How to find an apprenticeship?

We provide an official service to search through available apprenticeships. To get started, create an account here, specify the desired region, and your preferences. You will be able to search through all officially registered open apprenticeships.

You can contact the apprenticeship office through our official phone hotline above, or with the web-form below. We generally respond to written requests within 7-10 days.

Data samples and
off-the-shelf Datasets

Built using our deep expertise in GenAI use cases. Powered by exceptional raters. Sure to improve your model performance.

Access Datasets

Why Deccan's datasets

Model-grounded, high-touch, real-world data

End-to-end dataset pipeline

Expert, domain-trained annotators

Proven scalability and reliability

Curated Data Samples

Doc Intelligence

Datasets spanning to all sorts of doc types with structured outputs and grounded responses to evaluate retrieval accuracy, numerical fidelity, and document-based reasoning.

View Sample

Coding

Programming tasks spanning algorithms, APIs, debugging, and refactoring, designed to assess correctness, efficiency, tool use, and structured problem decomposition.

View Sample

STEM

Problem sets across PCMB evaluating quantitative reasoning, derivations, conceptual clarity, and stepwise solution accuracy at varied difficulty levels.

View Sample

Multimodal Data

Supervised fine-tuning datasets for multimodal tasks, including transformation, enhancement, and style adaptation, with aligned inputs and outputs to evaluate temporal consistency and visual fidelity.

View Sample

Deep Research

Subject-agnostic research assignments requiring structured exploration, multi-source synthesis, critical comparison, and evidence-backed conclusions to assess long-horizon reasoning and analytical depth.

View Sample

Agentic AI

Interactive task datasets spanning mobile and browser interfaces, evaluating planning, tool use, state tracking, and reliable multi-step action execution in dynamic environments.

View Sample

Multimodal QA / Data Interpretation SFT

This dataset enables use-cases like Market Research/Analytics over infographics. Each datapoint consists of vivid images (infographics, reports, charts), a Prompt (complicated analytical question over the image), and a detailed step-by-step Response (the answer).

Indic languages SFT

This a high quality, single turn Indic Language LLM fine-tuning dataset which enables general purpose LLMs to extend their multilingual capabilities.

Super-Pristine Single Shot Text2SQL SFT

This is a high quality, single shot Text2SQL LLM fine-tuning dataset which enables Conversational Business Intelligence use-cases. Each datapoint consists of a Natural Language Question (NLQ), SQL Query pair. The NLQ addresses a complicated business insight question over the DB, and the SQL query solves it.

Indic languages Code Switching

This dataset consists of a high quality, single turn Indic Language LLM RLHF dataset which enables general purpose LLMs to extend their multilingual capabilities.

Preference Ranking

This is a PPO dataset that assembles a high-quality RLHF dataset designed to address end-consumer use cases across various domains. Each data point consists of a challenging prompt along with a pair of responses generated by two different large language models (LLMs).

QA in Finance

This dataset utilises financial documents from U.S. companies, including annual reports (Form 10-K), annual general meetings, and quarterly reports. To simulate real-world scenarios, the annotators (finance experts) have carefully designed questions that reflect the needs of different financial companies and varying levels of financial literacy.