//
Research

Independent benchmarks that test the limits of frontier models

At Deccan AI research, our mission is to deeply understand, evaluate, and advance the science of foundation models, enabling clearer insights for the AI community.

Great! We'll ensure our research lands in your inbox without fail
Oops! Something went wrong while submitting the form.
// Collaborated and co-authored with ML engineers at

From Compliance to Foresight: Benchmarking Deep Research Agents

The study focuses on examining the “Proactive” research capabilities of the SOTA models using novel rubrics

IF Benchmark: Constraint Choice Predicts Failure Rate Better Than Model Choice

This study evaluates instruction-following reliability in frontier models under controlled, multi-constraint conditions.

Image Generation Study: Deeper Assessment of Image SOTA Models

We introduce a new framework for end-to-end evaluation of next-gen image generation models

Anthar Study: Evaluating AI Coding Agents Beyond Benchmarks

A human evaluation benchmark measuring how well autonomous coding agents perform in real development workflows and security-critical scenarios.

WebApp Study: Evaluating AI-Native Web Application Builders

A human evaluation benchmark measuring how well no-code platforms handle real application development tasks.

This doesn’t have to end here

Accuracy is Intelligence