By Domain

LLM agents built for specific application areas and industries

These pages cover agents designed for particular domains — where the combination of LLM reasoning with domain-specific knowledge, tools, and evaluation criteria creates distinct research challenges.

💻

Coding Agents

From autocomplete to autonomous software engineering — SWE-agent, Devin, Claude Code, Cursor, OpenHands, benchmarks (SWE-bench, HumanEval, LiveCodeBench), agent-computer interfaces, and the architecture of edit-test-debug loops.

🔬

Science & Research Agents

FutureHouse Platform, Google AI Co-Scientist, PaperCoder, SkyRL, METR, and the frontier of agents built specifically for scientific discovery.

📊

Data Science & Analytics Agents

Agents that explore, analyze, and model data — Code Interpreter, Data Interpreter, DS-Agent, AIDE (Kaggle bronze medals), MLAgentBench, LIDA visualization, NL-to-SQL (Spider, BIRD), and the distinct challenges of data correctness over code correctness.

🏥

Domain-Specific Agents

AI agents built for professional domains — medical (Med-PaLM, MedAgents, radiology), legal (LegalBench, Harvey, hallucinated citations), financial (BloombergGPT, FinanceBench, trading agents), customer service, chemistry, and the cross-cutting challenge of high-stakes hallucination.

Looking for cross-cutting themes? See Topics →