By Domain
LLM agents built for specific application areas and industries
These pages cover agents designed for particular domains — where the combination of LLM reasoning with domain-specific knowledge, tools, and evaluation criteria creates distinct research challenges.
Coding Agents
From autocomplete to autonomous software engineering — SWE-agent, Devin, Claude Code, Cursor, OpenHands, benchmarks (SWE-bench, HumanEval, LiveCodeBench), agent-computer interfaces, and the architecture of edit-test-debug loops.
Science & Research Agents
FutureHouse Platform, Google AI Co-Scientist, PaperCoder, SkyRL, METR, and the frontier of agents built specifically for scientific discovery.
Data Science & Analytics Agents
Agents that explore, analyze, and model data — Code Interpreter, Data Interpreter, DS-Agent, AIDE (Kaggle bronze medals), MLAgentBench, LIDA visualization, NL-to-SQL (Spider, BIRD), and the distinct challenges of data correctness over code correctness.
Domain-Specific Agents
AI agents built for professional domains — medical (Med-PaLM, MedAgents, radiology), legal (LegalBench, Harvey, hallucinated citations), financial (BloombergGPT, FinanceBench, trading agents), customer service, chemistry, and the cross-cutting challenge of high-stakes hallucination.
Looking for cross-cutting themes? See Topics →