Field Timeline

Key milestones in LLM agent research and deployment, 2021–2026

A consolidated chronology of the major papers, releases, frameworks, benchmarks, products, and protocols that shaped the field. Click any linked item for more detail.


2021

Date Event Significance
Dec 2021 WebGPT (OpenAI) First major browser-using LLM agent; RLHF for browsing behavior

2022

Date Event Significance
Feb 2022 Chain-of-Thought Prompting (Wei et al.) Foundation of all LLM reasoning work; step-by-step prompting unlocks emergent capability
Apr 2022 SayCan (Google) Embodied agent: LLM proposes actions, affordance model filters by physical feasibility
May 2022 MRKL Systems (Karpas et al.) LLM as orchestrator of specialized symbolic modules — ancestor of tool-calling
May 2022 Zero-Shot CoT (“Let’s think step by step”) Remarkably simple — adding one sentence enables multi-step reasoning
Jun 2022 Self-Consistency (Wang et al.) Sample many reasoning paths, majority vote; significant reliability boost
Jul 2022 Inner Monologue (Huang et al.) Robots form natural-language inner monologues for planning with environment feedback
Oct 2022 ReAct (Yao et al.) The canonical agent loop: interleaved Thought / Action / Observation
Oct 2022 Decomposed Prompting (Khot et al.) Modular sub-prompt decomposition

2023 (Early)

Date Event Significance
Jan 2023 Least-to-Most Prompting (Zhou et al.) Sequential subproblem decomposition; better generalization than CoT
Feb 2023 Toolformer (Schick et al.) Self-supervised tool learning — LLMs teach themselves to call APIs
Mar 2023 CAMEL (Li et al.) First major role-playing communicative multi-agent paper
Mar 2023 HuggingGPT / JARVIS (Shen et al.) ChatGPT orchestrates hundreds of Hugging Face models
Mar 2023 AutoGPT goes viral Explosive public interest in autonomous agents; exposed failure modes
Mar 2023 GPT-4 released Capability threshold enabling reliable agent reasoning
Apr 2023 Generative Agents (Park et al.) 25 agents in a simulated town; foundational episodic memory architecture
Apr 2023 BabyAGI (Nakajima) Minimalist task management agent; viral open-source project
Apr 2023 Reflexion (Shinn et al.) Verbal RL: agents reflect on failures; HumanEval 80%→91% (GPT-4 baseline to Reflexion)
May 2023 Tree of Thoughts (Yao et al.) Branching search over reasoning steps; Game of 24: 4%→74%
May 2023 Reasoning via Planning / RAP (Hao et al.) LLM as world model for MCTS-based planning
May 2023 Voyager (Wang et al.) GPT-4 in Minecraft with lifelong skill library accumulation
May 2023 Self-Refine (Madaan et al.) Iterative self-critique without additional training
May 2023 Gorilla LLM (Patil et al.) Fine-tuned LLM for reliable API calling
May 2023 Large Language Models as Tool Makers (Cai et al.) Agents create reusable tools

2023 (Mid–Late)

Date Event Significance
Jun 2023 OpenAI function calling Industry-standard tool use API; enabled the framework ecosystem
Jun 2023 LLM Powered Autonomous Agents — Lilian Weng Most-cited blog post introduction to the field
Jun 2023 CRITIC (Gou et al.) Tool-grounded self-critique — verification via web search and code execution
Jun 2023 RestGPT REST API orchestration
Jul 2023 ChatDev (Qian et al.) Multi-agent software development team
Jul 2023 ToolLLM (Qin et al.) 16,000+ REST API coverage
Jul 2023 WebArena (Zhou et al.) 5-site web benchmark (+ Wikipedia); GPT-4 baseline ~14%
Aug 2023 MetaGPT (Hong et al.) SOP-driven multi-agent software engineering
Aug 2023 AutoGen (Wu et al., Microsoft) Conversable agents framework
Aug 2023 AgentVerse (Chen et al.) Multi-agent dynamics study; first systematic failure mode analysis
Aug 2023 AgentBench (Liu et al.) 8-task multi-domain agent benchmark
Aug 2023 A Survey on Large Language Model based Autonomous Agents (Wang et al., 2308.11432) First comprehensive survey; 200+ papers, Brain/Perception/Memory/Action framework
Sep 2023 The Rise and Potential of LLM-Based Agents (Xi et al., 2309.07864) 86-page survey; agent societies vision
Sep 2023 Cognitive Architectures for Language Agents (Sumers et al.) ACT-R/SOAR grounding for agent memory
Oct 2023 Reflexion final paper (NeurIPS)
Oct 2023 MemGPT (Packer et al.) LLM as OS: hierarchical memory paging; unlimited effective context
Oct 2023 Step-Back Prompting Abstract before solving; +7% MMLU Physics, +27% TimeQA (PaLM-2L)
Oct 2023 Self-RAG (Asai et al.) Selective retrieval with self-critique
Oct 2023 SWE-bench (Jimenez et al.) GitHub issue resolution benchmark; defines the coding agent race
Oct 2023 GAIA: A Benchmark for General AI Assistants (Mialon et al.) General AI assistant tasks; humans 92%, GPT-4+plugins ~30%
Nov 2023 DyLAN Dynamic agent selection per reasoning step
Dec 2023 CogAgent, AppAgent Specialist GUI models for mobile/desktop

2024 (Early)

Date Event Significance
Jan 2024 AgentScope (Alibaba) Production-oriented multi-agent platform
Feb 2024 CrewAI released Role-based agent framework; high-level crew abstraction
Feb 2024 ReadAgent (Google) Gist memory extending effective context 3.5–20×
Feb 2024 CodeAct (Wang et al.) Code as unified action space; more expressive than discrete tool calls
Feb 2024 AnyTool Hierarchical API selection from thousands of options
Feb 2024 OS-Copilot General computer control agent
Mar 2024 Devin (Cognition AI) “First AI software engineer”; validates commercial demand; initial SWE-bench claims later revised
Apr 2024 OSWorld (Xie et al.) Desktop GUI tasks across real apps; agents start <10%
Apr 2024 SWE-bench Pro (Scale) Harder private-codebase variant to combat benchmark overfitting
May 2024 SWE-agent (Yang et al., Princeton) Open-source coding agent; ~12% on SWE-bench; ACI design

2024 (Mid–Late)

Date Event Significance
Jun 2024 LangGraph gains production adoption Graph-based stateful agent workflows
Aug 2024 Plaat et al. survey starts circulating Reason–Act–Interact taxonomy; published JAIR Dec 2025
Aug 2024 Scaling LLM Test-Time Compute (Snell et al.) Targeted compute scaling can beat a larger model
Oct 2024 Anthropic Computer Use First major LLM provider with native computer control
Oct 2024 OpenAI Swarm (experimental) Minimal multi-agent handoff framework
Nov 2024 MCP (Model Context Protocol) (Anthropic) Open standard: universal tool/data connection for agents
Dec 2024 Building Effective Agents (Anthropic) Defining practitioner post; workflows vs. agents distinction; 5 workflow patterns
Jan 2025 DeepSeek-R1 Open RL-trained reasoning model matching o1 on math; “aha moments” emerge from GRPO (arXiv:2501.12948)

2025

Date Event Significance
Jan 2025 OpenAI Operator launch Browser-use agent; CUA (Computer-Using Agent) model
Jan 2025 Goose (Block / Jack Dorsey) Open-source agent framework; extensible, LLM-agnostic
Feb 2025 OpenAI Deep Research o3-powered web research agent; hours of research in minutes
Mar 2025 Manus AI launch General-purpose autonomous agent; described as “turning point”; acquired by Meta Dec 2025
Mar 2025 METR time horizons paper Agent capability doubling every 7 months; Claude 3.7 Sonnet at ~1hr horizon
Apr 2025 Google A2A (Agent2Agent) protocol Agent-to-agent communication standard; donated to Linux Foundation Jun 2025
Apr 2025 PaperCoder (Seo et al., ICLR 2026) Multi-agent framework: ML papers → working code repositories
May 2025 GitHub Copilot Coding Agent Autonomous coding in VS Code, Xcode, JetBrains, Eclipse
Jun 2025 A2A → Linux Foundation Neutral governance for agent interoperability
Jul 2025 ChatGPT Agent Operator + Deep Research merged into unified general-purpose agent
Sep 2025 Claude Agent SDK Claude Code generalized to full agent harness
Oct 2025 Microsoft Agent Framework AutoGen + Semantic Kernel merged; production-ready
Oct 2025 GitHub Agent HQ (Universe 2025) Unified agent orchestration within GitHub
Nov 2025 LangGraph v1.0 Production-ready release; enterprise adoption accelerates
Nov 2025 Gemini 3 Pro + Live SWE-agent: 77.4% Major SWE-bench Verified milestone
Dec 2025 Meta acquires Manus AI
Dec 2025 Plaat et al. published in JAIR Peer-reviewed survey; Reason–Act–Interact taxonomy
Dec 2025 Google Cloud “Lessons from 2025” Production retrospective; agent undo stacks, reversibility as design principle

2026 (to March)

Date Event Significance
Jan 2026 Multiple 2025 retrospectives Industry-wide reflection on agent deployment learnings
Feb 2026 MIT AI Agent Index published 30 agents documented; transparency gap exposed; autonomy levels mapped
Mar 2026 Anthropic Code Review for Claude Code Parallel multi-agent PR review; multi-agent coding goes mainstream
Mar 2026 This survey compiled


References

Foundational Papers (2021–2022)

  • WebGPT: Browser-Assisted Question-Answering with Large Language ModelsarXiv:2112.09332
  • Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022) — arXiv:2201.11903
  • MRKL Systems: A modular, neuro-symbolic architecture (Karpas et al., 2022) — arXiv:2205.00445
  • Large Language Models are Zero-Shot Reasoners (Kojima et al., 2022) — arXiv:2205.11916
  • Self-Consistency Improves Chain of Thought Reasoning in Language Models (Wang et al., 2022) — arXiv:2203.11171
  • SayCan: Grounding Language to Robotic Affordances (Ahn et al., 2022) — arXiv:2204.01691
  • Inner Monologue: Embodied Reasoning through Planning with Language Models (Huang et al., 2022) — arXiv:2207.05608
  • ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2023) — arXiv:2210.03629
  • Decomposed Prompting: A Modular Approach for Solving Complex Tasks (Khot et al., 2022) — arXiv:2210.02406

Early 2023 Breakthroughs

  • Least-to-Most Prompting Enables Complex Reasoning in Large Language Models (Zhou et al., 2023) — arXiv:2205.10625

  • Toolformer: Language Models Can Teach Themselves to Use Tools (Schick et al., 2023) — arXiv:2302.04761

  • CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Models (Li et al., 2023) — arXiv:2303.17760

  • HuggingGPT: Solving AI Tasks with Chatgpt and its Friends in Hugging Face (Shen et al., 2023) — arXiv:2303.17580

  • Reflexion: Language Agents with Verbal Reinforcement Learning (Shinn et al., 2023) — arXiv:2303.11366

  • Generative Agents: Interactive Simulacra of Human Behavior (Park et al., 2023) — arXiv:2304.03442

  • BabyAGI (Nakajima, 2023) — GitHub Repository

  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Yao et al., 2023) — arXiv:2305.10601

  • Reasoning via Planning with Language Models (Hao et al., 2023) — arXiv:2305.04091

  • Voyager: An Open-Ended Embodied Agent with Large Language Models (Wang et al., 2023) — arXiv:2305.16291

  • Self-Refine: Iterative Refinement with Self-Feedback (Madaan et al., 2023) — arXiv:2305.00633

  • Gorilla: Large Language Model Connected with Massive APIs (Patil et al., 2023) — arXiv:2305.15334

  • Large Language Models as Tool Makers (Cai et al., 2023) — arXiv:2305.17126 ### Mid–Late 2023 Frameworks & Systems

  • CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing (Gou et al., 2023) — arXiv:2305.11738

  • RestGPT: An API Chaining Framework for LLM-Assisted API Applications (Xu et al., 2023) — arXiv:2306.06624

  • ChatDev: Communicative Agents for Software Development (Qian et al., 2023) — arXiv:2307.07924

  • ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs (Qin et al., 2023) — arXiv:2307.16789

  • WebArena: A Realistic Web Environment for Building Autonomous Agents (Zhou et al., 2023) — arXiv:2307.13854

  • MetaGPT: The Multi-Agent Framework (Hong et al., 2023) — arXiv:2308.00352

  • AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (Wu et al., 2023) — arXiv:2308.08155

  • AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents (Chen et al., 2023) — arXiv:2308.10848

  • A Survey on Large Language Model based Autonomous Agents (Wang et al., 2023) — arXiv:2308.11432

  • AgentBench: Evaluating LLMs as Agents (Liu et al., 2023) — arXiv:2308.03688

  • The Rise and Potential of Large Language Model Based Agents: A Survey (Xi et al., 2023) — arXiv:2309.07864

  • Cognitive Architectures for Language Agents (Sumers et al., 2023) — arXiv:2309.02427

  • MemGPT: Towards LLMs as Operating Systems (Packer et al., 2023) — arXiv:2310.08560

  • Step-Back Prompting: An Effective Technique for Complex Reasoning (Zou et al., 2023) — arXiv:2310.06117

  • Self-RAG: Learning to Retrieve, Generate, and Critique for Self-Improved Generation (Asai et al., 2023) — arXiv:2310.11511

  • SWE-bench: Resolving Real-World GitHub Issues (Jimenez et al., 2023) — arXiv:2310.06770

  • GAIA: A Benchmark for General AI Assistants (Mialon et al., 2023) — arXiv:2311.12983

  • DyLAN: Dynamic Language Agent NetworksarXiv:2310.02779

  • CogAgent: A Visual Language Model for GUI Agents (Hong et al., 2023) — arXiv:2312.08914

  • AppAgent: Multimodal Agents as Smartphone Users (Zhang et al., 2023) — arXiv:2312.13771

2024 Production & Scale

  • AgentScope: A Flexible yet Robust Multi-Agent Plat form (Alibaba, 2024) — arXiv:2402.14034
  • CrewAI (2024) — GitHub
  • ReadAgent: Gist Memory for Extending Context Window of Large Language Models (Google Research, 2024) — arXiv:2402.09727
  • CodeAct: Unified Language Models as Zero-shot Agents (Wang et al., 2024) — arXiv:2402.01030
  • AnyTool: An LLM Agent that Can Flexibly Use Any API (Qian et al., 2024) — arXiv:2402.04253
  • OS-Copilot: Towards Generalist Computer Agents with Open-Ended Learning (Zhu et al., 2024) — arXiv:2402.07456
  • Devin: AI Software Engineer (Cognition AI, 2024) — Website
  • OSWorld: Benchmarking Multimodal Agents in Real Computer Environments (Xie et al., 2024) — arXiv:2404.07972
  • SWE-agent: An Open-Source Software Engineering Agent (Yang et al., 2024) — arXiv:2405.15793
  • Scaling LLM Test-Time Compute for Improved Performance and Robustness (Snell et al., 2024) — arXiv:2408.14958

2024–2025 Frameworks & Standards

  • LangGraph (LangChain, 2024) — Documentation
  • Anthropic Computer Use (October 2024) — Research Post
  • OpenAI Swarm (October 2024) — GitHub Repository
  • Model Context Protocol (MCP) (Anthropic, November 2024) — Website
  • Building Effective Agents (Anthropic, December 2024) — Blog Post
  • Agents in Artificial Intelligence: Surveys and Open Problems (Plaat et al., 2025, JAIR) — arXiv:2503.23037

2025 Launches & Milestones

  • DeepSeek-R1: Scaling Reasoning Capability of LLMs with Reinforcement Learning (DeepSeek, January 2025) — GitHub
  • OpenAI Operator (January 2025) — Website
  • Goose (Block / Jack Dorsey, January 2025) — GitHub
  • Google A2A (Agent-to-Agent) Protocol (April 2025) — Google Cloud Blog
  • PaperCoder: A Python Framework for Converting ML Papers to Working Code (Seo et al., ICLR 2026) — arXiv:2409.09381
  • GitHub Copilot Coding Agent (May 2025) — Blog Post
  • GitHub Agent HQ (October 2025, Universe 2025) — Blog Post
  • LangGraph v1.0 (November 2025) — Documentation

Blog Posts & Resources

  • LLM Powered Autonomous Agents Systems — Lilian Weng, 2023 — Blog
  • Building Effective Agents — Anthropic, 2024 — Blog Post
  • METR’s Time Horizons of AI Agents — March 2025 — Blog
  • MIT AI Agent Index — February 2026 — Website

For a conceptual map of how these fit together, see the Taxonomy →. For deeper coverage by topic, use the navigation above.