Field Timeline

Key milestones in LLM agent research and deployment, 2021–2026

A consolidated chronology of the major papers, releases, frameworks, benchmarks, products, and protocols that shaped the field. Click any linked item for more detail.

2021

Date	Event	Significance
Dec 2021	WebGPT (OpenAI)	First major browser-using LLM agent; RLHF for browsing behavior

2022

Date	Event	Significance
Feb 2022	Chain-of-Thought Prompting (Wei et al.)	Foundation of all LLM reasoning work; step-by-step prompting unlocks emergent capability
Apr 2022	SayCan (Google)	Embodied agent: LLM proposes actions, affordance model filters by physical feasibility
May 2022	MRKL Systems (Karpas et al.)	LLM as orchestrator of specialized symbolic modules — ancestor of tool-calling
May 2022	Zero-Shot CoT (“Let’s think step by step”)	Remarkably simple — adding one sentence enables multi-step reasoning
Jun 2022	Self-Consistency (Wang et al.)	Sample many reasoning paths, majority vote; significant reliability boost
Jul 2022	Inner Monologue (Huang et al.)	Robots form natural-language inner monologues for planning with environment feedback
Oct 2022	ReAct (Yao et al.)	The canonical agent loop: interleaved Thought / Action / Observation
Oct 2022	Decomposed Prompting (Khot et al.)	Modular sub-prompt decomposition

2023 (Early)

Date	Event	Significance
Jan 2023	Least-to-Most Prompting (Zhou et al.)	Sequential subproblem decomposition; better generalization than CoT
Feb 2023	Toolformer (Schick et al.)	Self-supervised tool learning — LLMs teach themselves to call APIs
Mar 2023	CAMEL (Li et al.)	First major role-playing communicative multi-agent paper
Mar 2023	HuggingGPT / JARVIS (Shen et al.)	ChatGPT orchestrates hundreds of Hugging Face models
Mar 2023	AutoGPT goes viral	Explosive public interest in autonomous agents; exposed failure modes
Mar 2023	GPT-4 released	Capability threshold enabling reliable agent reasoning
Apr 2023	Generative Agents (Park et al.)	25 agents in a simulated town; foundational episodic memory architecture
Apr 2023	BabyAGI (Nakajima)	Minimalist task management agent; viral open-source project
Apr 2023	Reflexion (Shinn et al.)	Verbal RL: agents reflect on failures; HumanEval 80%→91% (GPT-4 baseline to Reflexion)
May 2023	Tree of Thoughts (Yao et al.)	Branching search over reasoning steps; Game of 24: 4%→74%
May 2023	Reasoning via Planning / RAP (Hao et al.)	LLM as world model for MCTS-based planning
May 2023	Voyager (Wang et al.)	GPT-4 in Minecraft with lifelong skill library accumulation
May 2023	Self-Refine (Madaan et al.)	Iterative self-critique without additional training
May 2023	Gorilla LLM (Patil et al.)	Fine-tuned LLM for reliable API calling
May 2023	Large Language Models as Tool Makers (Cai et al.)	Agents create reusable tools

2023 (Mid–Late)

Date	Event	Significance
Jun 2023	OpenAI function calling	Industry-standard tool use API; enabled the framework ecosystem
Jun 2023	LLM Powered Autonomous Agents — Lilian Weng	Most-cited blog post introduction to the field
Jun 2023	CRITIC (Gou et al.)	Tool-grounded self-critique — verification via web search and code execution
Jun 2023	RestGPT	REST API orchestration
Jul 2023	ChatDev (Qian et al.)	Multi-agent software development team
Jul 2023	ToolLLM (Qin et al.)	16,000+ REST API coverage
Jul 2023	WebArena (Zhou et al.)	5-site web benchmark (+ Wikipedia); GPT-4 baseline ~14%
Aug 2023	MetaGPT (Hong et al.)	SOP-driven multi-agent software engineering
Aug 2023	AutoGen (Wu et al., Microsoft)	Conversable agents framework
Aug 2023	AgentVerse (Chen et al.)	Multi-agent dynamics study; first systematic failure mode analysis
Aug 2023	AgentBench (Liu et al.)	8-task multi-domain agent benchmark
Aug 2023	A Survey on Large Language Model based Autonomous Agents (Wang et al., 2308.11432)	First comprehensive survey; 200+ papers, Brain/Perception/Memory/Action framework
Sep 2023	The Rise and Potential of LLM-Based Agents (Xi et al., 2309.07864)	86-page survey; agent societies vision
Sep 2023	Cognitive Architectures for Language Agents (Sumers et al.)	ACT-R/SOAR grounding for agent memory
Oct 2023	Reflexion final paper (NeurIPS)
Oct 2023	MemGPT (Packer et al.)	LLM as OS: hierarchical memory paging; unlimited effective context
Oct 2023	Step-Back Prompting	Abstract before solving; +7% MMLU Physics, +27% TimeQA (PaLM-2L)
Oct 2023	Self-RAG (Asai et al.)	Selective retrieval with self-critique
Oct 2023	SWE-bench (Jimenez et al.)	GitHub issue resolution benchmark; defines the coding agent race
Oct 2023	GAIA: A Benchmark for General AI Assistants (Mialon et al.)	General AI assistant tasks; humans 92%, GPT-4+plugins ~30%
Nov 2023	DyLAN	Dynamic agent selection per reasoning step
Dec 2023	CogAgent, AppAgent	Specialist GUI models for mobile/desktop

2024 (Early)

Date	Event	Significance
Jan 2024	AgentScope (Alibaba)	Production-oriented multi-agent platform
Feb 2024	CrewAI released	Role-based agent framework; high-level crew abstraction
Feb 2024	ReadAgent (Google)	Gist memory extending effective context 3.5–20×
Feb 2024	CodeAct (Wang et al.)	Code as unified action space; more expressive than discrete tool calls
Feb 2024	AnyTool	Hierarchical API selection from thousands of options
Feb 2024	OS-Copilot	General computer control agent
Mar 2024	Devin (Cognition AI)	“First AI software engineer”; validates commercial demand; initial SWE-bench claims later revised
Apr 2024	OSWorld (Xie et al.)	Desktop GUI tasks across real apps; agents start <10%
Apr 2024	SWE-bench Pro (Scale)	Harder private-codebase variant to combat benchmark overfitting
May 2024	SWE-agent (Yang et al., Princeton)	Open-source coding agent; ~12% on SWE-bench; ACI design

2024 (Mid–Late)

Date	Event	Significance
Jun 2024	LangGraph gains production adoption	Graph-based stateful agent workflows
Aug 2024	Plaat et al. survey starts circulating	Reason–Act–Interact taxonomy; published JAIR Dec 2025
Aug 2024	Scaling LLM Test-Time Compute (Snell et al.)	Targeted compute scaling can beat a larger model
Oct 2024	Anthropic Computer Use	First major LLM provider with native computer control
Oct 2024	OpenAI Swarm (experimental)	Minimal multi-agent handoff framework
Nov 2024	MCP (Model Context Protocol) (Anthropic)	Open standard: universal tool/data connection for agents
Dec 2024	Building Effective Agents (Anthropic)	Defining practitioner post; workflows vs. agents distinction; 5 workflow patterns
Jan 2025	DeepSeek-R1	Open RL-trained reasoning model matching o1 on math; “aha moments” emerge from GRPO (arXiv:2501.12948)

2025

Date	Event	Significance
Jan 2025	OpenAI Operator launch	Browser-use agent; CUA (Computer-Using Agent) model
Jan 2025	Goose (Block / Jack Dorsey)	Open-source agent framework; extensible, LLM-agnostic
Feb 2025	OpenAI Deep Research	o3-powered web research agent; hours of research in minutes
Mar 2025	Manus AI launch	General-purpose autonomous agent; described as “turning point”; acquired by Meta Dec 2025
Mar 2025	METR time horizons paper	Agent capability doubling every 7 months; Claude 3.7 Sonnet at ~1hr horizon
Apr 2025	Google A2A (Agent2Agent) protocol	Agent-to-agent communication standard; donated to Linux Foundation Jun 2025
Apr 2025	PaperCoder (Seo et al., ICLR 2026)	Multi-agent framework: ML papers → working code repositories
May 2025	GitHub Copilot Coding Agent	Autonomous coding in VS Code, Xcode, JetBrains, Eclipse
Jun 2025	A2A → Linux Foundation	Neutral governance for agent interoperability
Jul 2025	ChatGPT Agent	Operator + Deep Research merged into unified general-purpose agent
Sep 2025	Claude Agent SDK	Claude Code generalized to full agent harness
Oct 2025	Microsoft Agent Framework	AutoGen + Semantic Kernel merged; production-ready
Oct 2025	GitHub Agent HQ (Universe 2025)	Unified agent orchestration within GitHub
Nov 2025	LangGraph v1.0	Production-ready release; enterprise adoption accelerates
Nov 2025	Gemini 3 Pro + Live SWE-agent: 77.4%	Major SWE-bench Verified milestone
Dec 2025	Meta acquires Manus AI
Dec 2025	Plaat et al. published in JAIR	Peer-reviewed survey; Reason–Act–Interact taxonomy
Dec 2025	Google Cloud “Lessons from 2025”	Production retrospective; agent undo stacks, reversibility as design principle

2026 (to March)

Date	Event	Significance
Jan 2026	Multiple 2025 retrospectives	Industry-wide reflection on agent deployment learnings
Feb 2026	MIT AI Agent Index published	30 agents documented; transparency gap exposed; autonomy levels mapped
Mar 2026	Anthropic Code Review for Claude Code	Parallel multi-agent PR review; multi-agent coding goes mainstream
Mar 2026	This survey compiled

References

Foundational Papers (2021–2022)

WebGPT: Browser-Assisted Question-Answering with Large Language Models — arXiv:2112.09332
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022) — arXiv:2201.11903
MRKL Systems: A modular, neuro-symbolic architecture (Karpas et al., 2022) — arXiv:2205.00445
Large Language Models are Zero-Shot Reasoners (Kojima et al., 2022) — arXiv:2205.11916
Self-Consistency Improves Chain of Thought Reasoning in Language Models (Wang et al., 2022) — arXiv:2203.11171
SayCan: Grounding Language to Robotic Affordances (Ahn et al., 2022) — arXiv:2204.01691
Inner Monologue: Embodied Reasoning through Planning with Language Models (Huang et al., 2022) — arXiv:2207.05608
ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2023) — arXiv:2210.03629
Decomposed Prompting: A Modular Approach for Solving Complex Tasks (Khot et al., 2022) — arXiv:2210.02406

Early 2023 Breakthroughs

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models (Zhou et al., 2023) — arXiv:2205.10625
Toolformer: Language Models Can Teach Themselves to Use Tools (Schick et al., 2023) — arXiv:2302.04761
CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Models (Li et al., 2023) — arXiv:2303.17760
HuggingGPT: Solving AI Tasks with Chatgpt and its Friends in Hugging Face (Shen et al., 2023) — arXiv:2303.17580
Reflexion: Language Agents with Verbal Reinforcement Learning (Shinn et al., 2023) — arXiv:2303.11366
Generative Agents: Interactive Simulacra of Human Behavior (Park et al., 2023) — arXiv:2304.03442
BabyAGI (Nakajima, 2023) — GitHub Repository
Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Yao et al., 2023) — arXiv:2305.10601
Reasoning via Planning with Language Models (Hao et al., 2023) — arXiv:2305.04091
Voyager: An Open-Ended Embodied Agent with Large Language Models (Wang et al., 2023) — arXiv:2305.16291
Self-Refine: Iterative Refinement with Self-Feedback (Madaan et al., 2023) — arXiv:2303.17651
Gorilla: Large Language Model Connected with Massive APIs (Patil et al., 2023) — arXiv:2305.15334
Large Language Models as Tool Makers (Cai et al., 2023) — arXiv:2305.17126 ### Mid–Late 2023 Frameworks & Systems
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing (Gou et al., 2023) — arXiv:2305.11738
RestGPT: An API Chaining Framework for LLM-Assisted API Applications (Xu et al., 2023) — arXiv:2306.06624
ChatDev: Communicative Agents for Software Development (Qian et al., 2023) — arXiv:2307.07924
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs (Qin et al., 2023) — arXiv:2307.16789
WebArena: A Realistic Web Environment for Building Autonomous Agents (Zhou et al., 2023) — arXiv:2307.13854
MetaGPT: The Multi-Agent Framework (Hong et al., 2023) — arXiv:2308.00352
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (Wu et al., 2023) — arXiv:2308.08155
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents (Chen et al., 2023) — arXiv:2308.10848
A Survey on Large Language Model based Autonomous Agents (Wang et al., 2023) — arXiv:2308.11432
AgentBench: Evaluating LLMs as Agents (Liu et al., 2023) — arXiv:2308.03688
The Rise and Potential of Large Language Model Based Agents: A Survey (Xi et al., 2023) — arXiv:2309.07864
Cognitive Architectures for Language Agents (Sumers et al., 2023) — arXiv:2309.02427
MemGPT: Towards LLMs as Operating Systems (Packer et al., 2023) — arXiv:2310.08560
Step-Back Prompting: An Effective Technique for Complex Reasoning (Zou et al., 2023) — arXiv:2310.06117
Self-RAG: Learning to Retrieve, Generate, and Critique for Self-Improved Generation (Asai et al., 2023) — arXiv:2310.11511
SWE-bench: Resolving Real-World GitHub Issues (Jimenez et al., 2023) — arXiv:2310.06770
GAIA: A Benchmark for General AI Assistants (Mialon et al., 2023) — arXiv:2311.12983
DyLAN: Dynamic Language Agent Networks — arXiv:2310.02779
CogAgent: A Visual Language Model for GUI Agents (Hong et al., 2023) — arXiv:2312.08914
AppAgent: Multimodal Agents as Smartphone Users (Zhang et al., 2023) — arXiv:2312.13771

2024 Production & Scale

AgentScope: A Flexible yet Robust Multi-Agent Plat form (Alibaba, 2024) — arXiv:2402.14034
CrewAI (2024) — GitHub
ReadAgent: Gist Memory for Extending Context Window of Large Language Models (Google Research, 2024) — arXiv:2402.09727
CodeAct: Unified Language Models as Zero-shot Agents (Wang et al., 2024) — arXiv:2402.01030
AnyTool: An LLM Agent that Can Flexibly Use Any API (Qian et al., 2024) — arXiv:2402.04253
OS-Copilot: Towards Generalist Computer Agents with Open-Ended Learning (Zhu et al., 2024) — arXiv:2402.07456
Devin: AI Software Engineer (Cognition AI, 2024) — Website
OSWorld: Benchmarking Multimodal Agents in Real Computer Environments (Xie et al., 2024) — arXiv:2404.07972
SWE-agent: An Open-Source Software Engineering Agent (Yang et al., 2024) — arXiv:2405.15793
Scaling LLM Test-Time Compute for Improved Performance and Robustness (Snell et al., 2024) — arXiv:2408.14958

2024–2025 Frameworks & Standards

LangGraph (LangChain, 2024) — Documentation
Anthropic Computer Use (October 2024) — Research Post
OpenAI Swarm (October 2024) — GitHub Repository
Model Context Protocol (MCP) (Anthropic, November 2024) — Website
Building Effective Agents (Anthropic, December 2024) — Blog Post
Agents in Artificial Intelligence: Surveys and Open Problems (Plaat et al., 2025, JAIR) — arXiv:2503.23037

2025 Launches & Milestones

DeepSeek-R1: Scaling Reasoning Capability of LLMs with Reinforcement Learning (DeepSeek, January 2025) — GitHub
OpenAI Operator (January 2025) — Website
Goose (Block / Jack Dorsey, January 2025) — GitHub
Google A2A (Agent-to-Agent) Protocol (April 2025) — Google Cloud Blog
PaperCoder: A Python Framework for Converting ML Papers to Working Code (Seo et al., ICLR 2026) — arXiv:2409.09381
GitHub Copilot Coding Agent (May 2025) — Blog Post
GitHub Agent HQ (October 2025, Universe 2025) — Blog Post
LangGraph v1.0 (November 2025) — Documentation

Blog Posts & Resources

LLM Powered Autonomous Agents Systems — Lilian Weng, 2023 — Blog
Building Effective Agents — Anthropic, 2024 — Blog Post
METR’s Time Horizons of AI Agents — March 2025 — Blog
MIT AI Agent Index — February 2026 — Website

For a conceptual map of how these fit together, see the Taxonomy →. For deeper coverage by topic, use the navigation above.