Foundations (2022–2023)
The papers and ideas that built the field
Overview
The modern LLM agent field traces to a cluster of papers from 2022–2023 that established the core paradigms still in use today. Before these works, LLMs were primarily used as single-shot text generators. This period showed they could act — using tools, browsing the web, executing code, and reasoning across multiple steps toward a goal.
The foundational insight: an LLM becomes an agent when you give it a loop — the ability to observe outcomes of its actions and decide what to do next.
Survey & Taxonomy Papers
These surveys provide the broadest maps of the field:
A Survey on Large Language Model based Autonomous Agents (2023)
Wang et al. · arXiv:2308.11432 · Frontiers of Computer Science, 2024
The most comprehensive survey of the field. Covers 200+ papers and presents a unified framework decomposing agents into four pillars: Brain (LLM), Perception, Memory, and Action. Analyzes applications across social science, natural science, and engineering. Actively maintained through v7 (March 2025).
- Key ideas: Unified architecture spanning majority of prior work; systematic taxonomy by application domain; evaluation methodology discussion
- Impact: De facto reference paper for new entrants to the field
The Rise and Potential of Large Language Model Based Agents (2023)
Xi et al. (29 co-authors) · arXiv:2309.07864 · Science China Information Sciences, 2025
An 86-page survey tracing agents from philosophical origins through modern AI. Uses a brain-perception-action framework. Explores single-agent, multi-agent, and human-agent cooperation. Discusses agent societies, emergent behaviors, and the path toward AGI.
- Key ideas: Historical and philosophical grounding; vision for agent societies; multi-agent coordination patterns
Agent AI: Surveying the Horizons of Multimodal Interaction (2024)
Durante et al. · arXiv:2401.03568
Expands scope to multimodal agents operating across text, vision, and action spaces. Covers gaming, robotics, healthcare, GUI navigation. A useful bridge between language-only and embodied/multimodal agents.
Large Language Model Agent: A Survey on Methodology, Applications and Challenges (2025)
Luo et al. · arXiv:2503.21460
Most recent comprehensive survey (March 2025) with structured taxonomy for LLM agent methodology. Catalogs current applications and synthesizes open challenges.
Agentic Large Language Models: A Survey (2025)
Plaat, van Duijn, van Stein, Preuss, van der Putten, Batenburg (Leiden University) · arXiv:2503.23037 · JAIR Vol. 84, December 2025 · companion website
A major peer-reviewed survey published in the Journal of Artificial Intelligence Research. Organizes the field around a distinctive Reason–Act–Interact taxonomy, offering a cleaner alternative to component-based frameworks:
- Reason: Multi-step reasoning (CoT, search trees), self-reflection, retrieval augmentation
- Act: Action models (world models, VLA models), robots and tools, domain assistants (medicine, finance, science)
- Interact: Social capabilities, role-based multi-agent interaction, open-ended agent societies and emergent norms
Key original contributions: 1. Virtuous cycle framing — the three capabilities are mutually reinforcing and generate new training data through agent-world interaction, addressing the “running out of training data” problem 2. Thinking Fast and Slow — connects LLM reasoning models (System 2 slow deliberation) to Kahneman’s dual-process theory 3. Theory of Mind — covers strategic behavior, negotiation, and theory of mind as prerequisites for effective multi-agent interaction 4. Emergent social norms — agent societies can develop emergent norms through interaction, enabling large-scale social science simulation 5. Research agenda — explicit open problems table covering all three categories
Applications highlighted: medical diagnosis, logistics, financial market analysis, scientific research augmentation.
Cognitive Architectures for Language Agents (2023)
Sumers et al. · arXiv:2309.02427 · TMLR 2024
Grounds LLM agent design in cognitive science. Proposes four-part memory model (working, procedural, semantic, episodic) drawing from ACT-R and SOAR. Essential for understanding memory system design.
Core Foundational Papers
ReAct: Synergizing Reasoning and Acting in Language Models (2023)
Yao et al. · arXiv:2210.03629 · ICLR 2023
The most cited and influential agent paper. Introduces the ReAct paradigm: interleaved reasoning traces and actions. The agent alternates between Thought: (reasoning about what to do) and Action: (doing it), with Observation: feeding back the result. This loop is the skeleton of most modern agent systems.
Thought: I need to find the population of France.
Action: search("France population 2024")
Observation: France has a population of approximately 68 million.
Thought: Now I can answer the question.
Action: finish("approximately 68 million")
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022)
Wei et al. · arXiv:2201.11903 · NeurIPS 2022
The prerequisite to ReAct. Shows that prompting LLMs to generate intermediate reasoning steps dramatically improves performance on arithmetic, commonsense, and symbolic reasoning. Established that LLMs can reason step-by-step — the cognitive foundation for all agent work.
- Key ideas: Few-shot CoT prompting; emergent capability (most effective at 100B+ params); generalizes across reasoning domains
Toolformer: Language Models Can Teach Themselves to Use Tools (2023)
Schick et al. · arXiv:2302.04761
Shows how to train an LLM to use tools in a self-supervised manner. The model learns to insert API calls into its own generated text, execute them, and incorporate results — without large human-annotated datasets.
- Tools learned: Calculator, Wikipedia search, calendar, Q&A system, translator
- Key ideas: Self-supervised tool learning; decides which API to call, when, and with what arguments; training signal from self-generated examples
- Legacy: Prefigures the function-calling paradigm now standard in GPT-4, Claude, etc.
MRKL Systems: A Modular, Neuro-Symbolic Architecture (2022)
Karpas et al. · arXiv:2205.00445
Early vision of modular AI where an LLM router dispatches to specialized expert modules (calculators, databases, ML models). One of the first papers to articulate the “LLM + tools” architecture.
- Key ideas: LLM as orchestrator; discrete symbolic modules for reliable computation; router learns which module is appropriate
- Legacy: Direct ancestor of today’s tool-calling agents
WebGPT: Browser-Assisted Question Answering (2021)
Nakano et al. · arXiv:2112.09332
GPT-3 fine-tuned to browse the web to answer questions. Introduced the concept of browser-using agents — now a major category. Used RLHF to train the browsing behavior.
- Key ideas: Web browsing as an action space; RLHF for agent behavior; long-form question answering with citations
HuggingGPT / JARVIS (2023)
Shen et al. · arXiv:2303.17580 · NeurIPS 2023
LLM (ChatGPT) as a task planner that orchestrates hundreds of specialized ML models from Hugging Face. The LLM parses user requests, selects the right ML models, executes them in sequence, and synthesizes results.
- Key ideas: LLM as controller of a model hub; structured task planning; multi-modal capability via model composition
- GitHub: microsoft/JARVIS
SayCan: Grounding Language in Robotic Affordances (2022)
Ahn et al. · arXiv:2204.01691 · CoRL 2022
Landmark paper combining LLMs with robotics. The LLM generates possible action plans; a learned “affordance” model scores which actions are physically feasible in the current environment. Early example of embodied agent planning.
- Key ideas: LLM proposes, affordance model filters; language grounding in physical world; “what is both useful AND possible”
Voyager: An Open-Ended Embodied Agent in Minecraft (2023)
Wang et al. · arXiv:2305.16291
GPT-4 playing Minecraft autonomously for indefinite time spans. Features a procedural skill library that grows over time, an automatic curriculum for proposing increasingly complex tasks, and iterative skill refinement via execution feedback.
- Key ideas: Lifelong learning via skill accumulation; automatic curriculum; code-as-action (skills written as JavaScript); never “forgets” learned skills
- Results: 3.3× more unique items collected, 2.3× longer distances covered, up to 15.3× faster at unlocking tech tree milestones compared to prior methods
AutoGPT (2023)
Toran Bruce Richards (Significant Gravitas) · GitHub
Not a paper, but arguably the most influential artifact of 2023. AutoGPT showed the world what a fully autonomous LLM agent looked like: give it a goal, let it run. It sparked massive public interest and downstream research, despite significant reliability issues.
- Key ideas: Autonomous goal pursuit; persistent memory; web search + code execution; iterative task decomposition
- Legacy: Demonstrated demand for autonomous agents; exposed failure modes that drove later research
BabyAGI (2023)
Nakajima · GitHub
Minimalist task management agent. Maintains a task list, executes tasks with an LLM + web search, and adds new tasks based on results. Showed that a simple agent loop could produce surprising emergent behaviors.
Standard Agent Decomposition
Based on the survey literature, the field converged on this standard decomposition:
| Component | Description | Examples |
|---|---|---|
| Perception | Inputs from environment | Text, images, tool outputs, web pages |
| Memory | Information storage & retrieval | In-context, vector DB, key-value stores |
| Planning | Deciding what to do | ReAct, CoT, hierarchical decomposition |
| Action | Executing decisions | Tool calls, web search, code execution |
| Learning | Improving over time | Reflexion, skill libraries, RLHF |
References
Survey Papers
- A Survey on Large Language Model based Autonomous Agents (Wang et al., 2023) — arXiv:2308.11432 — Frontiers of Computer Science, 2024 — Comprehensive 200+ paper survey with unified Brain-Perception-Memory-Action framework
- The Rise and Potential of Large Language Model Based Agents (Xi et al., 2023) — arXiv:2309.07864 — Science China Information Sciences, 2025 — 86-page survey with philosophical grounding and multi-agent coordination
- Agent AI: Surveying the Horizons of Multimodal Interaction (Durante et al., 2024) — arXiv:2401.03568 — Bridges language-only and embodied agents across text, vision, and action
- Large Language Model Agent: A Survey on Methodology, Applications and Challenges (Luo et al., 2025) — arXiv:2503.21460 — Latest comprehensive survey with structured methodology taxonomy
- Agentic Large Language Models: A Survey (Plaat, van Duijn, van Stein, Preuss, van der Putten, Batenburg, 2025) — arXiv:2503.23037 — Journal of Artificial Intelligence Research, Vol. 84, December 2025 — Distinctive Reason–Act–Interact taxonomy with virtuous cycle framing — companion website
- Cognitive Architectures for Language Agents (Sumers et al., 2023) — arXiv:2309.02427 — TMLR 2024 — Grounds agent design in cognitive science with four-part memory model
Core Foundational Papers
- ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022) — arXiv:2210.03629 — ICLR 2023 — Most cited agent paper; introduces interleaved reasoning-action loop
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022) — arXiv:2201.11903 — NeurIPS 2022 — Foundational work showing step-by-step reasoning improves LLM performance
- Toolformer: Language Models Can Teach Themselves to Use Tools (Schick et al., 2023) — arXiv:2302.04761 — Self-supervised tool learning without large annotated datasets
- MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning (Karpas et al., 2022) — arXiv:2205.00445 — Early modular AI vision with LLM as orchestrator
Browser & Web-Based Agents
- WebGPT: Browser-Assisted Question Answering with Human Feedback (Nakano et al., 2021) — arXiv:2112.09332 — Pioneering web-browsing agents with RLHF training
Multi-Modal & Embodied Agents
- HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face (Shen et al., 2023) — arXiv:2303.17580 — NeurIPS 2023 — LLM orchestration of 100s of specialized ML models — GitHub: microsoft/JARVIS
- Grounding Language to Affordances for Robotic Task Learning (Ahn et al., 2022) — arXiv:2204.01691 — CoRL 2022 — Combines LLMs with robotic affordances for embodied planning
Open-Ended & Lifelong Learning
- Voyager: An Open-Ended Embodied Agent with Large Language Models (Wang et al., 2023) — arXiv:2305.16291 — Autonomous Minecraft agent with skill library and curriculum learning
Landmark Agents (Non-Academic)
- AutoGPT (Toran Bruce Richards / Significant Gravitas, 2023) — GitHub: Significant-Gravitas/AutoGPT — Fully autonomous agent demonstrating goal pursuit and web integration
- BabyAGI (Yohei Nakajima, 2023) — GitHub: yoheinakajima/babyagi — Minimalist task management agent with emergent behaviors
For a full chronological view of the field, see the Timeline →. Continue to Reasoning & Planning →