Foundations (2022–2023)

The papers and ideas that built the field

Overview

The modern LLM agent field traces to a cluster of papers from 2022–2023 that established the core paradigms still in use today. Before these works, LLMs were primarily used as single-shot text generators. This period showed they could act — using tools, browsing the web, executing code, and reasoning across multiple steps toward a goal.

The foundational insight: an LLM becomes an agent when you give it a loop — the ability to observe outcomes of its actions and decide what to do next.


Survey & Taxonomy Papers

These surveys provide the broadest maps of the field:

A Survey on Large Language Model based Autonomous Agents (2023)

Wang et al. · arXiv:2308.11432 · Frontiers of Computer Science, 2024

The most comprehensive survey of the field. Covers 200+ papers and presents a unified framework decomposing agents into four pillars: Brain (LLM), Perception, Memory, and Action. Analyzes applications across social science, natural science, and engineering. Actively maintained through v7 (March 2025).

  • Key ideas: Unified architecture spanning majority of prior work; systematic taxonomy by application domain; evaluation methodology discussion
  • Impact: De facto reference paper for new entrants to the field

The Rise and Potential of Large Language Model Based Agents (2023)

Xi et al. (29 co-authors) · arXiv:2309.07864 · Science China Information Sciences, 2025

An 86-page survey tracing agents from philosophical origins through modern AI. Uses a brain-perception-action framework. Explores single-agent, multi-agent, and human-agent cooperation. Discusses agent societies, emergent behaviors, and the path toward AGI.

  • Key ideas: Historical and philosophical grounding; vision for agent societies; multi-agent coordination patterns

Agent AI: Surveying the Horizons of Multimodal Interaction (2024)

Durante et al. · arXiv:2401.03568

Expands scope to multimodal agents operating across text, vision, and action spaces. Covers gaming, robotics, healthcare, GUI navigation. A useful bridge between language-only and embodied/multimodal agents.

Large Language Model Agent: A Survey on Methodology, Applications and Challenges (2025)

Luo et al. · arXiv:2503.21460

Most recent comprehensive survey (March 2025) with structured taxonomy for LLM agent methodology. Catalogs current applications and synthesizes open challenges.

Agentic Large Language Models: A Survey (2025)

Plaat, van Duijn, van Stein, Preuss, van der Putten, Batenburg (Leiden University) · arXiv:2503.23037 · JAIR Vol. 84, December 2025 · companion website

A major peer-reviewed survey published in the Journal of Artificial Intelligence Research. Organizes the field around a distinctive Reason–Act–Interact taxonomy, offering a cleaner alternative to component-based frameworks:

  • Reason: Multi-step reasoning (CoT, search trees), self-reflection, retrieval augmentation
  • Act: Action models (world models, VLA models), robots and tools, domain assistants (medicine, finance, science)
  • Interact: Social capabilities, role-based multi-agent interaction, open-ended agent societies and emergent norms

Key original contributions: 1. Virtuous cycle framing — the three capabilities are mutually reinforcing and generate new training data through agent-world interaction, addressing the “running out of training data” problem 2. Thinking Fast and Slow — connects LLM reasoning models (System 2 slow deliberation) to Kahneman’s dual-process theory 3. Theory of Mind — covers strategic behavior, negotiation, and theory of mind as prerequisites for effective multi-agent interaction 4. Emergent social norms — agent societies can develop emergent norms through interaction, enabling large-scale social science simulation 5. Research agenda — explicit open problems table covering all three categories

Applications highlighted: medical diagnosis, logistics, financial market analysis, scientific research augmentation.

Cognitive Architectures for Language Agents (2023)

Sumers et al. · arXiv:2309.02427 · TMLR 2024

Grounds LLM agent design in cognitive science. Proposes four-part memory model (working, procedural, semantic, episodic) drawing from ACT-R and SOAR. Essential for understanding memory system design.


Core Foundational Papers

ReAct: Synergizing Reasoning and Acting in Language Models (2023)

Yao et al. · arXiv:2210.03629 · ICLR 2023

The most cited and influential agent paper. Introduces the ReAct paradigm: interleaved reasoning traces and actions. The agent alternates between Thought: (reasoning about what to do) and Action: (doing it), with Observation: feeding back the result. This loop is the skeleton of most modern agent systems.

Thought: I need to find the population of France.
Action: search("France population 2024")
Observation: France has a population of approximately 68 million.
Thought: Now I can answer the question.
Action: finish("approximately 68 million")
  • Key ideas: Reasoning and acting are synergistic, not separate; grounding in external feedback reduces hallucination; enables error correction mid-task
  • Results: Outperforms pure CoT and pure acting on HotpotQA, FEVER, ALFWorld, WebShop

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022)

Wei et al. · arXiv:2201.11903 · NeurIPS 2022

The prerequisite to ReAct. Shows that prompting LLMs to generate intermediate reasoning steps dramatically improves performance on arithmetic, commonsense, and symbolic reasoning. Established that LLMs can reason step-by-step — the cognitive foundation for all agent work.

  • Key ideas: Few-shot CoT prompting; emergent capability (most effective at 100B+ params); generalizes across reasoning domains

Toolformer: Language Models Can Teach Themselves to Use Tools (2023)

Schick et al. · arXiv:2302.04761

Shows how to train an LLM to use tools in a self-supervised manner. The model learns to insert API calls into its own generated text, execute them, and incorporate results — without large human-annotated datasets.

  • Tools learned: Calculator, Wikipedia search, calendar, Q&A system, translator
  • Key ideas: Self-supervised tool learning; decides which API to call, when, and with what arguments; training signal from self-generated examples
  • Legacy: Prefigures the function-calling paradigm now standard in GPT-4, Claude, etc.

MRKL Systems: A Modular, Neuro-Symbolic Architecture (2022)

Karpas et al. · arXiv:2205.00445

Early vision of modular AI where an LLM router dispatches to specialized expert modules (calculators, databases, ML models). One of the first papers to articulate the “LLM + tools” architecture.

  • Key ideas: LLM as orchestrator; discrete symbolic modules for reliable computation; router learns which module is appropriate
  • Legacy: Direct ancestor of today’s tool-calling agents

WebGPT: Browser-Assisted Question Answering (2021)

Nakano et al. · arXiv:2112.09332

GPT-3 fine-tuned to browse the web to answer questions. Introduced the concept of browser-using agents — now a major category. Used RLHF to train the browsing behavior.

  • Key ideas: Web browsing as an action space; RLHF for agent behavior; long-form question answering with citations

HuggingGPT / JARVIS (2023)

Shen et al. · arXiv:2303.17580 · NeurIPS 2023

LLM (ChatGPT) as a task planner that orchestrates hundreds of specialized ML models from Hugging Face. The LLM parses user requests, selects the right ML models, executes them in sequence, and synthesizes results.

  • Key ideas: LLM as controller of a model hub; structured task planning; multi-modal capability via model composition
  • GitHub: microsoft/JARVIS

SayCan: Grounding Language in Robotic Affordances (2022)

Ahn et al. · arXiv:2204.01691 · CoRL 2022

Landmark paper combining LLMs with robotics. The LLM generates possible action plans; a learned “affordance” model scores which actions are physically feasible in the current environment. Early example of embodied agent planning.

  • Key ideas: LLM proposes, affordance model filters; language grounding in physical world; “what is both useful AND possible”

Voyager: An Open-Ended Embodied Agent in Minecraft (2023)

Wang et al. · arXiv:2305.16291

GPT-4 playing Minecraft autonomously for indefinite time spans. Features a procedural skill library that grows over time, an automatic curriculum for proposing increasingly complex tasks, and iterative skill refinement via execution feedback.

  • Key ideas: Lifelong learning via skill accumulation; automatic curriculum; code-as-action (skills written as JavaScript); never “forgets” learned skills
  • Results: 3.3× more unique items collected, 2.3× longer distances covered, up to 15.3× faster at unlocking tech tree milestones compared to prior methods

AutoGPT (2023)

Toran Bruce Richards (Significant Gravitas) · GitHub

Not a paper, but arguably the most influential artifact of 2023. AutoGPT showed the world what a fully autonomous LLM agent looked like: give it a goal, let it run. It sparked massive public interest and downstream research, despite significant reliability issues.

  • Key ideas: Autonomous goal pursuit; persistent memory; web search + code execution; iterative task decomposition
  • Legacy: Demonstrated demand for autonomous agents; exposed failure modes that drove later research

BabyAGI (2023)

Nakajima · GitHub

Minimalist task management agent. Maintains a task list, executes tasks with an LLM + web search, and adds new tasks based on results. Showed that a simple agent loop could produce surprising emergent behaviors.


Standard Agent Decomposition

Based on the survey literature, the field converged on this standard decomposition:

Component Description Examples
Perception Inputs from environment Text, images, tool outputs, web pages
Memory Information storage & retrieval In-context, vector DB, key-value stores
Planning Deciding what to do ReAct, CoT, hierarchical decomposition
Action Executing decisions Tool calls, web search, code execution
Learning Improving over time Reflexion, skill libraries, RLHF

References

Survey Papers

  • A Survey on Large Language Model based Autonomous Agents (Wang et al., 2023) — arXiv:2308.11432Frontiers of Computer Science, 2024 — Comprehensive 200+ paper survey with unified Brain-Perception-Memory-Action framework
  • The Rise and Potential of Large Language Model Based Agents (Xi et al., 2023) — arXiv:2309.07864Science China Information Sciences, 2025 — 86-page survey with philosophical grounding and multi-agent coordination
  • Agent AI: Surveying the Horizons of Multimodal Interaction (Durante et al., 2024) — arXiv:2401.03568 — Bridges language-only and embodied agents across text, vision, and action
  • Large Language Model Agent: A Survey on Methodology, Applications and Challenges (Luo et al., 2025) — arXiv:2503.21460 — Latest comprehensive survey with structured methodology taxonomy
  • Agentic Large Language Models: A Survey (Plaat, van Duijn, van Stein, Preuss, van der Putten, Batenburg, 2025) — arXiv:2503.23037Journal of Artificial Intelligence Research, Vol. 84, December 2025 — Distinctive Reason–Act–Interact taxonomy with virtuous cycle framing — companion website
  • Cognitive Architectures for Language Agents (Sumers et al., 2023) — arXiv:2309.02427TMLR 2024 — Grounds agent design in cognitive science with four-part memory model

Core Foundational Papers

  • ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022) — arXiv:2210.03629ICLR 2023 — Most cited agent paper; introduces interleaved reasoning-action loop
  • Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022) — arXiv:2201.11903NeurIPS 2022 — Foundational work showing step-by-step reasoning improves LLM performance
  • Toolformer: Language Models Can Teach Themselves to Use Tools (Schick et al., 2023) — arXiv:2302.04761 — Self-supervised tool learning without large annotated datasets
  • MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning (Karpas et al., 2022) — arXiv:2205.00445 — Early modular AI vision with LLM as orchestrator

Browser & Web-Based Agents

  • WebGPT: Browser-Assisted Question Answering with Human Feedback (Nakano et al., 2021) — arXiv:2112.09332 — Pioneering web-browsing agents with RLHF training

Multi-Modal & Embodied Agents

  • HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face (Shen et al., 2023) — arXiv:2303.17580NeurIPS 2023 — LLM orchestration of 100s of specialized ML models — GitHub: microsoft/JARVIS
  • Grounding Language to Affordances for Robotic Task Learning (Ahn et al., 2022) — arXiv:2204.01691CoRL 2022 — Combines LLMs with robotic affordances for embodied planning

Open-Ended & Lifelong Learning

  • Voyager: An Open-Ended Embodied Agent with Large Language Models (Wang et al., 2023) — arXiv:2305.16291 — Autonomous Minecraft agent with skill library and curriculum learning

Landmark Agents (Non-Academic)


For a full chronological view of the field, see the Timeline →. Continue to Reasoning & Planning →