Taxonomy & Conceptual Framework

How to think about LLM agents — a map of the design space

What Is an LLM Agent?

An LLM agent is a system in which a large language model serves as a central reasoning engine, capable of taking sequences of actions toward a goal — not merely producing a single response. The key distinctions from standard LLM use:

  1. Agency over time: The model takes multiple steps, observes results, and adapts
  2. Tool use: The model can invoke external capabilities (search, code execution, APIs)
  3. Goal-directedness: The system pursues an objective, not just answers a question
  4. Memory beyond context: The system can store and retrieve information across turns

The canonical formulation (from Wang et al., 2023 survey): an agent has a Brain (the LLM), Perception (inputs), Memory (short- and long-term storage), Action (what it can do), and Planning (how it decides what to do).


A Note on Taxonomies

There is no single consensus taxonomy for LLM agents — different surveys slice the design space differently. Two major competing frameworks:

Wang et al. (2023) — Component-based: Brain (LLM) + Perception + Memory + Planning + Action. Organized around what an agent has.

Plaat et al. (2025, JAIR) — Capability-based: Reason + Act + Interact. Organized around what an agent does.

The Plaat et al. framework is elegant in its simplicity and maps cleanly onto the literature: reasoning papers, acting/tool papers, and multi-agent interaction papers. It also reveals a virtuous cycle: agents that reason generate better actions; reflection improves multi-agent interaction; and crucially, acting and interacting generate new training data — a solution to the “running out of training data” problem. See Plaat et al., arXiv:2503.23037.

The rest of this page follows a hybrid approach: the component decomposition from Wang et al. for design dimensions, and the Plaat et al. capability framing for architectural families.


The Core Agent Loop

Most LLM agent architectures implement some version of this loop:

Observe → Think → Act → Observe → ...

This was crystallized in ReAct (Yao et al., 2023):

Thought: I need to find X
Action: search("X")
Observation: [result]
Thought: Now I know X, I should do Y
Action: ...

Variations on this loop form the backbone of nearly every agent system.


Decomposing Agent Design Space

NoteEditorial synthesis

The five dimensions below are an editorial synthesis for this survey, not a taxonomy from a single source. They draw primarily from Wang et al. (2023) (Brain/Perception/Memory/Action/Planning), Plaat et al. (2025) (Reason/Act/Interact), and Sumers et al. (2023) (cognitive memory architecture), combined with common framings in the practitioner literature. The goal is a practical map of design choices rather than a formal taxonomy.

We can decompose the design space of LLM agents along five key dimensions:

1. 🧠 Reasoning / Planning

How does the agent decide what to do next?

Approach Description Examples
Direct Single-step action selection Basic tool-use with function calling
Chain-of-Thought Linear reasoning trace ReAct, CoT prompting
Tree/Graph Search Branching exploration Tree of Thoughts, MCTS agents
Hierarchical Decompose → solve subgoals Plan-and-Execute, HierAgent
Reflective Evaluate → revise plans Reflexion, Self-Refine, CRITIC

2. 💾 Memory Architecture

What information can the agent access and how?

Type Description Examples
In-context Everything in the prompt window Standard LLM, short conversations
External (vector) Retrieved from embedding store RAG, MemGPT external storage
Episodic Record of past events MemGPT, Generative Agents diary
Semantic Facts about the world Knowledge graph integration
Procedural How-to skills, code Voyager skill library

3. 🔧 Action Space

What can the agent actually do?

Category Examples
Search/Web Google search, web browsing, Wikipedia
Code execution Python REPL, shell, Jupyter
API calls REST APIs, function calling
File I/O Read/write files
GUI/Computer Click, type, screenshot
Agent spawning Spawn sub-agents, delegate tasks
Communication Send messages, emails

4. 🤝 Agent Multiplicity

Is the system a single agent or multiple?

Pattern Description Examples
Single One LLM does everything Standard ReAct agents
Specialist pipeline Sequential specialization HuggingGPT, modular agents
Peer collaboration Agents debate and refine CAMEL, SPP
Hierarchical Manager + workers AutoGen, MetaGPT, ChatDev
Society Many autonomous agents Generative Agents, AgentVerse

5. 🎯 Degree of Autonomy

How much human oversight is in the loop?

Level Description
Tool-augmented LLM with tools, human-in-the-loop
Semi-autonomous Agent acts, human approves key steps
Fully autonomous Agent runs until task complete
Multi-hop autonomous Long-horizon, fully unsupervised

A Taxonomy of Agent Architectures

Based on the literature, we can identify several major architectural families:

Family 1: Tool-Augmented LLMs

“Give the LLM tools to call”

The simplest agent pattern. A single LLM is given a set of callable tools. It reasons about which tool to call and synthesizes results. The focus is on tool integration more than autonomous goal-pursuit.

Key papers: Toolformer, MRKL Systems, OpenAI function calling, Claude tool use Key frameworks: LangChain tools, OpenAI Assistants

Family 2: ReAct-Style Agents

“Interleave reasoning and acting”

The agent alternates between Thought (reasoning about what to do) and Action (doing it). The observation from the action feeds back into the next thought. This is the dominant pattern for task-solving agents.

Key papers: ReAct, DEPS, Inner Monologue Key frameworks: LangChain agents, LlamaIndex agents

Family 3: Plan-Then-Execute

“Make a plan, then execute it”

A planner agent generates a high-level plan, then an executor carries out each step. Allows more structured task decomposition and is easier to monitor.

Key papers: Plan-and-Execute (Chase, 2023; blog post), LLM+P, DEPS Key frameworks: LangGraph, LangChain Plan-and-Execute

Family 4: Reflective / Self-Improving Agents

“Learn from mistakes within a task”

Agents that evaluate their own performance and update their approach. Introduces a critic or reflection component.

Key papers: Reflexion, Self-Refine, CRITIC, Constitutional AI Key frameworks: AutoGen (with feedback), LangGraph with loops

Family 5: Multi-Agent Systems

“Specialized agents collaborating”

Multiple LLM instances with different roles (planner, coder, critic, etc.) communicate and collaborate to solve complex tasks.

Key papers: CAMEL, MetaGPT, ChatDev, AutoGen, AgentVerse Key frameworks: AutoGen, CrewAI, LangGraph multi-agent

Family 6: Memory-Augmented Agents

“Agents with persistent, structured memory”

Go beyond the context window with external memory systems: vector databases, structured stores, memory hierarchies modeled after cognitive architectures.

Key papers: MemGPT, Generative Agents, A-MEM, ReadAgent Key frameworks: MemGPT, LangChain Memory

Family 7: Embodied / Action Agents

“Agents that act in the physical or digital world”

Agents that control computers (GUI agents), robots (embodied agents), or execute code. The action space is rich and grounded.

Key papers: SayCan, RT-2, CogAgent, UFO, OS-Copilot, SWE-agent Key products: Devin, Claude Computer Use, OpenAI Operator


The Virtuous Cycle (Plaat et al.)

One underappreciated insight from the Plaat et al. survey: the three categories of agentic behavior form a virtuous cycle that also generates training data:

Reasoning ──→ better decisions
    ↑                ↓
Interacting ←── Acting in the world
    │                │
    └──→ new training data ──→ better LLMs

Key insight: agentic LLMs that act and interact generate new empirical data — action-feedback sequences, multi-agent dialogues, role-play transcripts — that can feed back into pretraining and finetuning. This offers a potential solution to the “running out of training data” problem: agents create their own curriculum through experience. Vision-Language-Action models (RT-2, π₀, Magma) are the clearest current example.

The flip side: feedback loops can destabilize learning. Agent-generated data may amplify biases or errors if not carefully filtered and validated.


Key Design Tensions

The literature reveals several recurring tensions that different systems resolve differently:

Tension Tradeoff
Autonomy vs. Oversight More autonomy = more capability, more risk
Generality vs. Specialization Specialist agents perform better, general agents are more flexible
In-context vs. External memory Context is fast but limited; external is vast but retrieval is lossy
Natural language vs. Structured Natural language is flexible; structured is reliable
Single vs. Multi-agent Multi-agent enables specialization; single-agent is simpler to debug
Plan-first vs. Interleave Planning enables lookahead; interleaving enables reactivity

How the Field Has Evolved

Era Dominant Pattern Key Innovation
2022 Tool-augmented LLMs MRKL, WebGPT, SayCan — LLMs + specialized modules
Early 2023 ReAct agents Interleaved reasoning + action; Toolformer
Mid 2023 Autonomous agents AutoGPT, BabyAGI — long-horizon goal pursuit
Late 2023 Multi-agent + Memory MetaGPT, AutoGen, MemGPT, Generative Agents
2024 Coding agents + Infra SWE-agent, Devin; LangGraph, CrewAI
2025-2026 Agentic products + MCP Claude Computer Use, Operator; model-native tooling


References

Foundation Surveys

  • On the Opportunities and Risks of Foundation Models (Bommasani et al., 2022) — arXiv:2108.07258
  • A Survey on Large Language Model based Autonomous Agents (Wang et al., 2023) — arXiv:2308.11432
  • Large Language Models Meet Agents: A Survey (Xi et al., 2023) — arXiv:2309.07864
  • Agents in Artificial Intelligence: Surveys and Open Problems (Plaat et al., 2025, JAIR) — arXiv:2503.23037

Reasoning & Planning Papers

  • Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022) — arXiv:2201.11903
  • Large Language Models are Zero-Shot Reasoners (Kojima et al., 2022) — arXiv:2205.11916
  • Least-to-Most Prompting Enables Complex Reasoning in Large Language Models (Zhou et al., 2023) — arXiv:2305.04091
  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Yao et al., 2023) — arXiv:2305.10601
  • Decomposed Prompting: A Modular Approach for Solving Complex Tasks (Khot et al., 2022) — arXiv:2210.02406
  • Reasoning via Planning (Hao et al., 2023) — arXiv:2305.04091

Acting & Tool Use Papers

  • WebGPT: Browser-Assisted Question-Answering with Large Language Models (OpenAI, 2021) — Blog
  • MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning (Karpas et al., 2022) — arXiv:2205.00445
  • Toolformer: Language Models Can Teach Themselves to Use Tools (Schick et al., 2023) — arXiv:2302.04761
  • ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2023) — arXiv:2210.03629
  • SayCan: Grounding Language to Robotic Affordances (Ahn et al., 2022) — arXiv:2204.00598
  • Gorilla: Large Language Model Connected with Massive APIs (Patil et al., 2023) — arXiv:2305.15334
  • Large Language Models as Tool Makers (Cai et al., 2023) — arXiv:2305.17126
  • CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing (Gou et al., 2023) — arXiv:2305.11738
  • Self-RAG: Learning to Retrieve, Generate, and Critique for Self-Improved Generation (Asai et al., 2023) — arXiv:2310.11511
  • CodeAct: Unified Language Models as Zero-shot Agents (Wang et al., 2024) — arXiv:2401.10403

Reflection & Self-Improvement Papers

  • Reflexion: Language Agents with Verbal Reinforcement Learning (Shinn et al., 2023) — arXiv:2303.11366
  • Self-Refine: Iterative Refinement with Self-Feedback (Madaan et al., 2023) — arXiv:2305.00633
  • Constitutional AI: Harmlessness from AI Feedback (Bai et al., 2022) — arXiv:2212.08073

Multi-Agent Systems Papers

  • CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Models (Li et al., 2023) — arXiv:2303.17760
  • MetaGPT: The Multi-Agent Framework (Hong et al., 2023) — arXiv:2308.00352
  • ChatDev: Communicative Agents for Software Development (Qian et al., 2023) — arXiv:2307.07924
  • AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (Wu et al., 2023) — arXiv:2308.08155
  • AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents (Chen et al., 2023) — arXiv:2308.10848

Memory & Context Management Papers

  • MemGPT: Towards LLMs as Operating Systems (Packer et al., 2023) — arXiv:2310.08560
  • Generative Agents: Interactive Simulacra of Human Behavior (Park et al., 2023) — arXiv:2304.03442
  • ReadAgent: Gist Memory for Extending Context Window of Large Language Models (Google Research, 2024) — arXiv:2402.09727
  • Cognitive Architectures for Language Agents (Sumers et al., 2023) — arXiv:2309.10638

Embodied & GUI Agents Papers

  • RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (Driess et al., 2023) — arXiv:2307.15818
  • CogAgent: A Visual Language Model for GUI Agents (Hong et al., 2023) — arXiv:2305.04364
  • UFO: A UI-Focused Agent for Windows OS Interaction (Zhang et al., 2024) — arXiv:2402.07939
  • SWE-agent: An Open-Source Software Engineering Agent (Yang et al., 2024) — arXiv:2405.15793
  • OS-Copilot: Towards Generalist Computer Agents with Open-Ended Learning (Zhu et al., 2024) — arXiv:2402.07939

Vision-Language-Action Models

  • RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (Driess et al., 2023) — arXiv:2307.15818
  • π₀: A Vision-Language-Action Flow Model for Open-World Robotic Manipulation (source needed)
  • **Magma: Multimodal Agents Grounded in Machine Learning and Mobility* (source needed)

Benchmarks & Evaluations

  • AgentBench: Evaluating LLMs as Agents (Liu et al., 2023) — arXiv:2308.03688
  • WebArena: A Realistic Web Environment for Building Autonomous Agents (Zhou et al., 2023) — arXiv:2307.13854
  • GAIA: A Benchmark for General AI Assistants (Mialon et al., 2023) — arXiv:2311.12983
  • SWE-bench: A Benchmark for Software Engineering with Language Models (Jimenez et al., 2023) — arXiv:2310.06770
  • OSWorld: Benchmarking Multimodal Agents in Real Computer Environments (Xie et al., 2024) — arXiv:2404.07972

Standards & Protocols

  • Model Context Protocol (MCP) (Anthropic, 2024) — GitHub
  • A2A (Agent-to-Agent) Protocol (Google, 2025) — Linux Foundation

Industry Resources & Blog Posts

  • LLM Powered Autonomous Agents Systems — Lilian Weng, 2023 — Blog post
  • Building Effective Agents — Anthropic, 2024 — Blog post
  • Agent Architecture Patterns & Best Practices — LangChain — Docs

Further Reading