Taxonomy & Conceptual Framework
How to think about LLM agents — a map of the design space
What Is an LLM Agent?
An LLM agent is a system in which a large language model serves as a central reasoning engine, capable of taking sequences of actions toward a goal — not merely producing a single response. The key distinctions from standard LLM use:
- Agency over time: The model takes multiple steps, observes results, and adapts
- Tool use: The model can invoke external capabilities (search, code execution, APIs)
- Goal-directedness: The system pursues an objective, not just answers a question
- Memory beyond context: The system can store and retrieve information across turns
The canonical formulation (from Wang et al., 2023 survey): an agent has a Brain (the LLM), Perception (inputs), Memory (short- and long-term storage), Action (what it can do), and Planning (how it decides what to do).
A Note on Taxonomies
There is no single consensus taxonomy for LLM agents — different surveys slice the design space differently. Two major competing frameworks:
Wang et al. (2023) — Component-based: Brain (LLM) + Perception + Memory + Planning + Action. Organized around what an agent has.
Plaat et al. (2025, JAIR) — Capability-based: Reason + Act + Interact. Organized around what an agent does.
The Plaat et al. framework is elegant in its simplicity and maps cleanly onto the literature: reasoning papers, acting/tool papers, and multi-agent interaction papers. It also reveals a virtuous cycle: agents that reason generate better actions; reflection improves multi-agent interaction; and crucially, acting and interacting generate new training data — a solution to the “running out of training data” problem. See Plaat et al., arXiv:2503.23037.
The rest of this page follows a hybrid approach: the component decomposition from Wang et al. for design dimensions, and the Plaat et al. capability framing for architectural families.
The Core Agent Loop
Most LLM agent architectures implement some version of this loop:
Observe → Think → Act → Observe → ...
This was crystallized in ReAct (Yao et al., 2023):
Thought: I need to find X
Action: search("X")
Observation: [result]
Thought: Now I know X, I should do Y
Action: ...
Variations on this loop form the backbone of nearly every agent system.
Decomposing Agent Design Space
The five dimensions below are an editorial synthesis for this survey, not a taxonomy from a single source. They draw primarily from Wang et al. (2023) (Brain/Perception/Memory/Action/Planning), Plaat et al. (2025) (Reason/Act/Interact), and Sumers et al. (2023) (cognitive memory architecture), combined with common framings in the practitioner literature. The goal is a practical map of design choices rather than a formal taxonomy.
We can decompose the design space of LLM agents along five key dimensions:
1. 🧠 Reasoning / Planning
How does the agent decide what to do next?
| Approach | Description | Examples |
|---|---|---|
| Direct | Single-step action selection | Basic tool-use with function calling |
| Chain-of-Thought | Linear reasoning trace | ReAct, CoT prompting |
| Tree/Graph Search | Branching exploration | Tree of Thoughts, MCTS agents |
| Hierarchical | Decompose → solve subgoals | Plan-and-Execute, HierAgent |
| Reflective | Evaluate → revise plans | Reflexion, Self-Refine, CRITIC |
2. 💾 Memory Architecture
What information can the agent access and how?
| Type | Description | Examples |
|---|---|---|
| In-context | Everything in the prompt window | Standard LLM, short conversations |
| External (vector) | Retrieved from embedding store | RAG, MemGPT external storage |
| Episodic | Record of past events | MemGPT, Generative Agents diary |
| Semantic | Facts about the world | Knowledge graph integration |
| Procedural | How-to skills, code | Voyager skill library |
3. 🔧 Action Space
What can the agent actually do?
| Category | Examples |
|---|---|
| Search/Web | Google search, web browsing, Wikipedia |
| Code execution | Python REPL, shell, Jupyter |
| API calls | REST APIs, function calling |
| File I/O | Read/write files |
| GUI/Computer | Click, type, screenshot |
| Agent spawning | Spawn sub-agents, delegate tasks |
| Communication | Send messages, emails |
4. 🤝 Agent Multiplicity
Is the system a single agent or multiple?
| Pattern | Description | Examples |
|---|---|---|
| Single | One LLM does everything | Standard ReAct agents |
| Specialist pipeline | Sequential specialization | HuggingGPT, modular agents |
| Peer collaboration | Agents debate and refine | CAMEL, SPP |
| Hierarchical | Manager + workers | AutoGen, MetaGPT, ChatDev |
| Society | Many autonomous agents | Generative Agents, AgentVerse |
5. 🎯 Degree of Autonomy
How much human oversight is in the loop?
| Level | Description |
|---|---|
| Tool-augmented | LLM with tools, human-in-the-loop |
| Semi-autonomous | Agent acts, human approves key steps |
| Fully autonomous | Agent runs until task complete |
| Multi-hop autonomous | Long-horizon, fully unsupervised |
A Taxonomy of Agent Architectures
Based on the literature, we can identify several major architectural families:
Family 1: Tool-Augmented LLMs
“Give the LLM tools to call”
The simplest agent pattern. A single LLM is given a set of callable tools. It reasons about which tool to call and synthesizes results. The focus is on tool integration more than autonomous goal-pursuit.
Key papers: Toolformer, MRKL Systems, OpenAI function calling, Claude tool use Key frameworks: LangChain tools, OpenAI Assistants
Family 2: ReAct-Style Agents
“Interleave reasoning and acting”
The agent alternates between Thought (reasoning about what to do) and Action (doing it). The observation from the action feeds back into the next thought. This is the dominant pattern for task-solving agents.
Key papers: ReAct, DEPS, Inner Monologue Key frameworks: LangChain agents, LlamaIndex agents
Family 3: Plan-Then-Execute
“Make a plan, then execute it”
A planner agent generates a high-level plan, then an executor carries out each step. Allows more structured task decomposition and is easier to monitor.
Key papers: Plan-and-Execute (Chase, 2023; blog post), LLM+P, DEPS Key frameworks: LangGraph, LangChain Plan-and-Execute
Family 4: Reflective / Self-Improving Agents
“Learn from mistakes within a task”
Agents that evaluate their own performance and update their approach. Introduces a critic or reflection component.
Key papers: Reflexion, Self-Refine, CRITIC, Constitutional AI Key frameworks: AutoGen (with feedback), LangGraph with loops
Family 5: Multi-Agent Systems
“Specialized agents collaborating”
Multiple LLM instances with different roles (planner, coder, critic, etc.) communicate and collaborate to solve complex tasks.
Key papers: CAMEL, MetaGPT, ChatDev, AutoGen, AgentVerse Key frameworks: AutoGen, CrewAI, LangGraph multi-agent
Family 6: Memory-Augmented Agents
“Agents with persistent, structured memory”
Go beyond the context window with external memory systems: vector databases, structured stores, memory hierarchies modeled after cognitive architectures.
Key papers: MemGPT, Generative Agents, A-MEM, ReadAgent Key frameworks: MemGPT, LangChain Memory
Family 7: Embodied / Action Agents
“Agents that act in the physical or digital world”
Agents that control computers (GUI agents), robots (embodied agents), or execute code. The action space is rich and grounded.
Key papers: SayCan, RT-2, CogAgent, UFO, OS-Copilot, SWE-agent Key products: Devin, Claude Computer Use, OpenAI Operator
The Virtuous Cycle (Plaat et al.)
One underappreciated insight from the Plaat et al. survey: the three categories of agentic behavior form a virtuous cycle that also generates training data:
Reasoning ──→ better decisions
↑ ↓
Interacting ←── Acting in the world
│ │
└──→ new training data ──→ better LLMs
Key insight: agentic LLMs that act and interact generate new empirical data — action-feedback sequences, multi-agent dialogues, role-play transcripts — that can feed back into pretraining and finetuning. This offers a potential solution to the “running out of training data” problem: agents create their own curriculum through experience. Vision-Language-Action models (RT-2, π₀, Magma) are the clearest current example.
The flip side: feedback loops can destabilize learning. Agent-generated data may amplify biases or errors if not carefully filtered and validated.
Key Design Tensions
The literature reveals several recurring tensions that different systems resolve differently:
| Tension | Tradeoff |
|---|---|
| Autonomy vs. Oversight | More autonomy = more capability, more risk |
| Generality vs. Specialization | Specialist agents perform better, general agents are more flexible |
| In-context vs. External memory | Context is fast but limited; external is vast but retrieval is lossy |
| Natural language vs. Structured | Natural language is flexible; structured is reliable |
| Single vs. Multi-agent | Multi-agent enables specialization; single-agent is simpler to debug |
| Plan-first vs. Interleave | Planning enables lookahead; interleaving enables reactivity |
How the Field Has Evolved
| Era | Dominant Pattern | Key Innovation |
|---|---|---|
| 2022 | Tool-augmented LLMs | MRKL, WebGPT, SayCan — LLMs + specialized modules |
| Early 2023 | ReAct agents | Interleaved reasoning + action; Toolformer |
| Mid 2023 | Autonomous agents | AutoGPT, BabyAGI — long-horizon goal pursuit |
| Late 2023 | Multi-agent + Memory | MetaGPT, AutoGen, MemGPT, Generative Agents |
| 2024 | Coding agents + Infra | SWE-agent, Devin; LangGraph, CrewAI |
| 2025-2026 | Agentic products + MCP | Claude Computer Use, Operator; model-native tooling |
References
Foundation Surveys
- On the Opportunities and Risks of Foundation Models (Bommasani et al., 2022) — arXiv:2108.07258
- A Survey on Large Language Model based Autonomous Agents (Wang et al., 2023) — arXiv:2308.11432
- Large Language Models Meet Agents: A Survey (Xi et al., 2023) — arXiv:2309.07864
- Agents in Artificial Intelligence: Surveys and Open Problems (Plaat et al., 2025, JAIR) — arXiv:2503.23037
Reasoning & Planning Papers
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022) — arXiv:2201.11903
- Large Language Models are Zero-Shot Reasoners (Kojima et al., 2022) — arXiv:2205.11916
- Least-to-Most Prompting Enables Complex Reasoning in Large Language Models (Zhou et al., 2023) — arXiv:2305.04091
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Yao et al., 2023) — arXiv:2305.10601
- Decomposed Prompting: A Modular Approach for Solving Complex Tasks (Khot et al., 2022) — arXiv:2210.02406
- Reasoning via Planning (Hao et al., 2023) — arXiv:2305.04091
Acting & Tool Use Papers
- WebGPT: Browser-Assisted Question-Answering with Large Language Models (OpenAI, 2021) — Blog
- MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning (Karpas et al., 2022) — arXiv:2205.00445
- Toolformer: Language Models Can Teach Themselves to Use Tools (Schick et al., 2023) — arXiv:2302.04761
- ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2023) — arXiv:2210.03629
- SayCan: Grounding Language to Robotic Affordances (Ahn et al., 2022) — arXiv:2204.00598
- Gorilla: Large Language Model Connected with Massive APIs (Patil et al., 2023) — arXiv:2305.15334
- Large Language Models as Tool Makers (Cai et al., 2023) — arXiv:2305.17126
- CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing (Gou et al., 2023) — arXiv:2305.11738
- Self-RAG: Learning to Retrieve, Generate, and Critique for Self-Improved Generation (Asai et al., 2023) — arXiv:2310.11511
- CodeAct: Unified Language Models as Zero-shot Agents (Wang et al., 2024) — arXiv:2401.10403
Reflection & Self-Improvement Papers
- Reflexion: Language Agents with Verbal Reinforcement Learning (Shinn et al., 2023) — arXiv:2303.11366
- Self-Refine: Iterative Refinement with Self-Feedback (Madaan et al., 2023) — arXiv:2305.00633
- Constitutional AI: Harmlessness from AI Feedback (Bai et al., 2022) — arXiv:2212.08073
Multi-Agent Systems Papers
- CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Models (Li et al., 2023) — arXiv:2303.17760
- MetaGPT: The Multi-Agent Framework (Hong et al., 2023) — arXiv:2308.00352
- ChatDev: Communicative Agents for Software Development (Qian et al., 2023) — arXiv:2307.07924
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (Wu et al., 2023) — arXiv:2308.08155
- AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents (Chen et al., 2023) — arXiv:2308.10848
Memory & Context Management Papers
- MemGPT: Towards LLMs as Operating Systems (Packer et al., 2023) — arXiv:2310.08560
- Generative Agents: Interactive Simulacra of Human Behavior (Park et al., 2023) — arXiv:2304.03442
- ReadAgent: Gist Memory for Extending Context Window of Large Language Models (Google Research, 2024) — arXiv:2402.09727
- Cognitive Architectures for Language Agents (Sumers et al., 2023) — arXiv:2309.10638
Embodied & GUI Agents Papers
- RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (Driess et al., 2023) — arXiv:2307.15818
- CogAgent: A Visual Language Model for GUI Agents (Hong et al., 2023) — arXiv:2305.04364
- UFO: A UI-Focused Agent for Windows OS Interaction (Zhang et al., 2024) — arXiv:2402.07939
- SWE-agent: An Open-Source Software Engineering Agent (Yang et al., 2024) — arXiv:2405.15793
- OS-Copilot: Towards Generalist Computer Agents with Open-Ended Learning (Zhu et al., 2024) — arXiv:2402.07939
Vision-Language-Action Models
- RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (Driess et al., 2023) — arXiv:2307.15818
- π₀: A Vision-Language-Action Flow Model for Open-World Robotic Manipulation (source needed)
- **Magma: Multimodal Agents Grounded in Machine Learning and Mobility* (source needed)
Benchmarks & Evaluations
- AgentBench: Evaluating LLMs as Agents (Liu et al., 2023) — arXiv:2308.03688
- WebArena: A Realistic Web Environment for Building Autonomous Agents (Zhou et al., 2023) — arXiv:2307.13854
- GAIA: A Benchmark for General AI Assistants (Mialon et al., 2023) — arXiv:2311.12983
- SWE-bench: A Benchmark for Software Engineering with Language Models (Jimenez et al., 2023) — arXiv:2310.06770
- OSWorld: Benchmarking Multimodal Agents in Real Computer Environments (Xie et al., 2024) — arXiv:2404.07972
Standards & Protocols
- Model Context Protocol (MCP) (Anthropic, 2024) — GitHub
- A2A (Agent-to-Agent) Protocol (Google, 2025) — Linux Foundation
Industry Resources & Blog Posts
- LLM Powered Autonomous Agents Systems — Lilian Weng, 2023 — Blog post
- Building Effective Agents — Anthropic, 2024 — Blog post
- Agent Architecture Patterns & Best Practices — LangChain — Docs
Further Reading
- Foundations (2022–2023) — The papers that built the field
- Reasoning & Planning — Deep dive on how agents think
- Multi-Agent Systems — Collaborative architectures
- Memory, Tools & Actions — The building blocks
- 2024–2026 Frontier — Where the field is now