Taxonomy & Conceptual Framework

How to think about LLM agents — a map of the design space

What Is an LLM Agent?

An LLM agent is a system in which a large language model serves as a central reasoning engine, capable of taking sequences of actions toward a goal — not merely producing a single response. The key distinctions from standard LLM use:

Agency over time: The model takes multiple steps, observes results, and adapts
Tool use: The model can invoke external capabilities (search, code execution, APIs)
Goal-directedness: The system pursues an objective, not just answers a question
Memory beyond context: The system can store and retrieve information across turns

The canonical formulation (from Wang et al., 2023 survey): an agent has a Brain (the LLM), Perception (inputs), Memory (short- and long-term storage), Action (what it can do), and Planning (how it decides what to do).

A Note on Taxonomies

There is no single consensus taxonomy for LLM agents — different surveys slice the design space differently. Two major competing frameworks:

Wang et al. (2023) — Component-based: Brain (LLM) + Perception + Memory + Planning + Action. Organized around what an agent has.

Plaat et al. (2025, JAIR) — Capability-based: Reason + Act + Interact. Organized around what an agent does.

The Plaat et al. framework is elegant in its simplicity and maps cleanly onto the literature: reasoning papers, acting/tool papers, and multi-agent interaction papers. It also reveals a virtuous cycle: agents that reason generate better actions; reflection improves multi-agent interaction; and crucially, acting and interacting generate new training data — a solution to the “running out of training data” problem. See Plaat et al., arXiv:2503.23037.

The rest of this page follows a hybrid approach: the component decomposition from Wang et al. for design dimensions, and the Plaat et al. capability framing for architectural families.

The Core Agent Loop

Most LLM agent architectures implement some version of this loop:

Observe → Think → Act → Observe → ...

This was crystallized in ReAct (Yao et al., 2023):

Thought: I need to find X
Action: search("X")
Observation: [result]
Thought: Now I know X, I should do Y
Action: ...

Variations on this loop form the backbone of nearly every agent system.

Decomposing Agent Design Space

Editorial synthesis

The five dimensions below are an editorial synthesis for this survey, not a taxonomy from a single source. They draw primarily from Wang et al. (2023) (Brain/Perception/Memory/Action/Planning), Plaat et al. (2025) (Reason/Act/Interact), and Sumers et al. (2023) (cognitive memory architecture), combined with common framings in the practitioner literature. The goal is a practical map of design choices rather than a formal taxonomy.

We can decompose the design space of LLM agents along five key dimensions:

1. 🧠 Reasoning / Planning

How does the agent decide what to do next?

Approach	Description	Examples
Direct	Single-step action selection	Basic tool-use with function calling
Chain-of-Thought	Linear reasoning trace	ReAct, CoT prompting
Tree/Graph Search	Branching exploration	Tree of Thoughts, MCTS agents
Hierarchical	Decompose → solve subgoals	Plan-and-Execute, HierAgent
Reflective	Evaluate → revise plans	Reflexion, Self-Refine, CRITIC

2. 💾 Memory Architecture

What information can the agent access and how?

Type	Description	Examples
In-context	Everything in the prompt window	Standard LLM, short conversations
External (vector)	Retrieved from embedding store	RAG, MemGPT external storage
Episodic	Record of past events	MemGPT, Generative Agents diary
Semantic	Facts about the world	Knowledge graph integration
Procedural	How-to skills, code	Voyager skill library

3. 🔧 Action Space

What can the agent actually do?

Category	Examples
Search/Web	Google search, web browsing, Wikipedia
Code execution	Python REPL, shell, Jupyter
API calls	REST APIs, function calling
File I/O	Read/write files
GUI/Computer	Click, type, screenshot
Agent spawning	Spawn sub-agents, delegate tasks
Communication	Send messages, emails

4. 🤝 Agent Multiplicity

Is the system a single agent or multiple?

Pattern	Description	Examples
Single	One LLM does everything	Standard ReAct agents
Specialist pipeline	Sequential specialization	HuggingGPT, modular agents
Peer collaboration	Agents debate and refine	CAMEL, SPP
Hierarchical	Manager + workers	AutoGen, MetaGPT, ChatDev
Society	Many autonomous agents	Generative Agents, AgentVerse

5. 🎯 Degree of Autonomy

How much human oversight is in the loop?

Level	Description
Tool-augmented	LLM with tools, human-in-the-loop
Semi-autonomous	Agent acts, human approves key steps
Fully autonomous	Agent runs until task complete
Multi-hop autonomous	Long-horizon, fully unsupervised

A Taxonomy of Agent Architectures

Based on the literature, we can identify several major architectural families:

Family 1: Tool-Augmented LLMs

“Give the LLM tools to call”

The simplest agent pattern. A single LLM is given a set of callable tools. It reasons about which tool to call and synthesizes results. The focus is on tool integration more than autonomous goal-pursuit.

Key papers: Toolformer, MRKL Systems, OpenAI function calling, Claude tool use Key frameworks: LangChain tools, OpenAI Assistants

Family 2: ReAct-Style Agents

“Interleave reasoning and acting”

The agent alternates between Thought (reasoning about what to do) and Action (doing it). The observation from the action feeds back into the next thought. This is the dominant pattern for task-solving agents.

Key papers: ReAct, DEPS, Inner Monologue Key frameworks: LangChain agents, LlamaIndex agents

Family 3: Plan-Then-Execute

“Make a plan, then execute it”

A planner agent generates a high-level plan, then an executor carries out each step. Allows more structured task decomposition and is easier to monitor.

Key papers: Plan-and-Execute (Chase, 2023; blog post), LLM+P, DEPS Key frameworks: LangGraph, LangChain Plan-and-Execute

Family 4: Reflective / Self-Improving Agents

“Learn from mistakes within a task”

Agents that evaluate their own performance and update their approach. Introduces a critic or reflection component.

Key papers: Reflexion, Self-Refine, CRITIC, Constitutional AI Key frameworks: AutoGen (with feedback), LangGraph with loops

Family 5: Multi-Agent Systems

“Specialized agents collaborating”

Multiple LLM instances with different roles (planner, coder, critic, etc.) communicate and collaborate to solve complex tasks.

Key papers: CAMEL, MetaGPT, ChatDev, AutoGen, AgentVerse Key frameworks: AutoGen, CrewAI, LangGraph multi-agent

Family 6: Memory-Augmented Agents

“Agents with persistent, structured memory”

Go beyond the context window with external memory systems: vector databases, structured stores, memory hierarchies modeled after cognitive architectures.

Key papers: MemGPT, Generative Agents, A-MEM, ReadAgent Key frameworks: MemGPT, LangChain Memory

Family 7: Embodied / Action Agents

“Agents that act in the physical or digital world”

Agents that control computers (GUI agents), robots (embodied agents), or execute code. The action space is rich and grounded.

Key papers: SayCan, RT-2, CogAgent, UFO, OS-Copilot, SWE-agent Key products: Devin, Claude Computer Use, OpenAI Operator

The Virtuous Cycle (Plaat et al.)

One underappreciated insight from the Plaat et al. survey: the three categories of agentic behavior form a virtuous cycle that also generates training data:

Reasoning ──→ better decisions
    ↑                ↓
Interacting ←── Acting in the world
    │                │
    └──→ new training data ──→ better LLMs

Key insight: agentic LLMs that act and interact generate new empirical data — action-feedback sequences, multi-agent dialogues, role-play transcripts — that can feed back into pretraining and finetuning. This offers a potential solution to the “running out of training data” problem: agents create their own curriculum through experience. Vision-Language-Action models (RT-2, π₀, Magma) are the clearest current example.

The flip side: feedback loops can destabilize learning. Agent-generated data may amplify biases or errors if not carefully filtered and validated.

Key Design Tensions

The literature reveals several recurring tensions that different systems resolve differently:

Tension	Tradeoff
Autonomy vs. Oversight	More autonomy = more capability, more risk
Generality vs. Specialization	Specialist agents perform better, general agents are more flexible
In-context vs. External memory	Context is fast but limited; external is vast but retrieval is lossy
Natural language vs. Structured	Natural language is flexible; structured is reliable
Single vs. Multi-agent	Multi-agent enables specialization; single-agent is simpler to debug
Plan-first vs. Interleave	Planning enables lookahead; interleaving enables reactivity

How the Field Has Evolved

Era	Dominant Pattern	Key Innovation
2022	Tool-augmented LLMs	MRKL, WebGPT, SayCan — LLMs + specialized modules
Early 2023	ReAct agents	Interleaved reasoning + action; Toolformer
Mid 2023	Autonomous agents	AutoGPT, BabyAGI — long-horizon goal pursuit
Late 2023	Multi-agent + Memory	MetaGPT, AutoGen, MemGPT, Generative Agents
2024	Coding agents + Infra	SWE-agent, Devin; LangGraph, CrewAI
2025-2026	Agentic products + MCP	Claude Computer Use, Operator; model-native tooling

References

Foundation Surveys

On the Opportunities and Risks of Foundation Models (Bommasani et al., 2022) — arXiv:2108.07258
A Survey on Large Language Model based Autonomous Agents (Wang et al., 2023) — arXiv:2308.11432
Large Language Models Meet Agents: A Survey (Xi et al., 2023) — arXiv:2309.07864
Agents in Artificial Intelligence: Surveys and Open Problems (Plaat et al., 2025, JAIR) — arXiv:2503.23037

Reasoning & Planning Papers

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022) — arXiv:2201.11903
Large Language Models are Zero-Shot Reasoners (Kojima et al., 2022) — arXiv:2205.11916
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models (Zhou et al., 2023) — arXiv:2305.04091
Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Yao et al., 2023) — arXiv:2305.10601
Decomposed Prompting: A Modular Approach for Solving Complex Tasks (Khot et al., 2022) — arXiv:2210.02406
Reasoning via Planning (Hao et al., 2023) — arXiv:2305.04091

Acting & Tool Use Papers

WebGPT: Browser-Assisted Question-Answering with Large Language Models (OpenAI, 2021) — Blog
MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning (Karpas et al., 2022) — arXiv:2205.00445
Toolformer: Language Models Can Teach Themselves to Use Tools (Schick et al., 2023) — arXiv:2302.04761
ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2023) — arXiv:2210.03629
SayCan: Grounding Language to Robotic Affordances (Ahn et al., 2022) — arXiv:2204.00598
Gorilla: Large Language Model Connected with Massive APIs (Patil et al., 2023) — arXiv:2305.15334
Large Language Models as Tool Makers (Cai et al., 2023) — arXiv:2305.17126
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing (Gou et al., 2023) — arXiv:2305.11738
Self-RAG: Learning to Retrieve, Generate, and Critique for Self-Improved Generation (Asai et al., 2023) — arXiv:2310.11511
CodeAct: Unified Language Models as Zero-shot Agents (Wang et al., 2024) — arXiv:2401.10403

Reflection & Self-Improvement Papers

Reflexion: Language Agents with Verbal Reinforcement Learning (Shinn et al., 2023) — arXiv:2303.11366
Self-Refine: Iterative Refinement with Self-Feedback (Madaan et al., 2023) — arXiv:2303.17651
Constitutional AI: Harmlessness from AI Feedback (Bai et al., 2022) — arXiv:2212.08073

Multi-Agent Systems Papers

CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Models (Li et al., 2023) — arXiv:2303.17760
MetaGPT: The Multi-Agent Framework (Hong et al., 2023) — arXiv:2308.00352
ChatDev: Communicative Agents for Software Development (Qian et al., 2023) — arXiv:2307.07924
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (Wu et al., 2023) — arXiv:2308.08155
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents (Chen et al., 2023) — arXiv:2308.10848

Memory & Context Management Papers

MemGPT: Towards LLMs as Operating Systems (Packer et al., 2023) — arXiv:2310.08560
Generative Agents: Interactive Simulacra of Human Behavior (Park et al., 2023) — arXiv:2304.03442
ReadAgent: Gist Memory for Extending Context Window of Large Language Models (Google Research, 2024) — arXiv:2402.09727
Cognitive Architectures for Language Agents (Sumers et al., 2023) — arXiv:2309.10638

Embodied & GUI Agents Papers

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (Driess et al., 2023) — arXiv:2307.15818
CogAgent: A Visual Language Model for GUI Agents (Hong et al., 2023) — arXiv:2312.08914
UFO: A UI-Focused Agent for Windows OS Interaction (Zhang et al., 2024) — arXiv:2402.07939
SWE-agent: An Open-Source Software Engineering Agent (Yang et al., 2024) — arXiv:2405.15793
OS-Copilot: Towards Generalist Computer Agents with Open-Ended Learning (Zhu et al., 2024) — arXiv:2402.07939

Vision-Language-Action Models

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (Driess et al., 2023) — arXiv:2307.15818
π₀: A Vision-Language-Action Flow Model for Open-World Robotic Manipulation (source needed)
**Magma: Multimodal Agents Grounded in Machine Learning and Mobility* (source needed)

Benchmarks & Evaluations

AgentBench: Evaluating LLMs as Agents (Liu et al., 2023) — arXiv:2308.03688
WebArena: A Realistic Web Environment for Building Autonomous Agents (Zhou et al., 2023) — arXiv:2307.13854
GAIA: A Benchmark for General AI Assistants (Mialon et al., 2023) — arXiv:2311.12983
SWE-bench: A Benchmark for Software Engineering with Language Models (Jimenez et al., 2023) — arXiv:2310.06770
OSWorld: Benchmarking Multimodal Agents in Real Computer Environments (Xie et al., 2024) — arXiv:2404.07972

Standards & Protocols

Model Context Protocol (MCP) (Anthropic, 2024) — GitHub
A2A (Agent-to-Agent) Protocol (Google, 2025) — Linux Foundation

Industry Resources & Blog Posts

LLM Powered Autonomous Agents Systems — Lilian Weng, 2023 — Blog post
Building Effective Agents — Anthropic, 2024 — Blog post
Agent Architecture Patterns & Best Practices — LangChain — Docs