Chapter 3.5: Agent Foundations - From ReAct to Multi-Agent Systems
Introduction​
Before diving into production multi-agent systems, we need to understand how agents work from the ground up. This chapter traces the evolution from simple ReAct agents to sophisticated multi-agent orchestration, showing you each step of the journey.
Think of this progression like learning web development: starting with vanilla JavaScript, then jQuery, then React, then Next.js. Each level builds on the previous, adding structure and capabilities. We're doing the same for AI agents.
What is an Agent?​
An agent is a program that:
- Perceives its environment (receives input)
- Reasons about what to do (makes decisions)
- Acts on the environment (executes actions)
- Learns from outcomes (improves over time)
Simple Example:
def simple_agent(user_question: str) -> str:
"""A basic agent that just calls an LLM"""
response = llm.generate(user_question)
return response
Problem: This isn't really an "agent" - it's just an LLM call. It can't use tools, check facts, or break down complex tasks.
The ReAct Pattern: Reasoning + Acting​
ReAct (Reasoning and Acting) is a foundational pattern that makes LLMs act like true agents. Published in 2022, it interleaves reasoning and action steps.
How ReAct Works​
Thought: I need to find current weather data
Action: search_weather("San Francisco")
Observation: Temperature is 65°F, partly cloudy
Thought: Now I can answer the user's question
Action: finish("The weather in San Francisco is 65°F and partly cloudy")
Key Insight: By making the model think out loud before acting, we get:
- Better decision making
- Transparency (we see the reasoning)
- Error correction (model can reconsider)
Implementing Basic ReAct​
from typing import List, Dict, Callable
import google.generativeai as genai
class ReactAgent:
"""Basic ReAct agent with tool calling"""
def __init__(self, tools: Dict[str, Callable]):
self.tools = tools
self.model = genai.GenerativeModel('gemini-1.5-flash')
def run(self, task: str, max_iterations: int = 5) -> str:
"""Execute ReAct loop"""
conversation_history = []
for iteration in range(max_iterations):
# Generate thought and action
prompt = self._build_prompt(task, conversation_history)
response = self.model.generate_content(prompt)
# Parse response
thought, action, action_input = self._parse_response(response.text)
print(f"Thought: {thought}")
print(f"Action: {action}({action_input})")
# Execute action
if action == "finish":
return action_input
if action in self.tools:
observation = self.tools[action](action_input)
print(f"Observation: {observation}\n")
# Add to history
conversation_history.append({
"thought": thought,
"action": action,
"action_input": action_input,
"observation": observation
})
else:
print(f"Error: Unknown action '{action}'")
break
return "Task incomplete after max iterations"
def _build_prompt(self, task: str, history: List[Dict]) -> str:
"""Build ReAct prompt with tools description"""
tools_desc = "\n".join([
f"- {name}: {func.__doc__}"
for name, func in self.tools.items()
])
prompt = f"""You are an agent that can use tools to solve tasks.
Available tools:
{tools_desc}
Your response must follow this format:
Thought: <your reasoning>
Action: <tool_name>
Action Input: <input for the tool>
Or to finish:
Thought: <final reasoning>
Action: finish
Action Input: <final answer>
Task: {task}
"""
# Add conversation history
for step in history:
prompt += f"""Thought: {step['thought']}
Action: {step['action']}
Action Input: {step['action_input']}
Observation: {step['observation']}
"""
prompt += "Thought:"
return prompt
def _parse_response(self, response: str) -> tuple:
"""Extract thought, action, and input from response"""
lines = response.strip().split('\n')
thought = ""
action = ""
action_input = ""
for line in lines:
if line.startswith("Thought:"):
thought = line.replace("Thought:", "").strip()
elif line.startswith("Action:"):
action = line.replace("Action:", "").strip()
elif line.startswith("Action Input:"):
action_input = line.replace("Action Input:", "").strip()
return thought, action, action_input
Using the ReAct Agent​
# Define tools
def search_papers(query: str) -> str:
"""Search for academic papers on arXiv"""
# Simplified for example
return f"Found 3 papers about '{query}'"
def get_paper_abstract(paper_id: str) -> str:
"""Get the abstract of a specific paper"""
return f"Abstract of paper {paper_id}: This paper discusses..."
# Create agent with tools
agent = ReactAgent(tools={
"search_papers": search_papers,
"get_paper_abstract": get_paper_abstract,
"finish": lambda x: x
})
# Run task
result = agent.run("Find recent papers about multi-agent systems")
Output:
Thought: I need to search for papers about multi-agent systems
Action: search_papers
Action Input: multi-agent systems
Observation: Found 3 papers about 'multi-agent systems'
Thought: I found relevant papers, I can provide the answer
Action: finish
Action Input: I found 3 recent papers about multi-agent systems on arXiv
ReAct transforms a passive LLM into an active agent that can interact with the world through tools. This is the foundation of all agent systems.
System Prompts: Giving Agents Identity​
System prompts define an agent's behavior, personality, and capabilities.
Anatomy of a Good System Prompt​
RESEARCHER_SYSTEM_PROMPT = """You are a research assistant specialized in academic literature.
Your capabilities:
- Search academic databases (arXiv, Semantic Scholar, PubMed)
- Extract and summarize research papers
- Identify relationships between papers
- Provide citations in proper format
Your behavior:
- Always cite sources with paper IDs
- Admit when you don't know something
- Break complex questions into steps
- Verify information before responding
Your limitations:
- Cannot access papers behind paywalls
- Cannot read full PDFs (only abstracts and metadata)
- Information is current as of your last update
When responding:
1. Think through the problem step-by-step
2. Use tools to gather information
3. Synthesize findings clearly
4. Provide citations
"""
Integrating System Prompts with ReAct​
class ReactAgent:
"""ReAct agent with system prompt"""
def __init__(self, tools: Dict[str, Callable], system_prompt: str):
self.tools = tools
self.system_prompt = system_prompt
self.model = genai.GenerativeModel(
'gemini-1.5-flash',
system_instruction=system_prompt # System prompt here
)
def run(self, task: str, max_iterations: int = 5) -> str:
# System prompt is automatically prepended to all messages
# Rest of implementation same as before
pass
Impact: System prompts make agents:
- More reliable (consistent behavior)
- More focused (stay on task)
- More transparent (explain their reasoning)
- More controllable (follow rules)
Tool Integration: Extending Agent Capabilities​
Tools are functions that agents can call to interact with the external world.
Types of Tools​
1. Information Retrieval
def search_arxiv(query: str, max_results: int = 5) -> List[Dict]:
"""Search arXiv for papers matching query"""
import arxiv
search = arxiv.Search(
query=query,
max_results=max_results,
sort_by=arxiv.SortCriterion.SubmittedDate
)
results = []
for paper in search.results():
results.append({
"title": paper.title,
"authors": [author.name for author in paper.authors],
"abstract": paper.summary,
"pdf_url": paper.pdf_url,
"published": paper.published.isoformat()
})
return results
2. Data Processing
def extract_entities(text: str) -> Dict[str, List[str]]:
"""Extract named entities from text"""
# Using spaCy or similar
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
entities = {
"PERSON": [],
"ORG": [],
"DATE": []
}
for ent in doc.ents:
if ent.label_ in entities:
entities[ent.label_].append(ent.text)
return entities
3. External APIs
def query_database(cypher_query: str) -> List[Dict]:
"""Execute Cypher query on Neo4j database"""
from neo4j import GraphDatabase
driver = GraphDatabase.driver("bolt://localhost:7687")
with driver.session() as session:
result = session.run(cypher_query)
return [record.data() for record in result]
4. State Modification
def add_to_memory(key: str, value: str) -> str:
"""Store information in agent's memory"""
agent_memory[key] = value
return f"Stored '{key}': {value[:50]}..."
Tool Calling with Modern LLMs​
Modern LLMs like Gemini have native tool calling:
import google.generativeai as genai
# Define tool for Gemini
search_tool = genai.protos.Tool(
function_declarations=[
genai.protos.FunctionDeclaration(
name="search_papers",
description="Search academic databases for research papers",
parameters=genai.protos.Schema(
type=genai.protos.Type.OBJECT,
properties={
"query": genai.protos.Schema(
type=genai.protos.Type.STRING,
description="Search query"
),
"max_results": genai.protos.Schema(
type=genai.protos.Type.INTEGER,
description="Max results"
)
},
required=["query"]
)
)
]
)
# Create model with tools
model = genai.GenerativeModel(
'gemini-1.5-flash',
tools=[search_tool]
)
# Agent automatically decides when to use tools
chat = model.start_chat()
response = chat.send_message("Find papers about RAG systems")
# Handle tool calls
for part in response.parts:
if part.function_call:
function_name = part.function_call.name
function_args = dict(part.function_call.args)
# Execute the function
result = search_papers(**function_args)
# Send result back to model
response = chat.send_message(
genai.protos.Content(
parts=[genai.protos.Part(
function_response=genai.protos.FunctionResponse(
name=function_name,
response={"result": result}
)
)]
)
)
Tool calling is like API endpoints in a web app. The agent (frontend) calls tools (backend endpoints) to fetch data or perform actions, then uses the results to generate responses.
From Single Agent to Stateful Workflows​
Single-turn ReAct agents work for simple tasks, but complex workflows need:
- State persistence (remember past actions)
- Multiple steps (break down complex tasks)
- Error recovery (retry failed actions)
- Branching logic (different paths based on results)
This is where LangGraph comes in.
Why LangGraph?​
Traditional ReAct agents have limitations:
# Problem 1: No state persistence
result1 = agent.run("Search for papers on topic X")
result2 = agent.run("Summarize the first paper") # Agent forgot previous search!
# Problem 2: No branching logic
# Can't say: "If search returns no results, try broader query"
# Problem 3: No error recovery
# If one action fails, entire workflow stops
LangGraph Solution: Represent workflows as stateful graphs.
LangGraph Basics​
from langgraph.graph import StateGraph, END
from typing import TypedDict
# Define state
class AgentState(TypedDict):
"""State that flows through the workflow"""
messages: list
papers: list
current_query: str
next_action: str
# Create graph
workflow = StateGraph(AgentState)
# Define nodes (functions that modify state)
def search_node(state: AgentState) -> AgentState:
"""Search for papers"""
papers = search_papers(state["current_query"])
state["papers"] = papers
state["messages"].append(f"Found {len(papers)} papers")
return state
def summarize_node(state: AgentState) -> AgentState:
"""Summarize papers"""
summaries = [summarize(p) for p in state["papers"]]
state["messages"].append(f"Summarized {len(summaries)} papers")
return state
# Add nodes to graph
workflow.add_node("search", search_node)
workflow.add_node("summarize", summarize_node)
# Define edges (workflow flow)
workflow.set_entry_point("search")
workflow.add_edge("search", "summarize")
workflow.add_edge("summarize", END)
# Compile graph
app = workflow.compile()
# Run workflow (state persists across nodes!)
result = app.invoke({
"messages": [],
"papers": [],
"current_query": "multi-agent systems",
"next_action": ""
})
Benefits:
- State persists between nodes
- Clear workflow (visualize as graph)
- Composable (add/remove nodes easily)
- Testable (test each node independently)
LangGraph with ReAct Tools​
Combining LangGraph's state management with ReAct's tool calling:
from langgraph.graph import StateGraph, END
from typing import TypedDict, List
class AgentState(TypedDict):
messages: List[str]
intermediate_steps: List[tuple] # Store (action, observation) pairs
def create_react_langgraph_agent(tools: List):
"""Create a ReAct agent using LangGraph"""
# Define nodes
def agent_node(state: AgentState) -> AgentState:
"""Agent decides next action"""
# Build prompt with history
prompt = build_prompt(state["messages"], state["intermediate_steps"])
# Get LLM decision
response = model.generate_content(prompt)
# Parse action
action, action_input = parse_action(response.text)
state["intermediate_steps"].append((action, action_input))
return state
def tool_node(state: AgentState) -> AgentState:
"""Execute the tool"""
last_action, action_input = state["intermediate_steps"][-1]
# Execute tool
observation = execute_tool(last_action, action_input, tools)
# Add observation to state
state["intermediate_steps"][-1] = (last_action, action_input, observation)
return state
def should_continue(state: AgentState) -> str:
"""Decide if we should continue or finish"""
last_action = state["intermediate_steps"][-1][0]
if last_action == "finish":
return "end"
else:
return "continue"
# Build graph
workflow = StateGraph(AgentState)
workflow.add_node("agent", agent_node)
workflow.add_node("tools", tool_node)
workflow.set_entry_point("agent")
workflow.add_conditional_edges(
"agent",
should_continue,
{
"continue": "tools",
"end": END
}
)
workflow.add_edge("tools", "agent") # Loop back for next action
return workflow.compile()
What's Happening Here:
- agent_node: LLM decides next action (ReAct reasoning)
- tool_node: Execute the tool (ReAct acting)
- Conditional routing: Continue loop or end (ReAct decision)
- State: Preserves full history (ReAct memory)
LangGraph + ReAct gives you the best of both worlds:
- ReAct's reasoning and tool use
- LangGraph's state management and control flow
Graph Compilation: Making Workflows Executable​
When you call workflow.compile(), LangGraph transforms your graph definition into an executable state machine.
What Happens During Compilation​
workflow = StateGraph(AgentState)
workflow.add_node("search", search_node)
workflow.add_node("process", process_node)
workflow.add_edge("search", "process")
# Before compilation: Just a graph definition
print(type(workflow)) # <class 'StateGraph'>
# After compilation: Executable runtime
app = workflow.compile()
print(type(app)) # <class 'CompiledGraph'>
Compilation Steps:
- Validation: Check all nodes are connected, no orphans
- Topological Sort: Determine execution order
- Checkpointing Setup: Enable state saving
- Optimization: Parallelize independent nodes
- Runtime Creation: Generate executable code
Understanding Compiled Graphs​
from langgraph.graph import StateGraph, END
workflow = StateGraph(AgentState)
# Define workflow
workflow.add_node("start", start_node)
workflow.add_node("branch_a", branch_a_node)
workflow.add_node("branch_b", branch_b_node)
workflow.add_node("merge", merge_node)
workflow.set_entry_point("start")
workflow.add_conditional_edges(
"start",
routing_function,
{
"a": "branch_a",
"b": "branch_b"
}
)
workflow.add_edge("branch_a", "merge")
workflow.add_edge("branch_b", "merge")
workflow.add_edge("merge", END)
# Compile
app = workflow.compile()
# Execution creates a state machine:
# 1. Initialize state
# 2. Execute entry_point node
# 3. Check conditional edges (if any)
# 4. Execute next node
# 5. Update state
# 6. Repeat until END
Checkpointing: State Persistence​
from langgraph.checkpoint.memory import MemorySaver
# Compile with checkpointing
checkpointer = MemorySaver()
app = workflow.compile(checkpointer=checkpointer)
# Run with thread ID (for persistence)
config = {"configurable": {"thread_id": "conversation_1"}}
# First invocation
result1 = app.invoke(initial_state, config)
# Later invocation (continues from saved state!)
result2 = app.invoke(new_state, config)
Use Cases:
- Multi-turn conversations
- Long-running workflows
- Error recovery
- Pause and resume
Evolution to Multi-Agent Systems​
Now that we understand single agents with tools and stateful workflows, let's see how this evolves into multi-agent systems.
Single Agent Limitations​
# Single agent trying to do everything
agent = ReactAgent(tools=[
search_papers,
extract_entities,
build_graph,
compute_vectors,
generate_answer
])
# Problems:
# 1. Complex prompts (agent must know about ALL tools)
# 2. Context limits (too much in memory)
# 3. No specialization (jack of all trades, master of none)
# 4. No parallelization (sequential execution)
Multi-Agent Solution: Specialization​
Instead of one agent doing everything, create specialized agents:
class DataCollectorAgent:
"""Specialized in collecting papers from sources"""
tools = [search_arxiv, search_semantic_scholar, search_pubmed]
class KnowledgeGraphAgent:
"""Specialized in building knowledge graphs"""
tools = [extract_entities, create_relationships, query_graph]
class VectorAgent:
"""Specialized in semantic search"""
tools = [embed_text, search_vectors, find_similar]
class ReasoningAgent:
"""Specialized in answering questions"""
tools = [generate_answer, cite_sources, explain_reasoning]
Benefits:
- Focused: Each agent masters its domain
- Maintainable: Changes to one agent don't affect others
- Testable: Test each agent independently
- Scalable: Run agents in parallel
Multi-Agent Orchestration with LangGraph​
from langgraph.graph import StateGraph, END
class MultiAgentState(TypedDict):
query: str
papers: List[Dict]
graph_data: Dict
vectors: np.ndarray
answer: str
def create_multi_agent_workflow():
"""Multi-agent system with LangGraph"""
# Initialize agents
data_collector = DataCollectorAgent()
graph_agent = KnowledgeGraphAgent()
vector_agent = VectorAgent()
reasoning_agent = ReasoningAgent()
# Define workflow nodes (one per agent)
def collect_data_node(state: MultiAgentState) -> MultiAgentState:
papers = data_collector.collect(state["query"])
state["papers"] = papers
return state
def build_graph_node(state: MultiAgentState) -> MultiAgentState:
graph = graph_agent.build_graph(state["papers"])
state["graph_data"] = graph
return state
def create_vectors_node(state: MultiAgentState) -> MultiAgentState:
vectors = vector_agent.embed(state["papers"])
state["vectors"] = vectors
return state
def reason_node(state: MultiAgentState) -> MultiAgentState:
# Use data from other agents
context = {
"papers": state["papers"],
"graph": state["graph_data"],
"vectors": state["vectors"]
}
answer = reasoning_agent.generate_answer(state["query"], context)
state["answer"] = answer
return state
# Build workflow
workflow = StateGraph(MultiAgentState)
workflow.add_node("collect", collect_data_node)
workflow.add_node("graph", build_graph_node)
workflow.add_node("vectors", create_vectors_node)
workflow.add_node("reason", reason_node)
# Define flow
workflow.set_entry_point("collect")
# Parallel execution: graph and vectors can run simultaneously
workflow.add_edge("collect", "graph")
workflow.add_edge("collect", "vectors")
# Both must complete before reasoning
workflow.add_edge("graph", "reason")
workflow.add_edge("vectors", "reason")
workflow.add_edge("reason", END)
return workflow.compile()
Key Features:
- Specialized Agents: Each handles one concern
- Parallel Execution: graph and vectors run together
- Shared State: All agents read/write to MultiAgentState
- Orchestration: LangGraph manages coordination
Coordination Patterns​
Progression Summary​
Single Function Call
↓
ReAct Agent (Reason + Act)
↓
ReAct + System Prompts (Controlled behavior)
↓
ReAct + Tools (External capabilities)
↓
LangGraph + ReAct (Stateful workflows)
↓
Multi-Agent LangGraph (Specialized + Coordinated)
↓
Production Multi-Agent System (ResearcherAI)
Each level adds capabilities while building on previous foundations.
Key Takeaways​
- ReAct Pattern: Foundation for agents that can reason and act
- System Prompts: Define agent behavior and capabilities
- Tools: Extend agents to interact with external world
- LangGraph: Manage complex, stateful workflows
- Graph Compilation: Transform definitions into executables
- Multi-Agent Systems: Specialized agents coordinated by graphs
Next Steps​
Now that you understand agent foundations, you're ready to:
- Chapter 4 (Orchestration Frameworks): Deep dive into LangGraph and LlamaIndex
- Chapter 5 (Backend): Implement production agents with databases
- Chapter 6 (Frontend): Build UIs for multi-agent systems
The foundations you learned here power everything in ResearcherAI. Every concept - from ReAct to graph compilation - is used in the production system.
Try building your own agent:
- Start with basic ReAct
- Add system prompt for personality
- Add 2-3 tools (search, calculate, memory)
- Convert to LangGraph workflow
- Add a second specialized agent
- Coordinate with conditional routing
This progression mirrors how ResearcherAI was built!