LlamaIndex Tutorial (Python): persisting agent state for advanced developers
This tutorial shows how to persist LlamaIndex agent state in Python so you can stop and restart an agent without losing its conversation history, tool context, or memory-backed behavior. You need this when your agent runs in a long-lived service, survives deploys, or needs to resume work after a crash instead of starting from zero.
What You'll Need
- •Python 3.10+
- •
llama-index - •
llama-index-llms-openai - •
llama-index-agent-openai - •
openaiAPI key - •A writable local directory for persisted state
- •Basic familiarity with LlamaIndex agents and tools
Install the packages:
pip install llama-index llama-index-llms-openai llama-index-agent-openai openai
Set your API key:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Start by creating a simple agent with a tool and a memory object. The important part is that the memory is explicit, because that is what we will persist and reload later.
from pathlib import Path
from llama_index.core import Settings, SimpleDirectoryReader
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.tools import FunctionTool
from llama_index.llms.openai import OpenAI
from llama_index.agent.openai import OpenAIAgent
Settings.llm = OpenAI(model="gpt-4o-mini")
def get_account_balance(account_id: str) -> str:
balances = {"123": "$4,250.17", "456": "$980.42"}
return balances.get(account_id, "Account not found")
balance_tool = FunctionTool.from_defaults(fn=get_account_balance)
memory = ChatMemoryBuffer.from_defaults(token_limit=4000)
agent = OpenAIAgent.from_tools(
tools=[balance_tool],
llm=Settings.llm,
memory=memory,
verbose=True,
)
- •Run a few turns and then persist the agent’s memory to disk. In production, this is the checkpoint you want before process shutdown, deploys, or handoff to another worker.
response1 = agent.chat("What's the balance for account 123?")
print(response1)
response2 = agent.chat("Remember that I asked about account 123.")
print(response2)
persist_dir = Path("./agent_state")
persist_dir.mkdir(parents=True, exist_ok=True)
memory.persist(persist_dir / "chat_memory.json")
print(f"Saved memory to {persist_dir / 'chat_memory.json'}")
- •Recreate the agent in a fresh process by loading the saved memory file. This is the key pattern: rebuild the runtime objects, then inject the restored state before continuing.
from pathlib import Path
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.tools import FunctionTool
from llama_index.llms.openai import OpenAI
from llama_index.agent.openai import OpenAIAgent
def get_account_balance(account_id: str) -> str:
balances = {"123": "$4,250.17", "456": "$980.42"}
return balances.get(account_id, "Account not found")
balance_tool = FunctionTool.from_defaults(fn=get_account_balance)
persist_dir = Path("./agent_state")
loaded_memory = ChatMemoryBuffer.from_persisted_state(
persist_dir / "chat_memory.json"
)
llm = OpenAI(model="gpt-4o-mini")
restored_agent = OpenAIAgent.from_tools(
tools=[balance_tool],
llm=llm,
memory=loaded_memory,
verbose=True,
)
- •Continue the conversation and confirm the restored context is still available. If persistence worked, the agent should answer follow-up questions using prior turns instead of behaving like a new session.
response3 = restored_agent.chat("What account did I ask about earlier?")
print(response3)
response4 = restored_agent.chat("Now compare that to account 456.")
print(response4)
- •If you want stronger production behavior, persist more than just chat memory. For multi-step workflows, store session metadata alongside memory so you can restore routing decisions, customer IDs, or workflow stage.
import json
from pathlib import Path
session_state = {
"customer_id": "cust_001",
"workflow_stage": "balance_review",
"last_updated": "2026-04-21T12:00:00Z",
}
state_path = Path("./agent_state/session_meta.json")
state_path.write_text(json.dumps(session_state, indent=2))
loaded_session_state = json.loads(state_path.read_text())
print(loaded_session_state["workflow_stage"])
Testing It
Run the first script once, then run the restore script in a separate Python process. The second run should still know what was discussed earlier because it loaded the persisted chat memory.
Check that ./agent_state/chat_memory.json exists and contains messages after the first run. If it is empty or missing, your persistence path or file permissions are wrong.
Also verify that follow-up prompts reference prior context correctly. A good test is asking “What did I ask about earlier?” after restart; if it answers correctly, your state restoration path is working.
For extra confidence in production-like setups, kill the process between save and restore. Persistence only matters if recovery works across process boundaries.
Next Steps
- •Add Redis or Postgres-backed persistence for shared agent state across workers.
- •Persist tool outputs separately from chat memory when tool calls are expensive or rate-limited.
- •Wrap restore logic in a session loader so each user gets isolated state by session ID.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit