LlamaIndex Tutorial (Python): persisting agent state for intermediate developers
This tutorial shows you how to persist LlamaIndex agent state in Python so conversations survive process restarts, deploys, and worker crashes. If you’re building an agent that needs memory across sessions, this is the difference between a demo and something you can actually ship.
What You'll Need
- •Python 3.10+
- •
llama-index - •
llama-index-llms-openai - •
llama-index-storage-docstore-mongodbis not needed for this tutorial; we’ll use local persistence - •An OpenAI API key set as
OPENAI_API_KEY - •A writable directory on disk for persisted state
- •Basic familiarity with:
- •
ReActAgent - •
StorageContext - •LlamaIndex tools and chat loops
- •
Install the packages:
pip install llama-index llama-index-llms-openai
Set your API key:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Start by creating a simple tool and agent. The point here is not the tool itself, but having an agent with internal state worth saving between runs.
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool
from llama_index.llms.openai import OpenAI
def get_account_status(account_id: str) -> str:
return f"Account {account_id} is active with no overdue balance."
status_tool = FunctionTool.from_defaults(fn=get_account_status)
llm = OpenAI(model="gpt-4o-mini")
agent = ReActAgent.from_tools(
tools=[status_tool],
llm=llm,
verbose=True,
)
- •Next, create a persistent storage location and save the agent’s current state after a conversation turn. In LlamaIndex, persistence is handled through the storage context, which writes index and chat-related state to disk.
import os
from llama_index.core import StorageContext
persist_dir = "./agent_state"
os.makedirs(persist_dir, exist_ok=True)
response = agent.chat("Check account 12345.")
print(response)
storage_context = StorageContext.from_defaults()
storage_context.persist(persist_dir=persist_dir)
print(f"Persisted state to {persist_dir}")
- •Now reload that state in a fresh process. This is the real test: if your app restarts, you should still be able to continue from where the agent left off.
from llama_index.core import load_index_from_storage
loaded_storage_context = StorageContext.from_defaults(persist_dir=persist_dir)
loaded_index = load_index_from_storage(loaded_storage_context)
# Rebuild the agent around the loaded index-backed context.
# For agents that rely on memory/stateful workflows, this pattern keeps persisted artifacts available.
restored_agent = ReActAgent.from_tools(
tools=[status_tool],
llm=llm,
verbose=True,
)
follow_up = restored_agent.chat("What did you just tell me about account 12345?")
print(follow_up)
- •If you want actual conversational memory persisted across turns, use a chat store-backed workflow. This is what you need when you want the agent to remember prior messages after restart, not just reload index artifacts.
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.storage.chat_store import SimpleChatStore
chat_store_path = "./chat_store.json"
chat_store = SimpleChatStore.from_persist_path(chat_store_path) if os.path.exists(chat_store_path) else SimpleChatStore()
memory = ChatMemoryBuffer.from_defaults(token_limit=4000, chat_store=chat_store)
memory.put_message("user", "Remember that my policy number is POL-7781.")
memory.put_message("assistant", "Got it.")
chat_store.persist(chat_store_path)
print(f"Saved chat memory to {chat_store_path}")
- •On restart, load the chat store back into memory before creating the next response. This gives you durable conversational context without relying on in-process variables.
from llama_index.core.storage.chat_store import SimpleChatStore
from llama_index.core.memory import ChatMemoryBuffer
reloaded_chat_store = SimpleChatStore.from_persist_path(chat_store_path)
reloaded_memory = ChatMemoryBuffer.from_defaults(
token_limit=4000,
chat_store=reloaded_chat_store,
)
messages = reloaded_memory.get_all()
for msg in messages:
print(f"{msg.role}: {msg.content}")
- •Finally, wire persistence into your application lifecycle. Persist after each meaningful interaction, then reload at startup before serving requests.
def save_state(storage_context: StorageContext, chat_store: SimpleChatStore) -> None:
storage_context.persist(persist_dir=persist_dir)
chat_store.persist(chat_store_path)
def load_state():
storage = StorageContext.from_defaults(persist_dir=persist_dir)
store = SimpleChatStore.from_persist_path(chat_store_path)
return storage, store
storage_context_2, chat_store_2 = load_state()
print("State loaded successfully")
Testing It
Run the script once, ask a question, and confirm both the persisted directory and chat store file are created on disk. Then stop the process completely and run it again; your code should load without errors and print previously saved messages.
For a real verification pass, add a second turn after reload and confirm the agent can reference earlier conversation content. If it cannot, your persistence layer is saving files correctly but your runtime isn’t actually rehydrating memory into the request path.
A good production check is to delete only one of the persisted artifacts and see how your app behaves. That tells you whether your startup code fails fast or silently loses context.
Next Steps
- •Move from local disk persistence to S3 or Redis-backed storage for multi-instance deployments.
- •Add session IDs so each user gets isolated memory and state.
- •Learn how
Context,Memory, andChatStorefit together in LlamaIndex agents for cleaner production architecture
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit