LlamaIndex Tutorial (Python): persisting agent state for advanced developers

By Cyprian AaronsUpdated 2026-04-21

llamaindexpersisting-agent-state-for-advanced-developerspython

This tutorial shows how to persist LlamaIndex agent state in Python so you can stop and restart an agent without losing its conversation history, tool context, or memory-backed behavior. You need this when your agent runs in a long-lived service, survives deploys, or needs to resume work after a crash instead of starting from zero.

What You'll Need

•Python 3.10+
•llama-index
•llama-index-llms-openai
•llama-index-agent-openai
•openai API key
•A writable local directory for persisted state
•Basic familiarity with LlamaIndex agents and tools

Install the packages:

pip install llama-index llama-index-llms-openai llama-index-agent-openai openai

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

•Start by creating a simple agent with a tool and a memory object. The important part is that the memory is explicit, because that is what we will persist and reload later.

from pathlib import Path

from llama_index.core import Settings, SimpleDirectoryReader
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.tools import FunctionTool
from llama_index.llms.openai import OpenAI
from llama_index.agent.openai import OpenAIAgent

Settings.llm = OpenAI(model="gpt-4o-mini")

def get_account_balance(account_id: str) -> str:
    balances = {"123": "$4,250.17", "456": "$980.42"}
    return balances.get(account_id, "Account not found")

balance_tool = FunctionTool.from_defaults(fn=get_account_balance)

memory = ChatMemoryBuffer.from_defaults(token_limit=4000)
agent = OpenAIAgent.from_tools(
    tools=[balance_tool],
    llm=Settings.llm,
    memory=memory,
    verbose=True,
)

•Run a few turns and then persist the agent’s memory to disk. In production, this is the checkpoint you want before process shutdown, deploys, or handoff to another worker.

response1 = agent.chat("What's the balance for account 123?")
print(response1)

response2 = agent.chat("Remember that I asked about account 123.")
print(response2)

persist_dir = Path("./agent_state")
persist_dir.mkdir(parents=True, exist_ok=True)

memory.persist(persist_dir / "chat_memory.json")
print(f"Saved memory to {persist_dir / 'chat_memory.json'}")

•Recreate the agent in a fresh process by loading the saved memory file. This is the key pattern: rebuild the runtime objects, then inject the restored state before continuing.

from pathlib import Path

from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.tools import FunctionTool
from llama_index.llms.openai import OpenAI
from llama_index.agent.openai import OpenAIAgent

def get_account_balance(account_id: str) -> str:
    balances = {"123": "$4,250.17", "456": "$980.42"}
    return balances.get(account_id, "Account not found")

balance_tool = FunctionTool.from_defaults(fn=get_account_balance)

persist_dir = Path("./agent_state")
loaded_memory = ChatMemoryBuffer.from_persisted_state(
    persist_dir / "chat_memory.json"
)

llm = OpenAI(model="gpt-4o-mini")
restored_agent = OpenAIAgent.from_tools(
    tools=[balance_tool],
    llm=llm,
    memory=loaded_memory,
    verbose=True,
)

•Continue the conversation and confirm the restored context is still available. If persistence worked, the agent should answer follow-up questions using prior turns instead of behaving like a new session.

response3 = restored_agent.chat("What account did I ask about earlier?")
print(response3)

response4 = restored_agent.chat("Now compare that to account 456.")
print(response4)

•If you want stronger production behavior, persist more than just chat memory. For multi-step workflows, store session metadata alongside memory so you can restore routing decisions, customer IDs, or workflow stage.

import json
from pathlib import Path

session_state = {
    "customer_id": "cust_001",
    "workflow_stage": "balance_review",
    "last_updated": "2026-04-21T12:00:00Z",
}

state_path = Path("./agent_state/session_meta.json")
state_path.write_text(json.dumps(session_state, indent=2))

loaded_session_state = json.loads(state_path.read_text())
print(loaded_session_state["workflow_stage"])

Testing It

Run the first script once, then run the restore script in a separate Python process. The second run should still know what was discussed earlier because it loaded the persisted chat memory.

Check that ./agent_state/chat_memory.json exists and contains messages after the first run. If it is empty or missing, your persistence path or file permissions are wrong.

Also verify that follow-up prompts reference prior context correctly. A good test is asking “What did I ask about earlier?” after restart; if it answers correctly, your state restoration path is working.

For extra confidence in production-like setups, kill the process between save and restore. Persistence only matters if recovery works across process boundaries.

Next Steps

•Add Redis or Postgres-backed persistence for shared agent state across workers.
•Persist tool outputs separately from chat memory when tool calls are expensive or rate-limited.
•Wrap restore logic in a session loader so each user gets isolated state by session ID.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit