LlamaIndex Tutorial (Python): persisting agent state for beginners

By Cyprian AaronsUpdated 2026-04-21

llamaindexpersisting-agent-state-for-beginnerspython

This tutorial shows you how to persist a LlamaIndex agent’s state to disk in Python and load it back later. You need this when you want an agent to remember its tools, chat history, or workflow state across process restarts instead of starting from zero every time.

What You'll Need

•Python 3.10+
•llama-index installed
•An OpenAI API key set as OPENAI_API_KEY
•A writable local directory for persistence
•Basic familiarity with ReActAgent and LlamaIndex tools

Install the package:

pip install llama-index

Step-by-Step

•Start by creating a simple agent with one tool. The key point here is that the agent will later be wrapped in a workflow that supports persistence.

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool
from llama_index.llms.openai import OpenAI


def multiply(a: int, b: int) -> int:
    return a * b


tool = FunctionTool.from_defaults(fn=multiply)
llm = OpenAI(model="gpt-4o-mini")

agent = ReActAgent.from_tools(
    tools=[tool],
    llm=llm,
    verbose=True,
)

•Add a persistent memory object and point it at a local directory. This is what lets the agent recover its internal state after the process exits.

import os

from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.core.storage.chat_store import SimpleChatStore
from llama_index.core.memory import Memory


persist_dir = "./agent_state"
os.makedirs(persist_dir, exist_ok=True)

chat_store = SimpleChatStore()
memory = Memory.from_defaults(
    chat_store=chat_store,
    chat_store_key="demo_session",
    token_limit=4000,
)

workflow = AgentWorkflow.from_tools_or_functions(
    tools=[tool],
    llm=llm,
    memory=memory,
)

•Run the agent once, then persist the workflow state and chat store to disk. In practice, you save both because the conversation history lives in the chat store while the workflow keeps the agent’s execution state.

import json

response = workflow.run("What is 6 times 7?")
print(response)

chat_store.persist(persist_dir=persist_dir)

state_path = os.path.join(persist_dir, "workflow_state.json")
with open(state_path, "w", encoding="utf-8") as f:
    json.dump(
        {
            "chat_store_key": "demo_session",
        },
        f,
        indent=2,
    )

•Simulate a restart by creating fresh objects and loading the persisted data back. This is the part beginners usually miss: if you rebuild everything from scratch without loading storage, the agent will not remember anything.

import json

from llama_index.core.storage.chat_store import SimpleChatStore
from llama_index.core.memory import Memory


with open("./agent_state/workflow_state.json", "r", encoding="utf-8") as f:
    saved_state = json.load(f)

loaded_chat_store = SimpleChatStore.from_persist_dir("./agent_state")
loaded_memory = Memory.from_defaults(
    chat_store=loaded_chat_store,
    chat_store_key=saved_state["chat_store_key"],
    token_limit=4000,
)

reloaded_workflow = AgentWorkflow.from_tools_or_functions(
    tools=[tool],
    llm=llm,
    memory=loaded_memory,
)

•Ask a follow-up question and confirm the agent still has context from before. If persistence worked, it should continue from the same session instead of behaving like a brand-new assistant.

follow_up = reloaded_workflow.run("What was my previous question?")
print(follow_up)

another_answer = reloaded_workflow.run("Now multiply 8 by 9.")
print(another_answer)

Testing It

Run the script once and confirm it answers 42 for 6 times 7. Then stop the process, rerun it, and ask a follow-up like “What was my previous question?”; if persistence is working, the agent should reference the earlier interaction stored in demo_session.

Check that ./agent_state/ contains persisted files after the first run. If you delete that directory and rerun, the agent should lose its memory, which is a good sanity check that your persistence path is actually being used.

If you want stronger verification, print out stored messages from loaded_chat_store before running the second query. That lets you confirm that conversation history is being restored before any new inference happens.

Next Steps

•Move from SimpleChatStore to a production-backed store like Redis or Postgres for multi-process deployments.
•Add per-user session keys so each customer gets isolated memory.
•Persist more than chat history by looking into LlamaIndex workflow/state serialization patterns for longer-running agents.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit