LlamaIndex Tutorial (Python): adding memory to agents for advanced developers

By Cyprian AaronsUpdated 2026-04-21

llamaindexadding-memory-to-agents-for-advanced-developerspython

This tutorial shows how to add durable conversation memory to a LlamaIndex agent in Python, so it can remember user context across turns instead of treating every request as isolated. You need this when building support bots, case-handling assistants, or any agent that must carry forward facts like names, policy numbers, preferences, or prior decisions.

What You'll Need

•Python 3.10+
•llama-index
•An OpenAI API key
•A terminal and a virtual environment
•Basic familiarity with ReActAgent or FunctionAgent in LlamaIndex
•
Optional but useful:
- •python-dotenv for loading secrets from .env
- •llama-index-llms-openai if your install is split by provider

Step-by-Step

•Install the packages and set your API key.
For this example, we’ll use OpenAI-backed LlamaIndex components because they’re the most straightforward way to validate memory behavior end-to-end.

pip install llama-index python-dotenv
export OPENAI_API_KEY="your-openai-key"

•Create a small agent that can answer questions and keep state in memory.
The key piece here is ChatMemoryBuffer, which stores chat history and is passed into the agent at creation time.

from llama_index.core.agent import ReActAgent
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o-mini")
memory = ChatMemoryBuffer.from_defaults(token_limit=2000)

agent = ReActAgent.from_tools(
    tools=[],
    llm=llm,
    memory=memory,
    verbose=True,
)

•Send multiple turns through the same agent instance.
Memory only works if you reuse the same agent object; if you recreate it on every request, the conversation state is gone.

response1 = agent.chat("My name is Priya and I work in claims.")
print(response1)

response2 = agent.chat("What do you remember about me?")
print(response2)

response3 = agent.chat("I also handle fraud escalations.")
print(response3)

•Inspect the stored chat history directly.
This is useful in production when you want to debug what the model actually saw before generating a response.

chat_history = memory.get()
for idx, message in enumerate(chat_history):
    print(f"{idx}: {message.role} -> {message.content}")

•Persist memory outside the process if you need session continuity.
In real applications, you should not rely on in-process memory alone because it disappears when the worker restarts. A common pattern is to store messages in your own database keyed by session ID, then reconstruct ChatMemoryBuffer from that history when the user returns.

from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.llms import ChatMessage, MessageRole

stored_messages = [
    ChatMessage(role=MessageRole.USER, content="My name is Priya and I work in claims."),
    ChatMessage(role=MessageRole.ASSISTANT, content="Got it."),
]

restored_memory = ChatMemoryBuffer.from_defaults(
    chat_history=stored_messages,
    token_limit=2000,
)

print(restored_memory.get())

•Use memory with tools when your agent needs long-running context.
The pattern stays the same: create one memory object per session, attach it to the agent, and keep reusing that agent while the session is active.

from llama_index.core.agent import ReActAgent
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.llms.openai import OpenAI

def build_agent():
    llm = OpenAI(model="gpt-4o-mini")
    memory = ChatMemoryBuffer.from_defaults(token_limit=2000)
    return ReActAgent.from_tools(tools=[], llm=llm, memory=memory)

agent = build_agent()
print(agent.chat("Remember that my department is underwriting."))
print(agent.chat("What department am I in?"))

Testing It

Run the script and confirm that the second response references details from earlier turns. If you ask “What do you remember about me?” after introducing your name or role, the agent should answer using prior context instead of guessing from scratch.

Then restart the Python process and run it again with a fresh agent instance. The model should no longer remember anything from the previous run unless you explicitly restored chat history from storage.

For a stronger test, simulate two different users with two separate memory objects. If both users share one memory instance by mistake, their conversations will bleed into each other.

Next Steps

•Add per-session persistence with Redis or Postgres so memory survives restarts.
•Switch from raw chat history to summarization memory for long conversations.
•Combine memory with tools and retrieval so the agent remembers both dialogue and enterprise data.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit