LlamaIndex Tutorial (Python): adding memory to agents for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexadding-memory-to-agents-for-intermediate-developerspython

This tutorial shows you how to give a LlamaIndex agent short-term memory in Python so it can remember prior turns in a conversation. You need this when your agent has to carry context across multiple user messages without re-sending the full history every time.

What You'll Need

  • Python 3.10+
  • A virtual environment
  • llama-index
  • An OpenAI API key
  • Basic familiarity with LlamaIndex agents and tools
  • A terminal where you can run a Python script

Install the package first:

pip install llama-index

Set your API key in the shell:

export OPENAI_API_KEY="your-api-key"

Step-by-Step

  1. Start with a clean agent setup.
    We’ll use a simple calculator tool so you can see memory affecting follow-up questions instead of just single-turn answers.
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool
from llama_index.llms.openai import OpenAI

def add_numbers(a: float, b: float) -> float:
    """Add two numbers."""
    return a + b

tool = FunctionTool.from_defaults(fn=add_numbers)
llm = OpenAI(model="gpt-4o-mini")
  1. Add chat memory to the agent.
    LlamaIndex stores conversation state in a memory object, and the agent reads from it on each turn. This is what lets it answer follow-up questions like “what was that total again?”
from llama_index.core.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=2000)

agent = ReActAgent.from_tools(
    tools=[tool],
    llm=llm,
    memory=memory,
    verbose=True,
)
  1. Run a first turn and inspect the response.
    The first call establishes context in memory. Keep this interaction simple so you can verify the memory is actually being updated.
response1 = agent.chat("Add 12.5 and 7.25.")
print(response1)

response2 = agent.chat("What was the result?")
print(response2)
  1. Confirm the memory contains the conversation.
    In production, I always check the stored messages during development because it tells you whether your agent is truly stateful or just sounding stateful.
for message in memory.get_all():
    role = type(message).__name__
    print(f"{role}: {message.content}")
  1. Persist memory if you need continuity across sessions.
    If your app restarts, in-memory chat history disappears. For real systems, save and reload messages from your own database or cache layer.
import json

history = [
    {"role": type(msg).__name__, "content": msg.content}
    for msg in memory.get_all()
]

with open("chat_history.json", "w") as f:
    json.dump(history, f, indent=2)

Testing It

Run the script and ask a question that depends on prior context, such as “What was the result?” after an arithmetic turn. If the agent answers correctly without you restating the numbers, memory is working.

Then restart the script and ask the follow-up again. If you only used ChatMemoryBuffer, it should forget everything after restart, which is expected.

If you want stronger validation, print memory.get_all() after each turn and confirm new user and assistant messages are appended in order. That’s the fastest way to catch bugs where memory is instantiated but not actually attached to the agent.

Next Steps

  • Swap ChatMemoryBuffer for a persistent storage pattern backed by Redis, Postgres, or your app database.
  • Add summarization so long conversations don’t hit token limits.
  • Combine memory with retrieval so the agent remembers both conversation state and external knowledge.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides