Haystack Tutorial (Python): adding memory to agents for beginners

By Cyprian AaronsUpdated 2026-04-21

haystackadding-memory-to-agents-for-beginnerspython

This tutorial shows how to give a Haystack agent short-term memory so it can remember prior turns in a conversation and use that context in later responses. You need this when your agent must handle follow-up questions, preserve user preferences, or avoid asking the same thing twice.

What You'll Need

•Python 3.10+
•haystack-ai installed
•An OpenAI API key
•Basic familiarity with Haystack pipelines and components
•A terminal and a virtual environment

Install the package:

pip install haystack-ai

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

•Start by creating an LLM component and a place to store chat history. In Haystack, memory is usually implemented by keeping conversation messages and passing them back into the model on each turn.

from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage

chat_generator = OpenAIChatGenerator(model="gpt-4o-mini")

memory = [
    ChatMessage.from_system("You are a helpful banking assistant."),
    ChatMessage.from_user("My name is Priya."),
    ChatMessage.from_assistant("Nice to meet you, Priya."),
]

•Add a small helper function that appends the new user message, calls the model, and stores the response back into memory. This is the simplest production-friendly pattern for beginner agents because it is explicit and easy to debug.

def chat_with_memory(user_text: str) -> str:
    global memory

    memory.append(ChatMessage.from_user(user_text))
    result = chat_generator.run(messages=memory)
    assistant_message = result["replies"][0]

    memory.append(assistant_message)
    return assistant_message.content

•Send a first prompt that depends on previous context. The model should use the remembered name from earlier messages instead of asking again.

reply1 = chat_with_memory("What is my name?")
print(reply1)

reply2 = chat_with_memory("Can you remind me what I told you earlier?")
print(reply2)

•If you want cleaner state management, wrap memory in a class instead of using globals. This makes it easier to attach one memory store per user session in a web app or agent service.

class MemoryChatAgent:
    def __init__(self):
        self.chat_generator = OpenAIChatGenerator(model="gpt-4o-mini")
        self.memory = [
            ChatMessage.from_system("You are a helpful banking assistant.")
        ]

    def run(self, user_text: str) -> str:
        self.memory.append(ChatMessage.from_user(user_text))
        result = self.chat_generator.run(messages=self.memory)
        reply = result["replies"][0]
        self.memory.append(reply)
        return reply.content

•Use the class in a real conversation flow. This pattern keeps memory tied to one agent instance, which is what you want for per-session conversations.

agent = MemoryChatAgent()

print(agent.run("My account nickname is salary account."))
print(agent.run("What nickname did I give my account?"))
print(agent.run("Summarize what you know about me so far in one sentence."))

•For longer conversations, trim old messages so the prompt does not grow forever. A simple windowed memory keeps only the system message plus the last few turns.

def trim_memory(messages, max_messages=8):
    system_messages = [m for m in messages if m.role == "system"]
    other_messages = [m for m in messages if m.role != "system"]
    kept = other_messages[-max_messages:]
    return system_messages + kept

agent.memory = trim_memory(agent.memory, max_messages=6)
print(agent.run("Now tell me the last thing I asked about."))

Testing It

Run the script and ask a question that depends on earlier context, like your name or an account nickname. If memory works, the second response should reference information from previous turns without you repeating it.

Watch how many messages are being sent to the generator as the conversation grows. If responses start getting slower or more expensive, your trimming logic needs to be stricter.

Try restarting the process and asking the same follow-up question again. If the answer changes after restart, that confirms your memory was only in-process and not persisted across sessions.

Next Steps

•Add persistent storage for memory using Redis or Postgres so sessions survive restarts.
•Learn Haystack Agents and tool calling so memory can work alongside search and external actions.
•Implement summarization-based memory when conversations get too long for a simple message window.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit