Haystack Tutorial (Python): adding memory to agents for intermediate developers

By Cyprian AaronsUpdated 2026-04-21

haystackadding-memory-to-agents-for-intermediate-developerspython

This tutorial shows you how to add persistent memory to a Haystack agent in Python using a document store as the backing layer. You need this when your agent must remember prior user facts, conversation context, or case details across turns instead of treating every request as a clean slate.

What You'll Need

•Python 3.10+
•haystack-ai
•openai API key
•A writable local environment for storing memory data
•Basic familiarity with Haystack pipelines and components

Install the packages:

pip install haystack-ai openai

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

•Start with a simple memory store.

For agent memory, you want a place to persist facts and retrieve them later. In this tutorial, we’ll use InMemoryDocumentStore so you can run it locally without extra infrastructure.

from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore()

memory_docs = [
    Document(content="User prefers concise answers."),
    Document(content="User works on bank compliance workflows."),
]

document_store.write_documents(memory_docs)
print("Stored:", document_store.count_documents())

•Add retrieval over the stored memories.

The agent needs a way to pull relevant memories back into context. We’ll use a retriever over the same document store so the agent can query prior facts before answering.

from haystack.components.retrievers.in_memory import InMemoryBM25Retriever

retriever = InMemoryBM25Retriever(document_store=document_store)

query = "What style should I use for this user?"
result = retriever.run(query=query)

for doc in result["documents"]:
    print(doc.content)

•Wrap memory lookup and response generation into one pipeline.

This is the core pattern: retrieve memory first, then send both the user query and retrieved context to an LLM generator. Haystack’s OpenAIGenerator works well here because it accepts prompt text directly.

from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack import Pipeline

template = """
You are an assistant with memory.
Relevant memory:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}

User question: {{ question }}
Answer using the memory when relevant.
"""

prompt_builder = PromptBuilder(template=template)
llm = OpenAIGenerator(model="gpt-4o-mini")

pipe = Pipeline()
pipe.add_component("retriever", retriever)
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)

pipe.connect("retriever.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.prompt", "llm.prompt")

•Run the pipeline with a new user query.

At runtime, retrieve memories using a search phrase related to the current turn. Then generate an answer that incorporates those memories instead of ignoring them.

question = "How should I respond to this user?"
search_query = "user preferences and work domain"

result = pipe.run(
    {
        "retriever": {"query": search_query},
        "prompt_builder": {"question": question},
    }
)

print(result["llm"]["replies"][0])

•Persist new facts after each interaction.

Memory only becomes useful if you keep updating it. After each user turn, extract stable facts and write them back into the document store so future turns can retrieve them.

from datetime import datetime

new_fact = Document(
    content=f"User asked about adding memory to Haystack agents on {datetime.utcnow().isoformat()}."
)

document_store.write_documents([new_fact])

updated = retriever.run(query="Haystack agents memory")
for doc in updated["documents"]:
    print("-", doc.content)

Testing It

Run the script and confirm that the retriever returns both seeded memories and newly added facts. Then change the query to something unrelated and verify that retrieval gets weaker, which tells you BM25 is actually filtering by relevance.

Next, ask a follow-up question that depends on earlier context, like “What tone should I use?” If the answer reflects “concise” or “bank compliance,” your memory path is working end-to-end.

If you want to be stricter, log retrieved documents before generation and inspect whether they match the current turn. That’s how you catch bad memory writes early instead of discovering them in production conversations.

Next Steps

•Replace InMemoryDocumentStore with QdrantDocumentStore or another persistent backend for real sessions.
•Add an extraction step that turns raw chat messages into stable memory facts before writing them.
•Introduce metadata fields like user_id, tenant_id, and session_id so memory stays isolated per customer or case.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit