Haystack Tutorial (TypeScript): adding memory to agents for advanced developers
This tutorial shows how to give a Haystack agent persistent memory in TypeScript so it can retain context across turns, recall prior facts, and stop behaving like a stateless chatbot. You need this when your agent has to handle multi-step workflows, remember user preferences, or keep track of case details across a long conversation.
What You'll Need
- •Node.js 18+ and npm
- •A TypeScript project with
ts-nodeor a build step - •Haystack for TypeScript packages installed:
- •
@haystack-ai/core - •
@haystack-ai/agents - •
@haystack-ai/components
- •
- •An LLM API key, such as:
- •
OPENAI_API_KEY
- •
- •A
.envfile or equivalent environment variable setup - •Basic familiarity with Haystack pipelines and agents
Step-by-Step
- •Start by installing the packages and setting up environment variables. I’m using OpenAI here because it’s the most straightforward path for an agent demo, but the memory pattern is the same if you swap providers later.
npm install @haystack-ai/core @haystack-ai/agents @haystack-ai/components dotenv
npm install -D typescript ts-node @types/node
export OPENAI_API_KEY="your-key-here"
- •Create a small memory store abstraction. For production systems, you usually want memory separated from the agent itself so you can swap in Redis, Postgres, or a vector store later without rewriting your agent logic.
// memory.ts
export type MemoryItem = {
role: "user" | "assistant";
content: string;
};
export class ConversationMemory {
private items: MemoryItem[] = [];
add(item: MemoryItem) {
this.items.push(item);
}
getAll() {
return [...this.items];
}
toPrompt() {
return this.items.map((item) => `${item.role.toUpperCase()}: ${item.content}`).join("\n");
}
}
- •Build the agent with a prompt that injects prior conversation state before each response. The key point is that the model does not “remember” anything by default; you have to pass memory back into the context on every turn.
// agent.ts
import "dotenv/config";
import { OpenAIChatGenerator } from "@haystack-ai/components";
import { ConversationMemory } from "./memory";
const memory = new ConversationMemory();
const generator = new OpenAIChatGenerator({
model: "gpt-4o-mini",
});
async function chat(userInput: string) {
memory.add({ role: "user", content: userInput });
const prompt = `
You are a helpful banking operations assistant.
Use the conversation history below as working memory.
${memory.toPrompt()}
ASSISTANT:
`.trim();
const result = await generator.run({
messages: [{ role: "user", content: prompt }],
});
const reply = result.replies[0].content;
memory.add({ role: "assistant", content: reply });
return reply;
}
- •Wire the memory into a simple multi-turn session. This is where you see the value: after one turn establishes context, later turns can reference it without re-entering everything manually.
// index.ts
import { chat } from "./runner";
async function main() {
console.log(await chat("My name is Amina and I work in claims."));
console.log(await chat("What department do I work in?"));
}
main().catch(console.error);
// runner.ts
import "dotenv/config";
import { OpenAIChatGenerator } from "@haystack-ai/components";
import { ConversationMemory } from "./memory";
const memory = new ConversationMemory();
const generator = new OpenAIChatGenerator({ model: "gpt-4o-mini" });
export async function chat(userInput: string) {
memory.add({ role: "user", content: userInput });
const messages = [
{
role: "user" as const,
content:
`Use this history:\n${memory.toPrompt()}\n\nAnswer the latest user message only.`,
},
];
const result = await generator.run({ messages });
const reply = result.replies[0].content;
memory.add({ role: "assistant", content: reply });
return reply;
}
- •If you want something closer to production behavior, add a trimming policy so the prompt does not grow forever. In regulated environments, this matters because you want predictable token usage and clear control over what gets retained.
// trimmed-memory.ts
export type MemoryItem = {
role: "user" | "assistant";
content: string;
};
export class TrimmedConversationMemory {
private items: MemoryItem[] = [];
constructor(private maxItems = 12) {}
add(item: MemoryItem) {
this.items.push(item);
if (this.items.length > this.maxItems) {
this.items = this.items.slice(-this.maxItems);
}
}
toPrompt() {
return this.items.map((item) => `${item.role}: ${item.content}`).join("\n");
}
}
Testing It
Run the app and ask for some identifying detail in the first turn, then reference it in the second turn. If the assistant answers correctly without you repeating the detail, your memory wiring works.
Also check that each new request includes only the intended history and that old context drops off when your trimming limit is hit. That protects you from runaway token costs and stale facts bleeding into later responses.
A good smoke test is:
- •“My policy number is P-1042.”
- •“What is my policy number?”
If it returns P-1042, your session memory is being injected correctly.
Next Steps
- •Replace in-memory storage with Redis or Postgres so conversation state survives process restarts.
- •Add summarization for older turns instead of hard truncation.
- •Move from raw prompt injection to structured message history so tool calls and system instructions stay cleanly separated.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit