LlamaIndex Tutorial (TypeScript): adding memory to agents for advanced developers
This tutorial shows how to add persistent memory to a LlamaIndex TypeScript agent so it can remember prior turns, carry context across requests, and behave like a real support or operations assistant. You need this when your agent must handle multi-step workflows, follow-up questions, or long-running conversations without losing state.
What You'll Need
- •Node.js 18+
- •A TypeScript project with
ts-nodeortsx - •
@llamaindex/core - •
@llamaindex/openai - •An OpenAI API key in
OPENAI_API_KEY - •A terminal where you can run a local script
- •Basic familiarity with LlamaIndex agents and tools
Install the packages:
npm install @llamaindex/core @llamaindex/openai
npm install -D typescript tsx @types/node
Step-by-Step
- •First, set up the OpenAI model and a small tool the agent can call. The tool is intentionally simple so you can focus on memory behavior instead of business logic.
import { openai } from "@llamaindex/openai";
import { FunctionTool } from "@llamaindex/core/tools";
const weatherTool = FunctionTool.from(
async ({ city }: { city: string }) => {
return `Weather in ${city}: sunny, 24C`;
},
{
name: "weather_tool",
description: "Returns the current weather for a city",
parameters: {
type: "object",
properties: {
city: { type: "string", description: "City name" },
},
required: ["city"],
additionalProperties: false,
},
}
);
const llm = openai({
model: "gpt-4o-mini",
temperature: 0,
});
- •Next, create an in-memory chat store keyed by session ID. This is the simplest production-friendly pattern for request-scoped memory in TypeScript services.
type Message = {
role: "system" | "user" | "assistant" | "tool";
content: string;
};
const sessions = new Map<string, Message[]>();
function getSessionMessages(sessionId: string): Message[] {
if (!sessions.has(sessionId)) {
sessions.set(sessionId, []);
}
return sessions.get(sessionId)!;
}
- •Now wire memory into the agent loop by prepending prior messages before each call and persisting the new turn after each response. This is the core pattern: load context, run the agent, save context.
import { AgentWorkflow } from "@llamaindex/core/agent";
const agent = new AgentWorkflow({
llm,
tools: [weatherTool],
});
async function chat(sessionId: string, input: string) {
const history = getSessionMessages(sessionId);
const result = await agent.run({
input,
history,
});
history.push({ role: "user", content: input });
history.push({ role: "assistant", content: result.response });
return result.response;
}
- •Add a small wrapper that demonstrates continuity across turns. In real systems this is where you would connect Redis, Postgres, or DynamoDB instead of a process-local map.
async function main() {
const sessionId = "customer-123";
const first = await chat(sessionId, "Remember that my preferred city is Nairobi.");
console.log("Turn 1:", first);
const second = await chat(
sessionId,
"What's the weather there?"
);
console.log("Turn 2:", second);
}
main().catch(console.error);
- •If you want stronger control over what gets stored, keep only the last N messages and summarize older turns. That avoids unbounded growth and is the pattern I’d use once conversations become long-lived.
function trimHistory(messages: Message[], maxMessages = 12) {
if (messages.length <= maxMessages) return messages;
return messages.slice(messages.length - maxMessages);
}
async function chatWithTrim(sessionId: string, input: string) {
const history = trimHistory(getSessionMessages(sessionId));
const result = await agent.run({
input,
history,
});
history.push({ role: "user", content: input });
history.push({ role: "assistant", content: result.response });
sessions.set(sessionId, trimHistory(history));
return result.response;
}
Testing It
Run the script twice with the same sessionId and ask follow-up questions that depend on earlier context. You should see the second response reflect information introduced in the first turn.
If you change sessionId, the agent should behave like a fresh conversation with no prior memory. That tells you your memory boundary is working correctly.
For a more realistic test, ask one turn to store a preference like location or policy number, then ask a vague follow-up like “what about that one?” If memory is wired correctly, the agent should resolve the reference from stored history instead of guessing.
Next Steps
- •Replace the in-memory
Mapwith Redis for multi-instance deployments. - •Add conversation summarization so long chats stay within token limits.
- •Store structured memory separately from raw chat history for things like customer preferences, policy metadata, and case status.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit