LlamaIndex Tutorial (TypeScript): adding memory to agents for intermediate developers
This tutorial shows you how to add conversational memory to a LlamaIndex agent in TypeScript so it can remember prior turns in the same session. You need this when your agent has to handle follow-up questions, preserve context across messages, or avoid asking the user for the same details twice.
What You'll Need
- •Node.js 18+
- •A TypeScript project with
npmorpnpm - •OpenAI API key set in your environment
- •These packages:
- •
llamaindex - •
dotenv - •
typescript - •
tsxor another TypeScript runner
- •
Install them like this:
npm install llamaindex dotenv
npm install -D typescript tsx @types/node
Set your API key:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Create a small TypeScript entry file and load environment variables.
Keep this simple: one file is enough for the whole example.
import "dotenv/config";
import { openai } from "llamaindex";
console.log("OpenAI model ready:", openai("gpt-4o-mini").model);
- •Define a memory-backed chat engine using a chat history store.
In LlamaIndex TypeScript, the easiest production-friendly pattern is to keep the conversation history in an array and pass it back into each turn.
import "dotenv/config";
import {
OpenAI,
type ChatMessage,
} from "llamaindex";
const llm = new OpenAI({ model: "gpt-4o-mini" });
const memory: ChatMessage[] = [];
async function chatOnce(userInput: string) {
memory.push({ role: "user", content: userInput });
const response = await llm.chat({
messages: [
{
role: "system",
content:
"You are a banking assistant. Use prior conversation context when relevant.",
},
...memory,
],
});
memory.push(response.message);
return response.message.content;
}
- •Wrap that logic in a reusable agent loop.
This is where memory becomes useful: every new user message gets appended, and the full transcript is sent back to the model on each turn.
import "dotenv/config";
import { OpenAI, type ChatMessage } from "llamaindex";
const llm = new OpenAI({ model: "gpt-4o-mini" });
const memory: ChatMessage[] = [];
async function sendMessage(input: string) {
memory.push({ role: "user", content: input });
const result = await llm.chat({
messages: [
{ role: "system", content: "You are a helpful insurance claims assistant." },
...memory,
],
});
memory.push(result.message);
return result.message.content;
}
async function main() {
console.log(await sendMessage("My policy number is P-10291."));
console.log(await sendMessage("What did I just tell you?"));
}
main();
- •Add a token-aware trimming strategy so memory does not grow forever.
For real systems, you should keep only the last few turns or summarize older messages before they hit model limits.
import "dotenv/config";
import { OpenAI, type ChatMessage } from "llamaindex";
const llm = new OpenAI({ model: "gpt-4o-mini" });
const memory: ChatMessage[] = [];
const MAX_MESSAGES = 8;
function trimMemory() {
if (memory.length > MAX_MESSAGES) {
memory.splice(0, memory.length - MAX_MESSAGES);
}
}
async function sendMessage(input: string) {
memory.push({ role: "user", content: input });
trimMemory();
const result = await llm.chat({
messages: [
{ role: "system", content: "You are a support agent with short-term memory." },
...memory,
],
});
memory.push(result.message);
trimMemory();
return result.message.content;
}
- •Persist session state if you want memory across restarts.
For local testing, an in-memory array is fine. In production, writememoryto Redis, Postgres, or your app database keyed bysessionId.
import { readFileSync, writeFileSync, existsSync } from "node:fs";
import type { ChatMessage } from "llamaindex";
const path = "./session-memory.json";
export function loadMemory(): ChatMessage[] {
if (!existsSync(path)) return [];
return JSON.parse(readFileSync(path, "utf8")) as ChatMessage[];
}
export function saveMemory(memory: ChatMessage[]) {
writeFileSync(path, JSON.stringify(memory, null, 2));
}
Testing It
Run your script and ask two related questions in sequence. The second answer should reflect the first message without you repeating it explicitly.
For example:
- •“My policy number is P-10291.”
- •“What’s my policy number?”
If the agent answers correctly on the second turn, memory is working. If it forgets, check that you are passing the full memory array back into llm.chat() and that you are storing both user and assistant messages after each response.
Also verify trimming does not remove too much context. If responses become vague after several turns, increase MAX_MESSAGES or switch to summarization for older turns.
Next Steps
- •Add Redis-backed session storage so multiple server instances share the same conversation state.
- •Replace raw message history with a summary-plus-recent-turns strategy for longer conversations.
- •Combine this pattern with tools so your agent can remember context and call internal APIs at the same time.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit