LlamaIndex Tutorial (TypeScript): adding memory to agents for beginners

By Cyprian AaronsUpdated 2026-04-21

llamaindexadding-memory-to-agents-for-beginnerstypescript

This tutorial shows you how to add conversational memory to a LlamaIndex TypeScript agent so it can remember prior turns in the same session. You need this when your agent must keep context across multiple messages, like remembering a user’s name, a case number, or the last action it took.

What You'll Need

•Node.js 18+ installed
•A TypeScript project with ts-node or tsx
•@llamaindex/core
•@llamaindex/openai
•An OpenAI API key in OPENAI_API_KEY
•Basic familiarity with creating a LlamaIndex agent in TypeScript

Install the packages:

npm install @llamaindex/core @llamaindex/openai
npm install -D typescript tsx @types/node

Set your API key:

export OPENAI_API_KEY="your-api-key"

Step-by-Step

•Start with a minimal agent setup. The key point is that memory is not a separate magic feature; it is usually attached to the chat engine or agent workflow through a message history object.

import { openai } from "@llamaindex/openai";
import { AgentWorkflow, runAgentStep } from "@llamaindex/core/agent";

async function main() {
  const llm = openai({ model: "gpt-4o-mini" });

  const agent = new AgentWorkflow({
    llm,
    tools: [],
    systemPrompt: "You are a helpful assistant.",
  });

  console.log("Agent ready:", !!agent);
}

main();

•Add a memory store using chat history. For beginners, the simplest production-friendly pattern is an in-memory array of messages per session, then pass that history back into each new turn.

import { ChatMessage } from "@llamaindex/core/llms";
import { openai } from "@llamaindex/openai";
import { AgentWorkflow } from "@llamaindex/core/agent";

const sessionMemory: ChatMessage[] = [];

async function main() {
  const llm = openai({ model: "gpt-4o-mini" });

  const agent = new AgentWorkflow({
    llm,
    tools: [],
    systemPrompt: "You are a helpful assistant.",
  });

  console.log("Memory slots:", sessionMemory.length);
}

main();

•Run each user turn through the agent and append both user and assistant messages to memory. This is the important part: every request includes prior context, and every response gets stored for the next turn.

import { ChatMessage } from "@llamaindex/core/llms";
import { openai } from "@llamaindex/openai";
import { AgentWorkflow, runAgentStep } from "@llamaindex/core/agent";

const memory: ChatMessage[] = [];

async function ask(question: string) {
  const llm = openai({ model: "gpt-4o-mini" });
  const agent = new AgentWorkflow({
    llm,
    tools: [],
    systemPrompt: "You are a helpful assistant.",
  });

  memory.push({ role: "user", content: question });

  const result = await runAgentStep(agent, {
    input: question,
    chatHistory: memory,
  });

  memory.push({ role: "assistant", content: result.response.message.content ?? "" });
  return result.response.message.content;
}

•Build a small interactive loop so you can test whether the agent remembers earlier turns. This example keeps one session alive in process, which is enough to prove the pattern before you wire it into Redis, Postgres, or your app server.

import readline from "node:readline/promises";
import { stdin as input, stdout as output } from "node:process";
import { ChatMessage } from "@llamaindex/core/llms";
import { openai } from "@llamaindex/openai";
import { AgentWorkflow, runAgentStep } from "@llamaindex/core/agent";

const memory: ChatMessage[] = [];

async function ask(question: string) {
  const llm = openai({ model: "gpt-4o-mini" });
  const agent = new AgentWorkflow({
    llm,
    tools: [],
    systemPrompt: "Remember what the user says in this conversation.",
  });

  memory.push({ role: "user", content: question });
  const result = await runAgentStep(agent, { input: question, chatHistory: memory });
  const answer = result.response.message.content ?? "";
  memory.push({ role: "assistant", content: answer });
  return answer;
}

async function main() {
  const rl = readline.createInterface({ input, output });
  while (true) {
    const q = await rl.question("> ");
    if (q === "exit") break;
    console.log(await ask(q));
  }
  rl.close();
}

main();

•Trim old messages before sending them back to the model. Real agents need bounded context because long-running sessions will eventually hit token limits or become expensive.

import { ChatMessage } from "@llamaindex/core/llms";

function getRecentMessages(messages: ChatMessage[], maxTurns = 10): ChatMessage[] {
  const maxMessages = maxTurns * 2;
  return messages.slice(-maxMessages);
}

const recentMemory = getRecentMessages(memory);
console.log("Sending messages:", recentMemory.length);

Testing It

Run the script and type two related prompts. For example, ask “My name is Priya” first, then ask “What is my name?”; if memory is wired correctly, the second response should mention Priya without you repeating it.

Also test a short follow-up like “What did I ask before this?” to confirm that previous turns are being passed into the model. If the agent forgets everything between prompts, check that you are persisting memory across calls and not recreating it inside the request handler.

If you are using this in an API server, make sure each user gets their own session key. Shared global memory will mix conversations across users, which is a bad failure mode in banking and insurance workflows.

Next Steps

•Move memory into Redis or Postgres so conversations survive restarts
•Add summarization for long sessions instead of keeping raw chat history forever
•Attach tools to the agent so it can remember and act on account-specific data

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit