LlamaIndex Tutorial (TypeScript): adding memory to agents for advanced developers

By Cyprian AaronsUpdated 2026-04-21

llamaindexadding-memory-to-agents-for-advanced-developerstypescript

This tutorial shows how to add persistent memory to a LlamaIndex TypeScript agent so it can remember prior turns, carry context across requests, and behave like a real support or operations assistant. You need this when your agent must handle multi-step workflows, follow-up questions, or long-running conversations without losing state.

What You'll Need

•Node.js 18+
•A TypeScript project with ts-node or tsx
•@llamaindex/core
•@llamaindex/openai
•An OpenAI API key in OPENAI_API_KEY
•A terminal where you can run a local script
•Basic familiarity with LlamaIndex agents and tools

Install the packages:

npm install @llamaindex/core @llamaindex/openai
npm install -D typescript tsx @types/node

Step-by-Step

•First, set up the OpenAI model and a small tool the agent can call. The tool is intentionally simple so you can focus on memory behavior instead of business logic.

import { openai } from "@llamaindex/openai";
import { FunctionTool } from "@llamaindex/core/tools";

const weatherTool = FunctionTool.from(
  async ({ city }: { city: string }) => {
    return `Weather in ${city}: sunny, 24C`;
  },
  {
    name: "weather_tool",
    description: "Returns the current weather for a city",
    parameters: {
      type: "object",
      properties: {
        city: { type: "string", description: "City name" },
      },
      required: ["city"],
      additionalProperties: false,
    },
  }
);

const llm = openai({
  model: "gpt-4o-mini",
  temperature: 0,
});

•Next, create an in-memory chat store keyed by session ID. This is the simplest production-friendly pattern for request-scoped memory in TypeScript services.

type Message = {
  role: "system" | "user" | "assistant" | "tool";
  content: string;
};

const sessions = new Map<string, Message[]>();

function getSessionMessages(sessionId: string): Message[] {
  if (!sessions.has(sessionId)) {
    sessions.set(sessionId, []);
  }
  return sessions.get(sessionId)!;
}

•Now wire memory into the agent loop by prepending prior messages before each call and persisting the new turn after each response. This is the core pattern: load context, run the agent, save context.

import { AgentWorkflow } from "@llamaindex/core/agent";

const agent = new AgentWorkflow({
  llm,
  tools: [weatherTool],
});

async function chat(sessionId: string, input: string) {
  const history = getSessionMessages(sessionId);

  const result = await agent.run({
    input,
    history,
  });

  history.push({ role: "user", content: input });
  history.push({ role: "assistant", content: result.response });

  return result.response;
}

•Add a small wrapper that demonstrates continuity across turns. In real systems this is where you would connect Redis, Postgres, or DynamoDB instead of a process-local map.

async function main() {
  const sessionId = "customer-123";

  const first = await chat(sessionId, "Remember that my preferred city is Nairobi.");
  console.log("Turn 1:", first);

  const second = await chat(
    sessionId,
    "What's the weather there?"
  );
  console.log("Turn 2:", second);
}

main().catch(console.error);

•If you want stronger control over what gets stored, keep only the last N messages and summarize older turns. That avoids unbounded growth and is the pattern I’d use once conversations become long-lived.

function trimHistory(messages: Message[], maxMessages = 12) {
  if (messages.length <= maxMessages) return messages;
  return messages.slice(messages.length - maxMessages);
}

async function chatWithTrim(sessionId: string, input: string) {
  const history = trimHistory(getSessionMessages(sessionId));

  const result = await agent.run({
    input,
    history,
  });

  history.push({ role: "user", content: input });
  history.push({ role: "assistant", content: result.response });

  sessions.set(sessionId, trimHistory(history));
  return result.response;
}

Testing It

Run the script twice with the same sessionId and ask follow-up questions that depend on earlier context. You should see the second response reflect information introduced in the first turn.

If you change sessionId, the agent should behave like a fresh conversation with no prior memory. That tells you your memory boundary is working correctly.

For a more realistic test, ask one turn to store a preference like location or policy number, then ask a vague follow-up like “what about that one?” If memory is wired correctly, the agent should resolve the reference from stored history instead of guessing.

Next Steps

•Replace the in-memory Map with Redis for multi-instance deployments.
•Add conversation summarization so long chats stay within token limits.
•Store structured memory separately from raw chat history for things like customer preferences, policy metadata, and case status.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit