LlamaIndex Tutorial (TypeScript): adding memory to agents for advanced developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexadding-memory-to-agents-for-advanced-developerstypescript

This tutorial shows how to add persistent memory to a LlamaIndex TypeScript agent so it can remember prior turns, carry context across requests, and behave like a real support or operations assistant. You need this when your agent must handle multi-step workflows, follow-up questions, or long-running conversations without losing state.

What You'll Need

  • Node.js 18+
  • A TypeScript project with ts-node or tsx
  • @llamaindex/core
  • @llamaindex/openai
  • An OpenAI API key in OPENAI_API_KEY
  • A terminal where you can run a local script
  • Basic familiarity with LlamaIndex agents and tools

Install the packages:

npm install @llamaindex/core @llamaindex/openai
npm install -D typescript tsx @types/node

Step-by-Step

  1. First, set up the OpenAI model and a small tool the agent can call. The tool is intentionally simple so you can focus on memory behavior instead of business logic.
import { openai } from "@llamaindex/openai";
import { FunctionTool } from "@llamaindex/core/tools";

const weatherTool = FunctionTool.from(
  async ({ city }: { city: string }) => {
    return `Weather in ${city}: sunny, 24C`;
  },
  {
    name: "weather_tool",
    description: "Returns the current weather for a city",
    parameters: {
      type: "object",
      properties: {
        city: { type: "string", description: "City name" },
      },
      required: ["city"],
      additionalProperties: false,
    },
  }
);

const llm = openai({
  model: "gpt-4o-mini",
  temperature: 0,
});
  1. Next, create an in-memory chat store keyed by session ID. This is the simplest production-friendly pattern for request-scoped memory in TypeScript services.
type Message = {
  role: "system" | "user" | "assistant" | "tool";
  content: string;
};

const sessions = new Map<string, Message[]>();

function getSessionMessages(sessionId: string): Message[] {
  if (!sessions.has(sessionId)) {
    sessions.set(sessionId, []);
  }
  return sessions.get(sessionId)!;
}
  1. Now wire memory into the agent loop by prepending prior messages before each call and persisting the new turn after each response. This is the core pattern: load context, run the agent, save context.
import { AgentWorkflow } from "@llamaindex/core/agent";

const agent = new AgentWorkflow({
  llm,
  tools: [weatherTool],
});

async function chat(sessionId: string, input: string) {
  const history = getSessionMessages(sessionId);

  const result = await agent.run({
    input,
    history,
  });

  history.push({ role: "user", content: input });
  history.push({ role: "assistant", content: result.response });

  return result.response;
}
  1. Add a small wrapper that demonstrates continuity across turns. In real systems this is where you would connect Redis, Postgres, or DynamoDB instead of a process-local map.
async function main() {
  const sessionId = "customer-123";

  const first = await chat(sessionId, "Remember that my preferred city is Nairobi.");
  console.log("Turn 1:", first);

  const second = await chat(
    sessionId,
    "What's the weather there?"
  );
  console.log("Turn 2:", second);
}

main().catch(console.error);
  1. If you want stronger control over what gets stored, keep only the last N messages and summarize older turns. That avoids unbounded growth and is the pattern I’d use once conversations become long-lived.
function trimHistory(messages: Message[], maxMessages = 12) {
  if (messages.length <= maxMessages) return messages;
  return messages.slice(messages.length - maxMessages);
}

async function chatWithTrim(sessionId: string, input: string) {
  const history = trimHistory(getSessionMessages(sessionId));

  const result = await agent.run({
    input,
    history,
  });

  history.push({ role: "user", content: input });
  history.push({ role: "assistant", content: result.response });

  sessions.set(sessionId, trimHistory(history));
  return result.response;
}

Testing It

Run the script twice with the same sessionId and ask follow-up questions that depend on earlier context. You should see the second response reflect information introduced in the first turn.

If you change sessionId, the agent should behave like a fresh conversation with no prior memory. That tells you your memory boundary is working correctly.

For a more realistic test, ask one turn to store a preference like location or policy number, then ask a vague follow-up like “what about that one?” If memory is wired correctly, the agent should resolve the reference from stored history instead of guessing.

Next Steps

  • Replace the in-memory Map with Redis for multi-instance deployments.
  • Add conversation summarization so long chats stay within token limits.
  • Store structured memory separately from raw chat history for things like customer preferences, policy metadata, and case status.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides