LlamaIndex Tutorial (TypeScript): streaming agent responses for beginners

By Cyprian AaronsUpdated 2026-04-21
llamaindexstreaming-agent-responses-for-beginnerstypescript

This tutorial shows you how to build a TypeScript LlamaIndex agent that streams its response token-by-token instead of waiting for the full answer. You need this when you want a chat UI, CLI, or API endpoint to feel responsive while the model is still generating output.

What You'll Need

  • Node.js 18+ installed
  • A TypeScript project initialized with tsconfig.json
  • OpenAI API key set as an environment variable:
    • OPENAI_API_KEY=...
  • These packages:
    • llamaindex
    • dotenv
    • typescript
    • tsx for running TypeScript directly during development

Install them like this:

npm install llamaindex dotenv
npm install -D typescript tsx @types/node

Step-by-Step

  1. Create a small TypeScript entry file and load your environment variables first.
    Keep this simple: one file, one agent, one streamed chat loop.
import "dotenv/config";
import { openai } from "@llamaindex/openai";
import {
  Agent,
  ChatMemoryBuffer,
  Settings,
} from "llamaindex";

Settings.llm = openai({
  model: "gpt-4o-mini",
});
  1. Define the agent with memory so it can keep context across turns.
    For beginners, memory matters because streaming only changes how output arrives, not how the conversation is stored.
const memory = new ChatMemoryBuffer({
  tokenLimit: 4000,
});

const agent = new Agent({
  llm: Settings.llm,
  memory,
});
  1. Call the streaming API and consume chunks as they arrive.
    The key method is chatStream, which returns an async iterator you can read with for await.
async function main() {
  const stream = await agent.chatStream({
    message: "Explain what streaming responses are in one short paragraph.",
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.response ?? "");
  }

  process.stdout.write("\n");
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});
  1. Add a reusable helper if you want multiple prompts in one session.
    This keeps the same agent instance alive, so memory and streaming both work across turns.
async function ask(message: string) {
  const stream = await agent.chatStream({ message });

  let fullText = "";
  for await (const chunk of stream) {
    const text = chunk.response ?? "";
    fullText += text;
    process.stdout.write(text);
  }

  process.stdout.write("\n");
  return fullText;
}
  1. Put it together and run two prompts back-to-back.
    This demonstrates that the agent streams output and remembers prior messages.
async function main() {
  await ask("You are helping a beginner understand streaming.");
  await ask("Now explain why this is useful in a chat app.");
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

Testing It

Run the file with tsx:

npx tsx src/index.ts

If everything is wired correctly, you should see text appear gradually instead of all at once. The first prompt should stream normally, and the second prompt should reference the earlier context because memory is enabled.

If you get an authentication error, check that OPENAI_API_KEY is exported in your shell or loaded from .env. If you get import errors, confirm that your installed llamaindex version matches the API used above.

Next Steps

  • Add tool calling so the agent can fetch data from internal APIs while still streaming responses.
  • Wrap the streamed output in an HTTP endpoint using Server-Sent Events for a web UI.
  • Swap process.stdout.write for a frontend state update so tokens render live in React or Next.js.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides