LlamaIndex Tutorial (TypeScript): streaming agent responses for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexstreaming-agent-responses-for-intermediate-developerstypescript

This tutorial shows you how to build a TypeScript LlamaIndex agent that streams partial responses back to the caller instead of waiting for the full answer. You need this when your assistant is talking to users in a chat UI, where showing tokens as they arrive feels responsive and lets you surface long-running reasoning or tool calls.

What You'll Need

  • Node.js 18+
  • A TypeScript project with ts-node or tsx
  • @llamaindex/core
  • @llamaindex/openai
  • An OpenAI API key in OPENAI_API_KEY
  • Basic familiarity with async iterators and for await...of

Install the packages:

npm install @llamaindex/core @llamaindex/openai
npm install -D typescript tsx @types/node

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

  1. Start by creating a minimal LlamaIndex runtime with an OpenAI LLM. The important part for streaming is using a model that supports token streaming and wiring it into an agent workflow.
import { Settings } from "@llamaindex/core";
import { OpenAI } from "@llamaindex/openai";

Settings.llm = new OpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

console.log("LLM configured");
  1. Define one or more tools the agent can call. Streaming is most useful when the model reasons through tool output and then emits a response incrementally.
import { FunctionTool } from "@llamaindex/core/tools";

const getPolicyStatus = FunctionTool.from(
  async ({ policyId }: { policyId: string }) => {
    return `Policy ${policyId} is active and next renewal is 2026-01-15.`;
  },
  {
    name: "get_policy_status",
    description: "Fetch the status of an insurance policy by ID.",
    parameters: {
      type: "object",
      properties: {
        policyId: { type: "string" },
      },
      required: ["policyId"],
    },
  }
);

console.log(getPolicyStatus.metadata.name);
  1. Create an agent runner and ask it to stream the response. In LlamaIndex TS, you consume streamed output with an async iterator, which makes it easy to push chunks straight into a terminal, websocket, or HTTP response.
import { ReActAgent } from "@llamaindex/core/agent";

const agent = new ReActAgent({
  tools: [getPolicyStatus],
});

async function main() {
  const stream = await agent.chat({
    message: "Check policy POL-1234 and explain its status clearly.",
    stream: true,
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.delta ?? "");
  }

  process.stdout.write("\n");
}

main();
  1. If you want cleaner integration in a real app, wrap the stream in a helper that returns both the final text and each emitted chunk. This pattern keeps your UI code separate from your agent logic.
async function collectStream(stream: AsyncIterable<{ delta?: string }>) {
  let fullText = "";

  for await (const chunk of stream) {
    const delta = chunk.delta ?? "";
    fullText += delta;
    process.stdout.write(delta);
  }

  return fullText;
}

async function run() {
  const stream = await agent.chat({
    message: "Summarize policy POL-1234 in one paragraph.",
    stream: true,
  });

  const finalText = await collectStream(stream);
  console.log("\n---");
  console.log(finalText);
}

run();
  1. If you are building for a web backend, send each chunk as it arrives instead of buffering everything first. The same async iterator works for Server-Sent Events, WebSockets, or chunked HTTP responses.
import http from "node:http";

http
  .createServer(async (_req, res) => {
    res.writeHead(200, {
      "Content-Type": "text/plain; charset=utf-8",
      "Transfer-Encoding": "chunked",
    });

    const stream = await agent.chat({
      message: "Explain the policy status for POL-1234.",
      stream: true,
    });

    for await (const chunk of stream) {
      res.write(chunk.delta ?? "");
    }

    res.end();
  })
  .listen(3000);

console.log("Server running on http://localhost:3000");

Testing It

Run the script with npx tsx your-file.ts and watch the output appear incrementally instead of all at once. If streaming is working, you should see text arrive in pieces while the model is still generating. If you added a tool, confirm that the final answer reflects the tool output rather than hallucinating policy details. For web apps, hit the endpoint with curl and verify that bytes are flushed before the request completes.

Next Steps

  • Add multiple tools and test how streamed responses behave after tool calls
  • Switch from plain text streaming to SSE so frontend clients can parse events reliably
  • Add conversation memory so streamed answers stay grounded across turns

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides