LlamaIndex Tutorial (TypeScript): streaming agent responses for advanced developers

By Cyprian AaronsUpdated 2026-04-21

llamaindexstreaming-agent-responses-for-advanced-developerstypescript

This tutorial shows you how to build a TypeScript LlamaIndex agent that streams partial responses token-by-token instead of waiting for the full answer. You need this when you’re wiring an assistant into a UI, a chat service, or any workflow where latency and user feedback matter.

What You'll Need

•Node.js 18+ and npm
•A TypeScript project with "type": "module" in package.json
•An OpenAI API key in OPENAI_API_KEY
•
Packages:
- •llamaindex
- •dotenv
- •typescript
- •tsx or another TypeScript runtime for local execution
•A terminal that can run async Node scripts

Step-by-Step

•Install the dependencies and set up your environment. Use the official LlamaIndex TypeScript package and load your API key from .env so you don’t hardcode secrets.

npm init -y
npm install llamaindex dotenv
npm install -D typescript tsx @types/node

•Create your environment file and make sure Node can read it. This is the minimum setup needed for OpenAI-backed streaming with LlamaIndex.

cat > .env << 'EOF'
OPENAI_API_KEY=your_openai_api_key_here
EOF

•Create a small streaming agent script. The important part is using agent.run() as an async stream and consuming each chunk as it arrives.

import "dotenv/config";
import { FunctionTool, OpenAIAgent } from "llamaindex";

const timeTool = FunctionTool.from(
  async ({ city }: { city: string }) => {
    return `I don't have live weather, but ${city} is a good place to test tool calls.`;
  },
  {
    name: "city_note",
    description: "Returns a short note about the requested city.",
    parameters: {
      type: "object",
      properties: { city: { type: "string" } },
      required: ["city"],
    },
  }
);

const agent = new OpenAIAgent({
  tools: [timeTool],
});

const stream = await agent.run({
  message: "Stream a concise response about why Tokyo is useful for testing agent streaming.",
});

•Consume the stream token-by-token. This is where streaming becomes useful in production because your client can render partial output immediately instead of waiting for completion.

for await (const chunk of stream) {
  if (chunk.delta) {
    process.stdout.write(chunk.delta);
  }
}
process.stdout.write("\n");

•Add a second example that includes tool use and explicit streaming handling. In practice, you want to verify both plain generation and tool-assisted generation behave the same way from the transport layer.

import "dotenv/config";
import { FunctionTool, OpenAIAgent } from "llamaindex";

async function main() {
  const lookupTool = FunctionTool.from(
    async ({ accountId }: { accountId: string }) => {
      return `Account ${accountId} is active and eligible for review.`;
    },
    {
      name: "lookup_account",
      description: "Looks up a mock bank account record.",
      parameters: {
        type: "object",
        properties: { accountId: { type: "string" } },
        required: ["accountId"],
      },
    }
  );

  const agent = new OpenAIAgent({ tools: [lookupTool] });
  const stream = await agent.run({
    message: "Check account A-1029 and explain the result in one paragraph.",
  });

  for await (const chunk of stream) {
    if (chunk.delta) process.stdout.write(chunk.delta);
  }
}

main();

•Run the script with a TypeScript runtime. If you want to keep this in a real app, wrap the same pattern behind an HTTP response stream or WebSocket handler.

npx tsx stream-agent.ts

Testing It

Run the script and watch stdout fill incrementally instead of printing one final block at the end. If you see chunks arriving one by one, streaming is working.

Then test a prompt that triggers tool usage, like asking for an account lookup or city note. You should still see partial output during generation, even when the agent calls a tool behind the scenes.

If nothing prints until the end, check three things first:

•OPENAI_API_KEY is loaded correctly
•You are using for await...of on the returned stream
•Your model access is valid in your OpenAI account

For production validation, add timestamps around each emitted chunk so you can measure first-token latency and total completion time.

Next Steps

•Wire this stream into an Express or Fastify route using Server-Sent Events.
•Add structured tool outputs so downstream systems can parse results without regex.
•Combine streaming with retrieval so your agent can answer from internal documents while still emitting tokens early.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit