LlamaIndex Tutorial (TypeScript): streaming agent responses for beginners
This tutorial shows you how to build a TypeScript LlamaIndex agent that streams its response token-by-token instead of waiting for the full answer. You need this when you want a chat UI, CLI, or API endpoint to feel responsive while the model is still generating output.
What You'll Need
- •Node.js 18+ installed
- •A TypeScript project initialized with
tsconfig.json - •OpenAI API key set as an environment variable:
- •
OPENAI_API_KEY=...
- •
- •These packages:
- •
llamaindex - •
dotenv - •
typescript - •
tsxfor running TypeScript directly during development
- •
Install them like this:
npm install llamaindex dotenv
npm install -D typescript tsx @types/node
Step-by-Step
- •Create a small TypeScript entry file and load your environment variables first.
Keep this simple: one file, one agent, one streamed chat loop.
import "dotenv/config";
import { openai } from "@llamaindex/openai";
import {
Agent,
ChatMemoryBuffer,
Settings,
} from "llamaindex";
Settings.llm = openai({
model: "gpt-4o-mini",
});
- •Define the agent with memory so it can keep context across turns.
For beginners, memory matters because streaming only changes how output arrives, not how the conversation is stored.
const memory = new ChatMemoryBuffer({
tokenLimit: 4000,
});
const agent = new Agent({
llm: Settings.llm,
memory,
});
- •Call the streaming API and consume chunks as they arrive.
The key method ischatStream, which returns an async iterator you can read withfor await.
async function main() {
const stream = await agent.chatStream({
message: "Explain what streaming responses are in one short paragraph.",
});
for await (const chunk of stream) {
process.stdout.write(chunk.response ?? "");
}
process.stdout.write("\n");
}
main().catch((err) => {
console.error(err);
process.exit(1);
});
- •Add a reusable helper if you want multiple prompts in one session.
This keeps the same agent instance alive, so memory and streaming both work across turns.
async function ask(message: string) {
const stream = await agent.chatStream({ message });
let fullText = "";
for await (const chunk of stream) {
const text = chunk.response ?? "";
fullText += text;
process.stdout.write(text);
}
process.stdout.write("\n");
return fullText;
}
- •Put it together and run two prompts back-to-back.
This demonstrates that the agent streams output and remembers prior messages.
async function main() {
await ask("You are helping a beginner understand streaming.");
await ask("Now explain why this is useful in a chat app.");
}
main().catch((err) => {
console.error(err);
process.exit(1);
});
Testing It
Run the file with tsx:
npx tsx src/index.ts
If everything is wired correctly, you should see text appear gradually instead of all at once. The first prompt should stream normally, and the second prompt should reference the earlier context because memory is enabled.
If you get an authentication error, check that OPENAI_API_KEY is exported in your shell or loaded from .env. If you get import errors, confirm that your installed llamaindex version matches the API used above.
Next Steps
- •Add tool calling so the agent can fetch data from internal APIs while still streaming responses.
- •Wrap the streamed output in an HTTP endpoint using Server-Sent Events for a web UI.
- •Swap
process.stdout.writefor a frontend state update so tokens render live in React or Next.js.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit