LlamaIndex Tutorial (Python): streaming agent responses for beginners

By Cyprian AaronsUpdated 2026-04-21

llamaindexstreaming-agent-responses-for-beginnerspython

This tutorial shows you how to build a Python LlamaIndex agent that streams its response token-by-token instead of waiting for the full answer. You need this when you want a chat UI, terminal app, or API endpoint to feel responsive while the model is still generating output.

What You'll Need

•Python 3.10+
•An OpenAI API key set as OPENAI_API_KEY
•llama-index installed
•python-dotenv if you want to load environment variables from a .env file
•Basic familiarity with LlamaIndex agents and QueryEngineTool

Install the packages:

pip install llama-index python-dotenv

Step-by-Step

•Start by setting your OpenAI key in the environment. LlamaIndex will read it automatically through the OpenAI integration.

import os

os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

•Create a simple tool the agent can call. For beginners, a local FunctionTool is easier than wiring up a full vector index.

from llama_index.core.tools import FunctionTool

def get_policy_status(policy_id: str) -> str:
    return f"Policy {policy_id} is active and next renewal is 2026-01-15."

policy_tool = FunctionTool.from_defaults(
    fn=get_policy_status,
    name="get_policy_status",
    description="Look up the status of an insurance policy by policy ID.",
)

•Build a streaming agent with that tool. The important part is streaming=True, which tells LlamaIndex to return chunks as they are generated.

from llama_index.core.agent.workflow import FunctionAgent

agent = FunctionAgent(
    tools=[policy_tool],
    llm=None,
    system_prompt="You are an insurance support assistant. Use tools when needed.",
    streaming=True,
)

•Run a streaming chat request and print each chunk as it arrives. This is the core pattern you will use in a CLI, web socket, or server-sent events handler.

import asyncio

async def main():
    handler = agent.run("Check policy 12345 and tell me the status.")
    async for chunk in handler.stream_events():
        if hasattr(chunk, "delta") and chunk.delta:
            print(chunk.delta, end="", flush=True)

asyncio.run(main())

•If you want cleaner terminal output, collect the streamed text into a buffer while still printing live updates. This makes it easier to log the final response after streaming finishes.

import asyncio

async def main():
    handler = agent.run("Check policy 12345 and tell me the status.")
    full_text = []

    async for chunk in handler.stream_events():
        if hasattr(chunk, "delta") and chunk.delta:
            print(chunk.delta, end="", flush=True)
            full_text.append(chunk.delta)

    print("\n\nFinal response:")
    print("".join(full_text))

asyncio.run(main())

Testing It

Run the script from your terminal and watch for text to appear immediately instead of all at once at the end. If streaming is working, you should see partial output printed before the full answer completes.

Try changing the prompt so the agent clearly needs the tool, such as asking for a specific policy ID. If the tool call happens correctly, you should see the final answer include the mocked policy status.

If nothing streams, check three things: your OpenAI key is set, your installed LlamaIndex version matches the imports above, and your event loop is running correctly. In practice, most failures come from version mismatches or missing API credentials.

Next Steps

•Replace the mock function with a real internal API call for policy lookup or claims status.
•Add QueryEngineTool backed by a vector index so the agent can stream answers from documents.
•Wire this pattern into FastAPI with Server-Sent Events for browser-based streaming responses.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit