LlamaIndex Tutorial (Python): streaming agent responses for intermediate developers
This tutorial shows you how to build a LlamaIndex agent in Python that streams partial responses back to the caller instead of waiting for the full answer. You need this when your app has to feel responsive, show token-by-token progress, or forward agent output into a UI, webhook, or SSE endpoint.
What You'll Need
- •Python 3.10+
- •
llama-indexinstalled - •An OpenAI API key set as
OPENAI_API_KEY - •A terminal and a virtual environment
- •Basic familiarity with LlamaIndex agents and tools
Install the package:
pip install llama-index
Set your API key:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Start with a simple tool the agent can call.
Streaming only matters if the agent has something useful to do, so we’ll give it a small calculator tool first.
from llama_index.core.tools import FunctionTool
def multiply(a: float, b: float) -> float:
return a * b
multiply_tool = FunctionTool.from_defaults(fn=multiply)
- •Create an agent with streaming enabled.
The important part isstreaming=True; that tells LlamaIndex to return an iterator-like response instead of buffering everything until completion.
import os
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"])
agent = FunctionAgent(
tools=[multiply_tool],
llm=llm,
system_prompt="You are a helpful assistant. Use tools when needed.",
streaming=True,
)
- •Run the agent and print tokens as they arrive.
In a real service, this loop is where you would push chunks into your frontend, SSE stream, or async queue.
import asyncio
async def main():
handler = agent.run("What is 17 times 23? Explain briefly.")
async for event in handler.stream_events():
if hasattr(event, "delta") and event.delta:
print(event.delta, end="", flush=True)
result = await handler
print("\n\nFinal response:")
print(result)
if __name__ == "__main__":
asyncio.run(main())
- •Handle streamed output in a way that works in production.
The pattern below separates raw streaming from final result handling, which makes it easier to log, trace, and retry without mixing concerns.
import asyncio
async def stream_agent_answer(prompt: str) -> str:
handler = agent.run(prompt)
chunks = []
async for event in handler.stream_events():
delta = getattr(event, "delta", None)
if delta:
chunks.append(delta)
print(delta, end="", flush=True)
result = await handler
return "".join(chunks), str(result)
if __name__ == "__main__":
text, final = asyncio.run(stream_agent_answer("Multiply 12 by 8 and keep it short."))
print("\n---")
print("Streamed text:", text)
print("Final text:", final)
- •If you want to stream from an API endpoint later, keep the same core pattern.
Your web layer should just forward chunks as they arrive; don’t rebuild streaming logic inside the route handler.
async def get_streamed_response(prompt: str):
handler = agent.run(prompt)
async for event in handler.stream_events():
delta = getattr(event, "delta", None)
if delta:
yield delta
# Example usage in an async context:
# async for chunk in get_streamed_response("What is 9 times 11?"):
# send_to_client(chunk)
Testing It
Run the script from your terminal and watch the response appear incrementally instead of all at once. If streaming is working, you should see partial text printed before the final response object is returned.
Test both a direct-answer prompt and a tool-using prompt like “What is 17 times 23?” so you can confirm the agent can still call tools while streaming output. If nothing streams until the end, check that streaming=True is set on the agent and that your OpenAI key is valid.
For debugging, log each event type before filtering on delta. That makes it easier to see whether you’re receiving model tokens, tool events, or other internal workflow events.
Next Steps
- •Wire the same streaming loop into FastAPI or Starlette using Server-Sent Events.
- •Add more tools with
FunctionTooland test how tool calls behave during streamed runs. - •Learn how to persist conversation state so streamed agents can continue multi-turn workflows.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit