LangChain Tutorial (Python): streaming agent responses for advanced developers

By Cyprian AaronsUpdated 2026-04-21
langchainstreaming-agent-responses-for-advanced-developerspython

This tutorial shows how to stream agent responses token-by-token in LangChain using Python, so you can build chat UIs that feel responsive instead of waiting for the full answer. You need this when your agent calls tools, reasons over multiple steps, or sits behind a web socket where partial output is better than blocking for the final message.

What You'll Need

  • Python 3.10+
  • An OpenAI API key set as OPENAI_API_KEY
  • LangChain packages:
    • langchain
    • langchain-openai
    • langchain-core
  • A terminal and a virtual environment
  • Basic familiarity with LangChain agents and tools

Install the packages:

pip install langchain langchain-openai langchain-core

Step-by-Step

  1. Start by creating a model that supports streaming. In LangChain, streaming only matters if your model client is configured to emit chunks as they arrive.
import os
from langchain_openai import ChatOpenAI

if not os.getenv("OPENAI_API_KEY"):
    raise ValueError("Set OPENAI_API_KEY in your environment")

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    streaming=True,
)
  1. Define at least one tool. A streaming agent is only useful if it can do real work while it talks, so use a simple deterministic tool first.
from langchain_core.tools import tool

@tool
def get_policy_status(policy_id: str) -> str:
    """Return a mock policy status for a given policy ID."""
    lookup = {
        "POL123": "Active",
        "POL456": "Pending renewal",
        "POL789": "Lapsed",
    }
    return lookup.get(policy_id.upper(), "Policy not found")
  1. Build the agent with a prompt that tells it how to use the tool. This example uses the modern LangChain agent stack with create_tool_calling_agent and AgentExecutor.
from langchain import hub
from langchain.agents import AgentExecutor, create_tool_calling_agent

prompt = hub.pull("hwchase17/openai-tools-agent")
tools = [get_policy_status]

agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=False,
)
  1. Stream the response events and print tokens as they arrive. For advanced developers, this is the part that matters: you can separate model tokens from tool events and wire them into a UI or websocket.
import asyncio

async def main():
    user_input = "Check policy POL123 and explain what it means for renewal."
    async for event in agent_executor.astream_events(
        {"input": user_input},
        version="v2",
    ):
        event_type = event["event"]

        if event_type == "on_chat_model_stream":
            chunk = event["data"]["chunk"]
            if getattr(chunk, "content", None):
                print(chunk.content, end="", flush=True)

        if event_type == "on_tool_start":
            print("\n\n[tool started]", flush=True)

        if event_type == "on_tool_end":
            print(f"\n[tool result] {event['data']['output']}\n", flush=True)

asyncio.run(main())
  1. If you want cleaner production output, capture streamed text into a buffer instead of printing directly. That makes it easier to send partial messages to clients while still storing the final answer for logs or audits.
import asyncio

async def collect_stream():
    chunks = []
    async for event in agent_executor.astream_events(
        {"input": "What is the status of policy POL456?"},
        version="v2",
    ):
        if event["event"] == "on_chat_model_stream":
            chunk = event["data"]["chunk"]
            if getattr(chunk, "content", None):
                chunks.append(chunk.content)

    final_text = "".join(chunks)
    print(final_text)

asyncio.run(collect_stream())

Testing It

Run the script from your terminal and watch for two things: immediate token output and tool lifecycle events. If everything is wired correctly, you should see the tool start before the final answer completes, then streamed text continuing after the tool returns.

If you get no streaming output, check that streaming=True is set on ChatOpenAI and that you are using astream_events, not plain invoke. If the model errors on tool calling, confirm your installed LangChain versions are current and that your OpenAI key has access to the selected model.

For a real app, test with a slow tool or an API call so you can verify that users see progress while the agent waits on external systems.

Next Steps

  • Add multiple tools and route them through an agent executor with structured outputs
  • Wire astream_events into FastAPI WebSockets or Server-Sent Events
  • Add audit logging for streamed tokens, tool inputs, and tool outputs

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides