LangChain Tutorial (Python): streaming agent responses for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
langchainstreaming-agent-responses-for-intermediate-developerspython

This tutorial shows you how to build a LangChain agent in Python that streams responses token-by-token instead of waiting for the full answer. You need this when you’re building chat UX, copilots, or support tools where users should see progress immediately rather than staring at a blank screen.

What You'll Need

  • Python 3.10+
  • An OpenAI API key
  • langchain
  • langchain-openai
  • openai
  • A terminal and a virtual environment
  • Basic familiarity with LangChain agents and chat models

Install the packages:

pip install langchain langchain-openai openai

Set your API key:

export OPENAI_API_KEY="your-api-key-here"

Step-by-Step

  1. Start with a simple streaming chat model. In LangChain, streaming is easiest when your model is configured with streaming=True, then you consume tokens from the response as they arrive.
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    streaming=True,
)

stream = llm.stream("Write one short sentence about streaming.")
for chunk in stream:
    if chunk.content:
        print(chunk.content, end="", flush=True)
print()
  1. Wrap the model in a tool-using agent. For a real agent, define at least one tool so the model has something to call before answering. Here we use a calculator-style tool built with @tool.
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

@tool
def add_numbers(a: int, b: int) -> int:
    """Add two integers."""
    return a + b

tools = [add_numbers]

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, streaming=True)
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=False)
  1. Stream the agent output through callbacks. The cleanest production pattern is to attach a custom callback handler and print tokens as they arrive. This gives you token-level updates while still letting the agent use tools.
from langchain_core.callbacks import BaseCallbackHandler

class StreamingPrinter(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(token, end="", flush=True)

result = executor.invoke(
    {"input": "What is 19 + 23? Explain briefly."},
    config={"callbacks": [StreamingPrinter()]},
)

print("\n\nFinal result:")
print(result["output"])
  1. If you want async streaming, use ainvoke with the same callback pattern. This is useful when your app already runs on asyncio, such as FastAPI or an async worker.
import asyncio
from langchain_core.callbacks import BaseCallbackHandler

class StreamingPrinter(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(token, end="", flush=True)

async def main():
    result = await executor.ainvoke(
        {"input": "Add 41 and 59, then say the total."},
        config={"callbacks": [StreamingPrinter()]},
    )
    print("\n\nFinal result:")
    print(result["output"])

asyncio.run(main())
  1. If you want to stream directly into a web UI later, keep the callback isolated from your business logic. That way your agent code stays reusable across CLI apps, APIs, and frontend event streams.
def run_agent(question: str) -> str:
    result = executor.invoke(
        {"input": question},
        config={"callbacks": [StreamingPrinter()]},
    )
    return result["output"]

if __name__ == "__main__":
    answer = run_agent("Use the tool to add 8 and 12.")
    print("\n\nReturned:", answer)

Testing It

Run the script from your terminal and confirm that text appears incrementally instead of all at once. You should see streamed tokens during generation and then a final full answer after the agent finishes.

Test two cases: one prompt that needs no tool call and one that does. For example, ask for a short explanation first, then ask it to add two numbers so you can verify both direct generation and tool-assisted reasoning.

If nothing streams until the end, check three things: streaming=True on ChatOpenAI, valid API credentials, and that your callback handler is actually passed through config={"callbacks": [...]}.

Next Steps

  • Wire the callback into FastAPI Server-Sent Events or WebSockets for browser streaming.
  • Add more tools and test how streamed output behaves during multi-step agent execution.
  • Learn LangChain message history so you can stream responses in multi-turn conversations without losing context.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides