LangChain Tutorial (Python): streaming agent responses for beginners

By Cyprian AaronsUpdated 2026-04-21
langchainstreaming-agent-responses-for-beginnerspython

This tutorial shows you how to build a LangChain agent in Python that streams its responses token by token instead of waiting for the full answer. You need this when you want a chat UI, CLI, or backend service to feel responsive while the model is still generating output.

What You'll Need

  • Python 3.10+
  • An OpenAI API key
  • These packages:
    • langchain
    • langchain-openai
    • langchain-community
    • python-dotenv
  • A terminal and a virtual environment
  • Basic familiarity with LangChain agents and tools

Step-by-Step

  1. Set up your environment and install the packages. Keep your API key in an .env file so you do not hardcode credentials into your codebase.
python -m venv .venv
source .venv/bin/activate
pip install langchain langchain-openai langchain-community python-dotenv

Create a .env file:

OPENAI_API_KEY=your_openai_api_key_here
  1. Create a simple tool the agent can call. Streaming works best when you can see both tool usage and final answer generation in real time.
from langchain_core.tools import tool

@tool
def get_account_status(account_id: str) -> str:
    """Return a mock account status for a given account ID."""
    mock_db = {
        "1001": "Account 1001 is active with no overdue balance.",
        "1002": "Account 1002 is pending verification.",
    }
    return mock_db.get(account_id, f"Account {account_id} not found.")
  1. Build a streaming chat model and an agent prompt. The key detail is streaming=True, which tells the model to emit chunks as they arrive.
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

load_dotenv()

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    streaming=True,
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful support agent."),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])
  1. Create the agent and executor. This is the part that wires the LLM, prompt, and tool together into something that can reason and act.
from langchain.agents import create_openai_tools_agent, AgentExecutor

tools = [get_account_status]

agent = create_openai_tools_agent(llm=llm, tools=tools, prompt=prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
)
  1. Stream the response chunk by chunk in your terminal. Use astream_events so you can inspect tokens as they are produced instead of waiting for the full result.
import asyncio

async def main():
    user_input = "Check account 1001 and explain the status briefly."
    
    async for event in agent_executor.astream_events(
        {"input": user_input},
        version="v2",
    ):
        if event["event"] == "on_chat_model_stream":
            chunk = event["data"]["chunk"]
            if chunk.content:
                print(chunk.content, end="", flush=True)

if __name__ == "__main__":
    asyncio.run(main())
  1. Add a cleaner fallback for non-streaming output if you want to debug tool behavior first. This helps when you want to compare the final answer with streamed events during development.
result = agent_executor.invoke({"input": "Check account 1002 and summarize it."})
print("\n\nFinal result:")
print(result["output"])

Testing It

Run the script from your terminal and watch for partial text appearing before the full answer completes. If everything is wired correctly, you should see the agent think through the request, call get_account_status, then continue streaming its final response.

If you only see one complete block at the end, check that streaming=True is set on ChatOpenAI and that you are using astream_events, not invoke. Also confirm your OpenAI key is loaded correctly from .env.

For a better test, try prompts that force tool use, like asking about account 1001 or 1002. That makes it obvious whether the agent is actually calling tools before generating its final answer.

Next Steps

  • Add more tools, such as policy lookup or claims status functions.
  • Stream events into a FastAPI endpoint or WebSocket for a real frontend.
  • Learn how to handle intermediate tool-call events separately from token streaming.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides