LangGraph Tutorial (Python): streaming agent responses for intermediate developers

By Cyprian AaronsUpdated 2026-04-22
langgraphstreaming-agent-responses-for-intermediate-developerspython

This tutorial shows you how to stream agent responses from a LangGraph app in Python, token by token and step by step. You need this when you want a UI, CLI, or API client to start showing useful output before the full agent run is finished.

What You'll Need

  • Python 3.10+
  • langgraph
  • langchain-openai
  • openai API key
  • A .env file or exported environment variable for OPENAI_API_KEY
  • Basic familiarity with LangGraph nodes, edges, and state

Install the packages first:

pip install langgraph langchain-openai python-dotenv

Step-by-Step

  1. Start with a minimal graph state and a chat model that supports streaming. The key idea is that LangGraph can emit updates as each node finishes, while the model itself can stream tokens inside the node.
from typing import Annotated, TypedDict
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages

load_dotenv()

class State(TypedDict):
    messages: Annotated[list, add_messages]

model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
  1. Define a single agent node that calls the model with the current message history. Returning the assistant message into state is enough for LangGraph to keep track of the conversation.
def agent(state: State):
    response = model.invoke(state["messages"])
    return {"messages": [response]}
  1. Build and compile the graph. This is the part most people already know, but it matters because streaming only works once you use the compiled app object.
builder = StateGraph(State)
builder.add_node("agent", agent)
builder.add_edge(START, "agent")
builder.add_edge("agent", END)

app = builder.compile()
  1. Run the graph with stream_mode="updates" if you want node-level progress. This gives you each state update as it happens, which is useful for logging, progress indicators, or incremental UI rendering.
inputs = {
    "messages": [
        ("user", "Write a 1-sentence summary of why streaming matters in agent apps.")
    ]
}

for chunk in app.stream(inputs, stream_mode="updates"):
    print(chunk)
  1. If you want actual token streaming from the LLM, use stream_mode="messages". This is what you want when building chat UIs because it lets you display partial assistant text before the node finishes.
inputs = {
    "messages": [
        ("user", "Explain LangGraph streaming in one short paragraph.")
    ]
}

for event in app.stream(inputs, stream_mode="messages"):
    token, metadata = event
    if token.content:
        print(token.content, end="", flush=True)
print()
  1. For production code, separate transport from execution so your API layer can forward chunks directly to clients. This pattern keeps your graph logic clean and makes it easy to swap console output for SSE or WebSockets later.
def run_stream(prompt: str):
    inputs = {"messages": [("user", prompt)]}
    for event in app.stream(inputs, stream_mode="messages"):
        token, metadata = event
        if token.content:
            yield token.content

if __name__ == "__main__":
    for text in run_stream("Give me three reasons to stream agent responses."):
        print(text, end="", flush=True)
    print()

Testing It

Run the script and confirm that output appears incrementally instead of all at once. If you use stream_mode="updates", you should see dictionaries representing graph updates; if you use stream_mode="messages", you should see assistant text arriving piece by piece.

If nothing streams, check that your OpenAI key is available in the environment and that your model name is valid. Also verify that your terminal flushes output correctly; print(..., end="", flush=True) matters here.

A good test prompt is something short and deterministic like “Write three bullet points about risk scoring.” That makes it obvious whether partial tokens are being emitted.

Next Steps

  • Add a second node for tool execution and stream both tool progress and model output.
  • Wrap app.stream(...) in FastAPI Server-Sent Events or WebSockets.
  • Learn stream_mode="values" so you can inspect full state snapshots during debugging.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides