LangGraph Tutorial (Python): streaming agent responses for advanced developers

By Cyprian AaronsUpdated 2026-04-22

langgraphstreaming-agent-responses-for-advanced-developerspython

This tutorial shows you how to build a LangGraph agent in Python that streams responses token-by-token and emits intermediate graph events. You need this when you want UI updates, progress indicators, or partial answers before the full agent run completes.

What You'll Need

•Python 3.10+
•langgraph
•langchain-core
•langchain-openai
•An OpenAI API key set as OPENAI_API_KEY
•A terminal with pip
•Basic familiarity with LangGraph state graphs and chat models

Step-by-Step

•Install the dependencies and set up your environment. Keep the versions current enough to support astream and chat model streaming.

pip install langgraph langchain-core langchain-openai
export OPENAI_API_KEY="your-key-here"

•Define a minimal graph state and a streaming-friendly node. The important part is using a chat model that supports streaming, then returning messages in the shape LangGraph expects.

from typing import Annotated, Sequence, TypedDict

from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI


class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]


llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, streaming=True)


def assistant_node(state: AgentState):
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

•Build the graph and compile it. For a simple streaming setup, one assistant node is enough; the graph still gives you the same event interface you’ll use in larger systems with tools and routing.

builder = StateGraph(AgentState)
builder.add_node("assistant", assistant_node)
builder.add_edge(START, "assistant")
builder.add_edge("assistant", END)

graph = builder.compile()

•Stream events from the graph instead of calling it synchronously. Use astream_events when you want fine-grained event visibility, which is what you need for token streams, node transitions, and observability hooks.

import asyncio


async def main():
    inputs = {
        "messages": [
            HumanMessage(content="Write a one-sentence summary of why streaming matters in agents.")
        ]
    }

    async for event in graph.astream_events(inputs, version="v2"):
        event_type = event["event"]

        if event_type == "on_chat_model_stream":
            chunk = event["data"]["chunk"]
            if getattr(chunk, "content", ""):
                print(chunk.content, end="", flush=True)

        elif event_type == "on_chain_start":
            print("\n[graph started]")

        elif event_type == "on_chain_end":
            print("\n[graph finished]")


if __name__ == "__main__":
    asyncio.run(main())

•If you only want final outputs but still need async execution, use astream. This gives you graph-level streamed state updates rather than low-level callback events.

import asyncio


async def main():
    inputs = {
        "messages": [
            HumanMessage(content="Give me three reasons to stream agent responses.")
        ]
    }

    async for state in graph.astream(inputs):
        last_message = state["messages"][-1]
        if isinstance(last_message, AIMessage):
            print(last_message.content)


if __name__ == "__main__":
    asyncio.run(main())

•Add a real production pattern: separate token streaming from final state handling. In practice, your UI consumes stream events while your backend keeps the final message for persistence or audit logging.

async def run_and_capture():
    inputs = {
        "messages": [
            HumanMessage(content="Explain LangGraph streaming in one paragraph.")
        ]
    }

    final_text = ""

    async for event in graph.astream_events(inputs, version="v2"):
        if event["event"] == "on_chat_model_stream":
            chunk = event["data"]["chunk"]
            if getattr(chunk, "content", ""):
                final_text += chunk.content
                print(chunk.content, end="", flush=True)

    return final_text

Testing It

Run the script from your terminal and confirm you see output before the full response completes. If nothing streams, check that streaming=True is set on ChatOpenAI and that your model/provider actually supports streaming.

For debugging, print every event once before filtering so you can inspect what LangGraph is emitting in your environment. Also verify that your OpenAI key is available in the same shell session where you launch Python.

If you’re integrating this into an API server, test it behind an async endpoint like FastAPI or an SSE/WebSocket handler. That’s where streaming matters most: users should see partial output without waiting for the full graph to finish.

Next Steps

•Add tool nodes and stream both tool progress and model tokens.
•Persist final messages to Postgres or a document store after each run.
•Wrap astream_events in Server-Sent Events or WebSocket transport for browser clients.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit