LangGraph Tutorial (Python): streaming agent responses for beginners

By Cyprian AaronsUpdated 2026-04-22
langgraphstreaming-agent-responses-for-beginnerspython

This tutorial shows how to build a LangGraph agent in Python that streams tokens and intermediate updates back to the caller. You need this when you want a chat UI, CLI, or API endpoint to show progress immediately instead of waiting for the full response.

What You'll Need

  • Python 3.10+
  • langgraph
  • langchain-openai
  • openai API key
  • A terminal and a text editor
  • Basic familiarity with LangGraph nodes, edges, and state

Install the packages:

pip install langgraph langchain-openai openai

Set your OpenAI key in the environment:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

  1. Start by defining a minimal graph state and a single assistant node. For streaming, the important part is that the model is created with streaming=True so LangGraph can emit partial output as it runs.
from typing import Annotated, TypedDict

from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages


class State(TypedDict):
    messages: Annotated[list, add_messages]


llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)


def assistant(state: State):
    response = llm.invoke(state["messages"])
    return {"messages": [response]}
  1. Build the graph with one node and compile it. This gives you an executable agent that accepts chat messages and returns updated state.
graph_builder = StateGraph(State)
graph_builder.add_node("assistant", assistant)
graph_builder.add_edge(START, "assistant")
graph_builder.add_edge("assistant", END)

app = graph_builder.compile()
  1. Run the graph with a user message first using normal execution so you can confirm the base agent works. This is your baseline before adding streaming behavior.
from langchain_core.messages import HumanMessage

inputs = {
    "messages": [
        HumanMessage(content="Write one sentence about why streaming matters in chat apps.")
    ]
}

result = app.invoke(inputs)
print(result["messages"][-1].content)
  1. Switch to streaming mode with app.stream(). Each chunk you receive represents an incremental update from the graph, which is what you use to render live output in a UI or terminal.
inputs = {
    "messages": [
        HumanMessage(content="Give me three short benefits of streaming responses.")
    ]
}

for chunk in app.stream(inputs):
    print(chunk)
  1. If you want token-level output from the model itself, stream directly from the LLM inside your node. This pattern is useful when you want to print tokens as they arrive instead of waiting for the node to finish.
def assistant_streaming(state: State):
    for chunk in llm.stream(state["messages"]):
        if chunk.content:
            print(chunk.content, end="", flush=True)
    return {"messages": []}
  1. For a cleaner beginner setup, use astream_events() when you want structured events instead of raw chunks. This is better for production because you can separate model tokens, tool calls, and graph lifecycle events.
import asyncio


async def main():
    async for event in app.astream_events(inputs, version="v2"):
        print(event["event"], event.get("name", ""))


asyncio.run(main())

Testing It

Run the script from your terminal and confirm that invoke() returns a final assistant message first. Then run stream() and check that you see multiple chunks instead of one final payload.

If you use astream_events(), make sure you see events like graph start, node execution, and graph end. If nothing streams, verify that OPENAI_API_KEY is set and that your model name is valid.

For UI work, replace print(chunk) with logic that appends text to your frontend or websocket response. The main thing to verify is that users see progress before generation finishes.

Next Steps

  • Add a tool node and stream tool execution events alongside model output.
  • Replace invoke() inside nodes with proper token handling for finer-grained streaming.
  • Wrap the graph in FastAPI or Flask so your frontend can consume streamed updates over SSE or websockets.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides