LangGraph Tutorial (Python): testing agents locally for advanced developers

By Cyprian AaronsUpdated 2026-04-22
langgraphtesting-agents-locally-for-advanced-developerspython

This tutorial shows you how to run and test a LangGraph agent locally in Python, with the same graph structure you would use in production. The goal is to make debugging deterministic: you can inspect state, mock model calls, and verify tool execution without pushing anything to a remote service.

What You'll Need

  • Python 3.10+
  • langgraph
  • langchain-core
  • langchain-openai
  • pytest
  • An OpenAI API key set as OPENAI_API_KEY if you want to run real model calls
  • Optional but useful:
    • python-dotenv for local env loading
    • pydantic for structured state models

Install the packages:

pip install langgraph langchain-core langchain-openai pytest python-dotenv

Step-by-Step

  1. Start with a minimal graph state and one tool. For local testing, keep the state small and explicit so assertions are easy.
from typing import Annotated, TypedDict

from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.tools import tool
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages


class State(TypedDict):
    messages: Annotated[list, add_messages]


@tool
def lookup_policy(policy_id: str) -> str:
    """Return a fake policy status for local testing."""
    return f"Policy {policy_id} is active"


def should_continue(state: State) -> str:
    last = state["messages"][-1]
    if isinstance(last, AIMessage) and last.tool_calls:
        return "tools"
    return END
  1. Build a graph that can call tools and then return control to the model. This pattern is what you want to test locally before wiring in real dependencies.
import os

from langchain_openai import ChatOpenAI
from langgraph.prebuilt import ToolNode


llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
tools = [lookup_policy]
llm_with_tools = llm.bind_tools(tools)

def call_model(state: State) -> dict:
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}

graph = StateGraph(State)
graph.add_node("agent", call_model)
graph.add_node("tools", ToolNode(tools))
graph.add_edge(START, "agent")
graph.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
graph.add_edge("tools", "agent")

app = graph.compile()
  1. Run the graph once with a real prompt and inspect the output messages. For local testing, print the full message trace so you can see whether tool calls happened where you expected.
if __name__ == "__main__":
    result = app.invoke(
        {"messages": [HumanMessage(content="Check policy 12345 status")]}
    )

    for msg in result["messages"]:
        print(type(msg).__name__, "=>", msg.content)
        if hasattr(msg, "tool_calls") and msg.tool_calls:
            print("tool_calls =>", msg.tool_calls)
  1. Add a deterministic test by mocking the model instead of calling OpenAI. This is the part most teams skip, and it is exactly what makes local agent testing reliable.
from langchain_core.language_models.fake_chat_models import FakeMessagesListChatModel


fake_model = FakeMessagesListChatModel(
    responses=[
        AIMessage(
            content="",
            tool_calls=[{"name": "lookup_policy", "args": {"policy_id": "12345"}, "id": "call_1"}],
        ),
        AIMessage(content="Policy 12345 is active"),
    ]
)

fake_llm_with_tools = fake_model.bind_tools(tools)

def fake_call_model(state: State) -> dict:
    response = fake_llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}
  1. Swap the node implementation and assert on behavior with pytest. You want tests that verify both the final answer and the intermediate tool invocation path.
def build_app(call_model_fn):
    g = StateGraph(State)
    g.add_node("agent", call_model_fn)
    g.add_node("tools", ToolNode(tools))
    g.add_edge(START, "agent")
    g.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
    g.add_edge("tools", "agent")
    return g.compile()


def test_agent_tool_flow():
    test_app = build_app(fake_call_model)
    result = test_app.invoke({"messages": [HumanMessage(content="Check policy 12345 status")]})

    final_message = result["messages"][-1]
    assert isinstance(final_message, AIMessage)
    assert final_message.content == "Policy 12345 is active"

Testing It

Run the script once with your real model configured and confirm that the message trace includes an assistant tool call followed by a tool response. Then run pytest and verify the deterministic test passes without any network dependency.

If your graph hangs or loops forever, your conditional edge logic is wrong or your model keeps emitting tool calls after the tool result. In practice, I also log state["messages"][-3:] during development because it makes bad transitions obvious fast.

Next Steps

  • Add more tools and test each one with isolated fake responses.
  • Move from plain TypedDict state to Pydantic models when your agent state gets larger.
  • Test streaming with .stream() so you can verify partial outputs before shipping to production.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides