LangChain Tutorial (Python): testing agents locally for advanced developers

By Cyprian AaronsUpdated 2026-04-21

langchaintesting-agents-locally-for-advanced-developerspython

This tutorial shows you how to run and test a LangChain agent locally in Python, with deterministic tools, structured outputs, and a repeatable test harness. You need this when you want to debug agent behavior before wiring it into production systems, especially for workflows that touch customer data, internal APIs, or regulated decisions.

What You'll Need

•Python 3.10+
•pip
•
A LangChain chat model provider API key
- •Example: OPENAI_API_KEY
•
Packages:
- •langchain
- •langchain-openai
- •langchain-core
- •pytest for local tests
•A local .env file or shell environment variables
•Basic familiarity with LangChain tools and prompts

Step-by-Step

•Install the dependencies and set up your environment.
For local agent testing, keep the stack small and explicit so failures are easy to trace.

pip install langchain langchain-openai langchain-core pytest python-dotenv
export OPENAI_API_KEY="your-key-here"

•Build a deterministic tool the agent can call.
In real systems, I start with one safe tool that has no side effects so I can validate planning and tool selection before connecting anything risky.

from langchain_core.tools import tool

@tool
def account_summary(customer_id: str) -> str:
    """Return a mock account summary for local testing."""
    mock_db = {
        "cust_1001": "Customer cust_1001: balance=$4,250.18, status=active",
        "cust_1002": "Customer cust_1002: balance=$120.00, status=past_due",
    }
    return mock_db.get(customer_id, f"Customer {customer_id}: not found")

•Create the agent with a chat model and the tool attached.
Use a model that supports tool calling, then bind the tool list so the agent can decide when to invoke it.

import os
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    api_key=os.environ["OPENAI_API_KEY"],
)

tools = [account_summary]
llm_with_tools = llm.bind_tools(tools)

messages = [
    HumanMessage(content="Check cust_1001 and tell me the account status.")
]

response = llm_with_tools.invoke(messages)
print(response)

•Wrap the model call in a small local test runner.
This is where you make behavior observable: print tool calls, execute them locally, then feed results back into the model for a final answer.

from langchain_core.messages import ToolMessage

def run_agent(question: str) -> str:
    first = llm_with_tools.invoke([HumanMessage(content=question)])

    if not first.tool_calls:
        return first.content

    tool_messages = []
    for call in first.tool_calls:
        if call["name"] == "account_summary":
            result = account_summary.invoke(call["args"])
            tool_messages.append(
                ToolMessage(
                    content=result,
                    tool_call_id=call["id"],
                )
            )

    final = llm_with_tools.invoke([HumanMessage(content=question), first, *tool_messages])
    return final.content

print(run_agent("Check cust_1002 and summarize the account state."))

•Add a pytest check so you can validate output locally without manual inspection every time.
This gives you a repeatable regression test when prompts or tools change.

def test_account_summary():
    result = account_summary.invoke({"customer_id": "cust_1001"})
    assert "balance=$4,250.18" in result

def test_agent_returns_customer_status():
    answer = run_agent("Check cust_1001 and tell me the status.")
    assert "active" in answer.lower()

•Run the tests and inspect failures like an engineer, not like a prompt user.
If the agent misses the tool call or returns unstable text, fix temperature first, then tighten your prompt or tool schema.

pytest -q

Testing It

Start by running the script directly and confirming that cust_1001 returns an answer mentioning active. Then run pytest and make sure both tests pass consistently across multiple runs.

If the agent answers without calling the tool, your prompt is too vague or your model choice is too weak for reliable function calling. If outputs vary between runs, keep temperature=0 and avoid free-form parsing in your assertions.

For more advanced debugging, print first.tool_calls before executing tools so you can see exactly what the model planned. That is usually where most agent bugs show up.

Next Steps

•Move from manual message chaining to LangGraph for explicit stateful agent flows.
•Replace mock tools with sandboxed wrappers around internal APIs.
•Add structured output parsing with Pydantic models so your tests assert on fields, not strings.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit