CrewAI Tutorial (Python): adding observability for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
crewaiadding-observability-for-intermediate-developerspython

This tutorial shows you how to add observability to a CrewAI project so you can inspect agent runs, tool calls, and task outputs instead of guessing why a crew behaved a certain way. You need this when your agents start doing real work and you want traces, logs, and failure points you can actually debug.

What You'll Need

  • Python 3.10 or newer
  • A CrewAI project already working locally
  • crewai installed
  • An OpenAI API key for the LLM used by your agents
  • A Langfuse account and project for tracing
  • LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, and LANGFUSE_HOST
  • OPENAI_API_KEY set in your environment

Step-by-Step

  1. First install the packages you need. CrewAI handles the agent runtime, while Langfuse gives you structured traces for runs, tasks, and tool execution.
pip install crewai langfuse python-dotenv
  1. Add your environment variables in a .env file. Keep secrets out of source control and load them at runtime.
OPENAI_API_KEY=your_openai_key
LANGFUSE_PUBLIC_KEY=your_langfuse_public_key
LANGFUSE_SECRET_KEY=your_langfuse_secret_key
LANGFUSE_HOST=https://cloud.langfuse.com
  1. Create a small CrewAI crew with one research agent and one writing task. This is the baseline run you will instrument.
from dotenv import load_dotenv
from crewai import Agent, Task, Crew, Process
from crewai.llm import LLM

load_dotenv()

llm = LLM(model="gpt-4o-mini")

researcher = Agent(
    role="Researcher",
    goal="Find concise facts about observability in agent systems",
    backstory="You are precise and practical.",
    llm=llm,
    verbose=True,
)

task = Task(
    description="Explain why observability matters for AI agents in 3 bullets.",
    expected_output="Three clear bullets with practical reasoning.",
    agent=researcher,
)

crew = Crew(
    agents=[researcher],
    tasks=[task],
    process=Process.sequential,
    verbose=True,
)

result = crew.kickoff()
print(result)
  1. Now wire in Langfuse so each run becomes traceable. The cleanest pattern is to initialize the client once and use it around your crew execution so you can attach metadata like user ID, request ID, or environment.
import os
from dotenv import load_dotenv
from langfuse import Langfuse

load_dotenv()

langfuse = Langfuse(
    public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
    secret_key=os.environ["LANGFUSE_SECRET_KEY"],
    host=os.environ["LANGFUSE_HOST"],
)

trace = langfuse.trace(
    name="crewai-observability-demo",
    user_id="developer-local",
    metadata={
        "service": "crew-observability-tutorial",
        "environment": "local",
    },
)

span = trace.span(name="crew-kickoff")
span.update(input={"task": "Explain why observability matters for AI agents"})
  1. Wrap the crew run and record the output into the same trace. This gives you a single place to inspect what was asked, what was returned, and whether the run succeeded.
from dotenv import load_dotenv
from crewai import Agent, Task, Crew, Process
from crewai.llm import LLM

load_dotenv()

llm = LLM(model="gpt-4o-mini")

agent = Agent(
    role="Researcher",
    goal="Explain observability clearly",
    backstory="You write practical internal docs.",
    llm=llm,
)

task = Task(
    description="List 3 reasons observability matters for CrewAI workflows.",
    expected_output="A short list of reasons.",
    agent=agent,
)

crew = Crew(agents=[agent], tasks=[task], process=Process.sequential)

result = crew.kickoff()
span.update(output={"result": str(result)})
span.end()
langfuse.flush()
print(result)
  1. Add basic error capture so failed runs are visible too. In production, this is the difference between “the agent failed” and “the tool call timed out after 12 seconds on task 2.”
try:
    result = crew.kickoff()
    span.update(output={"result": str(result)})
except Exception as exc:
    span.update(
        level="ERROR",
        output={"error": type(exc).__name__, "message": str(exc)},
    )
    raise
finally:
    span.end()
    langfuse.flush()

Testing It

Run the script from your terminal with your environment variables loaded. You should see the normal CrewAI output in stdout, and a matching trace should appear in Langfuse within a few seconds.

If the trace does not show up, check that LANGFUSE_HOST matches your deployment and that both keys are valid for the same project. Also confirm that langfuse.flush() is called before process exit; otherwise spans may not be sent.

A good test is to intentionally break a prompt or point an agent at a missing tool. You want to verify that failures are captured as error spans with enough context to reproduce the issue later.

Next Steps

  • Add per-task metadata like customer segment, workflow name, or request ID so traces are searchable in production.
  • Instrument tool functions separately so you can measure latency and failure rates outside the agent loop.
  • Connect traces to structured logging so you can correlate CrewAI runs with app logs and API requests.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides