CrewAI Tutorial (Python): streaming agent responses for beginners

By Cyprian AaronsUpdated 2026-04-21
crewaistreaming-agent-responses-for-beginnerspython

This tutorial shows you how to stream CrewAI agent output in Python so you can display tokens or chunks as they are generated instead of waiting for the full response. You need this when building chat UIs, long-running agent workflows, or anything where users should see progress immediately.

What You'll Need

  • Python 3.10+
  • crewai
  • openai
  • An OpenAI API key
  • Basic familiarity with CrewAI agents, tasks, and crews
  • A terminal and a text editor

Install the packages first:

pip install crewai openai

Set your API key in the environment:

export OPENAI_API_KEY="your-api-key"

Step-by-Step

  1. Start with a minimal CrewAI setup that returns a normal response first. This gives you a baseline before adding streaming, which makes debugging much easier.
from crewai import Agent, Task, Crew, Process
from crewai.llm import LLM

llm = LLM(model="gpt-4o-mini")

writer = Agent(
    role="Support Writer",
    goal="Write short helpful responses",
    backstory="You write concise support replies for internal tools.",
    llm=llm,
)

task = Task(
    description="Write a 2-sentence reply explaining what streaming is.",
    expected_output="A short explanation of streaming.",
    agent=writer,
)

crew = Crew(
    agents=[writer],
    tasks=[task],
    process=Process.sequential,
)

result = crew.kickoff()
print(result)
  1. Add a callback that receives streamed chunks from the model. In CrewAI, the cleanest beginner-friendly pattern is to pass an LLM callback and print each chunk as it arrives.
from crewai import Agent, Task, Crew, Process
from crewai.llm import LLM

def stream_callback(chunk):
    text = getattr(chunk, "content", None) or str(chunk)
    print(text, end="", flush=True)

llm = LLM(
    model="gpt-4o-mini",
    temperature=0.2,
    callbacks=[stream_callback],
)

agent = Agent(
    role="Streaming Assistant",
    goal="Respond clearly and incrementally",
    backstory="You help developers understand streamed output.",
    llm=llm,
)

task = Task(
    description="Explain in one paragraph why streaming improves UX.",
    expected_output="A single paragraph explanation.",
    agent=agent,
)
  1. Run the crew with the same execution flow, but now every token or chunk is printed as it arrives. This is the part you wire into a CLI app or web socket later.
crew = Crew(
    agents=[agent],
    tasks=[task],
    process=Process.sequential,
)

print("\n--- Streaming response ---\n")
result = crew.kickoff()
print("\n\n--- Final result object ---\n")
print(result)
  1. If you want cleaner output in a real app, wrap the streaming logic in a function and keep UI concerns separate from agent logic. That makes it easier to reuse the same agent in FastAPI, Streamlit, or a plain terminal app.
from crewai import Agent, Task, Crew, Process
from crewai.llm import LLM

def make_streaming_llm():
    def on_chunk(chunk):
        text = getattr(chunk, "content", None) or str(chunk)
        print(text, end="", flush=True)

    return LLM(model="gpt-4o-mini", callbacks=[on_chunk])

def build_crew():
    llm = make_streaming_llm()

    agent = Agent(
        role="Help Desk Agent",
        goal="Answer questions with streamed output",
        backstory="You are concise and practical.",
        llm=llm,
    )

    task = Task(
        description="Give three bullet points on when to use streaming.",
        expected_output="Three bullet points.",
        agent=agent,
    )

    return Crew(agents=[agent], tasks=[task], process=Process.sequential)

crew = build_crew()
print(crew.kickoff())
  1. For multi-agent crews, stream each agent through the same callback if you want consistent behavior across the whole workflow. That keeps your UI simple because every model call uses the same output path.
from crewai import Agent, Task, Crew, Process
from crewai.llm import LLM

def stream_callback(chunk):
    text = getattr(chunk, "content", None) or str(chunk)
    print(text, end="", flush=True)

llm = LLM(model="gpt-4o-mini", callbacks=[stream_callback])

researcher = Agent(
    role="Researcher",
    goal="Summarize technical concepts",
    backstory="You gather accurate implementation details.",
    llm=llm,
)

writer = Agent(
    role="Writer",
    goal="Turn research into clear guidance",
    backstory="You write practical tutorials for developers.",
    llm=llm,
)

task1 = Task(description="Explain what streaming means in one paragraph.", agent=researcher)
task2 = Task(description="Rewrite that explanation for beginners.", agent=writer)

crew = Crew(
    agents=[researcher, writer],
    tasks=[task1, task2],
process=Process.sequential,
)

print(crew.kickoff())

Testing It

Run the script from your terminal and watch for text appearing before the final result prints. If everything is wired correctly, you should see partial output arrive progressively instead of one full block at the end.

If you only see a final response with no incremental text, check three things: your API key is set correctly, your model supports streaming through the SDK path you’re using, and your callback is actually attached to the LLM instance passed into the agent.

Also test with a longer prompt like “write 10 numbered steps” so there’s enough output to notice streaming behavior. Short answers can look non-streamed simply because they finish too quickly.

Next Steps

  • Wire the callback into a FastAPI endpoint using Server-Sent Events or WebSockets.
  • Add per-agent prefixes so you can tell which agent is currently speaking in multi-agent crews.
  • Learn how to persist streamed messages into Redis or Postgres for audit logs and replay.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides