CrewAI Tutorial (Python): streaming agent responses for intermediate developers
This tutorial shows how to stream CrewAI agent output in Python so you can display partial responses as they’re generated instead of waiting for the final answer. You need this when you’re building chat UIs, internal copilots, or support workflows where latency and user feedback matter.
What You'll Need
- •Python 3.10 or newer
- •
crewai - •
openaiAPI access through an environment variable - •A valid
OPENAI_API_KEY - •Basic familiarity with CrewAI agents, tasks, and crews
- •A terminal and a Python virtual environment
Install the package:
pip install crewai
Set your API key:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Start with a minimal agent and task setup. The key idea is that streaming is controlled at execution time, not by changing the agent itself.
from crewai import Agent, Task, Crew, Process
researcher = Agent(
role="Research Analyst",
goal="Summarize technical topics clearly",
backstory="You are concise and accurate.",
verbose=True,
)
task = Task(
description="Explain what streaming agent responses means in one paragraph.",
expected_output="A short technical explanation.",
agent=researcher,
)
crew = Crew(
agents=[researcher],
tasks=[task],
process=Process.sequential,
)
- •Run the crew with streaming enabled. In current CrewAI versions, the simplest production-friendly pattern is to use
kickoff()and print chunks as they arrive through the console output from the underlying LLM client.
import os
from crewai import Agent, Task, Crew, Process
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY", "")
writer = Agent(
role="Technical Writer",
goal="Write clear explanations for developers",
backstory="You write practical docs for production systems.",
verbose=True,
)
task = Task(
description="Write three bullet points about why streaming helps UX.",
expected_output="Three concise bullets.",
agent=writer,
)
crew = Crew(
agents=[writer],
tasks=[task],
process=Process.sequential,
)
result = crew.kickoff()
print("\n\nFinal result:\n")
print(result)
- •If you need true token-by-token streaming into your own application layer, wire the underlying LLM with a callback handler. This is useful when you want to push chunks into a websocket, SSE endpoint, or CLI spinner.
from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI
from langchain_core.callbacks.base import BaseCallbackHandler
class PrintTokensHandler(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs) -> None:
print(token, end="", flush=True)
llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0,
streaming=True,
callbacks=[PrintTokensHandler()],
)
agent = Agent(
role="Assistant",
goal="Stream answers clearly",
backstory="You respond in real time.",
llm=llm,
)
task = Task(
description="Explain how streaming works in one short paragraph.",
expected_output="A short explanation.",
agent=agent,
)
crew = Crew(agents=[agent], tasks=[task])
crew.kickoff()
- •Build a reusable wrapper so your app can switch between streamed and non-streamed runs without rewriting your crew logic. This pattern keeps your orchestration code clean.
from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI
def build_crew(streaming: bool):
llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0,
streaming=streaming,
)
agent = Agent(
role="Support Assistant",
goal="Answer questions in plain language",
backstory="You help users understand technical workflows.",
llm=llm,
verbose=True,
)
task = Task(
description="Describe one benefit of response streaming.",
expected_output="A short answer.",
agent=agent,
)
return Crew(agents=[agent], tasks=[task])
crew = build_crew(streaming=False)
print(crew.kickoff())
- •If you’re building a web app, send streamed tokens to the client instead of printing them. The exact transport depends on your stack, but the shape stays the same: callback receives token → forward token to UI.
from queue import Queue
from threading import Thread
token_queue: Queue[str] = Queue()
class QueueTokenHandler(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs) -> None:
token_queue.put(token)
def run_crew():
llm = ChatOpenAI(model="gpt-4o-mini", streaming=True, callbacks=[QueueTokenHandler()])
agent = Agent(role="Assistant", goal="Stream output", backstory="", llm=llm)
task = Task(description="Say hello in a helpful way.", expected_output="Greeting", agent=agent)
Crew(agents=[agent], tasks=[task]).kickoff()
Thread(target=run_crew).start()
Testing It
Run the script from your terminal and watch for output appearing incrementally rather than all at once. If you used a callback handler, you should see tokens printed as they arrive from the model.
If nothing streams, check three things first: OPENAI_API_KEY is set correctly, your installed versions of crewai and langchain-openai are compatible, and the model name supports streaming. For web apps or queues, confirm that your callback is actually attached to the LLM instance used by the agent.
A good sanity check is to compare streamed vs non-streamed behavior using the same task prompt. Non-streamed execution returns only after completion; streamed execution should show partial text before the full response is done.
Next Steps
- •Add a FastAPI SSE endpoint so browser clients can consume streamed tokens directly.
- •Use multiple agents with one shared streamed LLM for coordinated live output.
- •Add retry logic and timeouts around your callback path so dropped connections don’t break execution.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit