How to Fix 'streaming response cutoff in production' in CrewAI (TypeScript)
What this error means
streaming response cutoff in production usually means CrewAI started streaming tokens back from a model, then the stream got terminated before the agent finished its response. In TypeScript projects, this shows up most often when the process exits early, the HTTP connection is closed, or your handler stops reading the stream.
You’ll typically see it in production behind a serverless runtime, reverse proxy, or API route that has a short timeout. It can also happen when you use AgentExecutor or Crew streaming output but don’t keep the request alive long enough to finish.
The Most Common Cause
The #1 cause is your runtime ends before the stream completes.
This happens a lot in Next.js API routes, serverless functions, and background jobs where the handler returns early. CrewAI keeps emitting chunks, but your process or response object is already gone.
Broken vs fixed pattern
| Broken pattern | Fixed pattern |
|---|---|
| Returns before stream finishes | Awaits full completion |
| Doesn’t keep HTTP response open | Keeps response open until done |
| Uses fire-and-forget async call | Uses explicit await / stream drain |
// BROKEN: handler returns while CrewAI is still streaming
import { NextRequest, NextResponse } from "next/server";
import { Crew } from "@crew-ai/crewai";
export async function POST(req: NextRequest) {
const crew = new Crew({
// ...agents/tasks
verbose: true,
// streaming enabled somewhere in your setup
});
crew.kickoff(); // Fire-and-forget
return NextResponse.json({ ok: true });
}
// FIXED: wait for completion before returning
import { NextRequest, NextResponse } from "next/server";
import { Crew } from "@crew-ai/crewai";
export async function POST(req: NextRequest) {
const crew = new Crew({
// ...agents/tasks
verbose: true,
});
const result = await crew.kickoff();
return NextResponse.json({
ok: true,
result,
});
}
If you’re using a streaming API, don’t just start the stream and exit. Drain it fully or pipe it to the client until completion.
// FIXED for streaming use cases: keep reading until done
const stream = await crew.kickoffStream();
for await (const chunk of stream) {
console.log(chunk);
}
Other Possible Causes
1) Reverse proxy timeout
Nginx, Cloudflare, Vercel, ALB, or an API gateway may cut idle or long-running responses.
location /api/crew {
proxy_read_timeout 300s;
proxy_send_timeout 300s;
}
If your model call takes longer than the proxy timeout, you’ll get truncated output even if your app code is correct.
2) Serverless timeout
AWS Lambda, Vercel Functions, and similar runtimes have hard execution limits.
export const maxDuration = 60; // Vercel example
If your task routinely runs past that limit, move it to a queue worker or background job. Don’t try to force long LLM streams through a short-lived request lifecycle.
3) Client disconnects
If the browser closes the tab or your frontend aborts the request with AbortController, the backend stream dies too.
const controller = new AbortController();
fetch("/api/crew", {
method: "POST",
signal: controller.signal,
});
Check whether your frontend is cancelling requests on route changes or component unmounts.
4) Token budget too large for the transport window
Sometimes the model is fine, but your response is too large for your app layer to hold open safely.
const crew = new Crew({
agents,
tasks,
});
// Keep outputs bounded
const task = {
description: "Summarize claim status in <= 200 words",
};
If you ask for huge outputs with streaming enabled, reduce max tokens, split work into smaller tasks, or persist partial results as they arrive.
How to Debug It
- •
Check whether the process exits early
- •Add logs before and after
await crew.kickoff(). - •If “after” never prints, you’re returning too soon or crashing mid-stream.
- •Add logs before and after
- •
Inspect infrastructure timeouts
- •Check Vercel
maxDuration, Lambda timeout, Nginxproxy_read_timeout, Cloudflare limits. - •Compare them against real model latency under load.
- •Check Vercel
- •
Test without streaming
- •Temporarily disable streaming and use plain
await crew.kickoff(). - •If non-streaming works, your issue is in stream handling rather than agent logic.
- •Temporarily disable streaming and use plain
- •
Add abort and disconnect logging
- •Log request cancellation events.
- •In Node handlers, confirm whether the client closed the connection before completion.
req.signal.addEventListener("abort", () => {
console.error("Client aborted request");
});
Prevention
- •Use
awaiton every CrewAI execution path unless you are explicitly piping a stream end-to-end. - •Keep LLM responses bounded with clear task constraints like word limits, JSON output formats, or chunked subtasks.
- •Match your runtime to the workload:
- •short HTTP requests for short tasks
- •queues/workers for long agent runs
If you’re building a bank or insurance workflow, treat streamed agent output like any other production socket:
- •keep it open intentionally
- •time it out intentionally
- •log cancellations intentionally
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit