How to Fix 'streaming response cutoff when scaling' in LangGraph (TypeScript)
What the error means
streaming response cutoff when scaling usually shows up when a LangGraph app streams tokens or events correctly on one machine, then starts dropping or truncating responses once you add more traffic, more replicas, or a proxy/load balancer in front of it. In practice, it means the stream was interrupted before the full response reached the client.
In TypeScript apps, this is often not a LangGraph bug. It’s usually a deployment issue: the request handler closes too early, the stream isn’t drained correctly, or your infra is buffering/chopping chunked responses.
The Most Common Cause
The #1 cause is returning from your route before the stream is fully consumed, or using a server/runtime that buffers the response instead of keeping it open for SSE/chunked output.
With LangGraph’s CompiledStateGraph.stream() or streamEvents(), you must keep the HTTP connection alive until the stream ends. If you wrap it in a framework handler that auto-serializes JSON, you’ll get partial output and errors like:
- •
Error: streaming response cutoff when scaling - •
TypeError: Cannot read properties of undefined - •
AbortError: The operation was aborted
Broken vs fixed pattern
| Broken | Fixed |
|---|---|
| Returns JSON immediately after starting stream | Pipes chunks to ReadableStream / SSE until completion |
| Lets framework buffer the response | Uses streaming response headers and flushes chunks |
| Closes handler before LangGraph finishes | Awaits iterator completion |
// BROKEN: returns before the LangGraph stream is fully consumed
import { NextRequest } from "next/server";
import { graph } from "./graph";
export async function POST(req: NextRequest) {
const body = await req.json();
const stream = await graph.stream(
{ messages: body.messages },
{ configurable: { thread_id: body.threadId } }
);
// Wrong: this does not actually forward streamed chunks to the client
return Response.json({ ok: true, stream });
}
// FIXED: keep the connection open and forward chunks as they arrive
import { NextRequest } from "next/server";
import { graph } from "./graph";
export async function POST(req: NextRequest) {
const body = await req.json();
const encoder = new TextEncoder();
const stream = await graph.stream(
{ messages: body.messages },
{ configurable: { thread_id: body.threadId } }
);
const readable = new ReadableStream({
async start(controller) {
try {
for await (const chunk of stream) {
controller.enqueue(
encoder.encode(`data: ${JSON.stringify(chunk)}\n\n`)
);
}
controller.close();
} catch (err) {
controller.error(err);
}
},
});
return new Response(readable, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache, no-transform",
Connection: "keep-alive",
},
});
}
If you’re using streamEvents(), same rule applies. Don’t convert it to JSON too early.
Other Possible Causes
1. Load balancer or reverse proxy buffering
Nginx, Cloudflare, ALB, and some API gateways buffer responses by default. That breaks token streaming under load.
location /api/chat {
proxy_buffering off;
proxy_cache off;
chunked_transfer_encoding on;
}
If you’re behind Nginx and forget this, your app may work locally but cut off in staging or prod.
2. Serverless timeout or cold start limits
If your Lambda/edge function times out mid-stream, LangGraph will look like it failed randomly during scaling.
export const maxDuration = 60; // Next.js route handler example
Also check platform limits:
- •Vercel function duration
- •AWS Lambda timeout
- •API Gateway idle timeout
- •Cloud Run request timeout
If your graph can run longer than the platform timeout, streaming will be cut off even if the code is correct.
3. Missing heartbeat/keep-alive behavior
Some proxies kill idle connections if no data is sent for a few seconds. If your graph has long tool calls between tokens, the stream can die mid-flight.
const readable = new ReadableStream({
async start(controller) {
const ping = setInterval(() => {
controller.enqueue(new TextEncoder().encode(": ping\n\n"));
}, 15000);
try {
for await (const chunk of await graph.stream(input)) {
controller.enqueue(encoder.encode(`data: ${JSON.stringify(chunk)}\n\n`));
}
} finally {
clearInterval(ping);
controller.close();
}
},
});
That : ping comment keeps SSE connections alive through idle periods.
4. State explosion during scaling
If each request carries huge state objects through LangGraph checkpoints, replicas may stall or fail under memory pressure. Then you see truncated streams instead of clean errors.
Bad pattern:
const input = {
messages,
hugeDocumentBlob,
entireConversationHistory,
};
Better pattern:
const input = {
messages,
};
Store large artifacts outside graph state. Pass references, IDs, or retrieval keys instead of raw blobs.
How to Debug It
- •
Reproduce locally with production-like streaming
- •Hit the endpoint with
curl -N. - •If output stops early locally, it’s your code.
- •If local works but prod fails, it’s infra buffering or timeout.
- •Hit the endpoint with
- •
Log when the iterator starts and ends
- •Add logs before
for await, on each chunk, and after completion. - •If you never hit “completed”, something closed the connection early.
- •Add logs before
console.log("stream started");
for await (const chunk of stream) {
console.log("chunk", chunk);
}
console.log("stream completed");
- •
Check proxy and platform limits
- •Look at Nginx
proxy_buffering - •Check ALB idle timeout
- •Check serverless duration limits
- •Check whether your host supports SSE properly
- •Look at Nginx
- •
Reduce state and remove tool latency
- •Remove large payloads from state.
- •Temporarily stub slow tools.
- •If cutoff disappears, you’re hitting timeout/pressure during scale-up.
Prevention
- •Use real streaming responses end-to-end:
- •
ReadableStream, SSE headers, no premature JSON serialization.
- •
- •Keep LangGraph state small:
- •store references instead of blobs.
- •Test under load before shipping:
- •run multiple concurrent requests with
curl, k6, or Artillery.
- •run multiple concurrent requests with
- •Verify infra settings:
- •disable proxy buffering and raise timeouts where needed.
If you want one rule to remember: LangGraph can only stream as long as your runtime keeps the socket open. Most “cutoff when scaling” issues are just that socket getting closed by code or infrastructure before the graph finishes.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit