How to Fix 'streaming response cutoff when scaling' in LangGraph (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
streaming-response-cutoff-when-scalinglanggraphtypescript

What the error means

streaming response cutoff when scaling usually shows up when a LangGraph app streams tokens or events correctly on one machine, then starts dropping or truncating responses once you add more traffic, more replicas, or a proxy/load balancer in front of it. In practice, it means the stream was interrupted before the full response reached the client.

In TypeScript apps, this is often not a LangGraph bug. It’s usually a deployment issue: the request handler closes too early, the stream isn’t drained correctly, or your infra is buffering/chopping chunked responses.

The Most Common Cause

The #1 cause is returning from your route before the stream is fully consumed, or using a server/runtime that buffers the response instead of keeping it open for SSE/chunked output.

With LangGraph’s CompiledStateGraph.stream() or streamEvents(), you must keep the HTTP connection alive until the stream ends. If you wrap it in a framework handler that auto-serializes JSON, you’ll get partial output and errors like:

  • Error: streaming response cutoff when scaling
  • TypeError: Cannot read properties of undefined
  • AbortError: The operation was aborted

Broken vs fixed pattern

BrokenFixed
Returns JSON immediately after starting streamPipes chunks to ReadableStream / SSE until completion
Lets framework buffer the responseUses streaming response headers and flushes chunks
Closes handler before LangGraph finishesAwaits iterator completion
// BROKEN: returns before the LangGraph stream is fully consumed
import { NextRequest } from "next/server";
import { graph } from "./graph";

export async function POST(req: NextRequest) {
  const body = await req.json();

  const stream = await graph.stream(
    { messages: body.messages },
    { configurable: { thread_id: body.threadId } }
  );

  // Wrong: this does not actually forward streamed chunks to the client
  return Response.json({ ok: true, stream });
}
// FIXED: keep the connection open and forward chunks as they arrive
import { NextRequest } from "next/server";
import { graph } from "./graph";

export async function POST(req: NextRequest) {
  const body = await req.json();

  const encoder = new TextEncoder();
  const stream = await graph.stream(
    { messages: body.messages },
    { configurable: { thread_id: body.threadId } }
  );

  const readable = new ReadableStream({
    async start(controller) {
      try {
        for await (const chunk of stream) {
          controller.enqueue(
            encoder.encode(`data: ${JSON.stringify(chunk)}\n\n`)
          );
        }
        controller.close();
      } catch (err) {
        controller.error(err);
      }
    },
  });

  return new Response(readable, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache, no-transform",
      Connection: "keep-alive",
    },
  });
}

If you’re using streamEvents(), same rule applies. Don’t convert it to JSON too early.

Other Possible Causes

1. Load balancer or reverse proxy buffering

Nginx, Cloudflare, ALB, and some API gateways buffer responses by default. That breaks token streaming under load.

location /api/chat {
  proxy_buffering off;
  proxy_cache off;
  chunked_transfer_encoding on;
}

If you’re behind Nginx and forget this, your app may work locally but cut off in staging or prod.

2. Serverless timeout or cold start limits

If your Lambda/edge function times out mid-stream, LangGraph will look like it failed randomly during scaling.

export const maxDuration = 60; // Next.js route handler example

Also check platform limits:

  • Vercel function duration
  • AWS Lambda timeout
  • API Gateway idle timeout
  • Cloud Run request timeout

If your graph can run longer than the platform timeout, streaming will be cut off even if the code is correct.

3. Missing heartbeat/keep-alive behavior

Some proxies kill idle connections if no data is sent for a few seconds. If your graph has long tool calls between tokens, the stream can die mid-flight.

const readable = new ReadableStream({
  async start(controller) {
    const ping = setInterval(() => {
      controller.enqueue(new TextEncoder().encode(": ping\n\n"));
    }, 15000);

    try {
      for await (const chunk of await graph.stream(input)) {
        controller.enqueue(encoder.encode(`data: ${JSON.stringify(chunk)}\n\n`));
      }
    } finally {
      clearInterval(ping);
      controller.close();
    }
  },
});

That : ping comment keeps SSE connections alive through idle periods.

4. State explosion during scaling

If each request carries huge state objects through LangGraph checkpoints, replicas may stall or fail under memory pressure. Then you see truncated streams instead of clean errors.

Bad pattern:

const input = {
  messages,
  hugeDocumentBlob,
  entireConversationHistory,
};

Better pattern:

const input = {
  messages,
};

Store large artifacts outside graph state. Pass references, IDs, or retrieval keys instead of raw blobs.

How to Debug It

  1. Reproduce locally with production-like streaming

    • Hit the endpoint with curl -N.
    • If output stops early locally, it’s your code.
    • If local works but prod fails, it’s infra buffering or timeout.
  2. Log when the iterator starts and ends

    • Add logs before for await, on each chunk, and after completion.
    • If you never hit “completed”, something closed the connection early.
console.log("stream started");
for await (const chunk of stream) {
  console.log("chunk", chunk);
}
console.log("stream completed");
  1. Check proxy and platform limits

    • Look at Nginx proxy_buffering
    • Check ALB idle timeout
    • Check serverless duration limits
    • Check whether your host supports SSE properly
  2. Reduce state and remove tool latency

    • Remove large payloads from state.
    • Temporarily stub slow tools.
    • If cutoff disappears, you’re hitting timeout/pressure during scale-up.

Prevention

  • Use real streaming responses end-to-end:
    • ReadableStream, SSE headers, no premature JSON serialization.
  • Keep LangGraph state small:
    • store references instead of blobs.
  • Test under load before shipping:
    • run multiple concurrent requests with curl, k6, or Artillery.
  • Verify infra settings:
    • disable proxy buffering and raise timeouts where needed.

If you want one rule to remember: LangGraph can only stream as long as your runtime keeps the socket open. Most “cutoff when scaling” issues are just that socket getting closed by code or infrastructure before the graph finishes.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides