How to Fix 'streaming response cutoff' in LangChain (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
streaming-response-cutofflangchaintypescript

If you’re seeing streaming response cutoff in a LangChain TypeScript app, it usually means the stream ended before LangChain finished receiving or forwarding all tokens. In practice, this shows up when you wire streaming incorrectly, close the process too early, or mix streaming and non-streaming APIs in the same chain.

The error often appears with ChatOpenAI, RunnableSequence, AgentExecutor, or any custom callback setup where token chunks are being handled asynchronously.

The Most Common Cause

The #1 cause is not awaiting the full stream consumption. In TypeScript, people often start streaming and then return from the request handler too early, especially in Next.js route handlers, Express handlers, or serverless functions.

Here’s the broken pattern:

BrokenFixed
Starts streaming but doesn’t fully consume itAwaits the stream until completion
// Broken: returns before the stream is fully consumed
import { ChatOpenAI } from "@langchain/openai";

const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  streaming: true,
});

export async function POST(req: Request) {
  const result = await model.stream("Write a short summary");

  // Bug: if you don't iterate the stream, tokens never finish flushing
  return new Response("Started streaming");
}
// Fixed: consume the stream completely
import { ChatOpenAI } from "@langchain/openai";

const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  streaming: true,
});

export async function POST(req: Request) {
  const stream = await model.stream("Write a short summary");

  let output = "";
  for await (const chunk of stream) {
    output += chunk.content ?? "";
  }

  return new Response(output);
}

If you’re using callbacks instead of direct iteration, the same rule applies: don’t let the request lifecycle end before handleLLMNewToken finishes running.

A very common stack trace looks like this:

Error: streaming response cutoff
    at CallbackManagerForLLMRun.handleLLMNewToken (...)
    at ChatOpenAI._streamResponseChunks (...)

That usually means LangChain was still emitting chunks when your code already exited.

Other Possible Causes

1. Mixing streaming and non-streaming config

If streaming: true is set on the model but your downstream code expects a single final message, you can trigger partial output handling issues.

const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  streaming: true,
});

const response = await model.invoke("Hello");
// Wrong assumption: invoke() gives you streamed chunks

Fix it by matching API style to your intent:

const response = await model.invoke("Hello"); // non-streaming usage

Or:

const stream = await model.stream("Hello"); // streaming usage
for await (const chunk of stream) {
  process.stdout.write(chunk.content ?? "");
}

2. Aborting the request too aggressively

In serverless or HTTP environments, an AbortController can kill the request before LangChain flushes all tokens.

const controller = new AbortController();

setTimeout(() => controller.abort(), 1000);

const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  streaming: true,
  signal: controller.signal,
});

If your timeout is too short, increase it or remove it for long generations.

3. Callback handler not awaiting async work

If your custom callback does file writes, DB inserts, or websocket sends inside handleLLMNewToken, those operations must be awaited properly.

class MyHandler {
  async handleLLMNewToken(token: string) {
    // Bad if caller doesn't wait for this to finish
    await saveTokenToDb(token);
  }
}

Use a proper callback manager setup and make sure your transport layer stays open until all async side effects complete.

4. Runtime closes before flush in Next.js / serverless

This one hits hard in route handlers that return immediately after starting a stream.

export async function POST() {
  const stream = await model.stream("Generate text");
  // Returning here cuts off token delivery
  return new Response("ok");
}

Use a real streamed response body instead:

export async function POST() {
  const stream = await model.stream("Generate text");

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        controller.enqueue(encoder.encode(chunk.content ?? ""));
      }
      controller.close();
    },
  });

  return new Response(readable);
}

How to Debug It

  1. Check whether you are actually consuming the stream

    • Search for stream() calls without for await.
    • If you call model.stream(...), make sure every chunk is read to completion.
  2. Compare invoke() vs stream() usage

    • If you want one final answer, use invoke().
    • If you want partial tokens, use stream() and keep the connection open.
  3. Inspect timeouts and abort signals

    • Look for AbortController, reverse proxy timeouts, Lambda time limits, or framework defaults.
    • A cutoff that happens at a consistent timestamp is usually infrastructure-related.
  4. Log callback lifecycle events

    • Add logs in:
      • handleLLMStart
      • handleLLMNewToken
      • handleLLMEnd
      • handleLLMError
    • If you see tokens stop mid-generation without handleLLMEnd, your transport is closing early.

Prevention

  • Use one pattern per endpoint:

    • non-streaming with invoke()
    • streaming with stream() plus full consumption
  • Keep route handlers alive until output is flushed.

    • In Next.js and serverless runtimes, return a real streamed body instead of a placeholder response.
  • Set explicit timeouts and test long outputs.

    • Short prompts hide this bug.
    • Long completions expose it immediately.

If you’re still stuck, look at where execution stops relative to handleLLMNewToken. In almost every case I’ve seen, the fix is not in LangChain itself — it’s in how the app consumes or terminates the stream.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides