How to Fix 'intermittent 500 errors when scaling' in CrewAI (Python)

By Cyprian AaronsUpdated 2026-04-21
intermittent-500-errors-when-scalingcrewaipython

Intermittent 500 errors when scaling in CrewAI usually mean your agents are fine in a single-process test, but something breaks under concurrent load. In practice, this shows up when you run more tasks, more workers, or multiple requests at the same time and the underlying LLM client, tool code, or shared state can’t keep up.

The key detail: this is rarely a “CrewAI is broken” problem. It’s usually a concurrency, rate-limit, or shared-resource issue that only becomes visible once traffic increases.

The Most Common Cause

The #1 cause is shared mutable state across tasks or agents.

If you reuse the same tool instance, client object, file handle, database session, or global variable across multiple CrewAI runs, one request can corrupt another. Under light load it looks fine. Under scaling, you start seeing failures like:

  • InternalServerError: 500 Server Error
  • openai.InternalServerError
  • httpx.ReadTimeout
  • litellm.exceptions.InternalServerError

Broken pattern vs fixed pattern

BrokenFixed
Shared singleton client/toolCreate per-run or per-task instances
Global mutable statePass immutable config only
Reused DB session / file handleOpen resources inside task scope
# BROKEN: shared state across concurrent tasks
from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")  # shared client
shared_context = {"attempts": 0}        # shared mutable state

def risky_tool():
    shared_context["attempts"] += 1
    return f"attempt={shared_context['attempts']}"

agent = Agent(
    role="Support Analyst",
    goal="Resolve incidents",
    backstory="Handles customer issues",
    llm=llm,
    tools=[risky_tool],
)

tasks = [
    Task(description="Investigate ticket A", agent=agent),
    Task(description="Investigate ticket B", agent=agent),
]

crew = Crew(agents=[agent], tasks=tasks)
result = crew.kickoff()
# FIXED: isolate state per task/run
from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI

def build_agent():
    return Agent(
        role="Support Analyst",
        goal="Resolve incidents",
        backstory="Handles customer issues",
        llm=ChatOpenAI(model="gpt-4o-mini"),
        tools=[make_tool()],
    )

def make_tool():
    attempts = {"count": 0}

    def tool():
        attempts["count"] += 1
        return f"attempt={attempts['count']}"

    return tool

agents = [build_agent() for _ in range(2)]
tasks = [
    Task(description="Investigate ticket A", agent=agents[0]),
    Task(description="Investigate ticket B", agent=agents[1]),
]

crew = Crew(agents=agents, tasks=tasks)
result = crew.kickoff()

If you’re running this behind an API server like FastAPI or Flask with multiple requests in flight, this is the first thing to fix.

Other Possible Causes

1) LLM provider rate limits under burst traffic

When you scale requests, your provider may start returning transient 429 responses that bubble up as 500 in your app.

# Add retries and backoff at the LLM layer
from tenacity import retry, wait_exponential, stop_after_attempt
from litellm import completion

@retry(wait=wait_exponential(min=1, max=8), stop=stop_after_attempt(3))
def call_llm(prompt: str):
    return completion(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
    )

2) Tool code is not thread-safe

A tool writing to the same file path, temp file, cache key, or database row can fail intermittently.

# BAD: every request writes to the same file
def export_report(data):
    with open("/tmp/report.json", "w") as f:
        f.write(data)
# GOOD: unique per-run path
import uuid
from pathlib import Path

def export_report(data):
    path = Path("/tmp") / f"report-{uuid.uuid4().hex}.json"
    with open(path, "w") as f:
        f.write(data)
    return str(path)

3) Context window overflow from growing conversation history

If you keep appending messages across runs, one request may succeed and the next may fail with provider-side errors.

# BAD: unbounded history growth
conversation_history.append(user_message)
agent.run(conversation_history)
# GOOD: trim history before each call
MAX_MESSAGES = 12
trimmed_history = conversation_history[-MAX_MESSAGES:]
agent.run(trimmed_history)

4) Async/sync mismatch in your app server

CrewAI code called from an async endpoint can block the event loop if you use sync tools or synchronous LLM calls.

# FastAPI example: offload blocking work
from fastapi import FastAPI
import asyncio

app = FastAPI()

@app.post("/run")
async def run_crew():
    result = await asyncio.to_thread(lambda: crew.kickoff())
    return {"result": str(result)}

How to Debug It

  1. Check whether failures correlate with concurrency

    • Run one request at a time.
    • Then run 5–10 parallel requests.
    • If errors appear only under load, suspect shared state or rate limits.
  2. Log the real exception chain

    • Don’t stop at 500.
    • Capture the full traceback and inspect for:
      • openai.InternalServerError
      • httpx.TimeoutException
      • litellm.exceptions.RateLimitError
      • pydantic.ValidationError
  3. Disable custom tools temporarily

    • Run the crew with only an LLM and no tools.
    • If the issue disappears, your tool code is the problem.
    • Add tools back one by one until it breaks again.
  4. Add request-scoped IDs

    • Log a unique ID for every kickoff.
    • Track which agent/tool call fails.
    • This makes it obvious whether two requests are sharing state.
import uuid
request_id = uuid.uuid4().hex[:8]
print(f"[{request_id}] starting crew kickoff")

Prevention

  • Build agents and tools per request instead of storing them as globals.
  • Put retries with exponential backoff around provider calls and external APIs.
  • Keep tool side effects isolated: unique temp files, scoped DB sessions, no shared caches unless they’re explicitly thread-safe.

If you’re seeing intermittent 500 errors only when scaling CrewAI apps in Python, assume concurrency first. In most cases the fix is not “more retries everywhere” — it’s removing shared mutable state and making each run independent.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides