How to Fix 'connection timeout' in CrewAI (Python)
A connection timeout in CrewAI usually means your Python process tried to reach an external service and never got a response before the timeout window closed. In practice, this shows up when CrewAI is calling an LLM provider, a tool endpoint, or an internal service that is slow, blocked, or misconfigured.
You’ll typically see it when an agent starts reasoning, when a tool makes an HTTP request, or right after initializing a Crew with an LLM that can’t be reached.
The Most Common Cause
The #1 cause is a bad model endpoint or missing provider configuration. With CrewAI, this often happens when LLM is pointed at the wrong base URL, the provider is down, or your network can’t reach the API host.
Here’s the broken pattern:
from crewai import Agent, Task, Crew
from crewai.llm import LLM
llm = LLM(
model="gpt-4o",
api_base="https://api.openai.com/v1", # wrong field for many setups
api_key="sk-your-key"
)
agent = Agent(
role="Support Analyst",
goal="Answer customer questions",
backstory="You handle bank support tickets.",
llm=llm,
)
task = Task(
description="Summarize this ticket.",
expected_output="A short summary.",
agent=agent,
)
crew = Crew(agents=[agent], tasks=[task])
print(crew.kickoff())
And here’s the fixed pattern:
import os
from crewai import Agent, Task, Crew
from crewai.llm import LLM
llm = LLM(
model="gpt-4o",
api_key=os.environ["OPENAI_API_KEY"],
)
agent = Agent(
role="Support Analyst",
goal="Answer customer questions",
backstory="You handle bank support tickets.",
llm=llm,
)
task = Task(
description="Summarize this ticket.",
expected_output="A short summary.",
agent=agent,
)
crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff()
print(result)
If the provider is reachable but slow, you may also see stack traces like:
- •
requests.exceptions.ConnectTimeout - •
httpx.ConnectTimeout - •
ConnectionError: HTTPSConnectionPool(...): Max retries exceeded - •
litellm.exceptions.TimeoutError
The fix is not always “increase timeout.” First make sure you are hitting the correct provider config and that the model name is valid for that account.
Other Possible Causes
1) Your network blocks outbound traffic
This is common in corporate environments, VPNs, and locked-down containers. If Python cannot reach the API host, CrewAI will fail during the first model call.
curl -I https://api.openai.com/v1/models
If that hangs or fails from the same machine, it’s not a CrewAI bug.
2) Proxy settings are missing
If your environment requires a proxy and Python doesn’t know about it, requests will stall until timeout.
export HTTPS_PROXY=http://proxy.company.local:8080
export HTTP_PROXY=http://proxy.company.local:8080
For Windows PowerShell:
$env:HTTPS_PROXY="http://proxy.company.local:8080"
$env:HTTP_PROXY="http://proxy.company.local:8080"
3) Tool calls are timing out
CrewAI agents often use tools that call internal APIs, databases, or web services. A slow tool can look like an LLM timeout because the agent waits on the tool result.
import requests
def fetch_policy(customer_id: str):
r = requests.get(
f"https://internal-api.example.com/policies/{customer_id}",
timeout=5,
)
return r.json()
If your service needs more time than that, either raise the timeout or make the tool asynchronous and cache results.
4) Rate limiting or overloaded provider
Some providers respond slowly under load. You may get timeouts instead of clean 429 errors if retries stack up.
from litellm import completion
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
max_retries=2,
)
If retry behavior is too aggressive, requests can pile up and look like connection issues.
How to Debug It
- •
Reproduce outside CrewAI
- •Call the same provider directly with a minimal script.
- •If direct SDK calls fail too, your issue is provider/network related.
- •
Print the exact exception
- •Don’t stop at
Crew kickoff failed. - •Capture the full traceback and look for:
- •
ConnectTimeout - •
ReadTimeout - •
Max retries exceeded - •
401,403, or429before the timeout
- •
- •Don’t stop at
- •
Test connectivity from the runtime
- •Run curl from inside Docker, CI, or your VM.
- •If local works but container fails, check DNS, proxy env vars, and firewall rules.
- •
Isolate tools from LLM calls
- •Temporarily remove tools from your agent.
- •If timeouts disappear, one of your tools is blocking or too slow.
Example isolation test:
agent = Agent(
role="Support Analyst",
goal="Answer customer questions",
backstory="You handle bank support tickets.",
)
If this works without a custom LLM/tool setup but fails once you add them back, you’ve narrowed it down fast.
Prevention
- •Set explicit timeouts on every outbound HTTP call in tools.
- •Validate model names and provider credentials at startup before running crews.
- •Add a health check script for API reachability in CI and deployment environments.
- •Log request duration per tool and per model call so slow failures are obvious.
- •Keep proxy settings and environment variables in one place across dev/staging/prod.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit