How to Fix 'memory not persisting in production' in CrewAI (Python)
When CrewAI memory “works locally” but not in production, the problem is usually not CrewAI itself. It means your agent state is being stored in a place that disappears between requests, process restarts, or container reschedules.
This usually shows up after you deploy behind Docker, Kubernetes, Gunicorn/Uvicorn workers, or serverless. You’ll see behavior like ShortTermMemory or EntityMemory resetting on every request, even though your code looks fine.
The Most Common Cause
The #1 cause is using in-process memory in a stateless production runtime.
If you create the Crew and its memory objects inside the request handler, each worker gets its own isolated Python process. That means memory=True may work during local dev, then appear broken once traffic is spread across multiple workers.
Wrong pattern vs right pattern
| Broken pattern | Fixed pattern |
|---|---|
| Memory created per request | Persistent storage configured once |
| Uses ephemeral defaults | Uses Redis/Postgres/SQLite persistence |
| New process = new memory | Same backing store across workers |
# ❌ WRONG: memory lives only inside the request process
from crewai import Agent, Task, Crew
from flask import Flask, request
app = Flask(__name__)
@app.post("/chat")
def chat():
agent = Agent(
role="Support Agent",
goal="Help the user",
backstory="You are a support assistant."
)
task = Task(
description=request.json["message"],
expected_output="Helpful answer"
)
crew = Crew(
agents=[agent],
tasks=[task],
memory=True, # looks fine, but often backed by process-local state
verbose=True
)
result = crew.kickoff()
return {"result": str(result)}
# ✅ RIGHT: use a shared persistent backend and reuse the same config
from crewai import Agent, Task, Crew
from crewai.memory import LongTermMemory
from flask import Flask, request
app = Flask(__name__)
long_term_memory = LongTermMemory(
storage={
"type": "sqlite",
"path": "/data/crewai_memory.db", # mount this volume in prod
}
)
agent = Agent(
role="Support Agent",
goal="Help the user",
backstory="You are a support assistant."
)
@app.post("/chat")
def chat():
task = Task(
description=request.json["message"],
expected_output="Helpful answer"
)
crew = Crew(
agents=[agent],
tasks=[task],
memory=long_term_memory,
verbose=True
)
result = crew.kickoff()
return {"result": str(result)}
If you’re running multiple replicas, SQLite only helps if the file is on persistent shared storage. In practice, Redis or Postgres is the better production choice.
Other Possible Causes
1) Your container filesystem is ephemeral
If you store memory in /tmp, inside the image layer, or anywhere not mounted as a volume, it disappears on restart.
# ❌ Broken: no persistent volume
services:
api:
image: my-crewai-app:latest
# ✅ Fixed: mount durable storage
services:
api:
image: my-crewai-app:latest
volumes:
- crewai-data:/data
volumes:
crewai-data:
2) You’re scaling horizontally without shared memory
Two Gunicorn workers or two Kubernetes pods do not share Python objects. One request hits worker A, the next hits worker B, and memory appears to “reset.”
# ❌ Broken if using process-local memory
gunicorn app:app --workers 4 --threads 2
# ✅ Fixed: use external persistence for all workers
gunicorn app:app --workers 4 --threads 2 \
--env CREWAI_MEMORY_BACKEND=redis \
--env REDIS_URL=redis://redis:6379/0
3) You’re recreating agents/crews with different session keys
CrewAI memory depends on consistent identifiers. If your user/session ID changes on every request, retrieval won’t find prior context.
# ❌ Broken: random session key every call
session_id = str(uuid.uuid4())
crew = Crew(agents=[agent], tasks=[task], memory=True)
# ✅ Fixed: stable user/session identifier from auth or cookie
session_id = request.headers["X-User-Id"]
crew = Crew(
agents=[agent],
tasks=[task],
memory={"session_id": session_id}
)
4) Your vector store or DB connection is failing silently
Sometimes “memory not persisting” is really “storage writes are failing.” Check for connection errors like:
- •
psycopg2.OperationalError - •
redis.exceptions.ConnectionError - •
sqlite3.OperationalError: unable to open database file
# Example config check for Redis-backed persistence
import os
REDIS_URL = os.getenv("REDIS_URL")
if not REDIS_URL:
raise RuntimeError("REDIS_URL is missing; CrewAI memory cannot persist")
How to Debug It
- •
Confirm whether you are using process-local storage
- •Search for
memory=True, temporary paths, or any in-memory defaults. - •If you don’t see Redis/Postgres/SQLite on durable storage, that’s your first suspect.
- •Search for
- •
Print the active session/user key
- •Make sure it stays stable across requests.
- •Log values like
session_id,user_id, or conversation ID before kickoff.
- •
Check whether writes actually happen
- •Turn on verbose logging:
crew = Crew(..., verbose=True) - •Look for storage-related errors during
kickoff().
- •Turn on verbose logging:
- •
Test persistence outside the web server
- •Run one script that writes memory.
- •Run a second script/process that reads it back.
- •If it fails across processes but works in one process, your backend is not shared.
Prevention
- •
Use an external persistence layer from day one:
- •Redis for fast session memory
- •Postgres for durable long-term records
- •Shared volumes only when you truly control single-node deployment
- •
Treat session identity as part of your API contract.
- •Never generate a new UUID per request unless that’s intentional.
- •Use auth subject IDs or conversation IDs that survive retries and restarts.
- •
Add a startup health check for memory dependencies.
- •Fail fast if Redis/Postgres is down.
- •Don’t let the app boot with fake “memory” that only exists in RAM.
If you’re seeing CrewAI memory reset in production, assume stateless infrastructure first. In most cases the fix is not inside your agent logic — it’s in where and how you persist state.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit