How to Fix 'memory not persisting when scaling' in AutoGen (Python)

By Cyprian AaronsUpdated 2026-04-21
memory-not-persisting-when-scalingautogenpython

What this error actually means

When AutoGen memory stops persisting after you scale out, the issue is usually not “memory” itself. It’s a state-sharing problem: your agents are writing to one process, one container, or one in-memory store, then your next request lands somewhere else.

You’ll see it when you move from a single local worker to multiple Uvicorn workers, Kubernetes replicas, Celery tasks, or any setup where the same AssistantAgent/UserProxyAgent instance is not reused across requests.

The Most Common Cause

The #1 cause is using in-process memory for something that needs to survive across requests or pods.

In AutoGen Python, people often keep conversation state in a Python object or a local dict and assume it will persist. It won’t once you scale horizontally.

Broken pattern vs fixed pattern

BrokenFixed
Stores conversation state in RAMStores conversation state in shared external storage
Recreates agent every requestRehydrates agent from persisted state
Works on localhostBreaks behind load balancer / multiple workers
# BROKEN: state lives only inside one Python process
from autogen import AssistantAgent, UserProxyAgent

session_memory = {}

def handle_request(session_id: str, message: str):
    if session_id not in session_memory:
        assistant = AssistantAgent(
            name="assistant",
            llm_config={"config_list": [{"model": "gpt-4o-mini"}]},
        )
        user_proxy = UserProxyAgent(name="user_proxy")

        session_memory[session_id] = {
            "assistant": assistant,
            "user_proxy": user_proxy,
            "messages": [],
        }

    state = session_memory[session_id]
    state["messages"].append({"role": "user", "content": message})

    # This may work on one worker, then fail when the next request hits another worker.
    result = state["user_proxy"].initiate_chat(
        state["assistant"],
        message=message,
    )
    return result
# FIXED: persist chat history outside the process
from autogen import AssistantAgent, UserProxyAgent

def load_messages(session_id: str) -> list[dict]:
    # Replace with Redis / Postgres / DynamoDB
    return redis_client.get_json(f"chat:{session_id}") or []

def save_messages(session_id: str, messages: list[dict]) -> None:
    redis_client.set_json(f"chat:{session_id}", messages)

def handle_request(session_id: str, message: str):
    messages = load_messages(session_id)

    assistant = AssistantAgent(
        name="assistant",
        llm_config={"config_list": [{"model": "gpt-4o-mini"}]},
    )
    user_proxy = UserProxyAgent(name="user_proxy")

    messages.append({"role": "user", "content": message})
    save_messages(session_id, messages)

    result = user_proxy.initiate_chat(
        assistant,
        message=message,
    )

    # Persist the new transcript after the turn
    messages.append({"role": "assistant", "content": str(result)})
    save_messages(session_id, messages)
    return result

If you’re using AutoGen’s newer memory abstractions, the same rule applies: don’t back them with local process memory if you need persistence across replicas. Use a shared store and reload on each request.

Other Possible Causes

1) Multiple workers without sticky sessions

If you run uvicorn --workers 4 or scale pods behind a load balancer, request A and request B may hit different processes.

uvicorn app:api --workers 4

If your “memory” is just a module-level variable, each worker gets its own copy.

2) Recreating agents every request without restoring history

This looks clean in code review and still breaks.

# Bad: new agent every request with no restored context
assistant = AssistantAgent(name="assistant", llm_config=llm_config)

Fix by loading prior turns before creating the next prompt:

history = load_messages(session_id)
prompt = build_prompt(history + [{"role": "user", "content": message}])

3) Using ephemeral Docker/Kubernetes storage

A volume mounted at /tmp or container filesystem disappears when the pod restarts.

volumeMounts:
  - name: scratch
    mountPath: /tmp/autogen-memory

Use Redis, Postgres, S3, or a persistent volume claim if you truly need file-backed storage.

4) Confusing chat history with durable memory

AutoGen chat transcripts are not the same as long-term memory. If you only keep messages, but your app expects retrieved facts later, you’ll think persistence is broken when it’s actually missing retrieval logic.

# You stored transcript...
messages.append({"role": "user", "content": "My policy number is 123"})

# ...but never indexed it for retrieval later.

For durable memory use case-specific storage:

  • vector DB for semantic recall
  • relational DB for structured facts
  • Redis for short-lived session state

How to Debug It

  1. Check whether the bug appears only after scaling

    • Run one worker locally.
    • Then run two workers or two pods.
    • If it breaks only when scaled out, this is almost always shared-state loss.
  2. Log process identity and session routing

    import os
    print(f"pid={os.getpid()} session_id={session_id}")
    

    If different requests for the same session hit different PIDs or pods, your memory is not shared.

  3. Inspect where history is stored

    • If it’s a global dict, class attribute, or local variable: broken.
    • If it’s Redis/Postgres/DynamoDB/vector store: likely fine.
    • If it’s inside AssistantAgent only: verify how you rehydrate it on restart.
  4. Search for reset points Look for code like:

    assistant = AssistantAgent(...)
    user_proxy = UserProxyAgent(...)
    

    inside your request handler. If those objects are recreated every call and no history is loaded first, persistence will fail.

Prevention

  • Use an external store for anything that must survive process restarts or horizontal scaling.
  • Treat AutoGen agents as stateless workers; persist transcripts and retrieved facts separately.
  • Add an integration test that runs two requests against two workers and verifies the second turn sees prior context.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides