How to Fix 'state not updating when scaling' in LlamaIndex (Python)
When you see state not updating when scaling in LlamaIndex, it usually means your app is creating or mutating state in one process, but the query path is reading from another. In practice, this shows up when you move from a single local worker to multiple workers, threads, or replicas and your index state stops reflecting writes.
The usual failure mode is simple: the ingestion path updates an in-memory object, but the serving path loads a fresh VectorStoreIndex, StorageContext, or custom StatefulRetriever instance that never saw those updates.
The Most Common Cause
The #1 cause is keeping index state in memory instead of persisting it. This works locally with one process, then breaks as soon as you scale horizontally.
Here’s the broken pattern:
| Broken | Fixed |
|---|---|
| Build index once in memory, mutate it, and assume every worker sees the same state | Persist storage and reload from the same backing store in every worker |
# BROKEN
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)
# Later: add more docs
new_docs = SimpleDirectoryReader("./more_data").load_data()
index.insert_nodes(new_docs) # only updates this process
# In another worker/process:
query_engine = index.as_query_engine()
response = query_engine.query("What changed?")
print(response)
# FIXED
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index.core import SimpleDirectoryReader
persist_dir = "./storage"
# Ingest/update path
docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)
index.storage_context.persist(persist_dir=persist_dir)
new_docs = SimpleDirectoryReader("./more_data").load_data()
index.insert_nodes(new_docs)
index.storage_context.persist(persist_dir=persist_dir)
# Query path in any worker/process
storage_context = StorageContext.from_defaults(persist_dir=persist_dir)
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()
response = query_engine.query("What changed?")
print(response)
If you are using a vector DB, the same rule applies: write to the shared backend and reload from it. Do not treat a Python object as the source of truth once you have more than one process.
Other Possible Causes
1. You are mixing async and sync updates incorrectly
A common bug is calling async ingestion code without awaiting it, then querying immediately.
# BROKEN
async def update_index(index, docs):
await index.aupdate_ref_doc(docs[0])
update_index(index, docs) # coroutine never awaited
# FIXED
async def update_index(index, docs):
await index.aupdate_ref_doc(docs[0])
await update_index(index, docs)
If you see warnings like RuntimeWarning: coroutine was never awaited, this is likely part of the problem.
2. Your retriever is cached with stale state
If you build a retriever once at startup and keep reusing it after inserts, it may hold stale internal references depending on your setup.
# BROKEN
retriever = index.as_retriever()
index.insert_nodes(new_nodes)
# retriever may still reflect old state
nodes = retriever.retrieve("policy changes")
# FIXED
index.insert_nodes(new_nodes)
# rebuild after mutation
retriever = index.as_retriever()
nodes = retriever.retrieve("policy changes")
This matters more when using custom retrievers like VectorIndexRetriever or wrappers around QueryEngine.
3. You are writing to one vector store and reading from another
This happens when environment variables differ across services or pods. One service writes to Pinecone/Weaviate/Chroma namespace A, while another reads namespace B.
# BROKEN CONFIG
PINECONE_INDEX="customer-support"
PINECONE_NAMESPACE="dev" # writer uses dev
PINECONE_NAMESPACE="prod" # reader uses prod
Fix the namespace/index mismatch first. If your reader and writer do not point to the same backend location, no amount of Python-side debugging will help.
4. You are using multiprocessing with non-picklable state
If each worker forks its own copy of memory, updates disappear between processes. This shows up with Gunicorn workers, Celery tasks, or Python multiprocessing.
# BROKEN PATTERN
from multiprocessing import Process
def worker():
global index
index.insert_nodes(new_nodes)
p1 = Process(target=worker)
p2 = Process(target=worker)
p1.start()
p2.start()
Each process gets its own memory space. If you need shared state, use a persistent store and rehydrate per worker.
How to Debug It
- •
Check whether your update survives a process restart
- •Insert data.
- •Restart the app.
- •Query again.
- •If the data disappears after restart, you were relying on memory instead of persistence.
- •
Log the exact storage backend being used
- •Print
persist_dir, vector DB host/index/namespace, and any tenant/project IDs. - •Mismatched config across services is one of the fastest ways to get stale reads.
- •Print
- •
Verify that your mutation path actually runs
- •Add logs before and after calls like
insert_nodes,upsert,refresh_ref_docs, oraupdate_ref_doc. - •If you use async code, confirm the coroutine is awaited and completed before querying.
- •Add logs before and after calls like
- •
Rebuild readers after writes
- •Recreate
QueryEngine,Retriever, or any cached wrapper after updating the index. - •If rebuilding fixes it, your bug is stale in-process state.
- •Recreate
Prevention
- •Persist every index or document store that needs to survive beyond one request.
- •Treat LlamaIndex objects as ephemeral views over durable storage, not as shared application state.
- •In multi-worker deployments, initialize readers from shared storage inside each worker startup path.
- •Keep writer and reader configuration identical for region, namespace, tenant, collection name, and embedding model version.
If you want a clean mental model: writes go to durable storage; reads load fresh state from durable storage. Once you follow that rule, this class of LlamaIndex scaling bugs mostly disappears.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit