LlamaIndex Tutorial (Python): deploying with Docker for intermediate developers
This tutorial shows you how to package a LlamaIndex Python app into a Docker image, run it locally, and keep the setup clean enough for deployment. You need this when your index-backed app works on your laptop but you want the same runtime in staging, CI, or a container platform.
What You'll Need
- •Python 3.10 or newer
- •Docker Desktop or Docker Engine
- •A working OpenAI API key
- •
pipandvenv - •Basic familiarity with LlamaIndex concepts like loaders, vector stores, and query engines
- •These Python packages:
- •
llama-index - •
llama-index-llms-openai - •
llama-index-embeddings-openai
- •
Step-by-Step
- •Start by creating a small project with one script that loads documents, builds an index, and answers a query. Keep it simple: one entrypoint is easier to containerize and debug than a multi-module app on day one.
# app.py
import os
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
def main() -> None:
os.environ["OPENAI_API_KEY"] = os.environ["OPENAI_API_KEY"]
llm = OpenAI(model="gpt-4o-mini")
embed_model = OpenAIEmbedding(model="text-embedding-3-small")
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(
documents,
llm=llm,
embed_model=embed_model,
)
query_engine = index.as_query_engine()
response = query_engine.query("What is this document collection about?")
print(response)
if __name__ == "__main__":
main()
- •Add a small document set so the app has something to index. In production you’ll usually mount data from object storage or fetch it from a database, but local files are enough to validate the container pathing and runtime behavior.
mkdir -p data
cat > data/sample.txt <<'EOF'
LlamaIndex helps build retrieval-based applications over private data.
This sample document exists to verify that the index can be built inside Docker.
EOF
- •Create a virtual environment locally first so you can lock down dependencies before moving into Docker. This avoids guessing what belongs in the image and makes it easier to reproduce failures outside the container.
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai
python app.py
- •Add a
requirements.txtand a.dockerignore, then write the Dockerfile. The important part is keeping the image lean and making sure your code runs the same way inside the container as it does on your machine.
# requirements.txt
llama-index
llama-index-llms-openai
llama-index-embeddings-openai
# .dockerignore
.venv
__pycache__
*.pyc
.git
.gitignore
FROM python:3.11-slim
WORKDIR /app
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py ./app.py
COPY data ./data
CMD ["python", "app.py"]
- •Build and run the container with your API key passed in at runtime. Do not bake secrets into the image; that creates unnecessary risk and makes promotion across environments harder.
docker build -t llamaindex-docker-demo .
docker run --rm \
-e OPENAI_API_KEY="$OPENAI_API_KEY" \
llamaindex-docker-demo
- •If you want a cleaner deployment pattern, move configuration into environment variables and keep the code free of hardcoded model names where possible. That gives you one image that can run against different models or environments without rebuilding.
# app.py
import os
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
def main() -> None:
llm_model = os.getenv("LLM_MODEL", "gpt-4o-mini")
embed_model_name = os.getenv("EMBED_MODEL", "text-embedding-3-small")
llm = OpenAI(model=llm_model)
embed_model = OpenAIEmbedding(model=embed_model_name)
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents, llm=llm, embed_model=embed_model)
query_engine = index.as_query_engine()
print(query_engine.query("Summarize the documents in one sentence."))
if __name__ == "__main__":
main()
Testing It
Run the container once with your real API key and confirm it prints an answer instead of stack traces. If it fails, check three things first: whether OPENAI_API_KEY is present, whether /app/data exists in the image, and whether your installed package versions match what you tested locally.
Then make one change at a time: edit data/sample.txt, rebuild the image, and rerun it to confirm Docker is actually packaging your updated files. If you want stronger validation, add a second document and verify that retrieval changes when you ask more specific questions.
A good production check is to run the same image in CI with a dummy smoke test command that imports your app and builds an index over fixture data. That catches broken dependency pins before deployment day.
Next Steps
- •Add FastAPI around the query engine so the container exposes an HTTP endpoint instead of printing to stdout.
- •Swap local files for S3, Postgres, or another real source of truth using LlamaIndex readers.
- •Introduce persistent vector storage with Chroma, Qdrant, or pgvector so indexing survives container restarts.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit