LlamaIndex Tutorial (Python): deploying with Docker for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexdeploying-with-docker-for-intermediate-developerspython

This tutorial shows you how to package a LlamaIndex Python app into a Docker image, run it locally, and keep the setup clean enough for deployment. You need this when your index-backed app works on your laptop but you want the same runtime in staging, CI, or a container platform.

What You'll Need

  • Python 3.10 or newer
  • Docker Desktop or Docker Engine
  • A working OpenAI API key
  • pip and venv
  • Basic familiarity with LlamaIndex concepts like loaders, vector stores, and query engines
  • These Python packages:
    • llama-index
    • llama-index-llms-openai
    • llama-index-embeddings-openai

Step-by-Step

  1. Start by creating a small project with one script that loads documents, builds an index, and answers a query. Keep it simple: one entrypoint is easier to containerize and debug than a multi-module app on day one.
# app.py
import os

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

def main() -> None:
    os.environ["OPENAI_API_KEY"] = os.environ["OPENAI_API_KEY"]

    llm = OpenAI(model="gpt-4o-mini")
    embed_model = OpenAIEmbedding(model="text-embedding-3-small")

    documents = SimpleDirectoryReader("./data").load_data()
    index = VectorStoreIndex.from_documents(
        documents,
        llm=llm,
        embed_model=embed_model,
    )

    query_engine = index.as_query_engine()
    response = query_engine.query("What is this document collection about?")
    print(response)

if __name__ == "__main__":
    main()
  1. Add a small document set so the app has something to index. In production you’ll usually mount data from object storage or fetch it from a database, but local files are enough to validate the container pathing and runtime behavior.
mkdir -p data

cat > data/sample.txt <<'EOF'
LlamaIndex helps build retrieval-based applications over private data.
This sample document exists to verify that the index can be built inside Docker.
EOF
  1. Create a virtual environment locally first so you can lock down dependencies before moving into Docker. This avoids guessing what belongs in the image and makes it easier to reproduce failures outside the container.
python -m venv .venv
source .venv/bin/activate

pip install --upgrade pip
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai

python app.py
  1. Add a requirements.txt and a .dockerignore, then write the Dockerfile. The important part is keeping the image lean and making sure your code runs the same way inside the container as it does on your machine.
# requirements.txt
llama-index
llama-index-llms-openai
llama-index-embeddings-openai
# .dockerignore
.venv
__pycache__
*.pyc
.git
.gitignore
FROM python:3.11-slim

WORKDIR /app

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app.py ./app.py
COPY data ./data

CMD ["python", "app.py"]
  1. Build and run the container with your API key passed in at runtime. Do not bake secrets into the image; that creates unnecessary risk and makes promotion across environments harder.
docker build -t llamaindex-docker-demo .

docker run --rm \
  -e OPENAI_API_KEY="$OPENAI_API_KEY" \
  llamaindex-docker-demo
  1. If you want a cleaner deployment pattern, move configuration into environment variables and keep the code free of hardcoded model names where possible. That gives you one image that can run against different models or environments without rebuilding.
# app.py
import os

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

def main() -> None:
    llm_model = os.getenv("LLM_MODEL", "gpt-4o-mini")
    embed_model_name = os.getenv("EMBED_MODEL", "text-embedding-3-small")

    llm = OpenAI(model=llm_model)
    embed_model = OpenAIEmbedding(model=embed_model_name)

    documents = SimpleDirectoryReader("./data").load_data()
    index = VectorStoreIndex.from_documents(documents, llm=llm, embed_model=embed_model)

    query_engine = index.as_query_engine()
    print(query_engine.query("Summarize the documents in one sentence."))

if __name__ == "__main__":
    main()

Testing It

Run the container once with your real API key and confirm it prints an answer instead of stack traces. If it fails, check three things first: whether OPENAI_API_KEY is present, whether /app/data exists in the image, and whether your installed package versions match what you tested locally.

Then make one change at a time: edit data/sample.txt, rebuild the image, and rerun it to confirm Docker is actually packaging your updated files. If you want stronger validation, add a second document and verify that retrieval changes when you ask more specific questions.

A good production check is to run the same image in CI with a dummy smoke test command that imports your app and builds an index over fixture data. That catches broken dependency pins before deployment day.

Next Steps

  • Add FastAPI around the query engine so the container exposes an HTTP endpoint instead of printing to stdout.
  • Swap local files for S3, Postgres, or another real source of truth using LlamaIndex readers.
  • Introduce persistent vector storage with Chroma, Qdrant, or pgvector so indexing survives container restarts.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides