LlamaIndex Tutorial (Python): deploying with Docker for advanced developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexdeploying-with-docker-for-advanced-developerspython

This tutorial shows how to package a LlamaIndex Python app into Docker, run it locally, and make it ready for deployment in a real environment. You need this when you want your index-building and query logic to behave the same way on every machine, from your laptop to CI to a container host.

What You'll Need

  • Python 3.10 or newer
  • Docker Desktop or Docker Engine
  • An OpenAI API key set as OPENAI_API_KEY
  • Basic familiarity with LlamaIndex VectorStoreIndex, SimpleDirectoryReader, and query engines
  • A small local dataset in a data/ directory
  • These Python packages:
    • llama-index
    • llama-index-llms-openai
    • llama-index-embeddings-openai

Step-by-Step

  1. Create a minimal project layout and pin your dependencies.
    Keep the app small and explicit so container builds stay predictable.
.
├── app.py
├── requirements.txt
├── Dockerfile
└── data/
    └── policy.txt
# requirements.txt
llama-index==0.11.23
llama-index-llms-openai==0.2.9
llama-index-embeddings-openai==0.2.5
  1. Write the LlamaIndex app so it reads documents, builds an index, and answers one query from the command line.
    This version uses OpenAI for both embeddings and chat, which is the cleanest path for a production-style container demo.
# app.py
import os

from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI


def main() -> None:
    api_key = os.environ["OPENAI_API_KEY"]
    Settings.llm = OpenAI(model="gpt-4o-mini", api_key=api_key)
    Settings.embed_model = OpenAIEmbedding(
        model="text-embedding-3-small",
        api_key=api_key,
    )

    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    query_engine = index.as_query_engine()

    response = query_engine.query("What does this document say about claims?")
    print(response)


if __name__ == "__main__":
    main()
  1. Add a real document to index so you can test retrieval inside the container.
    Use something deterministic, not random text, so you can tell whether the pipeline is working.
# data/policy.txt
Claims are reviewed within 5 business days.
If supporting evidence is missing, the claim is placed on hold.
Approved claims are paid by bank transfer.
The policy requires all claims to include a reference number.
  1. Build the container image with a slim Python base and install only what you need.
    The key detail here is copying both code and data into the image while keeping runtime configuration external through environment variables.
FROM python:3.11-slim

WORKDIR /app

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app.py .
COPY data ./data

CMD ["python", "app.py"]
  1. Build and run the container with your API key injected at runtime.
    Do not bake secrets into the image; pass them as environment variables or through your orchestrator later.
docker build -t llamaindex-docker-demo .
docker run --rm \
  -e OPENAI_API_KEY="$OPENAI_API_KEY" \
  llamaindex-docker-demo
  1. If you want faster iteration during development, mount your source code instead of rebuilding on every edit.
    This pattern is useful when you're tuning prompts, swapping models, or changing retrieval logic frequently.
docker run --rm -it \
  -e OPENAI_API_KEY="$OPENAI_API_KEY" \
  -v "$PWD/app.py:/app/app.py" \
  -v "$PWD/data:/app/data" \
  llamaindex-docker-demo

Testing It

You should see a natural-language answer printed to stdout based on the contents of data/policy.txt. If the container fails immediately, check that OPENAI_API_KEY is set in your shell and that Docker has network access to OpenAI’s APIs.

If you get an import error, verify that all three packages are installed in requirements.txt with versions compatible with each other. If retrieval looks wrong, confirm that the document actually contains the answer and that SimpleDirectoryReader("data") is reading the right folder inside the container.

For a more production-like test, run the image in CI after building it from scratch with no cached layers. That catches dependency drift early and confirms your Dockerfile is reproducible.

Next Steps

  • Add a FastAPI layer so queries are served over HTTP instead of only via CLI.
  • Move from local files to S3 or SharePoint ingestion using LlamaIndex readers.
  • Add persistent storage for indexes using a vector database like Qdrant or Postgres with pgvector.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides