LlamaIndex Tutorial (Python): deploying to AWS Lambda for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexdeploying-to-aws-lambda-for-intermediate-developerspython

This tutorial shows how to package a LlamaIndex-based Python app and run it on AWS Lambda behind API Gateway. You need this when you want a serverless retrieval endpoint for chat, search, or document Q&A without managing servers.

What You'll Need

  • Python 3.11 locally
  • AWS account with permission to create:
    • Lambda functions
    • IAM roles
    • API Gateway HTTP APIs
    • CloudWatch logs
  • An OpenAI API key exported as OPENAI_API_KEY
  • pip and venv
  • AWS CLI configured locally
  • These Python packages:
    • llama-index
    • openai
    • boto3
  • A simple document file to index, for example docs/employee-handbook.txt

Step-by-Step

  1. Create a small project with a persistent index artifact.
    On Lambda, you do not want to build the index from scratch on every request. Build it once during deployment and load it from the Lambda bundle or /tmp at runtime.
mkdir llamaindex-lambda && cd llamaindex-lambda
python3.11 -m venv .venv
source .venv/bin/activate
pip install llama-index openai boto3
mkdir data
cat > data/employee-handbook.txt <<'EOF'
Employees must submit expense reports within 30 days.
Remote work requires manager approval.
Security incidents must be reported immediately.
EOF
  1. Build and persist the index locally.
    This creates a storage/ directory that Lambda can ship with the deployment package. The code below uses real LlamaIndex imports and writes the index to disk.
from pathlib import Path

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext

data_dir = Path("data")
storage_dir = Path("storage")

documents = SimpleDirectoryReader(input_dir=str(data_dir)).load_data()
index = VectorStoreIndex.from_documents(documents)

storage_context = StorageContext.from_defaults()
index.storage_context.persist(persist_dir=str(storage_dir))

print(f"Indexed {len(documents)} documents into {storage_dir}")
  1. Add the Lambda handler that loads the persisted index and answers queries.
    Keep initialization at module scope so warm invocations reuse the loaded index. Also set the OpenAI API key in code only if you are using environment variables in Lambda.
import os
import json

from llama_index.core import StorageContext, load_index_from_storage

PERSIST_DIR = "storage"

if "OPENAI_API_KEY" not in os.environ:
    raise RuntimeError("OPENAI_API_KEY is required")

storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()

def lambda_handler(event, context):
    body = json.loads(event.get("body") or "{}")
    question = body.get("question", "What are the security incident rules?")
    response = query_engine.query(question)

    return {
        "statusCode": 200,
        "headers": {"Content-Type": "application/json"},
        "body": json.dumps({"answer": str(response)}),
    }
  1. Package the function for AWS Lambda.
    Lambda expects your handler file at the root of the deployment ZIP unless you use a container image. This example keeps it simple with a ZIP deployment.
cat > lambda_function.py <<'EOF'
import os
import json

from llama_index.core import StorageContext, load_index_from_storage

PERSIST_DIR = "storage"

if "OPENAI_API_KEY" not in os.environ:
    raise RuntimeError("OPENAI_API_KEY is required")

storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()

def lambda_handler(event, context):
    body = json.loads(event.get("body") or "{}")
    question = body.get("question", "What are the security incident rules?")
    response = query_engine.query(question)

    return {
        "statusCode": 200,
        "headers": {"Content-Type": "application/json"},
        "body": json.dumps({"answer": str(response)}),
    }
EOF

zip -r function.zip lambda_function.py storage/ .venv/lib/python3.11/site-packages >/dev/null
  1. Create the Lambda function and wire in environment variables.
    Use an execution role with basic logging permissions, then upload your ZIP file. If your ZIP gets too large, switch to a container image later.
aws iam create-role \
  --role-name llamaindex-lambda-role \
  --assume-role-policy-document '{
    "Version":"2012-10-17",
    "Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]
  }'

aws iam attach-role-policy \
  --role-name llamaindex-lambda-role \
  --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

ROLE_ARN=$(aws iam get-role --role-name llamaindex-lambda-role --query 'Role.Arn' --output text)

aws lambda create-function \
  --function-name llamaindex-query \
  --runtime python3.11 \
  --handler lambda_function.lambda_handler \
  --zip-file fileb://function.zip \
  --role "$ROLE_ARN" \
  --timeout 30 \
  --memory-size 1024 \
  --environment Variables="{OPENAI_API_KEY=$OPENAI_API_KEY}"
  1. Invoke it through the CLI first, then expose it through API Gateway if needed.
    Start with direct invocation so you can isolate packaging issues before adding HTTP routing.
cat > event.json <<'EOF'
{
  "body": "{\"question\":\"How long do employees have to submit expense reports?\"}"
}
EOF

aws lambda invoke \
  --function-name llamaindex-query \
  --payload fileb://event.json \
  response.json

cat response.json

Testing It

Run a question that should map directly to your sample document, like expense report timing or remote work approval. If the answer comes back as expected, your index loading and OpenAI access are working.

Check CloudWatch logs for cold start errors, missing environment variables, or import failures from packaging mistakes. If you see module import issues, your ZIP layout is wrong or dependencies were built on an incompatible OS.

If latency feels high, remember that Lambda cold starts plus model calls add up quickly. For production use, keep memory at least at 1024 MB and avoid rebuilding indexes inside the handler.

Next Steps

  • Move from ZIP deployment to an AWS Lambda container image when dependency size grows.
  • Add API Gateway HTTP API routing so clients can POST questions over HTTPS.
  • Replace SimpleDirectoryReader with S3-backed ingestion for real document pipelines.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides