LlamaIndex Tutorial (Python): deploying to AWS Lambda for intermediate developers
This tutorial shows how to package a LlamaIndex-based Python app and run it on AWS Lambda behind API Gateway. You need this when you want a serverless retrieval endpoint for chat, search, or document Q&A without managing servers.
What You'll Need
- •Python 3.11 locally
- •AWS account with permission to create:
- •Lambda functions
- •IAM roles
- •API Gateway HTTP APIs
- •CloudWatch logs
- •An OpenAI API key exported as
OPENAI_API_KEY - •
pipandvenv - •AWS CLI configured locally
- •These Python packages:
- •
llama-index - •
openai - •
boto3
- •
- •A simple document file to index, for example
docs/employee-handbook.txt
Step-by-Step
- •Create a small project with a persistent index artifact.
On Lambda, you do not want to build the index from scratch on every request. Build it once during deployment and load it from the Lambda bundle or/tmpat runtime.
mkdir llamaindex-lambda && cd llamaindex-lambda
python3.11 -m venv .venv
source .venv/bin/activate
pip install llama-index openai boto3
mkdir data
cat > data/employee-handbook.txt <<'EOF'
Employees must submit expense reports within 30 days.
Remote work requires manager approval.
Security incidents must be reported immediately.
EOF
- •Build and persist the index locally.
This creates astorage/directory that Lambda can ship with the deployment package. The code below uses real LlamaIndex imports and writes the index to disk.
from pathlib import Path
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext
data_dir = Path("data")
storage_dir = Path("storage")
documents = SimpleDirectoryReader(input_dir=str(data_dir)).load_data()
index = VectorStoreIndex.from_documents(documents)
storage_context = StorageContext.from_defaults()
index.storage_context.persist(persist_dir=str(storage_dir))
print(f"Indexed {len(documents)} documents into {storage_dir}")
- •Add the Lambda handler that loads the persisted index and answers queries.
Keep initialization at module scope so warm invocations reuse the loaded index. Also set the OpenAI API key in code only if you are using environment variables in Lambda.
import os
import json
from llama_index.core import StorageContext, load_index_from_storage
PERSIST_DIR = "storage"
if "OPENAI_API_KEY" not in os.environ:
raise RuntimeError("OPENAI_API_KEY is required")
storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()
def lambda_handler(event, context):
body = json.loads(event.get("body") or "{}")
question = body.get("question", "What are the security incident rules?")
response = query_engine.query(question)
return {
"statusCode": 200,
"headers": {"Content-Type": "application/json"},
"body": json.dumps({"answer": str(response)}),
}
- •Package the function for AWS Lambda.
Lambda expects your handler file at the root of the deployment ZIP unless you use a container image. This example keeps it simple with a ZIP deployment.
cat > lambda_function.py <<'EOF'
import os
import json
from llama_index.core import StorageContext, load_index_from_storage
PERSIST_DIR = "storage"
if "OPENAI_API_KEY" not in os.environ:
raise RuntimeError("OPENAI_API_KEY is required")
storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()
def lambda_handler(event, context):
body = json.loads(event.get("body") or "{}")
question = body.get("question", "What are the security incident rules?")
response = query_engine.query(question)
return {
"statusCode": 200,
"headers": {"Content-Type": "application/json"},
"body": json.dumps({"answer": str(response)}),
}
EOF
zip -r function.zip lambda_function.py storage/ .venv/lib/python3.11/site-packages >/dev/null
- •Create the Lambda function and wire in environment variables.
Use an execution role with basic logging permissions, then upload your ZIP file. If your ZIP gets too large, switch to a container image later.
aws iam create-role \
--role-name llamaindex-lambda-role \
--assume-role-policy-document '{
"Version":"2012-10-17",
"Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]
}'
aws iam attach-role-policy \
--role-name llamaindex-lambda-role \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
ROLE_ARN=$(aws iam get-role --role-name llamaindex-lambda-role --query 'Role.Arn' --output text)
aws lambda create-function \
--function-name llamaindex-query \
--runtime python3.11 \
--handler lambda_function.lambda_handler \
--zip-file fileb://function.zip \
--role "$ROLE_ARN" \
--timeout 30 \
--memory-size 1024 \
--environment Variables="{OPENAI_API_KEY=$OPENAI_API_KEY}"
- •Invoke it through the CLI first, then expose it through API Gateway if needed.
Start with direct invocation so you can isolate packaging issues before adding HTTP routing.
cat > event.json <<'EOF'
{
"body": "{\"question\":\"How long do employees have to submit expense reports?\"}"
}
EOF
aws lambda invoke \
--function-name llamaindex-query \
--payload fileb://event.json \
response.json
cat response.json
Testing It
Run a question that should map directly to your sample document, like expense report timing or remote work approval. If the answer comes back as expected, your index loading and OpenAI access are working.
Check CloudWatch logs for cold start errors, missing environment variables, or import failures from packaging mistakes. If you see module import issues, your ZIP layout is wrong or dependencies were built on an incompatible OS.
If latency feels high, remember that Lambda cold starts plus model calls add up quickly. For production use, keep memory at least at 1024 MB and avoid rebuilding indexes inside the handler.
Next Steps
- •Move from ZIP deployment to an AWS Lambda container image when dependency size grows.
- •Add API Gateway HTTP API routing so clients can POST questions over HTTPS.
- •Replace
SimpleDirectoryReaderwith S3-backed ingestion for real document pipelines.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit