LlamaIndex Tutorial (Python): deploying to AWS Lambda for beginners
This tutorial shows you how to package a basic LlamaIndex app as a Lambda handler and deploy it on AWS Lambda with Python. You’d do this when you want a lightweight question-answering endpoint without running a server yourself.
What You'll Need
- •Python 3.11 locally
- •An AWS account with permission to create:
- •Lambda functions
- •IAM roles
- •CloudWatch log groups
- •AWS CLI configured locally
- •
pipandvenv - •An OpenAI API key
- •These Python packages:
- •
llama-index - •
llama-index-llms-openai - •
llama-index-embeddings-openai
- •
- •A ZIP-based deployment workflow, not container images
- •Basic familiarity with AWS Lambda handlers
Step-by-Step
- •First, create a small project with a local virtual environment and install the dependencies. Keep the dependency set minimal so the deployment package stays manageable.
mkdir llamaindex-lambda-demo
cd llamaindex-lambda-demo
python3.11 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai
- •Create the Lambda handler in
app.py. This example builds an index from a few in-memory documents, then answers a query from the incoming event payload.
import json
import os
from llama_index.core import Document, VectorStoreIndex, Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
def build_query_engine():
Settings.llm = OpenAI(model="gpt-4o-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
docs = [
Document(text="Lambda is AWS's serverless compute service."),
Document(text="LlamaIndex helps connect LLMs to your data."),
Document(text="This demo answers questions from a tiny built-in knowledge base."),
]
index = VectorStoreIndex.from_documents(docs)
return index.as_query_engine()
query_engine = build_query_engine()
def lambda_handler(event, context):
question = event.get("question", "What is this demo about?")
response = query_engine.query(question)
return {
"statusCode": 200,
"headers": {"Content-Type": "application/json"},
"body": json.dumps({"question": question, "answer": str(response)}),
}
- •Set your OpenAI API key as an environment variable before testing locally or in Lambda. Lambda reads this at runtime, so don’t hardcode secrets into the source file.
export OPENAI_API_KEY="your-openai-api-key"
python -c "from app import lambda_handler; print(lambda_handler({'question':'What is LlamaIndex?'}, None))"
- •Package the code for Lambda. For beginners, the easiest path is to zip the virtual environment site-packages together with your handler file, then upload that ZIP to Lambda.
mkdir -p package
pip install --target package llama-index llama-index-llms-openai llama-index-embeddings-openai
cp app.py package/
cd package
zip -r ../lambda-package.zip .
cd ..
- •Create the Lambda function in AWS and point it at your handler. Use Python 3.11 runtime, set the handler to
app.lambda_handler, and add theOPENAI_API_KEYenvironment variable in the console or CLI.
aws lambda create-function \
--function-name llamaindex-demo \
--runtime python3.11 \
--role arn:aws:iam::123456789012:role/your-lambda-execution-role \
--handler app.lambda_handler \
--zip-file fileb://lambda-package.zip \
--timeout 30 \
--memory-size 1024 \
--environment Variables="{OPENAI_API_KEY=your-openai-api-key}"
- •Invoke the function with a JSON payload containing a question. If everything is wired correctly, you’ll get back a JSON response with an answer generated by LlamaIndex and OpenAI.
cat > event.json <<'EOF'
{"question":"What does this demo use LlamaIndex for?"}
EOF
aws lambda invoke \
--function-name llamaindex-demo \
--payload file://event.json \
response.json
cat response.json
Testing It
Start by testing locally with python -c before touching AWS. That catches bad imports, missing environment variables, and model configuration issues fast.
After deployment, check CloudWatch Logs for timeouts or dependency errors. If you see import failures, your ZIP package is missing dependencies or was built on the wrong Python version.
If the function runs but returns empty or slow responses, increase memory to 1024 MB or higher and make sure your timeout is at least 20–30 seconds. LLM calls are network-bound, so low timeouts are usually self-inflicted.
A good final test is to change the question and confirm responses vary based on your document set. That tells you the query engine is actually using your index instead of returning generic model output.
Next Steps
- •Move from in-memory documents to S3-backed loading with LlamaIndex readers.
- •Replace ZIP deployment with an AWS SAM or CDK pipeline for repeatable releases.
- •Add API Gateway in front of Lambda so you can call this from HTTP clients instead of only
invoke.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit