LlamaIndex Tutorial (TypeScript): deploying to AWS Lambda for advanced developers

By Cyprian AaronsUpdated 2026-04-21

llamaindexdeploying-to-aws-lambda-for-advanced-developerstypescript

This tutorial shows you how to package a TypeScript LlamaIndex query handler for AWS Lambda, wire it to API Gateway, and keep the runtime lean enough for real deployments. You need this when you want your agent or retrieval endpoint to run serverlessly without standing up and managing a long-lived Node service.

What You'll Need

•
An AWS account with permission to create:
- •Lambda functions
- •IAM roles
- •API Gateway HTTP APIs
- •CloudWatch logs
•Node.js 18 or newer
•TypeScript 5.x
•
These packages:
- •@llamaindex/core
- •@llamaindex/openai
- •aws-lambda
- •esbuild
- •typescript
•An OpenAI API key exported as OPENAI_API_KEY
•A local folder with a few text files to index
•AWS CLI configured locally if you want to deploy from terminal

Step-by-Step

•Set up a minimal project structure and install dependencies. The key constraint on Lambda is cold start size, so we keep the app small and avoid shipping unnecessary runtime baggage.

mkdir llamaindex-lambda-ts && cd llamaindex-lambda-ts
npm init -y
npm i @llamaindex/core @llamaindex/openai aws-lambda
npm i -D typescript esbuild @types/aws-lambda @types/node
mkdir -p src data dist
cat > tsconfig.json <<'EOF'
{
  "compilerOptions": {
    "target": "ES2022",
    "module": "NodeNext",
    "moduleResolution": "NodeNext",
    "strict": true,
    "outDir": "dist",
    "rootDir": "src",
    "esModuleInterop": true,
    "skipLibCheck": true
  },
  "include": ["src"]
}
EOF

•Create a small local index builder that runs once at startup inside Lambda. For advanced use cases, you’d usually replace the file reader with S3, DynamoDB, or a vector DB, but this keeps the deployment path clear.

// src/index.ts
import { SimpleDirectoryReader, VectorStoreIndex } from "@llamaindex/core";
import { OpenAIEmbedding } from "@llamaindex/openai";

export async function buildIndex() {
  const reader = new SimpleDirectoryReader();
  const docs = await reader.loadData({ directoryPath: "./data" });

  return await VectorStoreIndex.fromDocuments(docs, {
    embedModel: new OpenAIEmbedding({
      model: "text-embedding-3-small",
      apiKey: process.env.OPENAI_API_KEY!,
    }),
  });
}

•Add the Lambda handler. The important part is caching the index outside the handler so warm invocations reuse it, which reduces latency and OpenAI embedding calls.

// src/handler.ts
import type { APIGatewayProxyHandlerV2 } from "aws-lambda";
import { OpenAI } from "@llamaindex/openai";
import { buildIndex } from "./index.js";

let indexPromise = buildIndex();

export const handler: APIGatewayProxyHandlerV2 = async (event) => {
  const body = event.body ? JSON.parse(event.body) : {};
  const query = String(body.query ?? "").trim();

  if (!query) {
    return { statusCode: 400, body: JSON.stringify({ error: "query is required" }) };
  }

  const index = await indexPromise;
  const queryEngine = index.asQueryEngine({
    llm: new OpenAI({ model: "gpt-4o-mini", apiKey: process.env.OPENAI_API_KEY! }),
  });

  const response = await queryEngine.query({ query });
  return {
    statusCode: 200,
    headers: { "content-type": "application/json" },
    body: JSON.stringify({ answer: response.toString() }),
  };
};

•Add a build script and compile for Lambda. I prefer bundling with esbuild because it gives you one deployable artifact and avoids CommonJS/ESM surprises in Lambda.

{
  "name": "llamaindex-lambda-ts",
  "private": true,
  "type": "module",
  "scripts": {
    "build": "esbuild src/handler.ts --bundle --platform=node --target=node18 --format=esm --outfile=dist/index.mjs"
  }
}

npm run build

•Package and deploy to AWS Lambda. Use an environment variable for the API key; do not bake secrets into the bundle. If you want production hygiene, move the key into Secrets Manager later.

zip -j function.zip dist/index.mjs

aws lambda create-function \
  --function-name llamaindex-query \
  --runtime nodejs18.x \
  --handler index.handler \
  --zip-file fileb://function.zip \
  --role arn:aws:iam::<YOUR_ACCOUNT_ID>:role/<LAMBDA_EXEC_ROLE> \
  --timeout 30 \
  --memory-size 1024 \
  --environment Variables="{OPENAI_API_KEY=$OPENAI_API_KEY}"

•Expose it through API Gateway HTTP API. This gives you a clean HTTPS endpoint without having to manage ALB or custom auth plumbing first.

API_ID=$(aws apigatewayv2 create-api \
  --name llamaindex-http-api \
  --protocol-type HTTP \
  --query 'ApiId' --output text)

INTEGRATION_ID=$(aws apigatewayv2 create-integration \
  --api-id "$API_ID" \
  --integration-type AWS_PROXY \
  --integration-uri arn:aws:lambda:<REGION>:<YOUR_ACCOUNT_ID>:function:llamaindex-query \
  	--payload-format-version '2.0' \
  	--query 'IntegrationId' --output text)

aws apigatewayv2 create-route \
  --api-id "$API_ID" \
  	--route-key 'POST /query' \
  	--target integrations/$INTEGRATION_ID

aws apigatewayv2 create-stage \
  	--api-id "$API_ID" \
  	--stage-name '$default' \
  	--auto-deploy

aws lambda add-permission \
  	--function-name llamaindex-query \
  	--statement-id apigw-invoke \
  	--action lambda:InvokeFunction \
  	--principal apigateway.amazonaws.com

echo https://$API_ID.execute-api.<REGION>.amazonaws.com/query

Testing It

Send a POST request with a query payload and confirm that the function returns JSON with an answer field. If the first call is slow, that’s normal; it’s paying the cold start plus index initialization cost.

curl -X POST https://$API_ID.execute-api.<REGION>.amazonaws.com/query \
  -H 'content-type: application/json' \
  -d '{"query":"What documents are in this dataset?"}'

Check CloudWatch logs for two things: whether your data files were loaded successfully and whether any OpenAI auth or timeout errors occurred. If you see memory pressure or long startup times, increase Lambda memory before touching code.

For repeat traffic, invoke it twice and compare latency. Warm invocations should be materially faster because indexPromise stays in memory across requests until the container is recycled.

Next Steps

•Move document storage from local files to S3 and load them during initialization.
•Swap the in-memory vector store for Pinecone, Qdrant, or another persistent backend.
•Add request authentication with JWT authorizers before exposing this endpoint publicly.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit