LlamaIndex Tutorial (TypeScript): deploying to AWS Lambda for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexdeploying-to-aws-lambda-for-intermediate-developerstypescript

This tutorial shows how to build a small LlamaIndex TypeScript app that runs inside AWS Lambda and answers questions against your data. You need this when you want serverless inference for low-traffic workloads, event-driven document Q&A, or an API that scales to zero without managing servers.

What You'll Need

  • Node.js 18+ and npm
  • An AWS account with permission to create:
    • Lambda functions
    • IAM roles
    • CloudWatch logs
  • An OpenAI API key exported as OPENAI_API_KEY
  • A local TypeScript project
  • These packages:
    • llamaindex
    • @aws-sdk/client-s3 if you plan to load documents from S3 later
    • esbuild for bundling the Lambda artifact
  • Basic familiarity with:
    • async/await
    • AWS Lambda handler signatures
    • environment variables

Step-by-Step

  1. Create a new TypeScript project and install dependencies. Keep the runtime small and bundle everything into one file, because Lambda cold starts get worse when you ship a large dependency tree.
mkdir llamaindex-lambda && cd llamaindex-lambda
npm init -y
npm install llamaindex
npm install -D typescript @types/node esbuild
npx tsc --init --rootDir src --outDir dist --module commonjs --target es2020 --esModuleInterop true
mkdir src
  1. Add a minimal LlamaIndex query handler. This example builds an index from in-memory text so it works end-to-end without external storage. In production, you would usually swap the source for S3, DynamoDB, or a vector store.
// src/index.ts
import { Document, VectorStoreIndex } from "llamaindex";

const docs = [
  new Document({ text: "AWS Lambda is a serverless compute service." }),
  new Document({ text: "LlamaIndex helps structure and query data for LLM applications." }),
];

let cachedIndex: VectorStoreIndex | null = null;

async function getIndex() {
  if (!cachedIndex) {
    cachedIndex = await VectorStoreIndex.fromDocuments(docs);
  }
  return cachedIndex;
}

export const handler = async (event: { question?: string }) => {
  const question = event.question ?? "What is Lambda?";
  const index = await getIndex();
  const engine = index.asQueryEngine();
  const response = await engine.query({ query: question });

  return {
    statusCode: 200,
    body: JSON.stringify({ question, answer: response.toString() }),
  };
};
  1. Add an environment-aware OpenAI configuration and keep the index warm across invocations. Lambda may reuse the same container, so module-level caching reduces repeated initialization work.
// src/index.ts
import { Document, VectorStoreIndex, Settings } from "llamaindex";

Settings.llm.model = "gpt-4o-mini";
Settings.embedModel.model = "text-embedding-3-small";

const docs = [
  new Document({ text: "AWS Lambda is a serverless compute service." }),
  new Document({ text: "LlamaIndex helps structure and query data for LLM applications." }),
];

let cachedIndex: VectorStoreIndex | null = null;

async function getIndex() {
  if (!cachedIndex) cachedIndex = await VectorStoreIndex.fromDocuments(docs);
  return cachedIndex;
}
  1. Make the handler compatible with API Gateway or direct Lambda invocation. This version accepts either { question } or an HTTP event body, which makes local testing and API Gateway integration easier.
// src/index.ts
export const handler = async (event: any) => {
  const body =
    typeof event?.body === "string" ? JSON.parse(event.body) : event ?? {};
  const question = body.question ?? "What is Lambda?";

  const index = await getIndex();
  const engine = index.asQueryEngine();
  const response = await engine.query({ query: question });

  return {
    statusCode: 200,
    headers: { "content-type": "application/json" },
    body: JSON.stringify({ question, answer: response.toString() }),
  };
};
  1. Bundle for Lambda and run a local smoke test. Use esbuild so the deployed artifact contains the compiled code and dependencies in one file.
{
  "name": "llamaindex-lambda",
  "version": "1.0.0",
  "main": "dist/index.js",
  "scripts": {
    "build": "esbuild src/index.ts --bundle --platform=node --target=node18 --outfile=dist/index.js",
    "test": "node -e \"require('./dist/index').handler({question:'What is LlamaIndex?' }).then(console.log)\""
  }
}
  1. Deploy the bundle to AWS Lambda and set the API key as an environment variable. If you use an HTTP trigger, attach API Gateway; if you use direct invocation, call the function with JSON payloads from your backend.
npm run build

aws lambda create-function \
  --function-name llamaindex-ts-demo \
  --runtime nodejs18.x \
  --handler index.handler \
  --role arn:aws:iam::123456789012:role/lambda-exec-role \
"$(printf '%s' \
'--zip-file fileb://<(cd dist && zip -r ../function.zip .)' )"

A more practical deployment flow is usually:

cd dist && zip -r ../function.zip .
aws lambda update-function-code \
  --function-name llamaindex-ts-demo \
  --zip-file fileb://../function.zip

aws lambda update-function-configuration \
  --function-name llamaindex-ts-demo \
  --environment Variables="{OPENAI_API_KEY=your-key-here}"

Testing It

Invoke the function with a simple payload like {"question":"What does LlamaIndex do?"} and confirm you get a JSON response with an answer field. Check CloudWatch logs if the function times out or fails during model initialization.

If you see import errors, your bundle is probably wrong; rebuild with esbuild and make sure node_modules is not being required at runtime. If responses are slow on the first request but fast after that, that’s expected cold-start behavior plus cached initialization.

For API Gateway deployments, send a POST request to the endpoint and verify that both direct JSON bodies and proxy events are handled correctly. If the model returns empty or irrelevant answers, confirm your OPENAI_API_KEY is set in Lambda and that your documents actually contain the information you’re asking for.

Next Steps

  • Replace in-memory documents with S3-loaded files using @aws-sdk/client-s3
  • Add a persistent vector store like Pinecone or OpenSearch instead of rebuilding on every cold start
  • Wrap this handler in API Gateway + Cognito if you need authenticated access

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides