Haystack Tutorial (Python): deploying to AWS Lambda for advanced developers

By Cyprian AaronsUpdated 2026-04-21

haystackdeploying-to-aws-lambda-for-advanced-developerspython

This tutorial shows you how to package a Haystack pipeline into an AWS Lambda handler that can answer queries from API Gateway or direct invocations. You need this when you want retrieval-augmented generation behind a serverless endpoint without running and scaling a full container service.

What You'll Need

•Python 3.11
•AWS account with Lambda, IAM, and CloudWatch access
•AWS CLI configured locally
•A working OpenAI API key
•A Lambda execution role with permission to write logs
•
These Python packages:
- •haystack-ai
- •haystack-integrations
- •openai
- •boto3 is already available in Lambda, but include it locally for parity
•
A deployment method:
- •ZIP upload, or
- •AWS SAM, or
- •Serverless Framework

Step-by-Step

•Start by building the Haystack pipeline in a way that works both locally and inside Lambda. Keep initialization outside the handler so warm invocations reuse the same objects.

import os
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.dataclasses import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore

PROMPT_TEMPLATE = """
Answer the question using only the provided documents.

Question: {{question}}

Documents:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}

Answer:
"""

document_store = InMemoryDocumentStore()
document_store.write_documents(
    [
        Document(content="Lambda is AWS's serverless compute service."),
        Document(content="Haystack pipelines can connect retrievers, prompt builders, and generators."),
        Document(content="Keep cold-start work small in AWS Lambda."),
    ]
)

pipeline = Pipeline()
pipeline.add_component("splitter", DocumentSplitter(split_by="word", split_length=50))
pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=document_store))
pipeline.add_component("prompt_builder", PromptBuilder(template=PROMPT_TEMPLATE))
pipeline.add_component(
    "llm",
    OpenAIChatGenerator(api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o-mini"),
)

•Wire the components together and define a handler-friendly function. For Lambda, keep input/output JSON serializable and avoid returning Haystack objects directly.

pipeline.connect("splitter.documents", "retriever.documents")
pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "llm.messages")

def answer_question(question: str) -> str:
    result = pipeline.run(
        {
            "splitter": {"documents": list(document_store.filter_documents())},
            "retriever": {"query": question},
            "prompt_builder": {"question": question},
        }
    )
    reply = result["llm"]["replies"][0].content
    return reply.strip()

•Add the Lambda entrypoint. This version handles both API Gateway events and direct test invocations, which makes local testing much easier.

import json

def lambda_handler(event, context):
    if isinstance(event, dict) and "body" in event:
        body = event["body"]
        if isinstance(body, str):
            payload = json.loads(body)
        else:
            payload = body
    else:
        payload = event if isinstance(event, dict) else {}

    question = payload.get("question", "").strip()
    if not question:
        return {
            "statusCode": 400,
            "headers": {"Content-Type": "application/json"},
            "body": json.dumps({"error": "Missing required field: question"}),
        }

    answer = answer_question(question)
    return {
        "statusCode": 200,
        "headers": {"Content-Type": "application/json"},
        "body": json.dumps({"answer": answer}),
    }

•Package dependencies into your deployment artifact. On Lambda, native imports must match the runtime, so build inside a Linux-compatible environment or use an AWS SAM build container.

mkdir -p package
pip install --target package haystack-ai haystack-integrations openai boto3

cp lambda_function.py package/
cd package
zip -r ../haystack-lambda.zip .
cd ..
aws lambda create-function \
  --function-name haystack-answerer \
  --runtime python3.11 \
  --handler lambda_function.lambda_handler \
  --role arn:aws:iam::123456789012:role/lambda-exec-role \
  --zip-file fileb://haystack-lambda.zip \
  --timeout 30 \
  --memory-size 1024 \
  --environment Variables="{OPENAI_API_KEY=your-key-here}"

•Invoke it and inspect the response shape before putting it behind API Gateway. If you get serialization errors, they usually come from returning non-JSON data or forgetting to parse the request body.

aws lambda invoke \
  --function-name haystack-answerer \
  --payload '{"question":"What is Lambda?"}' \
  response.json

cat response.json

Testing It

First test the function locally by calling lambda_handler({"question": "What is Haystack?"}, None) from a Python shell and confirming you get a plain string answer back. Then deploy to Lambda and invoke it with a minimal JSON payload so you can verify the request parsing path.

Check CloudWatch logs for cold-start duration and any import errors; those are usually packaging issues, not Haystack issues. If the model call fails, confirm the environment variable is present in Lambda and that outbound internet access is allowed for your function configuration.

Next Steps

•Move from InMemoryDocumentStore to a real vector store like OpenSearch or Pinecone for persistent retrieval.
•Cache static resources outside the handler to reduce cold starts further.
•Add structured logging around query latency, retrieval hits, and LLM token usage.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit