How to Integrate Anthropic for healthcare with Cloudflare Workers for RAG

By Cyprian AaronsUpdated 2026-04-21
anthropic-for-healthcarecloudflare-workersrag

Why this integration matters

If you’re building healthcare AI, the hard part is not calling a model. It’s controlling where patient-adjacent data flows, keeping retrieval close to the edge, and making sure your agent can answer from governed sources instead of hallucinating.

Anthropic handles the reasoning layer. Cloudflare Workers gives you a low-latency edge runtime for retrieval, policy checks, and request orchestration. Put them together and you get a RAG pipeline that can answer clinical ops questions, summarize internal policy docs, and route sensitive prompts without shipping everything back to a central app server.

Prerequisites

  • An Anthropic API key with access to the Claude models you plan to use
  • A Cloudflare account with:
    • Workers enabled
    • D1 or KV configured for document metadata
    • Vectorize enabled if you want semantic retrieval
  • Python 3.10+
  • pip installed
  • These packages:
    • anthropic
    • requests
    • cloudflare or direct HTTP access to Workers endpoints
  • A deployed Cloudflare Worker endpoint for retrieval
  • A document corpus for healthcare RAG:
    • clinical policies
    • benefits docs
    • internal SOPs
    • de-identified knowledge base content

Integration Steps

1) Install the SDKs and wire secrets

Start with the Python client for Anthropic and a simple HTTP client for your Worker. For production, keep secrets in environment variables and never hardcode them.

pip install anthropic requests
export ANTHROPIC_API_KEY="your-anthropic-key"
export WORKER_URL="https://your-worker.your-subdomain.workers.dev"

In Python, initialize both sides:

import os
import requests
from anthropic import Anthropic

anthropic_client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
worker_url = os.environ["WORKER_URL"]

2) Build the Cloudflare Worker retrieval endpoint

Your Worker should accept a query, run vector search against your indexed healthcare documents, and return top chunks. This keeps retrieval close to the edge and lets you apply access control before any LLM call happens.

A minimal Worker route might look like this in Python-style pseudocode on the client side, but the actual Worker is usually written in JavaScript or TypeScript. From Python, you’ll call it as an HTTP service.

import requests

def retrieve_context(query: str) -> dict:
    response = requests.post(
        f"{worker_url}/rag/search",
        json={"query": query, "top_k": 5},
        timeout=15,
    )
    response.raise_for_status()
    return response.json()

result = retrieve_context("What is our prior authorization policy for MRI?")
print(result)

Expected response shape:

{
  "chunks": [
    {"id": "doc_12", "text": "MRI prior authorization is required for ...", "score": 0.92},
    {"id": "doc_44", "text": "Urgent imaging exceptions apply when ...", "score": 0.88}
  ]
}

3) Pass retrieved context into Anthropic Messages API

Use Anthropic’s Messages API to answer only from retrieved context. Keep the prompt strict: if the context doesn’t contain the answer, the model should say so.

from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

def answer_with_rag(question: str) -> str:
    retrieval = retrieve_context(question)
    context_blocks = "\n\n".join(
        f"[{chunk['id']}] {chunk['text']}" for chunk in retrieval["chunks"]
    )

    message = client.messages.create(
        model="claude-3-5-sonnet-latest",
        max_tokens=500,
        temperature=0,
        system=(
            "You are a healthcare support assistant. "
            "Answer only using the provided context. "
            "If the context is insufficient, say you do not have enough information."
        ),
        messages=[
            {
                "role": "user",
                "content": f"Context:\n{context_blocks}\n\nQuestion: {question}"
            }
        ],
    )
    return message.content[0].text

print(answer_with_rag("Do we need prior authorization for MRI?"))

This is the core pattern:

  • Cloudflare Workers handles retrieval and policy enforcement.
  • Anthropic handles grounded generation.
  • Your app stays thin and stateless.

4) Add healthcare-safe request filtering at the edge

Before a query reaches Claude, validate that it does not contain PHI you do not want processed. In practice, your Worker can reject unsafe inputs or redact them before forwarding.

From Python, send a metadata envelope alongside the query so your Worker can make routing decisions.

def safe_answer(question: str, user_role: str) -> str:
    payload = {
        "query": question,
        "top_k": 5,
        "user_role": user_role,
        "source_system": "care_ops_agent"
    }

    response = requests.post(
        f"{worker_url}/rag/search",
        json=payload,
        timeout=15,
    )
    response.raise_for_status()
    retrieval = response.json()

    context = "\n\n".join(chunk["text"] for chunk in retrieval["chunks"])

    msg = anthropic_client.messages.create(
        model="claude-3-5-sonnet-latest",
        max_tokens=300,
        temperature=0,
        system="Use only retrieved policy text. Do not infer medical advice.",
        messages=[{"role": "user", "content": f"{context}\n\nQ: {question}"}],
    )
    return msg.content[0].text

This pattern matters in healthcare because role-based access often decides which documents can be retrieved at all.

5) Return structured answers for downstream systems

Don’t stop at plain text if this is going into an agent workflow. Ask Claude for structured output so your orchestration layer can route it to ticketing, CRM, or review queues.

import json

def answer_as_json(question: str) -> dict:
    retrieval = retrieve_context(question)
    context = "\n\n".join(chunk["text"] for chunk in retrieval["chunks"])

    msg = anthropic_client.messages.create(
        model="claude-3-5-sonnet-latest",
        max_tokens=400,
        temperature=0,
        system="Return valid JSON with keys: answer, citations, confidence.",
        messages=[{
            "role": "user",
            "content": f"Context:\n{context}\n\nQuestion: {question}"
        }],
    )

    return json.loads(msg.content[0].text)

A clean JSON contract makes it easier to plug this into:

  • prior auth workflows
  • nurse support copilots
  • claims triage agents
  • compliance review pipelines

Testing the Integration

Run a simple end-to-end test from Python. This verifies that your Worker returns context and Anthropic turns it into an answer.

test_question = "What documentation is required before approving an MRI?"
answer = safe_answer(test_question, user_role="care_ops")

print("QUESTION:", test_question)
print("ANSWER:", answer)

Expected output:

QUESTION: What documentation is required before approving an MRI?
ANSWER: Prior authorization requires clinical notes showing medical necessity...

If you want to validate grounding more aggressively, inspect whether citations from retrieved chunks are present in the returned JSON or final text.

Real-World Use Cases

  • Prior authorization assistant

    • Retrieve payer rules from Cloudflare Workers-backed indexed docs.
    • Use Anthropic to draft approval guidance grounded in those rules.
  • Clinical operations copilot

    • Answer staff questions about scheduling rules, referral workflows, and escalation paths.
    • Keep sensitive policy content at the edge instead of centralizing every lookup.
  • Claims and benefits support agent

    • Pull plan-specific coverage language from vector search.
    • Generate consistent responses with citations for auditability.

Production Notes That Matter

Keep retrieval and generation separate. Workers should decide what content can be seen; Anthropic should only reason over what was allowed through.

For healthcare systems, also track:

  • document versioning
  • access logs per query
  • citation IDs returned by retrieval
  • prompt/response retention policy

That’s what turns a demo into something you can actually ship inside a regulated workflow.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides