How to Integrate FastAPI for payments with LangChain for RAG
Combining FastAPI for payments with LangChain for RAG gives you a clean pattern for monetized AI workflows: users pay for access, then the agent retrieves grounded answers from your private knowledge base. That’s the right shape for billing-sensitive systems like legal assistants, insurance quote bots, and internal copilots where every query may need authorization before retrieval.
Prerequisites
- •Python 3.10+
- •FastAPI installed
- •Uvicorn installed
- •LangChain installed
- •A payment provider SDK or API access behind your FastAPI service
- •A vector store or document source for RAG
- •OpenAI API key or another LLM provider supported by LangChain
- •
pydantic,httpx, andpython-dotenv
Install the core packages:
pip install fastapi uvicorn langchain langchain-openai langchain-community pydantic httpx python-dotenv
Integration Steps
- •Set up a FastAPI payment endpoint that returns an entitlement token after successful payment.
Use FastAPI to receive the payment event from your billing layer. In production, this is usually a webhook from Stripe, Adyen, or another processor. The important part is that your app issues a signed entitlement after payment succeeds.
from fastapi import FastAPI, HTTPException, Header
from pydantic import BaseModel
app = FastAPI()
class PaymentRequest(BaseModel):
user_id: str
amount_cents: int
currency: str = "usd"
@app.post("/payments/confirm")
async def confirm_payment(payload: PaymentRequest):
# Replace this with your actual payment provider verification.
if payload.amount_cents < 100:
raise HTTPException(status_code=400, detail="Minimum payment not met")
entitlement_token = f"entitlement:{payload.user_id}:{payload.amount_cents}"
return {"status": "paid", "entitlement_token": entitlement_token}
- •Build your LangChain RAG pipeline around a retriever and chat model.
This example uses a simple in-memory vector store setup. Swap it for Pinecone, FAISS, pgvector, or Elasticsearch depending on your stack.
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from langchain.chains import RetrievalQA
docs = [
Document(page_content="Claims are processed within 5 business days."),
Document(page_content="Refunds require verified account ownership."),
]
splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
chunks = splitter.split_documents(docs)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
- •Gate retrieval behind the entitlement token issued by FastAPI.
Your agent should refuse to answer unless the user has paid. This is where you connect the payment state to the RAG flow. In practice, validate the token against your database or JWT verifier.
from fastapi import Depends
def verify_entitlement(x_entitlement_token: str | None = Header(default=None)):
if not x_entitlement_token or not x_entitlement_token.startswith("entitlement:"):
raise HTTPException(status_code=403, detail="Payment required")
return x_entitlement_token
class QueryRequest(BaseModel):
question: str
@app.post("/agent/query")
async def query_agent(payload: QueryRequest, token: str = Depends(verify_entitlement)):
result = qa_chain.invoke({"query": payload.question})
return {
"entitlement_token": token,
"answer": result["result"],
}
- •Add a single orchestration endpoint that handles payment first, then RAG answer generation.
This pattern is useful when you want one API call from the client side. The client pays, receives entitlement, then immediately queries the agent using that token.
import httpx
@app.post("/agent/pay-and-query")
async def pay_and_query(payload: QueryRequest):
async with httpx.AsyncClient() as client:
pay_resp = await client.post(
"http://localhost:8000/payments/confirm",
json={"user_id": "user_123", "amount_cents": 5000, "currency": "usd"},
)
pay_resp.raise_for_status()
entitlement_token = pay_resp.json()["entitlement_token"]
rag_resp = await client.post(
"http://localhost:8000/agent/query",
json={"question": payload.question},
headers={"X-Entitlement-Token": entitlement_token},
)
rag_resp.raise_for_status()
return rag_resp.json()
- •Keep payment verification and retrieval separate in production.
Do not embed billing logic inside your LangChain chain. Keep it at the API boundary so you can audit access, revoke entitlements, and trace usage per customer.
| Concern | FastAPI layer | LangChain layer |
|---|---|---|
| Payment verification | Webhook/API auth | Not handled |
| Entitlement checks | Header/JWT/db lookup | Not handled |
| Retrieval and synthesis | Not handled | RetrievalQA.invoke() |
| Audit logging | Request logs + billing records | Chain traces |
Testing the Integration
Run the app:
uvicorn main:app --reload
Then test both steps end-to-end:
import requests
base_url = "http://127.0.0.1:8000"
payment = requests.post(
f"{base_url}/payments/confirm",
json={"user_id": "user_123", "amount_cents": 5000},
)
token = payment.json()["entitlement_token"]
answer = requests.post(
f"{base_url}/agent/query",
json={"question": "How long do claims take?"},
headers={"X-Entitlement-Token": token},
)
print(answer.json())
Expected output:
{
"entitlement_token": "entitlement:user_123:5000",
"answer": "Claims are processed within 5 business days."
}
If you send the query without X-Entitlement-Token, you should get:
{
"detail": "Payment required"
}
Real-World Use Cases
- •Paid support agents that answer from internal policy documents after subscription validation.
- •Insurance assistant workflows where customers pay per report retrieval or per claims guidance session.
- •Enterprise knowledge bots that only expose sensitive documents after department-level billing or quota checks.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit