How to Integrate FastAPI for retail banking with LangChain for RAG

By Cyprian AaronsUpdated 2026-04-21
fastapi-for-retail-bankinglangchainrag

Combining FastAPI for retail banking with LangChain gives you a clean way to expose bank-grade APIs and wrap them with retrieval-augmented generation. The result is an agent that can answer customer questions from policy docs, product PDFs, and account data without hardcoding every branch of logic.

For retail banking, this pattern is useful when you need deterministic API workflows on the backend and natural-language interaction on the front end. FastAPI handles the service layer; LangChain handles retrieval, prompt orchestration, and tool calling.

Prerequisites

  • Python 3.10+
  • A running FastAPI service for your banking domain
  • langchain, langchain-openai, langchain-community, fastapi, uvicorn, httpx
  • An OpenAI API key or another supported LLM provider configured in env vars
  • Banking documents indexed somewhere accessible:
    • PDFs
    • FAQ pages
    • product terms
    • internal policy docs
  • A local vector store for RAG, such as FAISS or Chroma

Install the packages:

pip install fastapi uvicorn httpx langchain langchain-openai langchain-community faiss-cpu pydantic

Integration Steps

  1. Build the FastAPI retail banking endpoint

Start with a small API that exposes customer-facing banking data. In production, this would sit behind auth, audit logging, and rate limits.

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

app = FastAPI(title="Retail Banking API")

class AccountRequest(BaseModel):
    customer_id: str

@app.get("/health")
def health():
    return {"status": "ok"}

@app.post("/accounts/balance")
def get_balance(req: AccountRequest):
    mock_db = {
        "cust_001": {"balance": 1520.75, "currency": "USD"},
        "cust_002": {"balance": 9843.10, "currency": "USD"},
    }

    account = mock_db.get(req.customer_id)
    if not account:
        raise HTTPException(status_code=404, detail="Customer not found")

    return {
        "customer_id": req.customer_id,
        "balance": account["balance"],
        "currency": account["currency"],
    }

Run it:

uvicorn app:app --reload
  1. Index your banking knowledge base for RAG

LangChain needs retrievable documents. Use a loader plus embeddings plus a vector store.

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

loader = TextLoader("banking_faq.txt", encoding="utf-8")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = splitter.split_documents(docs)

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

This gives you semantic search over policies like overdraft rules, card replacement timelines, or wire transfer limits.

  1. Wrap the FastAPI endpoint as a LangChain tool

This is the bridge. LangChain can call your banking API as a tool when the user asks for account-specific information.

import httpx
from langchain_core.tools import tool

FASTAPI_BASE_URL = "http://127.0.0.1:8000"

@tool
def get_account_balance(customer_id: str) -> str:
    """Fetch a customer's balance from the retail banking API."""
    response = httpx.post(
        f"{FASTAPI_BASE_URL}/accounts/balance",
        json={"customer_id": customer_id},
        timeout=10.0,
    )
    response.raise_for_status()
    data = response.json()
    return f"Customer {data['customer_id']} has {data['balance']} {data['currency']}"

Now your agent can decide whether to answer from retrieved docs or call the live banking system.

  1. Create a RAG chain with tool-aware prompting

Use LangChain to combine retrieval and tool use in one flow.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a retail banking assistant. Use retrieved context for policy questions and tools for live account data."),
    ("human", "{question}\n\nContext:\n{context}")
])

def answer_question(question: str):
    docs = retriever.invoke(question)
    context = "\n\n".join([doc.page_content for doc in docs])

    # Simple routing example: call tool when user asks about balance.
    if "balance" in question.lower():
        customer_id = question.split()[-1]  # replace with proper entity extraction in production
        return get_account_balance.invoke({"customer_id": customer_id})

    chain_input = {"question": question, "context": context}
    return (prompt | llm).invoke(chain_input).content

In production, replace the string split with structured extraction using function calling or an intent classifier.

  1. Expose a single orchestration endpoint

Your FastAPI app can serve both direct bank APIs and the AI assistant endpoint.

from fastapi import Body

@app.post("/assistant")
def assistant(question: str = Body(embed=True)):
    result = answer_question(question)
    return {"answer": result}

This is the pattern you want in real systems:

  • FastAPI owns transport, auth, validation, observability.
  • LangChain owns retrieval and model orchestration.
  • Your business logic stays behind typed endpoints.

Testing the Integration

Hit the assistant endpoint with a policy question and an account query.

import httpx

base_url = "http://127.0.0.1:8000"

policy_resp = httpx.post(f"{base_url}/assistant", json={"question": "What is your overdraft fee?"})
balance_resp = httpx.post(f"{base_url}/assistant", json={"question": "What is my balance cust_001?"})

print(policy_resp.json())
print(balance_resp.json())

Expected output:

{"answer":"Overdraft fees are charged when..."}
{"answer":"Customer cust_001 has 1520.75 USD"}

If the policy answer comes from your indexed docs and the balance answer comes from /accounts/balance, the integration is working correctly.

Real-World Use Cases

  • Customer self-service assistant

    • Answer FAQs from product docs.
    • Pull live balances or transaction summaries through FastAPI tools.
    • Reduce call center load without exposing raw backend systems.
  • Branch staff copilot

    • Retrieve internal procedures for KYC, disputes, card replacement, and loan servicing.
    • Call approved banking APIs for customer lookup or case status.
    • Keep responses grounded in policy and system state.
  • Collections or servicing agent

    • Retrieve repayment rules and hardship policies from documents.
    • Query account delinquency status through secure FastAPI endpoints.
    • Generate next-best-action suggestions with traceable sources.

The cleanest architecture here is simple: use FastAPI as your controlled banking interface and LangChain as the reasoning layer on top of it. That gives you RAG where it belongs — grounded in documents for knowledge, grounded in APIs for facts.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides