How to Integrate Next.js for fintech with Vercel AI SDK for RAG
Next.js for fintech gives you the app surface: authenticated dashboards, secure workflows, and fast server-rendered UX. Vercel AI SDK gives you the agent runtime on the edge or server, which is where retrieval-augmented generation lives when you need grounded answers from policy docs, product manuals, or transaction metadata.
The useful pattern here is simple: Next.js handles user context and request orchestration, while the AI SDK calls a RAG backend that retrieves from your fintech knowledge base and returns a controlled answer. That gives you chat assistants, support copilots, and internal ops tools that can answer from real documents instead of hallucinating.
Prerequisites
- •Node.js 18+ and Python 3.10+
- •A Next.js app already set up for fintech workflows
- •Vercel AI SDK installed in your Next.js project:
- •
npm install ai @ai-sdk/openai
- •
- •A Python RAG service with:
- •
fastapi - •
uvicorn - •
openai - •
chromadborfaiss-cpu
- •
- •An OpenAI API key or compatible model endpoint
- •A vector store with indexed fintech documents:
- •policy PDFs
- •FAQ pages
- •compliance playbooks
- •product terms
- •Environment variables configured:
- •
OPENAI_API_KEY - •
RAG_API_URL
- •
Integration Steps
- •Build the RAG service in Python.
Start with a small FastAPI service that embeds queries, retrieves top chunks, and returns grounded context. In production, swap the toy vector store for your real index and add tenant filtering by organization or account.
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI
import chromadb
app = FastAPI()
client = OpenAI()
chroma = chromadb.PersistentClient(path="./chroma_db")
collection = chroma.get_or_create_collection(name="fintech_docs")
class QueryRequest(BaseModel):
query: str
tenant_id: str
@app.post("/rag/query")
def rag_query(req: QueryRequest):
results = collection.query(
query_texts=[req.query],
n_results=3,
where={"tenant_id": req.tenant_id}
)
chunks = results["documents"][0] if results["documents"] else []
context = "\n\n".join(chunks)
return {
"context": context,
"sources": results["metadatas"][0] if results["metadatas"] else []
}
- •Add a Next.js API route that calls the RAG backend through the Vercel AI SDK.
Use the AI SDK’s streamText to keep the chat response streaming back to the UI. The key point is that your route can fetch retrieved context first, then pass it into the model call as grounded system input.
import os
import requests
from openai import OpenAI
RAG_API_URL = os.environ["RAG_API_URL"]
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
def fetch_rag_context(query: str, tenant_id: str) -> dict:
resp = requests.post(
f"{RAG_API_URL}/rag/query",
json={"query": query, "tenant_id": tenant_id},
timeout=15,
)
resp.raise_for_status()
return resp.json()
def build_messages(query: str, context: str):
return [
{
"role": "system",
"content": (
"You are a fintech assistant. Answer only from the provided context. "
"If the context is insufficient, say you don't know."
),
},
{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion:\n{query}",
},
]
def call_model(query: str, tenant_id: str):
rag = fetch_rag_context(query, tenant_id)
messages = build_messages(query, rag["context"])
client = OpenAI(api_key=OPENAI_API_KEY)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0,
)
return response.choices[0].message.content
- •Wire this into a Next.js route using the Vercel AI SDK primitives.
In Next.js, use streamText from ai and a provider like @ai-sdk/openai. The route below shows the common pattern: accept user input, call your Python RAG service first, then stream an answer based on retrieved chunks.
import json
import requests
def nextjs_route_payload(user_query: str, tenant_id: str):
rag_resp = requests.post(
"http://localhost:8000/rag/query",
json={"query": user_query, "tenant_id": tenant_id},
timeout=15,
)
rag_resp.raise_for_status()
rag_data = rag_resp.json()
return {
"messages": [
{
"role": "system",
"content": (
"Use only retrieved evidence. Do not invent policy details."
),
},
{
"role": "user",
"content": f"Question: {user_query}\n\nEvidence:\n{rag_data['context']}",
},
]
}
print(json.dumps(nextjs_route_payload("What is our chargeback SLA?", "tenant_123"), indent=2))
- •Add document ingestion so your fintech knowledge base stays current.
For production systems, ingestion should be separate from query time. Chunk documents by section headers, attach metadata like tenant ID and document type, then embed and store them before the app ever serves traffic.
from openai import OpenAI
import chromadb
client = OpenAI()
chroma = chromadb.PersistentClient(path="./chroma_db")
collection = chroma.get_or_create_collection(name="fintech_docs")
def ingest_chunk(text: str, doc_id: str, tenant_id: str, source_url: str):
embedding = client.embeddings.create(
model="text-embedding-3-small",
input=text,
)
collection.add(
ids=[doc_id],
documents=[text],
embeddings=[embedding.data[0].embedding],
metadatas=[{
"tenant_id": tenant_id,
"source_url": source_url,
"doc_type": "policy"
}],
)
ingest_chunk(
text="Chargebacks must be reviewed within 5 business days.",
doc_id="cb_001",
tenant_id="tenant_123",
source_url="https://docs.example.com/chargebacks"
)
- •Add guardrails before exposing answers to users.
Fintech systems need answer constraints. Filter by tenant ID, redact PII before indexing, and reject queries when retrieval confidence is low or when no supporting chunk exists.
def safe_answer(query: str, tenant_id: str):
rag_resp = requests.post(
f"{RAG_API_URL}/rag/query",
json={"query": query, "tenant_id": tenant_id},
timeout=15,
)
rag_resp.raise_for_status()
data = rag_resp.json()
if not data["context"].strip():
return {"answer": "I don't have enough evidence to answer that."}
return {
"answer": call_model(query=query, tenant_id=tenant_id),
"sources": data["sources"]
}
Testing the Integration
Run your Python RAG service:
uvicorn app:app --reload --port 8000
Then test it with a direct request:
import requests
resp = requests.post(
"http://localhost:8000/rag/query",
json={
"query": "What is our chargeback SLA?",
"tenant_id": "tenant_123"
},
timeout=15,
)
print(resp.status_code)
print(resp.json())
Expected output:
200
{
'context': 'Chargebacks must be reviewed within 5 business days...',
'sources': [{'source_url': 'https://docs.example.com/chargebacks'}]
}
If that passes, your Next.js route should be able to call the same endpoint through the Vercel AI SDK flow and stream a grounded response to the frontend.
Real-World Use Cases
- •
Customer support copilot
- •Answer questions about fees, limits, disputes, KYC status rules, and card replacement policies using approved documentation only.
- •
Internal ops assistant
- •Help analysts query runbooks for payment failures, settlement delays, AML escalation steps, and reconciliation procedures.
- •
Compliance Q&A layer
- •Let risk teams ask natural-language questions over policy PDFs and get answers with source citations tied to each regulated document segment.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit