How to Integrate Azure OpenAI for banking with CosmosDB for RAG
Azure OpenAI plus Cosmos DB is a practical stack for bank-grade RAG systems. You use Azure OpenAI to generate grounded answers, and Cosmos DB to store and retrieve policy docs, product terms, KYC rules, and support knowledge with low-latency vector search.
For banking, this matters because the assistant needs controlled retrieval, auditability, and data residency options. You are not just “chatting with documents”; you are building an agent that can answer customer and ops questions from approved internal sources.
Prerequisites
- •An Azure subscription with:
- •Azure OpenAI resource
- •Azure Cosmos DB for NoSQL account
- •Deployed Azure OpenAI model:
- •Chat model like
gpt-4oorgpt-4.1 - •Embedding model like
text-embedding-3-largeortext-embedding-3-small
- •Chat model like
- •A Cosmos DB database and container with vector search enabled
- •Python 3.10+
- •Installed packages:
- •
openai - •
azure-cosmos - •
python-dotenv
- •
- •Environment variables set:
- •
AZURE_OPENAI_ENDPOINT - •
AZURE_OPENAI_API_KEY - •
AZURE_OPENAI_CHAT_DEPLOYMENT - •
AZURE_OPENAI_EMBEDDING_DEPLOYMENT - •
COSMOS_ENDPOINT - •
COSMOS_KEY - •
COSMOS_DB_NAME - •
COSMOS_CONTAINER_NAME
- •
Integration Steps
- •
Set up your clients and configuration.
Keep your connection details out of code. For banking systems, use Key Vault in production;
.envis fine for local development.import os from dotenv import load_dotenv from openai import AzureOpenAI from azure.cosmos import CosmosClient load_dotenv() azure_openai_client = AzureOpenAI( api_key=os.environ["AZURE_OPENAI_API_KEY"], azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], api_version="2024-06-01" ) cosmos_client = CosmosClient( url=os.environ["COSMOS_ENDPOINT"], credential=os.environ["COSMOS_KEY"] ) db = cosmos_client.get_database_client(os.environ["COSMOS_DB_NAME"]) container = db.get_container_client(os.environ["COSMOS_CONTAINER_NAME"]) - •
Create embeddings for your banking documents.
Use the embedding deployment to convert policy text into vectors. In RAG, this is the indexable representation you store in Cosmos DB.
def embed_text(text: str) -> list[float]: response = azure_openai_client.embeddings.create( model=os.environ["AZURE_OPENAI_EMBEDDING_DEPLOYMENT"], input=text ) return response.data[0].embedding sample_doc = { "id": "bank-policy-001", "doc_type": "credit_card_policy", "title": "Credit Card Fee Policy", "content": "Annual fees are waived for premium customers with monthly deposits above threshold.", } sample_doc["embedding"] = embed_text(sample_doc["content"]) print(len(sample_doc["embedding"])) - •
Store documents in Cosmos DB with their vectors.
Your container should be configured with a vector policy and indexing policy that matches your embedding dimensions. The exact setup depends on your Cosmos DB API version, but the application-side write is straightforward.
container.upsert_item({ "id": sample_doc["id"], "doc_type": sample_doc["doc_type"], "title": sample_doc["title"], "content": sample_doc["content"], "embedding": sample_doc["embedding"] }) - •
Retrieve top matches using vector search.
Query Cosmos DB with the user question embedding. For RAG, you only send the top retrieved chunks to Azure OpenAI, not the whole corpus.
def retrieve_context(query: str, top_k: int = 3) -> list[dict]: query_embedding = embed_text(query) query_spec = { "query": """ SELECT TOP @top_k c.id, c.title, c.content FROM c ORDER BY VectorDistance(c.embedding, @query_embedding) """, "parameters": [ {"name": "@top_k", "value": top_k}, {"name": "@query_embedding", "value": query_embedding}, ] } return list(container.query_items( query=query_spec["query"], parameters=query_spec["parameters"], enable_cross_partition_query=True )) hits = retrieve_context("What are the annual fee waiver rules?") for hit in hits: print(hit["title"], hit["content"]) - •
Generate a grounded answer with Azure OpenAI.
Pass retrieved context into the chat completion call. In banking, keep the prompt strict: answer only from retrieved content and say when data is insufficient.
def answer_question(question: str) -> str: docs = retrieve_context(question) context_block = "\n\n".join( f"[{i+1}] {doc['title']}: {doc['content']}" for i, doc in enumerate(docs) ) messages = [ { "role": "system", "content": ( "You are a banking assistant. Answer only using the provided context. " "If the answer is not in the context, say you do not have enough information." ) }, { "role": "user", "content": f"Context:\n{context_block}\n\nQuestion: {question}" } ] response = azure_openai_client.chat.completions.create( model=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT"], messages=messages, temperature=0.2, max_tokens=300 ) return response.choices[0].message.content print(answer_question("When is the annual fee waived?"))
Testing the Integration
Run a simple end-to-end test with one known document and one question that should match it.
def test_rag_flow():
question = "Who gets an annual fee waiver?"
answer = answer_question(question)
print("QUESTION:", question)
print("ANSWER:", answer)
test_rag_flow()
Expected output:
QUESTION: Who gets an annual fee waiver?
ANSWER: Annual fees are waived for premium customers with monthly deposits above threshold.
If retrieval works but generation does not, check:
- •The embedding deployment name matches what you deployed in Azure OpenAI
- •The Cosmos DB container has vector indexing enabled
- •The prompt includes only retrieved context
- •The chat deployment name is correct
Real-World Use Cases
- •Customer service assistant for card fees, transfer limits, dispute timelines, and loan eligibility.
- •Internal ops copilot for compliance teams querying policy manuals, AML procedures, and escalation rules.
- •Branch staff assistant that answers product questions from approved knowledge bases without exposing raw backend systems.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit