How to Integrate Azure OpenAI for retail banking with CosmosDB for multi-agent systems

By Cyprian AaronsUpdated 2026-04-21

azure-openai-for-retail-bankingcosmosdbmulti-agent-systems

Combining Azure OpenAI with Cosmos DB gives you a practical backbone for retail banking agent systems: one service handles language reasoning, the other stores durable state, customer context, and agent memory. That matters when you need multi-agent workflows for things like fraud triage, loan pre-screening, dispute handling, or customer support where every agent needs shared context and auditability.

Prerequisites

•
An Azure subscription with:
- •Azure OpenAI resource deployed
- •Azure Cosmos DB account provisioned
•Python 3.10+
•
Installed packages:
- •openai
- •azure-cosmos
- •python-dotenv
•Azure OpenAI deployment name for a chat model, such as gpt-4o-mini
•Cosmos DB database and container ready for agent state storage
•
Environment variables configured:
- •AZURE_OPENAI_ENDPOINT
- •AZURE_OPENAI_API_KEY
- •AZURE_OPENAI_API_VERSION
- •COSMOS_ENDPOINT
- •COSMOS_KEY
- •COSMOS_DATABASE
- •COSMOS_CONTAINER

Integration Steps

1) Install dependencies and load configuration

Keep config out of code. For banking systems, that’s non-negotiable.

pip install openai azure-cosmos python-dotenv

import os
from dotenv import load_dotenv

load_dotenv()

AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
AZURE_OPENAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION", "2024-06-01")

COSMOS_ENDPOINT = os.getenv("COSMOS_ENDPOINT")
COSMOS_KEY = os.getenv("COSMOS_KEY")
COSMOS_DATABASE = os.getenv("COSMOS_DATABASE", "banking_agents")
COSMOS_CONTAINER = os.getenv("COSMOS_CONTAINER", "agent_memory")

2) Create the Azure OpenAI client

Use the Azure OpenAI client to generate agent responses. In retail banking, this usually powers intent extraction, customer-facing explanations, or internal routing decisions.

from openai import AzureOpenAI

client = AzureOpenAI(
    api_key=AZURE_OPENAI_API_KEY,
    api_version=AZURE_OPENAI_API_VERSION,
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
)

DEPLOYMENT_NAME = "gpt-4o-mini"

A typical call uses client.chat.completions.create(...). That is the method you want for multi-agent orchestration because it supports structured prompts and role separation.

def classify_customer_request(message: str) -> str:
    response = client.chat.completions.create(
        model=DEPLOYMENT_NAME,
        messages=[
            {"role": "system", "content": "You are a retail banking triage agent."},
            {"role": "user", "content": message},
        ],
        temperature=0.2,
    )
    return response.choices[0].message.content.strip()

3) Connect to Cosmos DB and create a container for agent memory

Cosmos DB is where you store conversation state, routing decisions, customer profile fragments, and tool outputs. For multi-agent systems, this becomes the shared memory layer.

from azure.cosmos import CosmosClient, PartitionKey

cosmos_client = CosmosClient(COSMOS_ENDPOINT, credential=COSMOS_KEY)
database = cosmos_client.create_database_if_not_exists(id=COSMOS_DATABASE)

container = database.create_container_if_not_exists(
    id=COSMOS_CONTAINER,
    partition_key=PartitionKey(path="/customerId"),
    offer_throughput=400
)

Use a partition key that matches your access pattern. For retail banking agents, customerId is usually the right choice because most reads and writes are scoped to one customer session.

4) Write agent state after each decision

Store both the model output and the raw request so downstream agents can reuse it without re-running inference.

from datetime import datetime, timezone
import uuid

def save_agent_memory(customer_id: str, user_message: str, agent_result: str):
    item = {
        "id": str(uuid.uuid4()),
        "customerId": customer_id,
        "timestamp": datetime.now(timezone.utc).isoformat(),
        "userMessage": user_message,
        "agentResult": agent_result,
        "source": "azure-openai",
        "status": "processed",
    }
    container.upsert_item(item)
    return item

For multi-agent flows, this pattern lets one agent write its output and another agent consume it later. That’s how you avoid brittle in-memory orchestration.

5) Read memory back into the next agent prompt

This is where the integration becomes useful. Pull recent context from Cosmos DB and inject it into the next prompt so each agent sees prior decisions.

def get_recent_memory(customer_id: str):
    query = """
    SELECT TOP 5 c.userMessage, c.agentResult, c.timestamp
    FROM c
    WHERE c.customerId = @customerId AND c.status = 'processed'
    ORDER BY c.timestamp DESC
    """
    params = [{"name": "@customerId", "value": customer_id}]
    items = list(container.query_items(
        query=query,
        parameters=params,
        enable_cross_partition_query=False
    ))
    return items[::-1]

Then use that memory in a second OpenAI call:

def generate_next_agent_response(customer_id: str, new_message: str):
    memory = get_recent_memory(customer_id)
    memory_text = "\n".join(
        f"- User: {m['userMessage']} | Agent: {m['agentResult']}"
        for m in memory
    )

    response = client.chat.completions.create(
        model=DEPLOYMENT_NAME,
        messages=[
            {"role": "system", "content": "You are a banking operations agent using prior case history."},
            {"role": "user", "content": f"Conversation history:\n{memory_text}\n\nNew message: {new_message}"},
        ],
        temperature=0.2,
    )
    return response.choices[0].message.content.strip()

Testing the Integration

Run a simple end-to-end check: classify a banking request, persist it to Cosmos DB, then read it back and feed it into another OpenAI call.

if __name__ == "__main__":
    customer_id = "cust-10001"
    message = "I noticed two card charges I don't recognize on my debit account."

    result = classify_customer_request(message)
    saved = save_agent_memory(customer_id, message, result)

    print("Saved item id:", saved["id"])
    print("Agent result:", result)

    memory = get_recent_memory(customer_id)
    print("Memory count:", len(memory))

    next_response = generate_next_agent_response(
        customer_id,
        "Please draft the next action for dispute handling."
    )
    print("Next agent response:", next_response)

Expected output:

Saved item id: 8f4c1a7d-1b2e-4f77-bb6c-d9f9d8f2d1a2
Agent result: This looks like a card dispute case. Route to fraud operations and verify transaction timestamps.
Memory count: 1
Next agent response: Create a dispute case, confirm transaction details with the customer, and escalate to fraud review.

Real-World Use Cases

•
Fraud triage agents
- •One agent classifies suspicious activity.
- •Another checks account history stored in Cosmos DB.
- •A third drafts a compliant customer response.
•
Loan application assistants
- •One agent extracts income and employment details.
- •Another stores application state and missing fields.
- •A final agent generates follow-up questions based on prior steps.
•
Customer service orchestration
- •One agent handles intent detection.
- •Another retrieves prior cases from Cosmos DB.
- •A supervisor agent decides whether to escalate to human support.

This setup works because Azure OpenAI handles reasoning while Cosmos DB handles persistence. In retail banking multi-agent systems, that split is what keeps your architecture usable under real workload pressure.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit