How to Integrate OpenAI for lending with Pinecone for AI agents

By Cyprian AaronsUpdated 2026-04-21
openai-for-lendingpineconeai-agents

Combining OpenAI for lending with Pinecone gives you a practical pattern for building AI agents that can answer borrower questions, retrieve policy context, and reason over loan documents without stuffing everything into the prompt. OpenAI handles the language and decisioning layer, while Pinecone gives the agent fast semantic retrieval over underwriting guides, product terms, KYC notes, and historical case files.

Prerequisites

  • Python 3.10+
  • An OpenAI API key
  • A Pinecone API key and an existing index
  • Access to your lending content:
    • loan product docs
    • underwriting policies
    • FAQ knowledge base
    • customer support transcripts or case notes
  • Installed packages:
    • openai
    • pinecone
    • python-dotenv
pip install openai pinecone python-dotenv

Integration Steps

  1. Set your environment variables and initialize both clients.
import os
from dotenv import load_dotenv
from openai import OpenAI
from pinecone import Pinecone

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
PINECONE_INDEX_NAME = os.getenv("PINECONE_INDEX_NAME", "lending-agent-index")

client = OpenAI(api_key=OPENAI_API_KEY)
pc = Pinecone(api_key=PINECONE_API_KEY)
index = pc.Index(PINECONE_INDEX_NAME)
  1. Create embeddings for lending documents with OpenAI.

Use the embedding model to convert policy text into vectors before storing them in Pinecone.

documents = [
    {
        "id": "loan_policy_001",
        "text": "Personal loans above $25,000 require proof of income and two recent bank statements.",
        "metadata": {"type": "policy", "product": "personal_loan"}
    },
    {
        "id": "loan_policy_002",
        "text": "Applicants with a credit score below 620 require manual review by an underwriter.",
        "metadata": {"type": "policy", "product": "underwriting"}
    }
]

texts = [doc["text"] for doc in documents]

embedding_response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

vectors = []
for doc, emb in zip(documents, embedding_response.data):
    vectors.append({
        "id": doc["id"],
        "values": emb.embedding,
        "metadata": {**doc["metadata"], "text": doc["text"]}
    })
  1. Upsert the vectors into Pinecone.

This is where your agent’s long-term memory starts. Store both the vector and enough metadata to reconstruct the answer later.

index.upsert(vectors=vectors)

print(f"Upserted {len(vectors)} lending documents into Pinecone.")
  1. Retrieve relevant context at query time and feed it into OpenAI.

This is the core agent loop: embed the user question, query Pinecone, then pass retrieved context to the model.

user_question = "What documents are needed for a personal loan over $25,000?"

query_embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input=user_question
).data[0].embedding

results = index.query(
    vector=query_embedding,
    top_k=3,
    include_metadata=True
)

context_chunks = []
for match in results["matches"]:
    context_chunks.append(match["metadata"]["text"])

context = "\n".join(context_chunks)

response = client.responses.create(
    model="gpt-4.1-mini",
    input=f"""
You are a lending assistant.
Answer only using the provided context.

Context:
{context}

Question:
{user_question}
"""
)

print(response.output_text)
  1. Wrap retrieval + generation into a reusable agent function.

In production, you do not want retrieval logic scattered across handlers. Keep it in one function so your agent can be called from an API route, queue worker, or orchestration layer.

def answer_lending_question(question: str) -> str:
    q_emb = client.embeddings.create(
        model="text-embedding-3-small",
        input=question
    ).data[0].embedding

    matches = index.query(
        vector=q_emb,
        top_k=3,
        include_metadata=True
    )["matches"]

    context = "\n".join(m["metadata"]["text"] for m in matches)

    result = client.responses.create(
        model="gpt-4.1-mini",
        input=f"""
You are a lending operations assistant.
Use only this context to answer.

Context:
{context}

Question:
{question}
"""
    )

    return result.output_text


print(answer_lending_question("When does a borrower need manual underwriting review?"))

Testing the Integration

Run a simple end-to-end check: insert one policy snippet, query it, then confirm the response cites the right rule.

test_question = "What happens if the applicant's credit score is below 620?"
answer = answer_lending_question(test_question)

print("QUESTION:", test_question)
print("ANSWER:", answer)

Expected output:

QUESTION: What happens if the applicant's credit score is below 620?
ANSWER: Applicants with a credit score below 620 require manual review by an underwriter.

If you get an unrelated answer, check these first:

  • The embedding model used for indexing and querying is the same family.
  • Your Pinecone index dimension matches text-embedding-3-small.
  • You included include_metadata=True.
  • Your prompt restricts the model to retrieved context only.

Real-World Use Cases

  • Loan policy assistant

    • Let internal ops teams ask questions like “What docs are needed for SME loans above $100k?” and get answers grounded in current policy docs.
  • Borrower support agent

    • Build a chat agent that explains eligibility rules, required paperwork, repayment terms, and escalation paths without hand-coded decision trees.
  • Underwriting copilot

    • Retrieve similar historical cases, compare them against current application data, and help underwriters triage applications faster.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides