What is top-p sampling in AI Agents? A Guide for compliance officers in banking
Top-p sampling is a method AI agents use to choose the next word by considering only the smallest set of likely options whose combined probability reaches a chosen threshold, called p. In practice, it lets the model stay creative without opening the door to low-probability, high-risk outputs.
How It Works
When an AI agent generates text, it assigns a probability to each possible next token. Top-p sampling, also called nucleus sampling, sorts those tokens from most likely to least likely and keeps adding them until their total probability reaches the threshold p, such as 0.9 or 0.95.
Think of it like a bank’s fraud review queue.
- •You do not review every transaction in the system.
- •You focus on the subset that accounts for most of the risk signal.
- •You stop once you have enough coverage to make a decision efficiently.
Top-p works the same way. Instead of forcing the model to always pick the single most likely word, it samples from a controlled “risk pool” of plausible next tokens.
If p = 0.9, the model might keep:
- •“approved”
- •“declined”
- •“pending”
- •“requires manual review”
But it would exclude oddball tokens that are technically possible but unlikely, such as a random unrelated phrase or an incoherent continuation.
That makes top-p different from greedy decoding:
| Method | Behavior | Output Risk |
|---|---|---|
| Greedy decoding | Always picks the highest-probability token | Low variability, can be repetitive |
| Top-k sampling | Picks from the top k tokens only | More flexible, but k is fixed |
| Top-p sampling | Picks from tokens whose cumulative probability reaches p | Adaptive and usually more stable |
For compliance teams, the important point is this: top-p does not make an AI agent deterministic. It introduces controlled randomness. That means two runs of the same prompt may produce slightly different wording, even if both are valid.
Why It Matters
Compliance officers should care because top-p affects how predictable an AI agent is in regulated workflows.
- •
It changes output consistency
- •Lower
pvalues usually make responses more conservative and repeatable. - •Higher
pvalues increase variation, which can be useful for drafting but risky for customer-facing decisions.
- •Lower
- •
It influences hallucination risk
- •A wider token pool can allow less likely phrasing or unsupported claims.
- •In banking use cases, that matters when an agent summarizes policy terms, explains loan decisions, or drafts customer communications.
- •
It affects auditability
- •If an agent’s output varies run to run, you need stronger logging around prompts, parameters, and model versioning.
- •Without that, it becomes harder to explain why one response differed from another.
- •
It should be tuned by use case
- •A chat assistant for internal staff may tolerate a higher
p. - •A customer-facing compliance workflow should usually use tighter settings and stronger guardrails.
- •A chat assistant for internal staff may tolerate a higher
The practical takeaway: top-p is not just a model setting. It is part of your control environment.
Real Example
A retail bank deploys an AI agent to draft first-line responses for credit card dispute cases.
The agent must summarize:
- •dispute reason
- •transaction date
- •provisional credit status
- •next steps under policy
The bank configures the model with:
- •
top_p = 0.8 - •low temperature
- •approved response templates
- •mandatory retrieval from policy documents
Here is what happens in one case:
A customer disputes a hotel charge and asks whether provisional credit will be reversed if the merchant responds late. The model generates several possible next words after “The provisional credit will remain…”:
- •“in place”
- •“active”
- •“temporarily applied”
- •“reversed immediately”
- •“reviewed by our team”
With top-p at 0.8, the model samples only from the most probable options like “in place” or “active.” It excludes unusual or risky continuations like “reversed immediately,” which could be misleading if stated without context.
The resulting draft says:
The provisional credit will remain in place while we complete our review under card network rules. If we receive supporting documentation from the merchant, we will assess whether the charge should stand or be reversed.
That is useful because it stays within policy language and avoids speculative phrasing. If top-p were set too high and guardrails were weak, the model might produce more varied wording that sounds confident but drifts from approved compliance language.
Related Concepts
- •
Temperature
- •Controls randomness directly.
- •Often used together with top-p; lower temperature makes outputs more conservative.
- •
Top-k sampling
- •Limits choices to a fixed number of candidates.
- •Simpler than top-p, but less adaptive across different contexts.
- •
Greedy decoding
- •Always selects the most likely token.
- •Good for consistency, bad for variety and sometimes bad for quality.
- •
Prompt guardrails
- •Rules that constrain what the model can say.
- •Important when outputs must stay within policy or regulatory boundaries.
- •
Model determinism and reproducibility
- •Whether repeated runs produce identical outputs.
- •Critical for audit trails, testing, and incident review in banking systems.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit