LlamaIndex Tutorial (Python): building prompt templates for advanced developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexbuilding-prompt-templates-for-advanced-developerspython

This tutorial shows you how to build reusable prompt templates in LlamaIndex with Python, then wire them into query and chat workflows that you can actually ship. You need this when the default prompts are too generic, your domain language matters, or you want consistent output shape across retrieval and tool-using agents.

What You'll Need

  • Python 3.10+
  • llama-index
  • An LLM provider package, such as:
    • llama-index-llms-openai
  • An embedding package if you plan to use retrieval:
    • llama-index-embeddings-openai
  • An OpenAI API key set in your environment:
    • export OPENAI_API_KEY="your-key"
  • A basic understanding of:
    • VectorStoreIndex
    • QueryEngine
    • PromptTemplate

Step-by-Step

  1. Start by installing the packages and wiring up the LLM objects explicitly. For advanced prompt work, avoid relying on defaults so you can control model behavior and template rendering.
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai
import os

from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY", "")

Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
  1. Create a prompt template for the exact output contract you want. The key move here is to make the format strict enough that downstream code can parse it reliably.
from llama_index.core.prompts import PromptTemplate

analysis_prompt = PromptTemplate(
    """
You are a senior insurance analyst.

Context:
{context_str}

Question:
{query_str}

Return:
1. A direct answer
2. A risk note
3. A one-line recommendation

Use concise language.
"""
)

print(analysis_prompt.template)
  1. Build a small index and attach your custom template to the query engine. This is where LlamaIndex becomes useful: the template gets injected into retrieval-backed prompting without you hand-crafting every prompt string.
from llama_index.core import Document, VectorStoreIndex

docs = [
    Document(text="Policy A covers water damage but excludes flood damage."),
    Document(text="Policy B includes flood coverage with a $5,000 deductible."),
]

index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(text_qa_template=analysis_prompt)

response = query_engine.query("Which policy is better for flood exposure?")
print(str(response))
  1. If you need more control, use separate templates for system-style instructions and formatting constraints. In production, I usually split “how to think” from “how to answer” so I can swap one without breaking the other.
from llama_index.core.prompts import PromptTemplate

system_prompt = PromptTemplate(
    "You are a compliance assistant for banking operations."
)

format_prompt = PromptTemplate(
    """
Answer in this JSON-like structure:

{
  "answer": "...",
  "risk": "...",
  "next_step": "..."
}
"""
)

combined_text = (
    system_prompt.format()
    + "\n\n"
    + format_prompt.format()
    + "\n\nContext:\n{context_str}\n\nQuestion:\n{query_str}"
)

composed_prompt = PromptTemplate(combined_text)
print(composed_prompt.format(context_str="Sample context", query_str="Sample question"))
  1. Use chat-style templates when your workflow is conversational rather than retrieval-first. LlamaIndex supports message-based prompting, which is cleaner when you want role separation and multi-turn behavior.
from llama_index.core.llms import ChatMessage
from llama_index.core.llms.types import MessageRole

messages = [
    ChatMessage(role=MessageRole.SYSTEM, content="You are a precise claims assistant."),
    ChatMessage(role=MessageRole.USER, content="Summarize this claim in two bullets."),
]

chat_response = Settings.llm.chat(messages)
print(chat_response.message.content)
  1. Validate your template before shipping it by rendering it with real values and checking for missing variables or weak instructions. If your prompt depends on structured output, test that the response shape stays stable across multiple inputs.
test_render = analysis_prompt.format(
    context_str="Policy B includes flood coverage with a $5,000 deductible.",
    query_str="What should I recommend?"
)

print(test_render)

for q in [
    "Is flood covered?",
    "What is the deductible?",
]:
    print("\n---")
    print(query_engine.query(q))

Testing It

Run the script end-to-end and confirm three things: the template renders without missing placeholders, the query engine returns answers grounded in your documents, and the output style stays consistent across repeated queries. If you changed temperature or model settings, rerun the same question twice and check whether formatting remains stable.

For stricter validation, add assertions around rendered prompt text and response structure. In production systems, I also log both the final rendered prompt and model output so prompt regressions are easy to trace.

Next Steps

  • Learn how to override response_synthesizer prompts for more control over final answer generation.
  • Add structured outputs with Pydantic models so your prompts return machine-readable data.
  • Build per-domain prompt libraries for claims, underwriting, fraud review, or KYC workflows.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides