LlamaIndex Tutorial (Python): building prompt templates for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexbuilding-prompt-templates-for-intermediate-developerspython

This tutorial shows you how to build reusable prompt templates in LlamaIndex with Python, then wire them into a query engine so your outputs stay consistent. You need this when raw prompting starts drifting across requests and you want a clean way to control tone, structure, and task-specific instructions.

What You'll Need

  • Python 3.10+
  • llama-index installed
  • An OpenAI API key set as OPENAI_API_KEY
  • A small local text file or sample documents to index
  • Basic familiarity with VectorStoreIndex, Document, and QueryEngine

Step-by-Step

  1. Start by installing the package and setting your API key. If you already have LlamaIndex basics down, this is just the foundation for using prompt templates on top of an index.
pip install llama-index
export OPENAI_API_KEY="your-api-key"
  1. Create a small index from sample documents. We’ll use a simple document set so you can focus on the prompt template mechanics instead of ingestion complexity.
from llama_index.core import VectorStoreIndex, Document

docs = [
    Document(text="LlamaIndex helps developers build LLM applications over private data."),
    Document(text="Prompt templates make outputs more predictable and easier to reuse."),
]

index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
  1. Define a custom prompt template for the response synthesis step. This is where you control the shape of the answer, including tone, format, and constraints.
from llama_index.core import PromptTemplate

custom_prompt = PromptTemplate(
    "You are a senior Python engineer.\n"
    "Answer the question using only the context below.\n"
    "Keep it concise and practical.\n\n"
    "Context:\n{context_str}\n\n"
    "Question: {query_str}\n"
    "Answer:"
)
  1. Attach that template to the query engine. LlamaIndex lets you override built-in prompts without rewriting the whole retrieval pipeline.
from llama_index.core.indices.query.query_transform.base import BaseQueryTransform

query_engine.update_prompts(
    {
        "response_synthesizer:text_qa_template": custom_prompt,
    }
)

response = query_engine.query("What does LlamaIndex help developers build?")
print(response)
  1. Build a second template for more structured output. In practice, this is useful when you want consistent formatting for downstream parsing or human review.
structured_prompt = PromptTemplate(
    "You are answering for an internal engineering team.\n"
    "Use exactly 3 bullet points.\n"
    "Each bullet must start with a verb.\n\n"
    "Context:\n{context_str}\n\n"
    "Question: {query_str}\n"
    "Bullets:"
)

query_engine.update_prompts(
    {
        "response_synthesizer:text_qa_template": structured_prompt,
    }
)

response = query_engine.query("Why use prompt templates in LlamaIndex?")
print(response)
  1. If you want to reuse templates across multiple engines, keep them in a separate module. That gives you one place to manage style, compliance language, and output rules across your app.
# prompts.py
from llama_index.core import PromptTemplate

def build_support_prompt() -> PromptTemplate:
    return PromptTemplate(
        "You are a support assistant for developers.\n"
        "Use the context only.\n"
        "If the answer is not in context, say 'I don't know.'\n\n"
        "Context:\n{context_str}\n\n"
        "Question: {query_str}\n"
        "Answer:"
    )

Testing It

Run a few queries that should produce clearly different answers if your template is working. For example, compare a plain query engine response with one using your custom instruction set, then check whether it follows formatting rules like “exactly 3 bullet points” or “use only context.”

Also test failure behavior. Ask something outside your indexed content and confirm the model either says it does not know or stays constrained by your prompt rules instead of hallucinating freely.

If you’re using this in production, log both the retrieved context and final rendered prompt during debugging. That makes it much easier to see whether bad output comes from retrieval quality or from weak prompt design.

Next Steps

  • Learn how to use ChatPromptTemplate for multi-message chat workflows.
  • Add prompt versioning so different teams can ship safe changes without breaking output contracts.
  • Combine prompt templates with response parsing for JSON or schema-driven outputs.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides