LlamaIndex Tutorial (Python): building a RAG pipeline for intermediate developers
This tutorial builds a working Retrieval-Augmented Generation (RAG) pipeline with LlamaIndex in Python, from document loading to query-time retrieval and answer generation. You’d use this when you need grounded answers over your own documents instead of relying on a model’s general knowledge.
What You'll Need
- •Python 3.10+
- •A virtual environment
- •
llama-index - •An embedding model and LLM API key
- •
OPENAI_API_KEYset in your environment - •A small document set to index, such as PDFs or text files
Install the packages first:
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai pypdf
Set your API key:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Start by loading your source documents. For a first pass, keep it simple: put a few
.txtfiles in a local folder and let LlamaIndex read them into memory.
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data").load_data()
print(f"Loaded {len(documents)} documents")
print(documents[0].text[:500])
- •Next, configure the LLM and embedding model explicitly. This makes the pipeline predictable in production and avoids relying on hidden defaults.
import os
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
os.environ["OPENAI_API_KEY"] = os.environ["OPENAI_API_KEY"]
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
- •Build the index from your documents. Under the hood, LlamaIndex chunks the text, embeds each chunk, and stores it in a vector index for retrieval.
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
index.storage_context.persist(persist_dir="./storage")
print("Index built and persisted")
- •Create a query engine and ask a question. This is where retrieval happens: relevant chunks are fetched first, then passed to the LLM to produce an answer grounded in your data.
query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query(
"What are the main topics covered in these documents?"
)
print(response)
- •If you want better control over context size and traceability, inspect retrieved nodes before generating the final answer. This is useful when debugging bad retrieval or hallucinated answers.
retriever = index.as_retriever(similarity_top_k=3)
nodes = retriever.retrieve("What are the main topics covered in these documents?")
for i, node in enumerate(nodes, start=1):
print(f"\n--- Match {i} ---")
print(node.node.get_content()[:400])
print(f"Score: {node.score:.4f}")
- •Add a chat-style interface only after retrieval works well. For most RAG systems, query quality matters more than chat memory at the start.
chat_engine = index.as_chat_engine(chat_mode="condense_question", similarity_top_k=3)
print(chat_engine.chat("Summarize the key ideas from the documents"))
print(chat_engine.chat("Which part discusses implementation details?"))
Testing It
Run the script against a small folder of plain text files first. You should see documents load successfully, an index persist to disk, and answers that quote or reflect content from your files rather than generic model output.
If retrieval looks weak, inspect the top matches with retrieve() before blaming generation. In practice, most bad RAG behavior comes from poor chunking, weak embeddings, or irrelevant source content.
A good sanity check is to ask a question that only exists in one document. If the response can point to that specific topic without drifting into unrelated text, your pipeline is working.
Next Steps
- •Add metadata filtering so you can scope retrieval by customer, policy type, region, or date.
- •Replace
SimpleDirectoryReaderwith loaders for PDFs, HTML pages, SharePoint exports, or S3 objects. - •Tune chunk size and overlap, then compare retrieval quality with different embedding models.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit