LlamaIndex Tutorial (Python): building a RAG pipeline for beginners
This tutorial shows you how to build a basic Retrieval-Augmented Generation (RAG) pipeline in Python with LlamaIndex. By the end, you’ll have a working app that loads documents, indexes them, retrieves relevant chunks, and answers questions using an LLM.
What You'll Need
- •Python 3.10+
- •A virtual environment
- •
llama-index - •An OpenAI API key
- •A small set of local text files or PDFs to index
- •Optional:
python-dotenvfor loading secrets from.env
Install the core packages:
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai python-dotenv
Set your API key:
export OPENAI_API_KEY="your-key-here"
Or put it in a .env file:
OPENAI_API_KEY=your-key-here
Step-by-Step
- •Start by loading your documents from disk. For beginners, plain
.txtfiles are easiest because they avoid extra parsing dependencies and make it obvious what is being indexed.
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data").load_data()
print(f"Loaded {len(documents)} documents")
print(documents[0].text[:500])
- •Next, configure the LLM and embedding model. LlamaIndex uses embeddings for retrieval and an LLM for response generation, so both need to be set explicitly for a clean beginner setup.
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
Settings.llm = OpenAI(model="gpt-4o-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
- •Build the index from the loaded documents. Under the hood, LlamaIndex chunks the text, embeds each chunk, and stores it so your query can retrieve the most relevant pieces later.
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
print("Index built successfully")
- •Create a query engine and ask a question. This is the RAG part: the engine retrieves matching chunks from the index and passes them into the LLM to generate an answer.
query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query(
"What are the main points discussed in these documents?"
)
print(response)
- •If you want better control over retrieval quality, inspect the source nodes returned with each answer. This helps you debug whether bad answers come from poor document quality or weak retrieval.
response = query_engine.query("Summarize the document content.")
print("Answer:")
print(response.response)
print("\nSources:")
for node in response.source_nodes:
print("-" * 40)
print(node.node.text[:300])
Testing It
Create a ./data folder and add one or two short .txt files with clear content, then run the script end to end. Ask questions that should obviously be answered from those files, like “What is this document about?” or “List the main topics.”
If the answer is vague or wrong, check three things first: whether documents were actually loaded, whether your API key is valid, and whether the text files contain enough relevant content to retrieve. You can also lower or raise similarity_top_k to see how many chunks are being passed into the prompt.
A good sanity check is to ask for something specific that appears in only one file. If the source snippets match your expectation but the answer is still off, your issue is usually prompt quality or model choice rather than retrieval.
Next Steps
- •Add persistent storage with a vector database like Chroma or Qdrant.
- •Replace plain text files with PDF ingestion and document cleaning.
- •Learn how to customize chunk size, metadata filters, and retrievers for better RAG quality.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit