LlamaIndex Tutorial (Python): handling long documents for advanced developers
This tutorial shows you how to load, chunk, index, and query long documents in LlamaIndex without blowing past context limits. You need this when your source files are too large for a single prompt, but you still want precise retrieval and stable answers.
What You'll Need
- •Python 3.10+
- •
llama-index - •An LLM API key, such as
OPENAI_API_KEY - •Optional but useful:
- •
python-dotenvfor local env loading - •A folder of long documents in
.txt,.md, or.pdfformat
- •
- •Basic familiarity with LlamaIndex concepts:
- •
Document - •
VectorStoreIndex - •retrievers and query engines
- •
Install the packages:
pip install llama-index python-dotenv
Step-by-Step
- •Start by loading your long documents into LlamaIndex. For production work, keep documents on disk and load them explicitly so you can control preprocessing, metadata, and file boundaries.
from pathlib import Path
from llama_index.core import SimpleDirectoryReader
docs_path = Path("./data")
documents = SimpleDirectoryReader(
input_dir=str(docs_path),
recursive=True,
).load_data()
print(f"Loaded {len(documents)} documents")
print(documents[0].metadata)
- •Next, split the documents into smaller chunks. Long-document handling is mostly about controlling chunk size and overlap so retrieval stays accurate without losing local context.
from llama_index.core import Settings
from llama_index.core.node_parser import SentenceSplitter
Settings.chunk_size = 1024
Settings.chunk_overlap = 150
Settings.node_parser = SentenceSplitter(
chunk_size=Settings.chunk_size,
chunk_overlap=Settings.chunk_overlap,
)
nodes = Settings.node_parser.get_nodes_from_documents(documents)
print(f"Created {len(nodes)} nodes")
print(nodes[0].get_content()[:300])
- •Build a vector index over the chunks. This gives you semantic retrieval across the full document set instead of forcing the model to read everything at once.
import os
from llama_index.core import VectorStoreIndex
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "")
index = VectorStoreIndex(nodes)
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("What are the main risks discussed in these documents?")
print(response)
- •For long documents, use a hierarchical summarization pass when you need broad coverage. This is useful when a question requires synthesis across many sections rather than exact passage retrieval.
from llama_index.core.indices.tree_summarize import TreeSummarize
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-4o-mini", temperature=0)
summarizer = TreeSummarize(llm=llm)
summary = summarizer.get_response(
"Summarize the key operational issues in these documents.",
[node.get_content() for node in nodes[:20]],
)
print(summary)
- •Add metadata filters or custom node metadata when you need document-level control. In real systems, this is how you separate policy docs from claims docs, or contracts from internal notes.
from llama_index.core import Document
tagged_documents = []
for doc in documents:
doc.metadata["source_type"] = "policy"
tagged_documents.append(doc)
tagged_nodes = Settings.node_parser.get_nodes_from_documents(tagged_documents)
tagged_index = VectorStoreIndex(tagged_nodes)
retriever = tagged_index.as_retriever(similarity_top_k=3)
results = retriever.retrieve("What exclusions are mentioned?")
for item in results:
print(item.score, item.node.metadata, item.node.get_content()[:200])
Testing It
Run a few queries that require different levels of granularity. One should ask for a specific clause or section; another should ask for a cross-document summary.
Check that retrieved chunks actually contain the evidence needed to answer the question. If answers are vague or miss details, reduce chunk size slightly or increase overlap.
Also inspect top-k retrieval results directly before trusting the query engine output. In production, retrieval quality matters more than prompt wording when dealing with long documents.
If your document set is large, test memory use and indexing time separately from answer quality. Those failure modes show up later than syntax errors.
Next Steps
- •Learn
AutoMergingRetrieverfor better multi-chunk context assembly. - •Add reranking with
SentenceTransformerRerankto improve precision on dense corpora. - •Store indexes in a persistent vector database instead of rebuilding them on every run.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit