LlamaIndex Tutorial (Python): handling long documents for advanced developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexhandling-long-documents-for-advanced-developerspython

This tutorial shows you how to load, chunk, index, and query long documents in LlamaIndex without blowing past context limits. You need this when your source files are too large for a single prompt, but you still want precise retrieval and stable answers.

What You'll Need

  • Python 3.10+
  • llama-index
  • An LLM API key, such as OPENAI_API_KEY
  • Optional but useful:
    • python-dotenv for local env loading
    • A folder of long documents in .txt, .md, or .pdf format
  • Basic familiarity with LlamaIndex concepts:
    • Document
    • VectorStoreIndex
    • retrievers and query engines

Install the packages:

pip install llama-index python-dotenv

Step-by-Step

  1. Start by loading your long documents into LlamaIndex. For production work, keep documents on disk and load them explicitly so you can control preprocessing, metadata, and file boundaries.
from pathlib import Path

from llama_index.core import SimpleDirectoryReader

docs_path = Path("./data")
documents = SimpleDirectoryReader(
    input_dir=str(docs_path),
    recursive=True,
).load_data()

print(f"Loaded {len(documents)} documents")
print(documents[0].metadata)
  1. Next, split the documents into smaller chunks. Long-document handling is mostly about controlling chunk size and overlap so retrieval stays accurate without losing local context.
from llama_index.core import Settings
from llama_index.core.node_parser import SentenceSplitter

Settings.chunk_size = 1024
Settings.chunk_overlap = 150
Settings.node_parser = SentenceSplitter(
    chunk_size=Settings.chunk_size,
    chunk_overlap=Settings.chunk_overlap,
)

nodes = Settings.node_parser.get_nodes_from_documents(documents)
print(f"Created {len(nodes)} nodes")
print(nodes[0].get_content()[:300])
  1. Build a vector index over the chunks. This gives you semantic retrieval across the full document set instead of forcing the model to read everything at once.
import os

from llama_index.core import VectorStoreIndex

os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "")

index = VectorStoreIndex(nodes)
query_engine = index.as_query_engine(similarity_top_k=5)

response = query_engine.query("What are the main risks discussed in these documents?")
print(response)
  1. For long documents, use a hierarchical summarization pass when you need broad coverage. This is useful when a question requires synthesis across many sections rather than exact passage retrieval.
from llama_index.core.indices.tree_summarize import TreeSummarize
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o-mini", temperature=0)
summarizer = TreeSummarize(llm=llm)

summary = summarizer.get_response(
    "Summarize the key operational issues in these documents.",
    [node.get_content() for node in nodes[:20]],
)

print(summary)
  1. Add metadata filters or custom node metadata when you need document-level control. In real systems, this is how you separate policy docs from claims docs, or contracts from internal notes.
from llama_index.core import Document

tagged_documents = []
for doc in documents:
    doc.metadata["source_type"] = "policy"
    tagged_documents.append(doc)

tagged_nodes = Settings.node_parser.get_nodes_from_documents(tagged_documents)
tagged_index = VectorStoreIndex(tagged_nodes)

retriever = tagged_index.as_retriever(similarity_top_k=3)
results = retriever.retrieve("What exclusions are mentioned?")
for item in results:
    print(item.score, item.node.metadata, item.node.get_content()[:200])

Testing It

Run a few queries that require different levels of granularity. One should ask for a specific clause or section; another should ask for a cross-document summary.

Check that retrieved chunks actually contain the evidence needed to answer the question. If answers are vague or miss details, reduce chunk size slightly or increase overlap.

Also inspect top-k retrieval results directly before trusting the query engine output. In production, retrieval quality matters more than prompt wording when dealing with long documents.

If your document set is large, test memory use and indexing time separately from answer quality. Those failure modes show up later than syntax errors.

Next Steps

  • Learn AutoMergingRetriever for better multi-chunk context assembly.
  • Add reranking with SentenceTransformerRerank to improve precision on dense corpora.
  • Store indexes in a persistent vector database instead of rebuilding them on every run.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides