LlamaIndex Tutorial (Python): handling long documents for beginners

By Cyprian AaronsUpdated 2026-04-21

llamaindexhandling-long-documents-for-beginnerspython

This tutorial shows you how to load a long document in Python, split it into manageable chunks, index it with LlamaIndex, and ask questions over the content without blowing past model context limits. You need this when your source is too large for a single prompt, like policy manuals, contracts, research papers, or claim documents.

What You'll Need

•Python 3.10+
•llama-index
•An OpenAI API key
•A long text file to test with, such as document.txt
•Basic familiarity with Python and LlamaIndex query engines

Install the package:

pip install llama-index

Set your API key in your shell:

export OPENAI_API_KEY="your-api-key-here"

Step-by-Step

•Start by loading a long document from disk. For beginners, plain text is easiest because it avoids extra parsing complexity and lets you focus on the indexing flow.

from pathlib import Path
from llama_index.core import SimpleDirectoryReader

docs_path = Path("data")
docs_path.mkdir(exist_ok=True)

sample_text = """
LlamaIndex helps you build retrieval systems over large documents.
When documents get too long, you should split them into chunks before indexing.
This allows the model to retrieve only the relevant parts at query time.
""" * 50

(docs_path / "document.txt").write_text(sample_text, encoding="utf-8")

documents = SimpleDirectoryReader(input_dir="data").load_data()
print(f"Loaded {len(documents)} document(s)")
print(documents[0].text[:200])

•Next, configure chunking. This is the core fix for long-document handling: instead of sending the whole file to the model, LlamaIndex breaks it into smaller nodes that can be retrieved efficiently.

from llama_index.core import Settings
from llama_index.core.node_parser import SentenceSplitter

Settings.node_parser = SentenceSplitter(chunk_size=300, chunk_overlap=50)
Settings.chunk_size = 300
Settings.chunk_overlap = 50

print("Chunking configured:")
print(f"chunk_size={Settings.chunk_size}, chunk_overlap={Settings.chunk_overlap}")

•Now create an index from the loaded documents. The VectorStoreIndex stores embeddings for each chunk so queries can pull back the most relevant sections instead of scanning the full file every time.

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

response = query_engine.query("What does LlamaIndex help you build?")
print(response)

•Ask a question that requires retrieval from multiple parts of the document. This is where chunking pays off: even if the answer is buried deep in a long file, the retriever can surface the right text.

question = "Why should long documents be split into chunks before indexing?"
response = query_engine.query(question)

print("Question:", question)
print("Answer:", response)

•If you want more control, inspect the retrieved nodes directly. This is useful when you need to debug bad answers or verify that the right passages are being pulled from a large source.

retriever = index.as_retriever(similarity_top_k=3)
nodes = retriever.retrieve("How does chunking help with long documents?")

for i, node in enumerate(nodes, start=1):
    print(f"\n--- Result {i} ---")
    print(node.score)
    print(node.node.text[:400])

Testing It

Run the script and confirm that it prints a loaded document count, then returns an answer to each query without errors. If your API key is set correctly, LlamaIndex will embed the chunks and generate responses through OpenAI.

If answers look vague or irrelevant, reduce chunk_size or increase similarity_top_k. For very dense documents like legal text, smaller chunks often retrieve cleaner evidence than large ones.

A good sanity check is to ask for a phrase that appears multiple times in your sample text and see whether retrieved snippets contain that phrase. If they do, your ingestion and retrieval pipeline is working.

Next Steps

•Try MarkdownNodeParser or SimpleFileNodeParser for non-text sources like Markdown and PDFs.
•Learn about metadata filters so you can search only specific sections of a large corpus.
•Add persistence with a vector store like Chroma or Pinecone so you do not rebuild indexes on every run.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit