LangChain Tutorial (Python): handling long documents for intermediate developers
This tutorial shows you how to take long documents, split them into usable chunks, index them with LangChain, and answer questions over them without blowing past model context limits. You need this when your source material is too large for a single prompt, like policy manuals, contract packs, claims notes, or internal knowledge bases.
What You'll Need
- •Python 3.10+
- •An OpenAI API key set as
OPENAI_API_KEY - •These packages:
- •
langchain - •
langchain-openai - •
langchain-community - •
tiktoken
- •
- •A text file or document corpus you want to query
- •Basic familiarity with LangChain
Document, loaders, and chains
Step-by-Step
- •Start by loading your long document into LangChain
Documentobjects. For this tutorial, we’ll use a plain text file because it keeps the example executable and easy to adapt to PDFs or HTML later.
from langchain_community.document_loaders import TextLoader
loader = TextLoader("long_document.txt", encoding="utf-8")
documents = loader.load()
print(f"Loaded {len(documents)} document(s)")
print(documents[0].page_content[:500])
- •Split the document into chunks that fit model context windows. The key is overlap: you want enough repeated text between chunks so the model does not lose continuity across boundaries.
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=150,
separators=["\n\n", "\n", " ", ""],
)
chunks = splitter.split_documents(documents)
print(f"Created {len(chunks)} chunks")
for i, chunk in enumerate(chunks[:3]):
print(f"\nChunk {i+1}:")
print(chunk.page_content[:300])
- •Embed the chunks and store them in a vector database. Chroma is a solid local default for long-document retrieval because it keeps the setup simple and works well for iterative development.
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
collection_name="long_doc_demo",
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
- •Build a retrieval chain that answers questions using only the most relevant chunks. This is the part that makes long documents practical: you retrieve a few targeted chunks instead of stuffing the whole file into the prompt.
from langchain_openai import ChatOpenAI
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_messages([
("system", "Answer only from the provided context. If the answer is not in the context, say you don't know."),
("human", "Question: {input}\n\nContext:\n{context}")
])
document_chain = create_stuff_documents_chain(llm, prompt)
qa_chain = create_retrieval_chain(retriever, document_chain)
- •Ask a question and inspect both the answer and retrieved sources. In production, this is where you validate whether your chunk size and retrieval settings are actually surfacing the right evidence.
result = qa_chain.invoke({"input": "What are the main obligations described in this document?"})
print("Answer:")
print(result["answer"])
print("\nRetrieved chunks:")
for i, doc in enumerate(result["context"], start=1):
print(f"\nChunk {i}:")
print(doc.page_content[:400])
- •If your document is very large or highly structured, tune chunking before touching anything else. Smaller chunks improve precision; larger overlap improves continuity;
kcontrols how much evidence reaches the model.
def build_qa_chain(chunk_size=1000, chunk_overlap=150, k=4):
splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,
separators=["\n\n", "\n", " ", ""],
)
docs = loader.load()
chunks = splitter.split_documents(docs)
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
collection_name=f"doc_{chunk_size}_{chunk_overlap}_{k}",
)
retriever = vectorstore.as_retriever(search_kwargs={"k": k})
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_messages([
("system", "Answer only from context."),
("human", "Question: {input}\n\nContext:\n{context}")
])
return create_retrieval_chain(
retriever,
create_stuff_documents_chain(llm, prompt),
)
Testing It
Run the script against a document with clear factual statements, then ask questions whose answers appear in only one or two sections. If retrieval is working, the returned chunks should contain the exact language needed to answer.
Test edge cases too: ask something that is not in the document and confirm the model says it does not know instead of hallucinating. Then adjust chunk_size, chunk_overlap, and k until relevant passages consistently show up in result["context"].
If answers feel vague, your chunks are probably too large or your retriever is pulling too many weak matches. If answers miss cross-section details, increase overlap or switch to a structure-aware splitter later.
Next Steps
- •Add metadata filters so you can search by section, date, product line, or claim type
- •Replace Chroma with FAISS or a managed vector store for deployment
- •Learn map-reduce and refine chains for cases where “stuffing” retrieved docs is still too expensive
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit