Your company has 3,000 support tickets from last year, a 200-page product manual, and a database of FAQ articles.
Right now, support agents answer the same questions every single day. Every answer is in those 3,000 tickets — but no one has time to read them all.
By the end of this article, you will have built the system that changes that. In TypeScript. With real data. With a real API.
WHAT YOU'LL BUILD IN THIS ARTICLE
This is Part 2 of a 4-part series. If you haven't read Part 1, start there — it covers installation, the 3-phase architecture, and your first working example.
Your environment from Part 1 is all you need. Same package.json, same .env file, same project structure. Let's build.
How LlamaIndex Actually Loads Your Data
Before we build anything, you need to understand what happens when you call reader.loadData(). This is the foundation everything else is built on.
The Document Object
Every piece of data in LlamaIndex starts as a Document. A Document has three things:
// This is what a Document looks like internally
{
id_: "doc-123-abc", // Unique identifier
text: "The full text...", // The actual content
metadata: {
file_name: "report.pdf", // Where it came from
file_type: "application/pdf",
page_label: "1", // PDF page number (if applicable)
creation_date: "2026-03-01",
// ... any custom metadata you add
}
}
This metadata travels with every chunk when LlamaIndex splits the document. That means when you get an answer, you can always trace which file and even which page it came from.
SimpleDirectoryReader — Your Swiss Army Knife
SimpleDirectoryReader is the easiest way to load documents. Pass it a folder, and it handles everything:
import { SimpleDirectoryReader } from "llamaindex/ingestion";
const reader = new SimpleDirectoryReader();
// Load all files in a folder
const docs = await reader.loadData("./data");
// Load only specific file types
const pdfDocs = await reader.loadData("./data", {
recursive: true, // Include subdirectories
// excludedExtensions: [".png", ".jpg"], // Skip these file types
});
console.log(`Loaded ${docs.length} documents`);
docs.forEach((doc) => {
console.log(` • ${doc.metadata.file_name} (${doc.text.length} chars)`);
});
Supported file types out of the box: .txt, .md, .pdf, .docx, .csv, .json, .html, .epub, and more.
Custom Metadata — Track Where Every Answer Comes From
import { Document } from "llamaindex";
const doc = new Document({
text: "Your custom text here...",
metadata: {
source: "internal-wiki",
author: "engineering-team",
lastUpdated: "2026-03-24",
department: "product",
},
});
Chunking: The Most Important Tuning Decision You'll Make
When LlamaIndex indexes your documents, it splits them into chunks called Nodes. How you split determines everything: answer quality, retrieval accuracy, and token costs.
Document → Chunks → Nodes (Visualized)
The full text of your document: an intro paragraph, several body sections, code examples, a conclusion. It's too long to fit in a single retrieval step.
Configuring the SentenceSplitter
import { Settings } from "llamaindex";
import { SentenceSplitter } from "llamaindex/node-parser";
// The SentenceSplitter is smart — it tries to split at sentence boundaries
// rather than cutting words in the middle of a sentence.
Settings.nodeParser = new SentenceSplitter({
chunkSize: 512, // tokens per chunk (default: 1024)
chunkOverlap: 50, // overlap between adjacent chunks (default: 200)
});
Chunk Size: The Golden Rule
QueryEngine vs ChatEngine — Choosing the Right Tool
This is the most important architectural decision in every LlamaIndex application. They look similar but serve completely different purposes.
How Both Engines Work — The Retrieval + Synthesis Flow
Project 1 — Custom Document Q&A System
Let's build a document Q&A system that uses ChatEngine to allow follow-up questions. This is the pattern you'll use for customer support bots, internal knowledge assistants, and document explorers.
First, create your data folder with some sample content:
mkdir -p src/data/project1
cat > src/data/project1/remote-work-policy.txt << 'EOF'
Remote Work Policy — Effective March 2026
OVERVIEW
Our company supports flexible remote work arrangements for all full-time employees.
Employees may work remotely up to 4 days per week, with one mandatory in-office day on Wednesdays.
EQUIPMENT POLICY
The company provides a $1,500 equipment budget for remote work setup, renewed every 3 years.
Approved items include monitors, keyboards, ergonomic chairs, and high-speed internet equipment.
WORKING HOURS
Core hours are 10:00 AM – 3:00 PM in the employee's local timezone.
Meetings must be scheduled within core hours unless mutually agreed otherwise.
TIME OFF
Remote work does not change time-off policies. PTO requests must be submitted 2 weeks in advance.
All national holidays remain paid days off regardless of remote status.
EOF
Now write the Q&A system:
// src/project1-chat.ts
import "dotenv/config";
import * as readline from "readline";
import { Settings } from "llamaindex";
import { OpenAI } from "llamaindex/llms";
import { OpenAIEmbedding } from "llamaindex/embeddings";
import { SimpleDirectoryReader } from "llamaindex/ingestion";
import { VectorStoreIndex } from "llamaindex/indices";
import { SentenceSplitter } from "llamaindex/node-parser";
async function main() {
// ── Configure LlamaIndex ────────────────────────────────────────────
Settings.llm = new OpenAI({
model: "gpt-4o",
apiKey: process.env.OPENAI_API_KEY,
temperature: 0.1,
});
Settings.embedModel = new OpenAIEmbedding({
model: "text-embedding-3-large",
apiKey: process.env.OPENAI_API_KEY,
});
// Use SentenceSplitter for policy docs — 512 tokens works well for
// documents with clearly separated policy sections
Settings.nodeParser = new SentenceSplitter({
chunkSize: 512,
chunkOverlap: 64,
});
// ── Load and index documents ────────────────────────────────────────
console.log("Loading documents...");
const reader = new SimpleDirectoryReader();
const documents = await reader.loadData("./src/data/project1");
console.log(`Indexing ${documents.length} document(s)...`);
const index = await VectorStoreIndex.fromDocuments(documents);
console.log("Ready! Type your question (or 'exit' to quit)\n");
// ── Create a ChatEngine with custom system prompt ───────────────────
const chatEngine = index.asChatEngine({
// The system prompt shapes how the AI responds
systemPrompt: `You are an HR assistant with deep knowledge of the company's
policies. Answer questions accurately based ONLY on the provided policy documents.
If the information is not in the documents, say so clearly.
Always be helpful, professional, and concise.`,
});
// ── Interactive chat loop ───────────────────────────────────────────
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
const askQuestion = () => {
rl.question("\n💬 You: ", async (input) => {
const message = input.trim();
if (message.toLowerCase() === "exit") {
console.log("\nGoodbye!");
rl.close();
return;
}
if (!message) {
askQuestion();
return;
}
try {
const response = await chatEngine.chat({ message });
console.log(`\n🤖 Assistant: ${response.toString()}`);
} catch (error) {
console.error("Error:", error);
}
askQuestion(); // Continue the conversation loop
});
};
askQuestion();
}
main().catch(console.error);
Run it and have a multi-turn conversation:
npx ts-node src/project1-chat.ts
Loading documents...
Indexing 1 document(s)...
Ready! Type your question (or 'exit' to quit)
💬 You: How many days can I work from home each week?
🤖 Assistant: You can work remotely up to 4 days per week. One day per week
(Wednesday) is mandatory in-office attendance.
💬 You: What about the equipment budget?
🤖 Assistant: The company provides a $1,500 equipment budget for your remote
work setup, which is renewed every 3 years. You can use it for monitors,
keyboards, ergonomic chairs, and high-speed internet equipment.
💬 You: When do I need to submit PTO requests?
🤖 Assistant: PTO requests must be submitted at least 2 weeks in advance.
National holidays remain paid days off regardless of your remote work status.
Notice how the second question ("What about the equipment budget?") has no explicit context — the ChatEngine figured out you were still talking about remote work policy.
Project 2 — Query a PDF File
PDFs are the most common document format in the real world. LlamaIndex handles them natively. No extra libraries needed for basic PDFs — just point SimpleDirectoryReader at a folder with PDFs.
For this project, we'll query a PDF and also expose source nodes — showing exactly which page and section the answer came from.
# Download a real PDF to test with (Warren Buffett's 2025 Annual Letter)
# Or create a simple test PDF and place it in:
mkdir -p src/data/project2
# Place any PDF you have in src/data/project2/
# For testing, we'll create a mock via a text file that simulates a PDF
cat > src/data/project2/investment-principles.txt << 'EOF'
Investment Principles and Strategy Guide — 2026 Edition
Chapter 1: The Foundation of Long-Term Investing
The most important quality for an investor is temperament, not intellect.
Markets are driven by fear and greed in the short term, but by fundamentals in the long term.
The best investment you can make is in yourself — your skills compound faster than any stock.
Chapter 2: Portfolio Construction
Diversification is protection against ignorance. If you know what you're doing, diversify less.
The ideal portfolio for most investors contains 5–15 positions across different sectors.
Never invest borrowed money. Leverage amplifies both gains and losses.
Chapter 3: Valuation Principles
Price is what you pay. Value is what you get.
Buy great businesses at fair prices rather than fair businesses at great prices.
A margin of safety of at least 25% below intrinsic value is prudent for any investment.
Chapter 4: Risk Management
The first rule of investing is never lose money. The second rule is never forget rule one.
Position sizing is more important than stock selection for most investors.
Cash is not trash — it gives you the ability to act when others cannot.
EOF
Now write the PDF query tool with source attribution:
// src/project2-pdf.ts
import "dotenv/config";
import { Settings } from "llamaindex";
import { OpenAI } from "llamaindex/llms";
import { OpenAIEmbedding } from "llamaindex/embeddings";
import { SimpleDirectoryReader } from "llamaindex/ingestion";
import { VectorStoreIndex } from "llamaindex/indices";
import { SentenceSplitter } from "llamaindex/node-parser";
async function queryDocument(question: string) {
Settings.llm = new OpenAI({
model: "gpt-4o",
apiKey: process.env.OPENAI_API_KEY,
temperature: 0, // Zero temp for factual document queries
});
Settings.embedModel = new OpenAIEmbedding({
model: "text-embedding-3-large",
apiKey: process.env.OPENAI_API_KEY,
});
// For longer documents like annual reports, use larger chunks
// so each retrieved chunk contains enough context
Settings.nodeParser = new SentenceSplitter({
chunkSize: 1024,
chunkOverlap: 100,
});
const reader = new SimpleDirectoryReader();
const documents = await reader.loadData("./src/data/project2");
console.log(
`Loaded ${documents.length} document(s) with ${documents.reduce((acc, d) => acc + d.text.length, 0).toLocaleString()} characters`,
);
const index = await VectorStoreIndex.fromDocuments(documents);
// Configure the QueryEngine to return the source nodes (where the answer came from)
const queryEngine = index.asQueryEngine({
similarityTopK: 3, // Retrieve top 3 most relevant chunks
});
console.log(`\n📌 Question: ${question}\n`);
const response = await queryEngine.query({ query: question });
// Display the answer
console.log("✅ Answer:");
console.log(response.toString());
// Display sources — this is the killer feature for trust and debugging
console.log("\n📚 Sources Used:");
const sourceNodes = response.sourceNodes || [];
sourceNodes.forEach((node, i) => {
const source = node.node.metadata?.file_name ?? "Unknown";
const score = node.score?.toFixed(3) ?? "N/A";
const preview = node.node
.getContent()
.substring(0, 100)
.replace(/\n/g, " ");
console.log(` [${i + 1}] ${source} (similarity: ${score})`);
console.log(` "${preview}..."`);
});
}
// Run three different types of queries to show versatility
async function main() {
await queryDocument(
"What is the recommended margin of safety for investments?",
);
console.log("\n" + "─".repeat(60) + "\n");
await queryDocument("How should I think about portfolio size?");
console.log("\n" + "─".repeat(60) + "\n");
await queryDocument("What does the guide say about using leverage?");
}
main().catch(console.error);
npx ts-node src/project2-pdf.ts
Loaded 1 document(s) with 1,847 characters
📌 Question: What is the recommended margin of safety?
✅ Answer:
The guide recommends a margin of safety of at least 25% below intrinsic
value as prudent for any investment. This concept comes from Chapter 3
on valuation principles.
📚 Sources Used:
[1] investment-principles.txt (similarity: 0.892)
"Chapter 3: Valuation Principles Price is what you pay. Value is what..."
[2] investment-principles.txt (similarity: 0.743)
"Chapter 2: Portfolio Construction Diversification is protection against..."
WHY SOURCE ATTRIBUTION MATTERS
In production, source attribution is what separates a trusted RAG system from a hallucination machine. Always expose sourceNodes in your UI. Users need to verify where answers come from — especially in legal, medical, or financial contexts.
Project 3 — A Production Express.js RAG API
This is where everything comes together. We'll build a proper REST API that your frontend can call — complete with CORS support, error handling, and response streaming.
First, install Express:
npm install [email protected] [email protected]
npm install -D @types/[email protected] @types/[email protected]
// src/project3-api.ts
import "dotenv/config";
import express, { Request, Response } from "express";
import cors from "cors";
import { Settings } from "llamaindex";
import { OpenAI } from "llamaindex/llms";
import { OpenAIEmbedding } from "llamaindex/embeddings";
import { SimpleDirectoryReader } from "llamaindex/ingestion";
import { VectorStoreIndex } from "llamaindex/indices";
import { SentenceSplitter } from "llamaindex/node-parser";
import type { ContextChatEngine, BaseQueryEngine } from "llamaindex";
const app = express();
app.use(cors());
app.use(express.json());
// ── Global index (initialized once at startup) ──────────────────────
// This is the key architectural decision: build the index ONCE when
// the server starts, then reuse it for all requests. No re-indexing!
let queryEngine: BaseQueryEngine;
let chatEngine: ContextChatEngine;
async function initializeRAG() {
console.log("Initializing RAG system...");
Settings.llm = new OpenAI({
model: "gpt-4o",
apiKey: process.env.OPENAI_API_KEY,
temperature: 0.1,
});
Settings.embedModel = new OpenAIEmbedding({
model: "text-embedding-3-large",
apiKey: process.env.OPENAI_API_KEY,
});
Settings.nodeParser = new SentenceSplitter({
chunkSize: 512,
chunkOverlap: 64,
});
const reader = new SimpleDirectoryReader();
// Load all documents from the data folder — add files, restart server
const documents = await reader.loadData("./src/data");
const index = await VectorStoreIndex.fromDocuments(documents);
queryEngine = index.asQueryEngine({ similarityTopK: 4 });
chatEngine = index.asChatEngine({
systemPrompt:
"You are a helpful assistant. Answer questions based only on the provided documents. Be concise and accurate.",
});
console.log(`RAG initialized with documents from ./src/data`);
}
// ── POST /api/query — Single-turn question answering ─────────────────
// Use this for search bars, document lookup, batch processing
app.post("/api/query", async (req: Request, res: Response) => {
const { query } = req.body;
if (!query || typeof query !== "string") {
res
.status(400)
.json({ error: "query field is required and must be a string" });
return;
}
try {
const response = await queryEngine.query({ query });
const sources = (response.sourceNodes || []).map((node) => ({
text: node.node.getContent().substring(0, 200),
fileName: node.node.metadata?.file_name,
score: node.score,
}));
res.json({
answer: response.toString(),
sources,
query,
});
} catch (error) {
console.error("Query error:", error);
res.status(500).json({ error: "Failed to process query" });
}
});
// ── POST /api/chat — Multi-turn conversation ─────────────────────────
// Use this for chatbots — NOTE: this is stateful per instance.
// For production with multiple users, implement session-based engines.
app.post("/api/chat", async (req: Request, res: Response) => {
const { message } = req.body;
if (!message || typeof message !== "string") {
res
.status(400)
.json({ error: "message field is required and must be a string" });
return;
}
try {
const response = await chatEngine.chat({ message });
res.json({
response: response.toString(),
message,
});
} catch (error) {
console.error("Chat error:", error);
res.status(500).json({ error: "Failed to process message" });
}
});
// ── GET /api/health — Health check ───────────────────────────────────
app.get("/api/health", (_req: Request, res: Response) => {
res.json({ status: "ok", timestamp: new Date().toISOString() });
});
// ── Start the server ──────────────────────────────────────────────────
const PORT = process.env.PORT || 3001;
initializeRAG()
.then(() => {
app.listen(PORT, () => {
console.log(`\n✅ RAG API running on http://localhost:${PORT}`);
console.log(` POST /api/query — Single-turn Q&A`);
console.log(` POST /api/chat — Multi-turn chat`);
console.log(` GET /api/health — Health check`);
});
})
.catch((err) => {
console.error("Failed to initialize RAG:", err);
process.exit(1);
});
Run your API server:
npx ts-node src/project3-api.ts
Test it with curl:
# Single-turn query
curl -X POST http://localhost:3001/api/query \
-H "Content-Type: application/json" \
-d '{"query": "How many days can I work from home?"}'
# Multi-turn chat
curl -X POST http://localhost:3001/api/chat \
-H "Content-Type: application/json" \
-d '{"message": "What is the equipment budget?"}'
# Health check
curl http://localhost:3001/api/health
Example response:
{
"answer": "Employees may work remotely up to 4 days per week, with Wednesday as the mandatory in-office day.",
"sources": [
{
"text": "Remote Work Policy — Effective March 2026\n\nOVERVIEW\nOur company supports flexible remote work...",
"fileName": "remote-work-policy.txt",
"score": 0.891
}
],
"query": "How many days can I work from home?"
}
Your Express.js RAG API — Request Flow
Frontend / curl / Postman
POST /api/query
QueryEngine
GPT-4o
Key Takeaways
loadData() — LlamaIndex figures out the rest.metadata is your audit trail. LlamaIndex preserves metadata through every stage — from Document to Node to source citation. Use it to track file names, page numbers, authors, and timestamps.Try It Yourself
3 Experiments to Go Deeper
similarityTopK and measure accuracysimilarityTopK: 1 (fastest, least context) vs similarityTopK: 8 (more context but slower and more expensive). Ask the same question both ways. Which gives better answers for your use case?metadata: { department: "HR" } and one with metadata: { department: "Legal" }. Then explore how to filter queries to only return results from one department. This is the foundation of permission-based RAG.