From 3,000 Support Tickets to a Searchable Brain.
Your company has thousands of tickets and 200-page manuals. No one has time to read them—until now. Build a TypeScript RAG system that answers questions with page-level citations.
This is Part 2 of 4. If you missed the foundation, check out Part 1: The LlamaIndex 3-Phase Architecture.
What We're Building
The Project Roadmap
- Goal: Internal Knowledge Q&A.
- Tech:
ChatEngine(Multi-turn). - Data: Markdown/Text policies.
- Goal: PDF Deep Search.
- Tech:
QueryEngine+ Source Attribution. - Data: Real 200-page reports.
- Goal: Production RAG API.
- Tech: Express.js + TS.
- Data: Live endpoints for your frontend.
The Core Abstraction: The Document Object
Before building, you must understand how data is represented internally.
- ID: Unique identifier (
doc-123-abc). - Text: The raw content string.
- Metadata: Key-value pairs like
file_name,page_label, ordepartment. - Impact: Metadata stays with every chunk, enabling perfect source tracking.
Chunking & Tuning
- Input: 2,000 Token Document.
- Process: Split into nodes using
chunkSize=512andchunkOverlap=50. - Output: 4 overlapping nodes that preserve context at boundaries.
Chunk Size Sweet Spots
Best for: FAQ lookup, specific data points. Trade-off: Minimal context.
Best for: Most RAG apps (The Golden Rule). Trade-off: Balanced speed & context.
Best for: Legal contracts, research papers. Trade-off: Slower, higher token cost.
Choosing Your Engine
QueryEngine vs. ChatEngine
- Mode: Single-Turn Q&A.
- Memory: None (Stateless).
- Use Case: Search bars, batch processing.
- Mode: Multi-Turn Conversation.
- Memory: Full (Stateful).
- Use Case: Support bots, interactive tutors.
Project 1: Stateful Internal Q&A (ChatEngine)
The key to a support assistant is memory. Here is how to configure a multi-turn chat experience that remembers context using LlamaIndex 0.12.
import { SimpleDirectoryReader, VectorStoreIndex, Settings, OpenAI } from "llamaindex";
// 1. Configure the LLM globally
Settings.llm = new OpenAI({ model: "gpt-4o", temperature: 0.2 });
async function runChatEngine() {
// 2. Load markdown and text documents from your knowledge directory
const reader = new SimpleDirectoryReader();
const documents = await reader.loadData({ directoryPath: "./src/data" });
// 3. Index and parse documents
const index = await VectorStoreIndex.fromDocuments(documents);
// 4. Create ChatEngine (maintains conversational history automatically)
const chatEngine = index.asChatEngine({
chatModel: Settings.llm,
systemPrompt: "You are a customer support agent. Answer questions using the provided context."
});
// 5. Start multi-turn conversation
const response1 = await chatEngine.chat({ message: "What is our remote work equipment policy?" });
console.log("User: What is our remote work equipment policy?");
console.log("AI:", response1.toString());
const response2 = await chatEngine.chat({ message: "Does that cover the Starter plan users?" });
console.log("User: Does that cover the Starter plan users?");
console.log("AI:", response2.toString());
}
runChatEngine().catch(console.error);Project 2: PDF Deep Search with Citations
In enterprise applications, users don't trust answers without proof. Using the metadata attached to LlamaIndex document chunks, we can return precise page-level citations.
import { SimpleDirectoryReader, VectorStoreIndex, Settings, OpenAI } from "llamaindex";
async function runPDFSearch() {
Settings.llm = new OpenAI({ model: "gpt-4o", temperature: 0.1 });
// 1. Load PDFs from a target directory
const reader = new SimpleDirectoryReader();
const documents = await reader.loadData({ directoryPath: "./src/data/pdfs" });
// 2. Create the Vector Index
const index = await VectorStoreIndex.fromDocuments(documents);
const queryEngine = index.asQueryEngine({ similarityTopK: 3 });
// 3. Run Query
const response = await queryEngine.query({ query: "Summarize the Q4 security audit results." });
console.log("Answer:", response.toString());
// 4. Print Citations
if (response.sourceNodes) {
console.log("\n--- CITATIONS ---");
response.sourceNodes.forEach((node, i) => {
console.log(`Source ${i + 1}:`);
console.log(`- File Name: ${node.node.metadata["file_name"]}`);
console.log(`- Page: ${node.node.metadata["page_label"] || "N/A"}`);
console.log(`- Score: ${node.score?.toFixed(4) || "N/A"}`);
console.log(`- Text Snippet: ${node.node.text.slice(0, 150)}...\n`);
});
}
}
runPDFSearch().catch(console.error);Project 3: Production RAG API (Express.js + TypeScript)
In a real deployment, RAG sits behind a backend API. We initialize the index once on server startup to save time and memory.
import express from "express";
import { SimpleDirectoryReader, VectorStoreIndex, Settings, OpenAI } from "llamaindex";
const app = express();
app.use(express.json());
let index: VectorStoreIndex | null = null;
// Initialize index once on startup
async function initIndex() {
console.log("Initializing Vector Index...");
const reader = new SimpleDirectoryReader();
const documents = await reader.loadData({ directoryPath: "./src/data" });
index = await VectorStoreIndex.fromDocuments(documents);
console.log("Index initialized successfully!");
}
app.post("/api/query", async (req, res) => {
if (!index) {
return res.status(503).json({ error: "Index is still initializing." });
}
const { query } = req.body;
if (!query) {
return res.status(400).json({ error: "Query is required." });
}
try {
const queryEngine = index.asQueryEngine({ similarityTopK: 3 });
const response = await queryEngine.query({ query });
const sources = response.sourceNodes?.map(node => ({
fileName: node.node.metadata["file_name"],
score: node.score,
text: node.node.text.slice(0, 200) + "..."
})) || [];
res.json({
answer: response.toString(),
sources
});
} catch (err: any) {
res.status(500).json({ error: err.message });
}
});
const PORT = 3000;
app.listen(PORT, async () => {
await initIndex();
console.log(`RAG API server listening on http://localhost:${PORT}`);
});Implementation Roadmap
6 Steps to Production
Use SimpleDirectoryReader to ingest PDFs, MDs, and CSVs.
Configure SentenceSplitter for the optimal chunk size.
Create a VectorStoreIndex from your documents.
Decide between ChatEngine or QueryEngine.
Wrap the engine in an Express.js server for frontend access.
Expose sourceNodes so users can verify every answer.
Key Takeaways
Don't just load text. Add custom metadata like department or last_updated to your documents—it makes filtering much more powerful.
In production, never show an answer without a 'Source' link. LlamaIndex's sourceNodes makes this a 1-line implementation.
Build your index when the server starts. Re-building the index on every request is a massive waste of tokens and time.