Part 2 of 4 — Build a Real RAG System with LlamaIndex 0.12: PDFs, Chat, and a Live API

From 3,000 Support Tickets to a Searchable Brain.

Your company has thousands of tickets and 200-page manuals. No one has time to read them—until now. Build a TypeScript RAG system that answers questions with page-level citations.

Primary Objective

LlamaIndex 0.12 | Express.js | PDF Support | Source Attribution

💡

Series Navigation

This is Part 2 of 4. If you missed the foundation, check out Part 1: The LlamaIndex 3-Phase Architecture.

What We're Building

The Project Roadmap

📄PROJECT 1

Goal: Internal Knowledge Q&A.
Tech: ChatEngine (Multi-turn).
Data: Markdown/Text policies.

📚PROJECT 2

Goal: PDF Deep Search.
Tech: QueryEngine + Source Attribution.
Data: Real 200-page reports.

🌐PROJECT 3

Goal: Production RAG API.
Tech: Express.js + TS.
Data: Live endpoints for your frontend.

The Core Abstraction: The Document Object

Before building, you must understand how data is represented internally.

Anatomy of a LlamaIndex Document

ID: Unique identifier (doc-123-abc).
Text: The raw content string.
Metadata: Key-value pairs like file_name, page_label, or department.
Impact: Metadata stays with every chunk, enabling perfect source tracking.

Chunking & Tuning

The SentenceSplitter Workflow

Input: 2,000 Token Document.
Process: Split into nodes using chunkSize=512 and chunkOverlap=50.
Output: 4 overlapping nodes that preserve context at boundaries.

Chunk Size Sweet Spots

PRECISE (128-256)

Best for: FAQ lookup, specific data points. Trade-off: Minimal context.

GENERAL (512)

Best for: Most RAG apps (The Golden Rule). Trade-off: Balanced speed & context.

REASONING (1024+)

Best for: Legal contracts, research papers. Trade-off: Slower, higher token cost.

Choosing Your Engine

QueryEngine vs. ChatEngine

🔍QUERYENGINE

Mode: Single-Turn Q&A.
Memory: None (Stateless).
Use Case: Search bars, batch processing.

💬CHATENGINE

Mode: Multi-Turn Conversation.
Memory: Full (Stateful).
Use Case: Support bots, interactive tutors.

Implementation Roadmap

6 Steps to Production

📂

LOAD DATA

Use SimpleDirectoryReader to ingest PDFs, MDs, and CSVs.

✂️

NODE PARSING

Configure SentenceSplitter for the optimal chunk size.

⚡

INDEXING

Create a VectorStoreIndex from your documents.

⚙️

ENGINE CONFIG

Decide between ChatEngine or QueryEngine.

🌐

API LAYER

Wrap the engine in an Express.js server for frontend access.

📚

CITATIONS

Expose sourceNodes so users can verify every answer.

Key Takeaways

Metadata is Leverage

Don't just load text. Add custom metadata like department or last_updated to your documents—it makes filtering much more powerful.

Source Attribution is Trust

In production, never show an answer without a 'Source' link. LlamaIndex's sourceNodes makes this a 1-line implementation.

Initialize Once

Build your index when the server starts. Re-building the index on every request is a massive waste of tokens and time.