Skip to main content
AI-Developer/AI Engineering
Part 1 of 16

Part 2 of 4 — Build a Real RAG System with LlamaIndex 0.12: PDFs, Chat, and a Live API

You understand what LlamaIndex does. Now let's build something real. In this article you'll build three complete RAG applications: a document Q&A system, a PDF querying tool, and a full Express.js API — all in TypeScript with LlamaIndex 0.12.

March 24, 2026
18 min read
#LlamaIndex#RAG#TypeScript#Express.js#PDF#QueryEngine#ChatEngine#AI Engineering#OpenAI

From 3,000 Support Tickets to a Searchable Brain.

Your company has thousands of tickets and 200-page manuals. No one has time to read them—until now. Build a TypeScript RAG system that answers questions with page-level citations.

Primary Objective
LlamaIndex 0.12 | Express.js | PDF Support | Source Attribution
💡
Series Navigation

This is Part 2 of 4. If you missed the foundation, check out Part 1: The LlamaIndex 3-Phase Architecture.


What We're Building

The Project Roadmap

📄PROJECT 1
  • Goal: Internal Knowledge Q&A.
  • Tech: ChatEngine (Multi-turn).
  • Data: Markdown/Text policies.
📚PROJECT 2
  • Goal: PDF Deep Search.
  • Tech: QueryEngine + Source Attribution.
  • Data: Real 200-page reports.
🌐PROJECT 3
  • Goal: Production RAG API.
  • Tech: Express.js + TS.
  • Data: Live endpoints for your frontend.

The Core Abstraction: The Document Object

Before building, you must understand how data is represented internally.

Anatomy of a LlamaIndex Document
  • ID: Unique identifier (doc-123-abc).
  • Text: The raw content string.
  • Metadata: Key-value pairs like file_name, page_label, or department.
  • Impact: Metadata stays with every chunk, enabling perfect source tracking.

Chunking & Tuning

The SentenceSplitter Workflow
  • Input: 2,000 Token Document.
  • Process: Split into nodes using chunkSize=512 and chunkOverlap=50.
  • Output: 4 overlapping nodes that preserve context at boundaries.

Chunk Size Sweet Spots

PRECISE (128-256)

Best for: FAQ lookup, specific data points. Trade-off: Minimal context.

GENERAL (512)

Best for: Most RAG apps (The Golden Rule). Trade-off: Balanced speed & context.

REASONING (1024+)

Best for: Legal contracts, research papers. Trade-off: Slower, higher token cost.


Choosing Your Engine

QueryEngine vs. ChatEngine

🔍QUERYENGINE
  • Mode: Single-Turn Q&A.
  • Memory: None (Stateless).
  • Use Case: Search bars, batch processing.
💬CHATENGINE
  • Mode: Multi-Turn Conversation.
  • Memory: Full (Stateful).
  • Use Case: Support bots, interactive tutors.

Implementation Roadmap

6 Steps to Production

📂
LOAD DATA

Use SimpleDirectoryReader to ingest PDFs, MDs, and CSVs.

✂️
NODE PARSING

Configure SentenceSplitter for the optimal chunk size.

INDEXING

Create a VectorStoreIndex from your documents.

⚙️
ENGINE CONFIG

Decide between ChatEngine or QueryEngine.

🌐
API LAYER

Wrap the engine in an Express.js server for frontend access.

📚
CITATIONS

Expose sourceNodes so users can verify every answer.


Key Takeaways

01
01
Metadata is Leverage

Don't just load text. Add custom metadata like department or last_updated to your documents—it makes filtering much more powerful.

01
01
Source Attribution is Trust

In production, never show an answer without a 'Source' link. LlamaIndex's sourceNodes makes this a 1-line implementation.

01
01
Initialize Once

Build your index when the server starts. Re-building the index on every request is a massive waste of tokens and time.

AI Engineering
MH

Mohamed Hamed

20 years building production systems — the last several deep in AI integration, LLMs, and full-stack architecture. I write what I've actually built and broken. If this was useful, the next one goes to LinkedIn first.

Follow on LinkedIn →