Skip to main content
AI-Developer → AI Engineering

Part 1 of 4 — LlamaIndex 0.12: The Framework That Makes RAG Actually Buildable

You shipped an AI chatbot. It's smart — but it knows nothing about your own data. Fine-tuning costs $10,000. Re-training takes weeks. There's a better way. LlamaIndex lets you connect ANY data source to ANY LLM in under 20 lines of code.

March 24, 2026
14 min read
#LlamaIndex#RAG#LLM#Node.js#TypeScript#AI Engineering#Vector Database#OpenAI

You ship an AI support chatbot for your company. Week 1: it's brilliant. Customers love it.

Week 3: a customer asks about the new product you launched last Tuesday. The AI confidently says "that product does not exist."

Week 5: a customer asks for your refund policy. The AI gives the old policy — the one you updated two months ago.

The problem isn't the AI. The problem is that the AI has no idea your data even exists.

LlamaIndex is used in production by over 50,000+ developers and powers AI applications serving hundreds of millions of end-users — from solo builders to engineering teams at Salesforce, Airbnb, and DoorDash. This is the framework they all use to connect their data to any LLM.

If you've read our RAG fundamentals article, you already understand the concept of Retrieval Augmented Generation. You know that RAG = retrieve relevant context + augment the prompt + generate a better answer.

But there's a massive gap between understanding the concept and actually building it. Who handles the chunking? Who manages the vector store? Who orchestrates the retrieval? How do you handle PDFs, CSVs, and SQL databases all at once?

LlamaIndex handles all of that. This is Part 1 of a 4-part series that takes you from zero to a deployed, production-ready RAG app.

STEP 1 OF 5

The Two Problems LlamaIndex Was Built to Solve

Before we write a single line of code, you need to understand exactly what problem we're solving. There are two fundamental gaps between what LLMs know and what your application actually needs.

PROBLEM 1: THE KNOWLEDGE CUTOFF

Every LLM was trained on data up to a specific date. GPT-4o's training data ends at a point in the past. It has no idea what happened after that.

✗ Doesn't know about last week's product launch
✗ Doesn't know about last quarter's earnings report
✗ Doesn't know about the policy update from last month
PROBLEM 2: YOUR PRIVATE DATA

Even if the model is fully up to date, it was never trained on your data. Your internal docs, your codebase, your contracts — none of it exists to the model.

✗ Doesn't know your internal HR policies
✗ Doesn't know your engineering documentation
✗ Doesn't know your customer contracts

THE KEY INSIGHT

Fine-tuning and retraining are NOT the solution.

Fine-tuning teaches the model new behavior. RAG gives the model new knowledge. For most applications — chatbots, Q&A systems, document assistants — you need knowledge, not behavior. RAG is 10–100x cheaper and faster to update.

The Knowledge Gap — Visualized

GPT-4o Cutoff
Your App Goes Live
← This entire gap is a blind spot for the model without RAG →
LLM Training Begins Your private data + recent events Today

LlamaIndex solves both problems by creating a bridge between your live, private data and the LLM. Not by retraining. By retrieval.

STEP 2 OF 5

The 3-Phase Architecture: Load → Index → Query

LlamaIndex is built around three phases that every RAG application goes through. Understand these three phases and you understand LlamaIndex.

The LlamaIndex Architecture — Every RAG App Uses This

📥
PHASE 1
LOAD
Ingest your data from any source, in any format
📄 PDFs & Word Docs
📊 CSV & JSON
🌐 Web pages & APIs
🗄️ SQL Databases
📁 Directories
🧠
PHASE 2
INDEX
Transform your data for intelligent retrieval
✂️ Chunk into pieces
🔢 Embed each chunk
💾 Store in VectorDB
🗂️ Build metadata
📋 Create summary index
💬
PHASE 3
QUERY
Answer questions using your data
🔍 QueryEngine (Q&A)
💬 ChatEngine (memory)
🤖 Agents (autonomous)
🔀 RouterQueryEngine
📡 Streaming support

This exact 3-phase pipeline is what every production RAG system uses — LlamaIndex just handles all three for you.

Here's the beauty of this architecture: Phases 1 and 2 run once. You load and index your documents, save the result to disk, and then every query after that goes straight to Phase 3. No re-embedding. No re-indexing. Just instant answers.

STEP 3 OF 5

Why Use a Framework? What LlamaIndex Does for You

Let's be honest about what it takes to build RAG from scratch without LlamaIndex.

✗ WITHOUT LLAMAINDEX (~400 lines)
1. Write a PDF parser for each format
2. Write a text chunking algorithm
3. Handle chunk overlap manually
4. Call embedding API, manage rate limits
5. Set up and query a vector database
6. Write prompt template with context injection
7. Handle multi-turn conversation memory
8. Manage token limits per request
9. Write error handling for all of the above
10. Test everything and debug edge cases
✓ WITH LLAMAINDEX (~15 lines)
// 1. Load
const docs = await reader.loadData();

// 2. Index
const index = await
  VectorStoreIndex
  .fromDocuments(docs);

// 3. Query
const engine = index
  .asQueryEngine();

const res = await
  engine.query({ query });

LlamaIndex handles over 160 data loaders, 40+ vector store integrations, automatic chunking strategies, embedding management, prompt templates, memory management, and streaming — all out of the box.

LlamaIndex 0.12 — Supported Data Sources (Selection)

📄 PDF
📝 DOCX / TXT
📊 CSV / Excel
🔧 JSON / YAML
🌐 Web Pages
📚 Wikipedia
🗄️ SQL Database
📧 Email / Notion
🐙 GitHub
💬 Slack
🎥 YouTube
🔗 REST APIs
STEP 4 OF 5

Environment Setup — Everything You Need

Let's get your machine ready. LlamaIndex 0.12 requires Node.js 20.9.0 or higher (the same requirement as Next.js 16, which we'll use in Part 4).

Prerequisites Check


node --version
# Expected: v20.x.x or v22.x.x

# If you need to update, use nvm:
nvm install 22
nvm use 22

Create Your Project

mkdir my-rag-app && cd my-rag-app
npm init -y

Install LlamaIndex 0.12 + Dependencies

# Core framework
npm install [email protected]

# OpenAI integration (LLM + embeddings)
npm install [email protected]

# TypeScript support (recommended)
npm install -D [email protected] @types/[email protected] [email protected]

# dotenv for environment variables
npm install [email protected]

Configure TypeScript

// tsconfig.json
{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "moduleResolution": "bundler",
    "esModuleInterop": true,
    "strict": true,
    "outDir": "./dist",
    "rootDir": "./src"
  },
  "include": ["src/**/*"]
}

Set Up Your Environment Variables

# .env
OPENAI_API_KEY=sk-proj-...your-key-here...

⚠️ NEVER COMMIT YOUR API KEY

Add .env to your .gitignore immediately. Your OpenAI key starts with sk-proj- and charges per token. A leaked key can cost you thousands of dollars.

Project Structure

my-rag-app/
├── src/
│   ├── index.ts          ← Your main entry point
│   └── data/
│       └── sample.txt    ← Your test document
├── .env
├── .gitignore
├── tsconfig.json
└── package.json

Create your first data file:

mkdir -p src/data
echo "LlamaIndex is a data framework for building LLM applications.
It was created by Jerry Liu and Simon Suo in 2022.
The TypeScript version (LlamaIndex.TS) was released in 2023.
LlamaIndex supports over 160 data connectors, 40+ vector stores,
and a comprehensive suite of agentic tools." > src/data/sample.txt
STEP 5 OF 5

Your First RAG Query — Hello, LlamaIndex

Let's write your first working RAG system. This exact code will load a document, index it, and answer questions about it. Every single line is explained.

// src/index.ts
import "dotenv/config";
import { Settings } from "llamaindex";
import { OpenAI } from "llamaindex/llms";
import { OpenAIEmbedding } from "llamaindex/embeddings";
import { SimpleDirectoryReader } from "llamaindex/ingestion";
import { VectorStoreIndex } from "llamaindex/indices";

async function main() {
  // ── Step 1: Configure your LLM and embedding model ──────────────────
  // Settings is LlamaIndex's global config object. Set it once, use everywhere.
  Settings.llm = new OpenAI({
    model: "gpt-4o", // Most capable OpenAI model
    apiKey: process.env.OPENAI_API_KEY,
    temperature: 0.1, // Low temp = more factual answers
  });

  Settings.embedModel = new OpenAIEmbedding({
    model: "text-embedding-3-large", // Best OpenAI embedding model
    apiKey: process.env.OPENAI_API_KEY,
  });

  // Chunking config: how big each text piece is
  Settings.chunkSize = 512; // tokens per chunk
  Settings.chunkOverlap = 50; // overlap between chunks

  // ── Step 2: Load your documents ──────────────────────────────────────
  // SimpleDirectoryReader reads every file in the folder automatically.
  // Supports .txt, .pdf, .docx, .csv, .json, .md, and more.
  const reader = new SimpleDirectoryReader();
  const documents = await reader.loadData("./src/data");
  console.log(`Loaded ${documents.length} document(s)`);

  // ── Step 3: Build the index ───────────────────────────────────────────
  // This chunks the documents, embeds each chunk, and stores the vectors.
  // Under the hood: document → chunks → OpenAI embeddings → in-memory store.
  console.log("Indexing documents...");
  const index = await VectorStoreIndex.fromDocuments(documents);
  console.log("Index ready!");

  // ── Step 4: Create a query engine and ask a question ─────────────────
  // asQueryEngine() wraps the index with retrieval + synthesis logic.
  const queryEngine = index.asQueryEngine();

  const response = await queryEngine.query({
    query: "Who created LlamaIndex and when was it founded?",
  });

  console.log("\n── Answer ─────────────────────────────");
  console.log(response.toString());
  console.log("────────────────────────────────────────");
}

main().catch(console.error);

Run it:

npx ts-node src/index.ts

Expected output:

Loaded 1 document(s)
Indexing documents...
Index ready!

── Answer ─────────────────────────────
LlamaIndex was created by Jerry Liu and Simon Suo in 2022.
The TypeScript version was released in 2023.
────────────────────────────────────────

What Just Happened? (Under The Hood)

What happened in those 20 lines

1
Your text file was loaded into a Document object
SimpleDirectoryReader read sample.txt and created a Document with the full text + metadata (filename, creation date, etc.)
2
The document was split into chunks (Nodes)
SentenceSplitter (LlamaIndex's default parser) divided the text into 512-token chunks with 50-token overlap. Each chunk becomes a Node object.
3
Each chunk was sent to OpenAI's embedding API
text-embedding-3-large converted each text chunk into a 3072-dimensional vector. This is your document's "digital fingerprint" in semantic space.
4
Your query was also embedded and matched
The query "Who created LlamaIndex?" was also embedded. LlamaIndex found the most similar chunks by cosine similarity, then sent them + your query to GPT-4o to generate the final answer.

PRO TIP: USE GPT-4O-MINI TO SAVE COSTS WHILE LEARNING

During development, swap gpt-4o for gpt-4o-mini. It's ~40x cheaper and fast enough for testing. Switch to gpt-4o only for production. For embeddings, text-embedding-3-small is 5x cheaper with minimal accuracy loss for most use cases.


Key Takeaways

LLMs have two blind spots: private data and recent events. Fine-tuning fixes behavior but not knowledge. RAG fixes knowledge without retraining anything.
LlamaIndex uses a 3-phase pipeline: Load → Index → Query. Phases 1 and 2 run once (or when your data updates). Phase 3 runs on every user request.
LlamaIndex 0.12 uses modular subpath imports. import from "llamaindex/llms", "llamaindex/indices", etc. This enables tree-shaking and keeps your bundles small. @llamaindex/core is deprecated — don't use it.
Settings is your global config object. Set your LLM, embedding model, chunk size, and chunk overlap once on Settings and it applies everywhere in your app.
Node 20.9.0+ is required. LlamaIndex 0.12 and Next.js 16 both require Node 20.9.0 as the minimum version. Update your Node before starting any new project.
Use gpt-4o-mini during development. It's ~40x cheaper than gpt-4o and plenty capable for testing your RAG logic. Your architecture stays the same when you switch to production.

Try It Yourself

3 Experiments to Build Your Intuition

LAB 1
Ask a question about something NOT in your document
Query: "What is the current stock price of LlamaIndex?" — Notice how the model responds. Does it hallucinate? Does it say it doesn't know? This shows you the power of grounding responses in your data.
LAB 2
Add a second document with conflicting information
Create a second .txt file that says "LlamaIndex was created in 2021". Ask the same question. Watch how LlamaIndex handles conflicting sources. This teaches you about multi-document retrieval.
LAB 3
Change chunk size and see what happens
Set Settings.chunkSize = 128 (very small) then Settings.chunkSize = 2048 (very large). Ask complex questions. Notice how answer quality and relevance changes. This is the most important RAG parameter to tune.

What's Next in the Series

PART 2 OF 4 — UP NEXT
Build Your First Production RAG System with LlamaIndex 0.12
We go deep on real-world RAG: loading PDFs, CSVs, and directories simultaneously, the QueryEngine vs ChatEngine decision, building a full Express.js API for your RAG system, and 3 complete hands-on projects.
✦ SentenceSplitter deep dive
✦ QueryEngine vs ChatEngine
✦ PDF + CSV ingestion
✦ Express.js RAG API
MH

Mohamed Hamed

20 years building production systems — the last several deep in AI integration, LLMs, and full-stack architecture. I write what I've actually built and broken. If this was useful, the next one goes to LinkedIn first.

Follow on LinkedIn →