You ship an AI support chatbot for your company. Week 1: it's brilliant. Customers love it.
Week 3: a customer asks about the new product you launched last Tuesday. The AI confidently says "that product does not exist."
Week 5: a customer asks for your refund policy. The AI gives the old policy — the one you updated two months ago.
The problem isn't the AI. The problem is that the AI has no idea your data even exists.
LlamaIndex is used in production by over 50,000+ developers and powers AI applications serving hundreds of millions of end-users — from solo builders to engineering teams at Salesforce, Airbnb, and DoorDash. This is the framework they all use to connect their data to any LLM.
If you've read our RAG fundamentals article, you already understand the concept of Retrieval Augmented Generation. You know that RAG = retrieve relevant context + augment the prompt + generate a better answer.
But there's a massive gap between understanding the concept and actually building it. Who handles the chunking? Who manages the vector store? Who orchestrates the retrieval? How do you handle PDFs, CSVs, and SQL databases all at once?
LlamaIndex handles all of that. This is Part 1 of a 4-part series that takes you from zero to a deployed, production-ready RAG app.
The Two Problems LlamaIndex Was Built to Solve
Before we write a single line of code, you need to understand exactly what problem we're solving. There are two fundamental gaps between what LLMs know and what your application actually needs.
Every LLM was trained on data up to a specific date. GPT-4o's training data ends at a point in the past. It has no idea what happened after that.
Even if the model is fully up to date, it was never trained on your data. Your internal docs, your codebase, your contracts — none of it exists to the model.
THE KEY INSIGHT
Fine-tuning and retraining are NOT the solution.
Fine-tuning teaches the model new behavior. RAG gives the model new knowledge. For most applications — chatbots, Q&A systems, document assistants — you need knowledge, not behavior. RAG is 10–100x cheaper and faster to update.
The Knowledge Gap — Visualized
LlamaIndex solves both problems by creating a bridge between your live, private data and the LLM. Not by retraining. By retrieval.
The 3-Phase Architecture: Load → Index → Query
LlamaIndex is built around three phases that every RAG application goes through. Understand these three phases and you understand LlamaIndex.
Here's the beauty of this architecture: Phases 1 and 2 run once. You load and index your documents, save the result to disk, and then every query after that goes straight to Phase 3. No re-embedding. No re-indexing. Just instant answers.
Why Use a Framework? What LlamaIndex Does for You
Let's be honest about what it takes to build RAG from scratch without LlamaIndex.
LlamaIndex handles over 160 data loaders, 40+ vector store integrations, automatic chunking strategies, embedding management, prompt templates, memory management, and streaming — all out of the box.
LlamaIndex 0.12 — Supported Data Sources (Selection)
Environment Setup — Everything You Need
Let's get your machine ready. LlamaIndex 0.12 requires Node.js 20.9.0 or higher (the same requirement as Next.js 16, which we'll use in Part 4).
Prerequisites Check
node --version
# Expected: v20.x.x or v22.x.x
# If you need to update, use nvm:
nvm install 22
nvm use 22
Create Your Project
mkdir my-rag-app && cd my-rag-app
npm init -y
Install LlamaIndex 0.12 + Dependencies
# Core framework
npm install [email protected]
# OpenAI integration (LLM + embeddings)
npm install [email protected]
# TypeScript support (recommended)
npm install -D [email protected] @types/[email protected] [email protected]
# dotenv for environment variables
npm install [email protected]
Configure TypeScript
// tsconfig.json
{
"compilerOptions": {
"target": "ES2022",
"module": "ESNext",
"moduleResolution": "bundler",
"esModuleInterop": true,
"strict": true,
"outDir": "./dist",
"rootDir": "./src"
},
"include": ["src/**/*"]
}
Set Up Your Environment Variables
# .env
OPENAI_API_KEY=sk-proj-...your-key-here...
⚠️ NEVER COMMIT YOUR API KEY
Add .env to your .gitignore immediately. Your OpenAI key starts with sk-proj- and charges per token. A leaked key can cost you thousands of dollars.
Project Structure
my-rag-app/
├── src/
│ ├── index.ts ← Your main entry point
│ └── data/
│ └── sample.txt ← Your test document
├── .env
├── .gitignore
├── tsconfig.json
└── package.json
Create your first data file:
mkdir -p src/data
echo "LlamaIndex is a data framework for building LLM applications.
It was created by Jerry Liu and Simon Suo in 2022.
The TypeScript version (LlamaIndex.TS) was released in 2023.
LlamaIndex supports over 160 data connectors, 40+ vector stores,
and a comprehensive suite of agentic tools." > src/data/sample.txt
Your First RAG Query — Hello, LlamaIndex
Let's write your first working RAG system. This exact code will load a document, index it, and answer questions about it. Every single line is explained.
// src/index.ts
import "dotenv/config";
import { Settings } from "llamaindex";
import { OpenAI } from "llamaindex/llms";
import { OpenAIEmbedding } from "llamaindex/embeddings";
import { SimpleDirectoryReader } from "llamaindex/ingestion";
import { VectorStoreIndex } from "llamaindex/indices";
async function main() {
// ── Step 1: Configure your LLM and embedding model ──────────────────
// Settings is LlamaIndex's global config object. Set it once, use everywhere.
Settings.llm = new OpenAI({
model: "gpt-4o", // Most capable OpenAI model
apiKey: process.env.OPENAI_API_KEY,
temperature: 0.1, // Low temp = more factual answers
});
Settings.embedModel = new OpenAIEmbedding({
model: "text-embedding-3-large", // Best OpenAI embedding model
apiKey: process.env.OPENAI_API_KEY,
});
// Chunking config: how big each text piece is
Settings.chunkSize = 512; // tokens per chunk
Settings.chunkOverlap = 50; // overlap between chunks
// ── Step 2: Load your documents ──────────────────────────────────────
// SimpleDirectoryReader reads every file in the folder automatically.
// Supports .txt, .pdf, .docx, .csv, .json, .md, and more.
const reader = new SimpleDirectoryReader();
const documents = await reader.loadData("./src/data");
console.log(`Loaded ${documents.length} document(s)`);
// ── Step 3: Build the index ───────────────────────────────────────────
// This chunks the documents, embeds each chunk, and stores the vectors.
// Under the hood: document → chunks → OpenAI embeddings → in-memory store.
console.log("Indexing documents...");
const index = await VectorStoreIndex.fromDocuments(documents);
console.log("Index ready!");
// ── Step 4: Create a query engine and ask a question ─────────────────
// asQueryEngine() wraps the index with retrieval + synthesis logic.
const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({
query: "Who created LlamaIndex and when was it founded?",
});
console.log("\n── Answer ─────────────────────────────");
console.log(response.toString());
console.log("────────────────────────────────────────");
}
main().catch(console.error);
Run it:
npx ts-node src/index.ts
Expected output:
Loaded 1 document(s)
Indexing documents...
Index ready!
── Answer ─────────────────────────────
LlamaIndex was created by Jerry Liu and Simon Suo in 2022.
The TypeScript version was released in 2023.
────────────────────────────────────────
What Just Happened? (Under The Hood)
What happened in those 20 lines
Document objectsample.txt and created a Document with the full text + metadata (filename, creation date, etc.)Node object.text-embedding-3-large converted each text chunk into a 3072-dimensional vector. This is your document's "digital fingerprint" in semantic space.PRO TIP: USE GPT-4O-MINI TO SAVE COSTS WHILE LEARNING
During development, swap gpt-4o for gpt-4o-mini. It's ~40x cheaper and fast enough for testing. Switch to gpt-4o only for production. For embeddings, text-embedding-3-small is 5x cheaper with minimal accuracy loss for most use cases.
Key Takeaways
import from "llamaindex/llms", "llamaindex/indices", etc. This enables tree-shaking and keeps your bundles small. @llamaindex/core is deprecated — don't use it.Settings is your global config object. Set your LLM, embedding model, chunk size, and chunk overlap once on Settings and it applies everywhere in your app.Try It Yourself
3 Experiments to Build Your Intuition
.txt file that says "LlamaIndex was created in 2021". Ask the same question. Watch how LlamaIndex handles conflicting sources. This teaches you about multi-document retrieval.Settings.chunkSize = 128 (very small) then Settings.chunkSize = 2048 (very large). Ask complex questions. Notice how answer quality and relevance changes. This is the most important RAG parameter to tune.