Part 3 of 4 — LlamaIndex Agents: Build an AI That Thinks Before It Answers

A user types: "Compare our pricing for Enterprise plans with what our competitors charge, and then summarize what our support tickets say about price complaints."

That's three different data sources: your pricing docs, a web search, and your support ticket database. A normal QueryEngine can handle one. An Agent can handle all three — automatically deciding which to use and in what order.

You're about to build an AI that thinks before it answers — not just retrieves.

🤖

WHAT YOU'LL BUILD IN THIS ARTICLE

🔷 Project 1: An OpenAIAgent with custom FunctionTools (fetch data on demand)
🔷 Project 2: A RouterQueryEngine that routes to the right knowledge base
🔷 Project 3: A full multi-source RAG agent (PDF + live data + custom logic)

This is Part 3 of a 4-part series. You need Part 1 for setup and Part 2 for the RAG fundamentals. The concepts here build directly on what we built in Part 2.

Phase 1 of 6

What Is an Agent? The ReAct Loop Explained

An Agent is an LLM that can use tools. That's it. The sophistication comes from the loop it uses: Reason → Act → Observe → Repeat.

This loop is called ReAct (Reasoning + Acting), and it's the foundation of every modern AI agent — from LlamaIndex agents to AutoGPT to Claude Agents.

The ReAct Loop — How Every LlamaIndex Agent Thinks

Step 1

🧠

REASON

What does the user want? Which tool should I use? What information do I need?

Step 2

⚡

ACT

Call the chosen tool with the right parameters. Execute the action.

Step 3

👁️

OBSERVE

What did the tool return? Does this answer the question or do I need more?

Step 4

✅

DECIDE

Is this enough to answer? If yes → respond. If no → loop back to Step 1

💡

The agent loops through Reason → Act → Observe until it has enough information to give a complete, accurate answer. It can call 1 tool or 20 tools — the same loop handles both.

The Key Difference: QueryEngine vs Agent

QUERYENGINE — LINEAR RETRIEVAL

Query → Embed → Retrieve → Answer

One data source. One retrieval step. Always the same flow. Simple and fast.

AGENT — DYNAMIC REASONING

Query → Think → Tool? → Observe → Think → Tool? → Observe → Answer

Any number of sources. Dynamic decision-making. Can retry, combine, and reason.

Phase 2 of 6

FunctionTool — The Foundation of Every Agent

A FunctionTool wraps any JavaScript function and gives the agent the ability to call it. The agent reads the tool's description and parameters, decides when to use it, calls it with the right inputs, and reads the result.

This is the most important building block you'll learn in this article.

typescript

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950

import { FunctionTool } from "llamaindex/tools";

// ── Simple example: a tool that gets the current date ───────────────
const getCurrentDateTool = FunctionTool.from(
  // 1. The actual function (can be async, can call APIs, databases, etc.)
  () => {
    return new Date().toISOString().split("T")[0]; // Returns "2026-03-24"
  },
  // 2. The metadata — this is what the LLM reads to decide when/how to use it
  {
    name: "get_current_date",
    description:
      "Returns the current date. Use this when the user asks about today's date, recent events, or needs to know what day it is.",
    parameters: {
      type: "object",
      properties: {}, // No parameters needed for this tool
      required: [],
    },
  },
);

// ── A more complex tool: search your product database ───────────────
const searchProductsTool = FunctionTool.from(
  async (params: { query: string; category?: string }) => {
    // This could be a real database query, an API call, anything
    const results = await fetchFromProductDB(params.query, params.category);
    return JSON.stringify(results);
  },
  {
    name: "search_products",
    description:
      "Search the product catalog for items matching a query. Returns product names, prices, and availability. Use when users ask about specific products, pricing, or inventory.",
    parameters: {
      type: "object",
      properties: {
        query: {
          type: "string",
          description:
            "The search query, e.g., 'wireless headphones' or 'enterprise plan'",
        },
        category: {
          type: "string",
          description:
            "Optional product category to filter by, e.g., 'electronics', 'software'",
        },
      },
      required: ["query"], // query is required, category is optional
    },
  },
);

💡

THE SECRET TO GREAT TOOLS: THE DESCRIPTION

The description field is the most important part of a FunctionTool. The LLM reads it to decide whether to call your tool. Write it like you're explaining to a colleague: what does this tool do, when should you use it, what does it return? A bad description = the agent never calls your tool (or calls it at the wrong time).

Phase 3 of 6

Project 1 — OpenAIAgent with Custom Tools

Let's build an AI assistant that can answer questions using a combination of: your internal policy documents (RAG), live calculations, and external data lookups.

typescript

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138

// src/project1-agent.ts
import "dotenv/config";
import { Settings } from "llamaindex";
import { OpenAI } from "llamaindex/llms";
import { OpenAIEmbedding } from "llamaindex/embeddings";
import { SimpleDirectoryReader } from "llamaindex/ingestion";
import { VectorStoreIndex } from "llamaindex/indices";
import { OpenAIAgent } from "llamaindex/agent";
import { FunctionTool } from "llamaindex/tools";

async function main() {
  // ── Configure LlamaIndex ───────────────────────────────────────────
  const llm = new OpenAI({
    model: "gpt-4o",
    apiKey: process.env.OPENAI_API_KEY,
    temperature: 0.1,
  });

  Settings.llm = llm;
  Settings.embedModel = new OpenAIEmbedding({
    model: "text-embedding-3-large",
    apiKey: process.env.OPENAI_API_KEY,
  });

  // ── Build a QueryEngine from your documents ────────────────────────
  const reader = new SimpleDirectoryReader();
  const documents = await reader.loadData("./src/data");
  const index = await VectorStoreIndex.fromDocuments(documents);
  const queryEngine = index.asQueryEngine({ similarityTopK: 4 });

  // ── Define your tools ──────────────────────────────────────────────

  // Tool 1: Search internal documents (wraps our QueryEngine as a tool)
  const searchDocumentsTool = FunctionTool.from(
    async (params: { query: string }) => {
      const response = await queryEngine.query({ query: params.query });
      return response.toString();
    },
    {
      name: "search_internal_docs",
      description:
        "Search internal company documents including HR policies, product manuals, support guides, and FAQs. Use this for any question about company policies, product features, pricing, or procedures.",
      parameters: {
        type: "object",
        properties: {
          query: {
            type: "string",
            description: "The specific question or topic to search for",
          },
        },
        required: ["query"],
      },
    },
  );

  // Tool 2: Calculate (for math/date questions the LLM might get wrong)
  const calculateTool = FunctionTool.from(
    (params: { expression: string }) => {
      try {
        // Safe eval — only allow math operations
        const result = Function(
          '"use strict"; return (' + params.expression + ")",
        )();
        return `Result: ${result}`;
      } catch {
        return "Error: Invalid expression";
      }
    },
    {
      name: "calculate",
      description:
        "Evaluate a mathematical expression. Use this for calculations involving numbers, percentages, dates, or any arithmetic. Input must be a valid JavaScript math expression.",
      parameters: {
        type: "object",
        properties: {
          expression: {
            type: "string",
            description:
              "A JavaScript math expression, e.g., '1500 * 0.85' or '365 - 90'",
          },
        },
        required: ["expression"],
      },
    },
  );

  // Tool 3: Get current date (so the agent knows "today")
  const getCurrentDateTool = FunctionTool.from(
    () => {
      const now = new Date();
      return `Current date: ${now.toLocaleDateString("en-US", {
        weekday: "long",
        year: "numeric",
        month: "long",
        day: "numeric",
      })}`;
    },
    {
      name: "get_current_date",
      description:
        "Returns today's date. Use this whenever the user refers to 'today', 'now', 'this week', or any relative time reference.",
      parameters: {
        type: "object",
        properties: {},
        required: [],
      },
    },
  );

  // ── Create the agent with all tools ───────────────────────────────
  const agent = new OpenAIAgent({
    tools: [searchDocumentsTool, calculateTool, getCurrentDateTool],
    llm,
    verbose: true, // Shows the agent's reasoning steps
    systemPrompt: `You are an intelligent assistant with access to internal company documents and calculation tools.
    Always search the internal docs before answering policy questions.
    Be accurate and cite which tool you used when relevant.
    If you're not sure, say so.`,
  });

  // ── Ask complex questions that require multiple tools ─────────────
  const questions = [
    "If my equipment budget is $1,500 and a monitor costs $450, what percentage of my budget would be left after buying it?",
    "What's our remote work policy and how does it relate to today's date?",
    "How many days until the end of this year?",
  ];

  for (const question of questions) {
    console.log("\n" + "─".repeat(60));
    console.log(`❓ Question: ${question}`);
    console.log("─".repeat(60));

    const response = await agent.chat({ message: question });
    console.log(`\n✅ Answer: ${response.toString()}`);
  }
}

main().catch(console.error);

Run it with verbose: true to see the agent's internal reasoning:

bash

npx ts-node src/project1-agent.ts

code

1234567891011

────────────────────────────────────────────────────────────
❓ Question: If my equipment budget is $1,500 and a monitor costs $450...
────────────────────────────────────────────────────────────

[Agent] Thinking: The user wants to know the percentage left after buying a monitor.
        I'll use the calculate tool for this.
[Tool Call] calculate({ expression: "(1500 - 450) / 1500 * 100" })
[Tool Result] Result: 70

✅ Answer: After buying a $450 monitor, you would have $1,050 remaining,
which is 70% of your total $1,500 equipment budget.

Phase 4 of 6

The RouterQueryEngine — Intelligent Multi-Source Routing

The RouterQueryEngine solves one of the hardest problems in production RAG: when you have multiple knowledge bases, how does the AI decide which one to search?

Imagine you have:

A product manual (for "how does feature X work?" questions)
A pricing document (for "how much does plan Y cost?" questions)
An FAQ database (for common support questions)

Instead of searching all three every time (expensive, noisy), the RouterQueryEngine asks the LLM to read the descriptions of each engine and route to the best one.

RouterQueryEngine — Intelligent Routing Architecture

💡

USER QUERY "What's the price of the Enterprise plan?

↓

🧠 LLM SELECTOR

"This is a pricing question. Route to the Pricing Engine."

📖

PRODUCT ENGINE

💰

PRICING ENGINE ✓

Selected

❓

FAQ ENGINE

↓

✅ Precise answer from the Pricing Engine

Phase 5 of 6

Project 2 — Build a RouterQueryEngine

Let's build the RouterQueryEngine with two real knowledge bases. First, create your data:

bash

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758

mkdir -p src/data/product-docs
mkdir -p src/data/pricing-docs

cat > src/data/product-docs/features.txt << 'EOF'
Product Features Guide — CloudSync Pro 2026

CORE FEATURES
Real-time collaboration: Multiple users can edit documents simultaneously with sub-100ms sync.
Version history: Every change is tracked. Restore any version from the last 365 days.
File types: Supports 150+ file formats including Office, Google Docs, Figma, and CAD files.
Storage: Starts at 100GB, scales to unlimited on Enterprise.
Offline mode: Full offline access on desktop and mobile apps (iOS 17+, Android 14+).

INTEGRATIONS
Native integrations with: Slack, GitHub, Jira, Notion, Salesforce, and 200+ apps via Zapier.
API: REST API with SDKs for JavaScript, Python, Go, and Ruby.
SSO: Supports SAML 2.0, LDAP, and OAuth 2.0 for enterprise authentication.

SECURITY
End-to-end encryption at rest and in transit.
SOC 2 Type II certified.
GDPR, HIPAA, and CCPA compliant on Enterprise plan.
Custom data residency: Choose US, EU, or APAC region on Enterprise.
EOF

cat > src/data/pricing-docs/plans.txt << 'EOF'
Pricing Guide — CloudSync Pro 2026

STARTER PLAN — $12/user/month (billed annually) or $15/user/month (monthly)
- Up to 10 users
- 100GB storage per user
- Core collaboration features
- Email support (48h response)
- 30-day version history

PROFESSIONAL PLAN — $28/user/month (billed annually) or $35/user/month (monthly)
- Unlimited users
- 1TB storage per user
- All Starter features + advanced analytics
- Priority support (4h response)
- 365-day version history
- Custom branding

ENTERPRISE PLAN — Custom pricing (contact sales)
- Unlimited users and storage
- All Professional features
- Dedicated account manager
- SLA: 99.99% uptime guarantee
- HIPAA/GDPR compliance modules
- Custom data residency
- SSO with Active Directory
- 24/7 phone support

DISCOUNTS
Annual billing: 20% off monthly rates
Non-profit: 50% off Professional plan
Students/Education: Free Starter for up to 30 users
EOF

Now build the RouterQueryEngine:

typescript

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495

// src/project2-router.ts
import "dotenv/config";
import { Settings } from "llamaindex";
import { OpenAI } from "llamaindex/llms";
import { OpenAIEmbedding } from "llamaindex/embeddings";
import { SimpleDirectoryReader } from "llamaindex/ingestion";
import { VectorStoreIndex } from "llamaindex/indices";
import { RouterQueryEngine } from "llamaindex/engines";
import { LLMSingleSelector } from "llamaindex/selectors";
import { SentenceSplitter } from "llamaindex/node-parser";

async function buildIndex(dataPath: string) {
  const reader = new SimpleDirectoryReader();
  const documents = await reader.loadData(dataPath);
  return VectorStoreIndex.fromDocuments(documents);
}

async function main() {
  const llm = new OpenAI({
    model: "gpt-4o",
    apiKey: process.env.OPENAI_API_KEY,
    temperature: 0,
  });

  Settings.llm = llm;
  Settings.embedModel = new OpenAIEmbedding({
    model: "text-embedding-3-large",
    apiKey: process.env.OPENAI_API_KEY,
  });
  Settings.nodeParser = new SentenceSplitter({
    chunkSize: 512,
    chunkOverlap: 50,
  });

  console.log("Building knowledge bases...");

  // Build separate indexes for each data source
  const [productIndex, pricingIndex] = await Promise.all([
    buildIndex("./src/data/product-docs"),
    buildIndex("./src/data/pricing-docs"),
  ]);

  console.log("Creating RouterQueryEngine...");

  // RouterQueryEngine wraps multiple query engines with descriptions
  // The LLMSingleSelector reads the descriptions to pick the right one
  const routerEngine = RouterQueryEngine.fromDefaults({
    queryEngineTools: [
      {
        queryEngine: productIndex.asQueryEngine({ similarityTopK: 3 }),
        description:
          "Use this for questions about product features, functionality, integrations, security, supported file types, offline mode, APIs, and technical specifications. Best for 'how does X work?' questions.",
      },
      {
        queryEngine: pricingIndex.asQueryEngine({ similarityTopK: 3 }),
        description:
          "Use this for questions about pricing, subscription plans, billing cycles, costs, discounts, plan comparisons, and what each tier includes. Best for 'how much does X cost?' questions.",
      },
    ],
    selector: new LLMSingleSelector({ llm }),
    verbose: true, // Shows routing decisions
  });

  // Test with questions that should route to different engines
  const testQueries = [
    {
      query: "Does the product support offline access on mobile?",
      expectedEngine: "Product Engine",
    },
    {
      query: "What is the price of the Professional plan billed annually?",
      expectedEngine: "Pricing Engine",
    },
    {
      query: "Is there a discount for non-profits?",
      expectedEngine: "Pricing Engine",
    },
    {
      query: "What authentication methods are supported for enterprise SSO?",
      expectedEngine: "Product Engine",
    },
  ];

  for (const { query, expectedEngine } of testQueries) {
    console.log("\n" + "═".repeat(60));
    console.log(`📌 Query: ${query}`);
    console.log(`   Expected: ${expectedEngine}`);
    console.log("─".repeat(60));

    const response = await routerEngine.query({ query });
    console.log(`✅ Answer: ${response.toString()}`);
  }
}

main().catch(console.error);

bash

npx ts-node src/project2-router.ts

code

123456789101112131415161718

Building knowledge bases...
Creating RouterQueryEngine...

════════════════════════════════════════════════════════════
📌 Query: What is the price of the Professional plan billed annually?
   Expected: Pricing Engine
────────────────────────────────────────────────────────────
[Router] Selected: Pricing Engine (score: 0.95)
✅ Answer: The Professional plan costs $28 per user per month when billed
annually. This represents a 20% discount compared to the monthly rate of $35.

════════════════════════════════════════════════════════════
📌 Query: Does the product support offline access on mobile?
   Expected: Product Engine
────────────────────────────────────────────────────────────
[Router] Selected: Product Engine (score: 0.93)
✅ Answer: Yes. CloudSync Pro offers full offline access on desktop and
mobile apps running iOS 17+ and Android 14+.

Phase 6 of 6

Project 3 — Multi-Source RAG Agent: The Full Picture

Now let's combine everything: an agent that has access to BOTH the RouterQueryEngine (your knowledge base) AND external tools (calculations, live data). This is the pattern used in production enterprise AI applications.

typescript

// src/project3-multi-agent.ts
import "dotenv/config";
import { Settings } from "llamaindex";
import { OpenAI } from "llamaindex/llms";
import { OpenAIEmbedding } from "llamaindex/embeddings";
import { SimpleDirectoryReader } from "llamaindex/ingestion";
import { VectorStoreIndex } from "llamaindex/indices";
import { RouterQueryEngine } from "llamaindex/engines";
import { LLMSingleSelector } from "llamaindex/selectors";
import { OpenAIAgent } from "llamaindex/agent";
import { FunctionTool } from "llamaindex/tools";
import { SentenceSplitter } from "llamaindex/node-parser";

async function buildIndex(path: string) {
  const reader = new SimpleDirectoryReader();
  const docs = await reader.loadData(path);
  return VectorStoreIndex.fromDocuments(docs);
}

async function main() {
  const llm = new OpenAI({
    model: "gpt-4o",
    apiKey: process.env.OPENAI_API_KEY,
    temperature: 0.1,
  });

  Settings.llm = llm;
  Settings.embedModel = new OpenAIEmbedding({
    model: "text-embedding-3-large",
    apiKey: process.env.OPENAI_API_KEY,
  });
  Settings.nodeParser = new SentenceSplitter({
    chunkSize: 512,
    chunkOverlap: 64,
  });

  console.log("Initializing knowledge bases...");
  const [productIndex, pricingIndex] = await Promise.all([
    buildIndex("./src/data/product-docs"),
    buildIndex("./src/data/pricing-docs"),
  ]);

  // Create the RouterQueryEngine (combined knowledge base)
  const routerEngine = RouterQueryEngine.fromDefaults({
    queryEngineTools: [
      {
        queryEngine: productIndex.asQueryEngine({ similarityTopK: 3 }),
        description:
          "Product features, integrations, security, technical specs",
      },
      {
        queryEngine: pricingIndex.asQueryEngine({ similarityTopK: 3 }),
        description:
          "Pricing, subscription plans, billing, discounts, plan comparisons",
      },
    ],
    selector: new LLMSingleSelector({ llm }),
  });

  // ── Define Agent Tools ─────────────────────────────────────────────

  // Tool 1: Knowledge base (wraps the entire RouterQueryEngine)
  const knowledgeBaseTool = FunctionTool.from(
    async (params: { question: string }) => {
      const response = await routerEngine.query({ query: params.question });
      return response.toString();
    },
    {
      name: "search_knowledge_base",
      description:
        "Search the complete knowledge base for answers about products, features, pricing, integrations, security, and plans. This should be your FIRST tool for most questions.",
      parameters: {
        type: "object",
        properties: {
          question: {
            type: "string",
            description: "The question to search the knowledge base for",
          },
        },
        required: ["question"],
      },
    },
  );

  // Tool 2: Cost calculator (uses real pricing data from knowledge base)
  const costCalculatorTool = FunctionTool.from(
    (params: {
      users: number;
      plan: string;
      billing: "monthly" | "annual";
    }) => {
      const pricing: Record<string, Record<string, number>> = {
        starter: { monthly: 15, annual: 12 },
        professional: { monthly: 35, annual: 28 },
      };

      const planKey = params.plan.toLowerCase();
      if (!pricing[planKey]) {
        return "Enterprise plan requires custom pricing. Please contact sales.";
      }

      const ratePerUser = pricing[planKey][params.billing];
      const totalMonthly = ratePerUser * params.users;
      const totalAnnual = totalMonthly * 12;
      const savings =
        params.billing === "annual"
          ? (pricing[planKey].monthly - pricing[planKey].annual) *
            params.users *
            12
          : 0;

      return JSON.stringify({
        plan: params.plan,
        users: params.users,
        billing: params.billing,
        ratePerUser,
        totalMonthly,
        totalAnnual,
        annualSavings: savings,
      });
    },
    {
      name: "calculate_plan_cost",
      description:
        "Calculate the exact cost of a CloudSync plan for a given number of users and billing period. Use this when users want to know total cost, not just per-user rates.",
      parameters: {
        type: "object",
        properties: {
          users: { type: "number", description: "Number of users" },
          plan: {
            type: "string",
            enum: ["starter", "professional", "enterprise"],
            description: "The plan name",
          },
          billing: {
            type: "string",
            enum: ["monthly", "annual"],
            description: "Billing frequency",
          },
        },
        required: ["users", "plan", "billing"],
      },
    },
  );

  // ── Create the final Agent ─────────────────────────────────────────
  const agent = new OpenAIAgent({
    tools: [knowledgeBaseTool, costCalculatorTool],
    llm,
    verbose: true,
    systemPrompt: `You are a CloudSync Pro sales and support assistant.
    Always use the search_knowledge_base tool first to ground your answers in official documentation.
    Use the calculate_plan_cost tool when users ask about total costs for their team size.
    Be helpful, accurate, and proactive about suggesting relevant plans.`,
  });

  // ── Complex multi-step queries ────────────────────────────────────
  const complexQueries = [
    "We have a 25-person team and need HIPAA compliance. Which plan should we choose and how much would it cost annually?",
    "Does CloudSync integrate with GitHub? And if we went with Professional for 10 users billed annually, what's the total yearly cost?",
  ];

  for (const query of complexQueries) {
    console.log("\n" + "═".repeat(70));
    console.log(`❓ ${query}`);
    console.log("═".repeat(70));

    const response = await agent.chat({ message: query });
    console.log(`\n✅ ${response.toString()}`);
  }
}

main().catch(console.error);

code

12345678910111213

═══════════════════════════════════════════════════════════════════════
❓ We have a 25-person team and need HIPAA compliance. Which plan?
═══════════════════════════════════════════════════════════════════════

[Agent] Thinking: I need to find which plan includes HIPAA compliance, then calculate cost for 25 users.
[Tool] search_knowledge_base({ question: "Which plan includes HIPAA compliance?" })
[Result] HIPAA compliance is available exclusively on the Enterprise plan.
[Agent] Now I know the plan. Enterprise has custom pricing, so I can't calculate — I'll note that.

✅ For a 25-person team requiring HIPAA compliance, you need the Enterprise plan.
HIPAA compliance is only available on Enterprise, which also includes unlimited
storage, 99.99% SLA, and a dedicated account manager. Enterprise pricing is custom
— I recommend contacting the sales team for a quote tailored to your 25-user setup.

✓When to Use Each Approach

🔷 QueryEngine: Single knowledge source, deterministic flow, no reasoning needed. Fast and cheap. Use for search bars and APIs.
🔷 RouterQueryEngine: Multiple knowledge sources, smart routing, no external tools. Great for multi-domain document Q&A systems.
🔷 Agent: Complex multi-step reasoning, external tools (APIs, databases, calculations), dynamic decision-making. Use for assistants and copilots.

Key Takeaways

An Agent is an LLM + Tools + ReAct loop

Reason → Act → Observe → repeat. The same loop powers everything from simple task runners to autonomous AI systems.

The tool description is everything

The LLM reads it to decide whether and how to use the tool. Write it like documentation for a new engineer.

RouterQueryEngine eliminates noise

Instead of searching all knowledge bases, the router selects the best one. This reduces cost and improves precision.

Use verbose mode for debugging

Seeing the agent's reasoning steps is essential for debugging its logic and tool selection.

Wrap QueryEngines as Tools

Your entire knowledge base becomes just another tool the agent can choose to use.

Agents are non-deterministic

They can take different paths to the same answer. Test them with known queries and expected answers.

What's Next in the Series

🚀

PART 4 OF 4 — THE FINAL CHAPTER

Production-Ready RAG: Persistence, Streaming & Next.js 16 Deployment

Your server restarts and everything re-indexes from scratch. Your API responds after 10 seconds of "thinking". These are the two biggest production RAG killers — and we fix both.

✦ Persistent index storage
✦ Real-time streaming responses
✦ create-llama CLI
✦ Next.js 16 full-stack app
✦ Vercel deployment