Imagine a catalog with 30 million products. A customer types: "I want something lightweight for translation that isn't too expensive."
How does AI find the right product — without scanning every description one by one?
In our last article on Embeddings, we learned that AI converts words into lists of numbers called vectors — digital fingerprints of meaning. Now we go one level further: once you have millions of those fingerprints, how do you find the one that matches your question?
That's Similarity Search — and it's the engine behind Netflix, TikTok, ChatGPT's memory, and every RAG application in production today.
The Problem with the Old Internet
Before 2015 or so, search meant one thing: find pages that contain the exact words you typed.
Keyword Search
Matches exact letters. Nothing more.
"glasses" doesn't contain the letters d-e-v-i-c-e
Semantic Search
Converts query to math. Finds nearest meaning.
Vectors are 95% similar in meaning-space
The difference isn't better engineering. It's a fundamentally different question being asked:
- Keyword search asks: Does this page contain these letters?
- Semantic search asks: Does this page mean the same thing as my query?
How Similarity Search Works in 3 Steps
Every semantic search system — from e-commerce to RAG — follows the same three-step process:
The Three Ways to Measure Similarity
Now comes the math. Given two vectors, how do you measure how "close" they are? There are three standard methods.
Method 1: Cosine Similarity — The Flashlight
Think of every sentence as an arrow pointing in some direction in high-dimensional space. Two people both facing north are pointing the same direction — even if one is 2 meters tall and the other is a child. Cosine Similarity measures the angle between two arrows — if two sentences point in nearly the same direction, they mean nearly the same thing. The direction tells you the meaning — not the length.
Why Cosine for text? A short tweet and a long article about the same topic will point in the same direction even though their lengths are completely different. Cosine only cares about direction — which is exactly what we want for meaning.
Method 2: Euclidean Distance — The Ruler
Instead of measuring the angle, measure the straight-line distance between two points in space. If two items are close together geometrically, they're similar.
Cosine (The Flashlight)
"Are you pointing the same direction as me?"
Best for: text, documents, queries
Euclidean (The Ruler)
"How far apart are you from me?"
Best for: images, coordinates, numbers
Method 3: Dot Product — The Shortcut
Dot product multiplies matching elements of two vectors and sums the results. It combines direction and magnitude in one operation.
Example:
A = [1, 2, 3]
B = [2, 1, 1]
(1×2) + (2×1) + (3×1) = 2 + 2 + 3
= 7
The catch: if vectors are very long (large magnitude), the dot product inflates even for loosely related pairs. This is why it works best with normalized vectors (scaled to length = 1, so magnitude no longer affects the score). When you normalize, dot product becomes mathematically equivalent to cosine similarity — but faster to compute at scale.
The Three Similarity Methods in Practice
Take this real query: "I want something lightweight for translation that is affordable"
Run it against 7 devices with our embedding model and the scores look like this:
The AI ranked the cheapest translation devices first — without a single filter rule written.
Notice: The Garmin sports watch scores negative — the AI correctly identified it as pointing in the opposite direction from "lightweight translation." We didn't write any rules. The math did it.
Where Do You Store 30 Million Vectors?
A regular SQL database can't efficiently search millions of vectors. You need a vector database (a database purpose-built for storing and searching high-dimensional vectors) — designed to find the closest matches using ANN (Approximate Nearest Neighbor) algorithms.
The Complete Python Implementation
Here's the full working pipeline — from raw text to ranked results:
# pip install sentence-transformers scikit-learn numpy
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
# Step 1: Load the embedding model
# 50+ languages, 384-dimensional output, runs on CPU
encoder = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
# Step 2: Define your catalog
devices = [
{"name": "Ray-Ban Meta Ultra", "desc": "Smart glasses — 48MP camera, 40-language translation", "price": 549},
{"name": "Xreal Air 3 Ultra", "desc": "AR glasses with 4K display and AR apps", "price": 449},
{"name": "Samsung Galaxy Ring v2", "desc": "Smart ring for health and sleep tracking", "price": 349},
{"name": "RingConverse Translate", "desc": "Translation ring — 30 languages, instant", "price": 199},
{"name": "Apple AirPods Pro 3", "desc": "Smart earbuds with real-time translation", "price": 299},
{"name": "Sony LinkBuds Open 2", "desc": "Open earbuds with lightweight translation", "price": 199},
{"name": "Garmin Fenix 9 Solar", "desc": "Sports watch for adventure and running", "price": 799},
]
# Step 3: Embed the catalog (OFFLINE — done once, stored)
device_texts = [f"{d['name']} - {d['desc']}" for d in devices]
device_embeddings = encoder.encode(device_texts)
print(f"Stored {len(devices)} devices × {device_embeddings[0].shape[0]} dimensions")
# → Stored 7 devices × 384 dimensions
# Step 4: Embed the user query (ONLINE — every request)
query = "I want something lightweight for translation that is affordable"
query_embedding = encoder.encode(query)
# Step 5: Score and rank
scores = cosine_similarity([query_embedding], device_embeddings)[0]
ranked = scores.argsort()[::-1] # highest first
print(f'\nQuery: "{query}"\n')
print(f"{'#':<4} {'Score':>7} {'Device':<28} {'Price':>6}")
print("─" * 55)
for rank, i in enumerate(ranked, 1):
flag = "✅" if scores[i] > 0.4 else "⚠️ " if scores[i] > 0.2 else "❌"
print(f" {rank}. {scores[i]:>6.3f} {flag} {devices[i]['name']:<25} ${devices[i]['price']}")
Output:
Stored 7 devices × 384 dimensions
Query: "I want something lightweight for translation that is affordable"
# Score Device Price
───────────────────────────────────────────────────────
1. 0.486 ✅ RingConverse Translate $199
2. 0.394 ✅ Apple AirPods Pro 3 $299
3. 0.382 ✅ Sony LinkBuds Open 2 $199
4. 0.301 ⚠️ Ray-Ban Meta Ultra $549
5. 0.271 ⚠️ Xreal Air 3 Ultra $449
6. 0.108 ❌ Samsung Galaxy Ring v2 $349
7. -0.027 ❌ Garmin Fenix 9 Solar $799
The AI found the two cheapest translation devices as top matches, without any price filter. It understood that "affordable" and "$199" are semantically linked.
cosine_similarity + argsort lines for a vector DB query — e.g., collection.query(query_embeddings=[query_embedding], n_results=10) in ChromaDB. The math is identical; the vector DB handles the speed.
How It Finds 30 Million Answers in Under 1 Second
You might be wondering: cosine similarity between your query and every stored vector sounds slow. For 7 devices it's instant. For 30 million? At 384 dimensions each, a brute-force scan would take seconds — too slow for a real app.
This is solved by ANN (Approximate Nearest Neighbor) algorithms. Instead of checking every vector, they build smart index structures that let the database skip most of the search space.
Real-World Case Studies
Netflix — ~$1B in Reduced Churn Per Year
Netflix doesn't recommend movies by matching genre tags. It converts your watch history into a vector — a mathematical fingerprint of your taste. Then it finds movie vectors that are closest to that fingerprint.
The result: "Because you watched X, you'll like Y" — even when X and Y share zero keywords or genres. Netflix estimates this recommendation engine saves them approximately $1 billion annually in reduced churn — users who find content they love don't cancel.
TikTok's For You Page
Each video is embedded based on its transcript, audio patterns, and visual content. Each user has a preference vector updated in real-time. The algorithm is literally running cosine similarity billions of times per second to decide what appears next.
Spotify's Discover Weekly
Spotify embeds both songs (audio features + lyrics) and users (listening patterns). Your Monday morning playlist is the result of finding the 30 song vectors closest to your personal taste vector.
Why This Is the Heart of RAG
If you've heard of RAG (Retrieval-Augmented Generation) — the technique that lets AI answer questions about your private documents — Similarity Search is its core engine.
Similarity Search powers the retrieval step — finding the relevant paragraphs before the LLM generates an answer.
When you ask an AI "What's our refund policy?" about a company's documentation:
- Your question is converted to a vector
- Similarity Search finds the most relevant paragraphs in the knowledge base
- Those paragraphs are handed to the LLM as context
- The LLM answers using only that retrieved information
Without Similarity Search, there is no RAG.
The Core Insight
Similarity Search doesn't find matching words.
It finds matching meaning.
The query and the result can share zero words — and still be a perfect match.
That's why searching "something for my eyes" returns smart glasses. Why TikTok shows you videos you didn't know you wanted. Why your company's AI chatbot finds the right policy clause even when you phrase the question in ten different ways.
The math doesn't care about letters. It only cares about where two things land in meaning-space.
PRO TIPS & COMMON MISTAKES
Try It Yourself
Run the Python code above on your local machine — it downloads the model automatically on first run. Then try these experiments:
- Cross-language search: Change the query to Arabic —
"أريد شيئاً للترجمة وسعره معقول". The multilingual model should return nearly the same ranking as the English query. - Synonym test: Try
"inexpensive"vs"affordable"vs"cheap"as the query. Watch how stable the top-3 ranking stays — the model understands they mean the same thing. - Opposite test: Try
"I want the most expensive flagship device with no budget limit"— watch the Garmin Fenix jump from last place to first.
All three experiments reveal the same truth: the model understands meaning, not just words.
Next in AI Fundamentals
The Artificial Neuron
The embedding model that powered everything in this article is built from billions of tiny decision-making units. We'll open the black box and build one from scratch in Python.