Embeddings: The Mathematical Fingerprint of Meaning
How does a computer understand that 'Apple' the fruit is different from 'Apple' the tech giant? It uses high-dimensional GPS coordinates called Embeddings.
Type "apple" into a grocery app → you get fruit. Type "apple" into a stock app → you get a $3 trillion company. Same four letters. Completely different meanings. How does the AI tell them apart — without actually "reading" the word?
The answer is a mathematical trick called Embeddings — and it's the silent engine powering every AI tool you use today.
Netflix's own 2016 research estimated their recommendation engine—powered by embeddings—saves them $1 billion per year in reduced churn by showing users content they actually love.
In our last article on Part 1 — Tokens, we saw how AI breaks words into numbers. But Token #27438 (" Meta") is just a label. It has no meaning. So how does the AI actually understand meaning? Welcome to the concept that makes modern AI possible: Embeddings and Vectors.
To understand how AI encodes meaning, we first need to understand how numbers can represent similarity.
The Coordinate System
To understand how AI thinks, we have to look at how humans classify the world. Imagine describing a person using only three numbers: Height (cm), Weight (kg), and Age (years).
If we represent three people using this exact format:
- Alex:
[180, 75, 25] - Marcus:
[178, 74, 26] - Sarah:
[165, 60, 30]
This is just a 2D slice — in reality, AI works in 384+ dimensions, making these clusters far more precise. But even here, Alex and Marcus land right next to each other, while Sarah sits far away.
The computer didn't need to look at their pictures. It didn't need to interview them. By seeing that their numbers are mathematically close, it deduced that they share similar physical characteristics. This flat list of numbers [180, 75, 25] is called a Vector.
"Cairo sits at (30.04°N, 31.23°E). Paris sits at (48.85°N, 2.35°E). Nearby cities have similar coordinates, while Cairo and Tokyo are far apart."
Words with similar meaning have similar coordinates (vectors) — just mapped in a 384-dimensional space instead of 2. The closer the coordinates, the closer the meaning.
Vectors work — but three dimensions aren't nearly enough to capture the complexity of real-world meaning.
The 3D Limit
Vectors are great, but three numbers can't describe the complexity of the real world. Suppose we represent smart devices in a 3D vector space using [Price ($), Weight (g), Features Count]:
- Vector:
[549, 48, 5] - Price: $549, Weight: 48g, Features: 5
- Vector:
[449, 72, 4] - Price: $449, Weight: 72g, Features: 4
- Vector:
[349, 3, 4] - Price: $349, Weight: 3g, Features: 4
The AI looks at the math: the Ray-Bans and the Xreal glasses cluster close together, while the Galaxy Ring is far away. It has successfully grouped the eyewear together — without ever being told what "glasses" are.
But here is the fatal flaw: What if we need to encode battery life? Camera quality? Multilingual translation capability? We can't capture all of that in just three dimensions. To truly capture the meaning and context of a word or an object, we need more numbers. A lot more.
This is where it gets interesting — instead of humans picking the dimensions, we let the AI discover them automatically.
From Vector to Embedding: The Masterpiece
When humans pick the properties (Height, Weight, Price), it's a basic Vector — subjective and limited. When the AI reads the entire internet and mathematically decides on the properties itself? That's an Embedding.
Instead of 3 properties, the model assigns a dense list of 384 to 3,072 numbers to form a digital fingerprint of meaning:
- Small Models: 384 dimensions
- Medium Models: 768 dimensions
- Large Models: 3,072 dimensions
So "Smart Glasses" might become [0.012, -0.443, 0.891, 0.112 ... 380 more dimensions]. Each of those hundreds of numbers represents a hidden layer of meaning the AI learned from reading the internet. We humans don't even know exactly what dimension #247 represents — it might encode "electronic-ness", while #89 might encode "wearable-ness".
We don't actually know these labels — the AI invented these concepts itself. We only see the numbers. But by analyzing patterns across billions of pages, the model mathematically organizes words so that related concepts align.
How AI Learns Meaning: Contrastive Learning
You might wonder: who decides what dimension #89 means? Nobody. The model learns these dimensions automatically through a process called Contrastive Learning. During training, it's shown millions of sentence pairs with labels:
The Contrastive Training Loop
- Shown paired examples: "I need coffee" + "Necesito café" or "Smart glasses" + "AR eyewear".
- Action: The AI pulls their vector coordinates closer together in space.
- Shown opposing examples: "I need coffee" + "Je veux dormir" (I want to sleep) or "Smart glasses" + "Running shoes".
- Action: The AI pushes their vector coordinates far apart in space.
After billions of such iterations, the model develops internal dimensions that capture meaning — not because anyone defined them, but because that geometry is the only way to satisfy all the constraints simultaneously. The coordinates stabilize, creating a highly organized map of human concepts.
Now we can see what that learned geometry actually looks like across languages.
The Cross-Language Magic: Cosine Similarity
To measure the distance between vectors in high-dimensional space, we use a formula called Cosine Similarity. It measures the angle between two vectors:
- 1.0: Pointing in the exact same direction (identical meaning).
- 0.0: Orthogonal/unrelated.
- -1.0: Pointing in opposite directions.
Here are real similarity scores calculated by a multilingual embedding model:
Notice the last pair: they share zero keywords, yet the similarity is 91% because the semantic intent is identical. This is the superpower of semantic search.
Same meaning = same numbers, in any language. The AI isn't translating "café" to "coffee." It just noticed that in its training data, "café" appears in the exact same contexts, next to the same words, as "coffee" does — so their digital fingerprints end up nearly identical. This works because the model was trained on millions of multilingual sentence pairs where the same idea appears side by side, and the contrastive learning we saw in Step 3 forces those equivalents toward the same point in embedding space.
💡 Quick Cosine Similarity Reference
| Score | Meaning | Example |
|---|---|---|
| 0.95+ | Identical meaning in different words/languages | "I need coffee" vs "Necesito café" |
| 0.70 – 0.90 | Highly related concepts | "Smart glasses" vs "AR eyewear" |
| 0.30 – 0.60 | Loosely related concepts | "Smart glasses" vs "Wearable tech" |
| < 0.30 | Unrelated concepts | "I need coffee" vs "Je veux dormir" |
Now that we understand how meaning is encoded in numbers, let's see why this shattered traditional search.
Why This Changed the Internet: Semantic Search
Having a map of meaning changes how we search. This single idea killed traditional keyword search. Think about the old internet — if you searched an online store for "device for my eyes":
Search Evolution
- Query: "device for my eyes"
- Logic: Find exact letter match.
- Result: ❌ 0 Matches (the word 'Glasses' doesn't contain 'device').
- Query: "device for my eyes"
- Logic: Find nearest vector neighbors.
- Result: ✅ Ray-Ban Glasses (Similarity: 95%).
This is exactly how TikTok's algorithm knows what video to show you next, and how Netflix recommends movies. Your watch history is converted into a vector, and the system finds the content whose vectors are nearest to yours in the coordinate space.
Code Example: Running Semantic Search in Python
Are you a developer? You can run a full multilingual embedding model and query it locally on your machine for free. We'll use an open-source model called paraphrase-multilingual-MiniLM-L12-v2 — it supports 50 languages and outputs a 384-dimensional vector.
# Install dependencies first:
# %pip install sentence-transformers scikit-learn
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
# Load a lightweight, multilingual embedding model
encoder = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
# Our catalog of devices
devices = [
"Ray-Ban Meta Ultra - Smart glasses with camera and translation",
"Sony LinkBuds Open - Lightweight earbuds with translation",
"Garmin Fenix 9 - Rugged sports watch for running",
]
# Step 1: Convert catalog sentences into vectors (embeddings)
device_embeddings = encoder.encode(devices)
# Step 2: Encode the user search query
query = "I want something very light for translating that isn't too expensive"
query_embedding = encoder.encode(query)
# Step 3: Compute similarity scores between query and catalog
scores = cosine_similarity([query_embedding], device_embeddings)[0]
# Step 4: Output the ranked results
print(f"Query: \"{query}\"\n")
ranked = scores.argsort()[::-1]
for i in ranked:
print(f" {scores[i]:.3f} {devices[i]}")Output:
Query: "I want something very light for translating that isn't too expensive"
0.421 Sony LinkBuds Open - Lightweight earbuds with translation
0.287 Ray-Ban Meta Ultra - Smart glasses with camera and translation
0.031 Garmin Fenix 9 - Rugged sports watch for runningThe AI ranked the Sony LinkBuds first even though the user never typed "Sony", "earbuds", or "LinkBuds". The model aligned the query to the correct product because of the semantic relationship between light/translating and lightweight/translation — and the Garmin sports watch scored almost zero.
The Heart of RAG
If you are building AI applications today, you are likely building something called RAG (Retrieval-Augmented Generation) — the process of giving an LLM access to your private data (PDFs, databases) so it can answer questions grounded in your facts. Embeddings are the silent engine behind it.
You can't do RAG without embeddings. You can't do modern search without embeddings. The AI does not understand words — it understands the distance between numbers in a high-dimensional space. And that turns out to be a far more powerful way to understand the universe.
Try It in Your Head
If we pass two words to an embedding model:
- "Apple" (the fruit)
- "Apple" (the tech company)
Will their vectors be close or far apart in the embedding space? Think about the context.
▶▶ Reveal Answer
They will be far apart.
In modern Transformer models, embeddings are contextual. The word "Apple" (fruit) appears near words like orchard, eat, tree, pie, juice, red. The word "Apple" (company) appears near MacBook, stock, iPhone, App Store, CEO, Tim Cook.
Because the surrounding contexts are completely different, they map to distant points in the coordinate system, despite being spelled exactly the same. This is why semantic search beats keyword search every time.
Pro Tips for Builders
- ✅ Use the same model for indexing and querying. If you embed your documents with model A and your queries with model B, the scores are meaningless — they live in different spaces.
- ✅ Match the model to your domain.
paraphrase-multilingual-MiniLM-L12-v2is great for general text. For legal, medical, or code, use domain-specific models. - ✅ Embed at the chunk level, not the document level. A 50-page PDF embedded as one vector loses all its detail. Split into paragraphs first.
- ⚠️ Cosine similarity ≠ probability. A score of 0.85 doesn't mean "85% match." Calibrate thresholds on your own data.
- ⚠️ Bigger isn't always better. A 3,072-dimensional model costs ~8× more to store and query than a 384-dim one, with often marginal accuracy gains for simple retrieval.
Common Misconceptions
Key Takeaways
- ✓Embeddings convert text, images, and other data into lists of numbers called vectors.
- ✓Words and phrases with similar semantic meaning sit close together in the vector space.
- ✓AI learns these dimensions automatically via Contrastive Learning on billions of text examples.
- ✓Cosine Similarity calculates the angle between vectors (1.0 = identical meaning, 0.1 = unrelated).
- ✓Embeddings power Semantic Search, Recommendation Systems, and Retrieval-Augmented Generation (RAG).
- ✓You can download and run multilingual embedding models locally on your computer for free.
Try It Yourself
Run the code above and try these three experiments to build your intuition:
- Spanish query test: Change the query to
"Quiero algo ligero para traducir que no sea muy caro". Does the Sony LinkBuds still rank first? It should — same meaning, different language. - The Apple test: Add two new devices,
"Apple fruit - fresh red apple from the orchard"and"Apple MacBook - laptop computer". Then query"I want something to eat"— which Apple lands closer? - Swap the model: Replace
paraphrase-multilingual-MiniLM-L12-v2withall-MiniLM-L6-v2(English only, faster). Run the Spanish query and watch the score collapse — proving the multilingual model is doing real cross-language work, not keyword matching.
The complete Jupyter notebook for this article — all examples, the cosine similarity visualiser, and the semantic search engine — lives in the AI Fundamentals repository. Open Vectors.ipynb →
Up Next in the Series
Part 3 — Vector Databases: Now that you know what embeddings are, discover how Vector Databases index and query millions of high-dimensional vectors in under a millisecond. Read Part 3 →
Coming later — RAG: Give Your AI a Memory: How to connect any LLM to your private data using the embedding pipeline you just learned.