Type "apple" into a grocery app → you get fruit.
Type "apple" into a stock app → you get a $3 trillion company.
Same four letters. Completely different meanings. How does the AI tell them apart — without actually "reading" the word?
The answer is a mathematical trick called Embeddings — and it's the silent engine powering every AI tool you use today.
Netflix's own 2016 research estimated $1 billion per year in reduced churn from its recommendation engine — all powered by the same math you're about to learn.
In our last article, we learned that AI breaks down words into numbers called Tokens. But tokens are just IDs. They are meaningless labels. Token #27438 {" Meta"} doesn't inherently mean anything to the computer.
So how does the AI actually understand meaning? Welcome to the concept that makes modern AI possible: Embeddings and Vectors.
To understand how AI encodes meaning, we first need to understand how numbers can represent similarity.
Welcome to the Coordinate System
To understand how AI thinks, we have to look at how humans classify the world.
If I ask you to describe a person using only three numbers, you might choose:
- Height (cm)
- Weight (kg)
- Age (years)
If we plot three people using these exact metrics:
-
Alex:
[180, 75, 25] -
Sarah:
[165, 60, 30] -
Marcus:
[178, 74, 26]HEIGHT vs WEIGHT (2D slice of their vectors)
Weight (kg) Height (cm)Sarah
[165, 60]Marcus
[178, 74]Alex
[180, 75]Alex and Marcus land near each other in the space. Sarah is far away. The computer knows they're similar — without seeing a single photo.
This is a 2D slice — in reality, AI works in 384+ dimensions, making these clusters far more precise.
If you throw these numbers onto a 3D graph, Alex and Marcus will end up right next to each other. Sarah will be further away.
The computer didn't need to look at their pictures. It didn't need to interview them. By seeing that their numbers are mathematically close, the computer deduced that they share similar physical characteristics.
This list of numbers [180, 75, 25] is called a Vector.
THE MENTAL MODEL YOU'LL NEVER FORGET
Embeddings are GPS coordinates for meaning.
Cairo sits at (30.04°N, 31.23°E). Paris sits at (48.85°N, 2.35°E). Cities that are geographically close have similar coordinates. Words and sentences with similar meaning have similar embedding coordinates — just in 384 dimensions instead of 2.
Vectors work — but 3 dimensions aren't nearly enough to capture the complexity of real-world meaning.
The Problem with 3 Dimensions
Vectors are great, but 3 numbers aren't enough to describe the complexity of the real world.
Let's say we are describing smart devices to an AI using a 3D Vector: [Price, Weight, Number of Features]
Ray-Ban Smart Glasses
[549, 48, 5]
Xreal Air 3 Glasses
[449, 72, 4]
Galaxy Smart Ring
[349, 3, 4]
The AI looks at the math: The Ray-Bans and the Xreals are close together in the vector space. The Galaxy Ring is far away. The AI correctly groups the glasses together.
But here is the fatal flaw: What if we need to know about battery life? What about camera quality? Does it translate languages? We can't capture all of that in just 3 dimensions.
To truly capture the meaning and context of a word or an object, we need more numbers. A lot more.
This is where it gets interesting — instead of humans picking the dimensions, we let the AI discover them automatically.
From Vector to Embedding: The Masterpiece
When humans pick the properties (Price, Weight), it's a basic Vector. Subjective and limited.
When the AI reads billions of documents and mathematically decides on the properties itself? That is called an Embedding.
Each of those hundreds of dimensions represents a hidden layer of meaning that the AI learned from reading the internet. We humans don't even know exactly what dimension #247 represents. It might represent "electronic-ness". Dimension #89 might represent "wearable-ness".
Together, these hundreds of numbers form a Digital Fingerprint of Meaning.
WHAT MIGHT THOSE HIDDEN DIMENSIONS MEAN?
"wearable-ness"0.92
"electronic-ness"0.84
"visual-ness"0.71
"edible-ness"0.04
We don't actually know the labels — the AI invented these concepts itself. We only see the numbers.
How Does the AI Actually Learn These Numbers?
You might wonder: who decides what dimension #89 means? Nobody. The model learns these dimensions automatically through a process called contrastive learning.
During training, the model is shown millions of sentence pairs with labels:
After seeing billions of such pairs, the model develops internal dimensions that capture meaning — not because anyone defined them, but because that geometry is the only way to satisfy all the constraints simultaneously.
Now we can see what that learned geometry actually looks like across languages.
The Cross-Language Magic Trick
Let's look inside the brain of an Embedding Model (specifically, a multilingual one).
If we ask the model to convert three sentences into their 384-dimensional Embeddings, and then we check how mathematically similar those lists of numbers are (using Cosine Similarity — a formula that measures the angle between two vectors: 1.0 = identical direction, 0 = unrelated, -1 = opposite):
COSINE SIMILARITY — HOW CLOSE ARE THESE MEANINGS?
Near-perfect match — same meaning, different language
Completely unrelated — the math knows it
Same intent, zero shared words — this is why AI search beats keyword search every time
Same Meaning = Same Numbers. In ANY Language.
The AI isn't translating "café" to "coffee". It just noticed that in its training data, "café" appears in the exact same contexts, next to the same words, as "coffee" does. Therefore, their digital fingerprints are nearly identical.
This cross-language matching works because the model was trained on millions of multilingual sentence pairs — human-translated text where the same idea appears in dozens of languages side by side. The contrastive learning process we described in Step 3 forces "Necesito café" and "I need coffee" toward the same point in embedding space, since they were always labeled as equivalent.
COSINE SCORE QUICK REFERENCE
Now that we understand how meaning is encoded in numbers, let's see why this shattered traditional search.
Why This Changed the Internet: Semantic Search
This math equation changed the world. It killed the traditional "Keyword Search".
Think about the old internet. If you searched an online store for "Device for my eyes".
Keyword Search
Looks strictly for exact letter matches.
Semantic Search
Turns query into Math, finds nearest neighbors.
This is exactly how TikTok's algorithm knows what you want to watch. It's how Netflix recommends movies. Your watch history is turned into a Vector, and they just find the movie Vectors that are closest to yours in the mathematical space.
See The Math in Action (Python)
Are you a developer? You can run this on your laptop right now for free. We will use an open-source model called paraphrase-multilingual-MiniLM-L12-v2. It supports 50 languages and outputs a 384-dimensional vector.
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
# Load the "Compass" (The Embedding Model)
encoder = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
# Let's say we have a catalog of devices
devices = [
"Ray-Ban Meta Ultra - Smart glasses with camera and translation",
"Sony LinkBuds Open - Lightweight earbuds with translation",
"Garmin Fenix 9 - Rugged sports watch for running",
]
# Step 1: Convert our catalog into Numbers (Embeddings)
device_embeddings = encoder.encode(devices)
# Now we have our 384-number coordinates for each product
# Step 2: The User searches for something tricky
query = "I want something very light for translating that isn't too expensive"
query_embedding = encoder.encode(query)
# Step 3: Math time! Find the closest matching numbers
scores = cosine_similarity([query_embedding], device_embeddings)[0]
# Step 4: Print ranked results
print(f"Query: \"{query}\"\n")
ranked = scores.argsort()[::-1]
for i in ranked:
print(f" {scores[i]:.3f} {devices[i]}")
Output:
Query: "I want something very light for translating that isn't too expensive"
0.421 Sony LinkBuds Open - Lightweight earbuds with translation
0.287 Ray-Ban Meta Ultra - Smart glasses with camera and translation
0.031 Garmin Fenix 9 - Rugged sports watch for running
The AI ranked the Sony LinkBuds first even though the user didn't type "Sony", "Earbuds", or "LinkBuds". The AI just mathematically aligned "light translating" with "lightweight translation". And the Garmin — a sports watch — scored almost zero.
The Heart of RAG
If you are building AI applications today, you are likely building something called RAG (Retrieval-Augmented Generation). It's the process of giving an AI access to your private company data (PDFs, Databases) so it can answer questions based on your facts.
Embeddings are the engine of RAG. Here's the full pipeline:
The RAG Pipeline — Embeddings power every arrow in this chain.
You can't do RAG without Embeddings. You can't do modern Search without Embeddings.
The AI does not understand words. It understands the distance between numbers in a 10,000-dimensional space. And that turns out to be a far more powerful way to understand the universe.
Try It In Your Head
If you sent two words to an Embedding Model:
"Apple" (the fruit) and "Apple" (the tech company).
Will their Embeddings be mathematically close, or far apart?
Think about the Context!
▶ Reveal the Answer
They would be FAR apart — because context matters.
The word "Apple" (fruit) appears in training data next to words like fruit, tree, juice, red, eat, orchard. The word "Apple" (company) appears next to iPhone, MacBook, Tim Cook, stock, CEO, App Store.
Because the surrounding contexts are completely different, the AI assigns them very different Embeddings — even though the word is spelled identically.
This is the superpower of Embeddings over simple keyword matching.
The same word can have multiple Embeddings depending on its context in the sentence.
This is why semantic search beats keyword search every time — same spelling, completely different meaning, completely different embedding.
PRO TIPS FOR BUILDERS
paraphrase-multilingual-MiniLM-L12-v2 is great for general text. For legal, medical, or code, use domain-specific models.Common Misconceptions About Embeddings
THINGS THAT SOUND RIGHT BUT AREN'T
❌ "The AI understands meaning"
Embeddings approximate meaning through statistical patterns learned during training. The model has no understanding — it just learned that words appearing in similar contexts should have similar coordinates. It's sophisticated pattern matching, not comprehension.
❌ "The same word always gets the same embedding"
In modern transformer-based models, context changes the embedding. "Apple" in "I ate an apple" and "Apple" in "Apple's stock dropped" produce completely different vectors — as our opening example showed.
❌ "You need embeddings in the same language to compare them"
Multilingual embedding models map all languages into a single shared space. "I need coffee" and "Necesito café" land at coordinates 0.97 apart — the language is irrelevant; only the meaning matters.
Key Takeaways
WHAT YOU LEARNED IN THIS ARTICLE
sentence-transformersTry It Yourself
Run the code above and try these three experiments to build your intuition:
- Spanish query test: Change the query to
"Quiero algo ligero para traducir que no sea muy caro". Does the Sony LinkBuds still rank first? It should — same meaning, different language. - The Apple test: Add two new devices:
"Apple fruit - fresh red apple from the orchard"and"Apple MacBook - laptop computer". Then query"I want something to eat"— which Apple lands closer? - Swap the model: Replace
paraphrase-multilingual-MiniLM-L12-v2withall-MiniLM-L6-v2(English only, faster). Run the Spanish query — watch the score collapse. This proves the multilingual model is doing real cross-language work, not just keyword matching.
What's Next in the Series
Embeddings are the foundation. Here's where they lead:
NEXT IN SERIES · #3 OF 14
How AI Finds Your Answer in 30 Million Documents in Under a Second
You know what embeddings are. Now learn how vector databases search through millions of them in milliseconds.
COMING LATER · #14 OF 14
RAG: Give Your AI a Memory
How to connect any LLM to your private data using the embedding pipeline you just learned.
Embeddings are one of the most important building blocks in modern AI. Understanding them unlocks the ability to build semantic search engines, intelligent chatbots, and AI assistants that reason over your own data. Every AI application you will build from here will use this math — you now understand what's happening under the hood.
The complete Jupyter notebook for this article — all examples, the cosine similarity visualiser, and the semantic search engine — is in the AI Fundamentals repository.
Open Vectors.ipynb →Next in AI Fundamentals
Part 3 — How AI Finds Your Answer in 30 Million Documents
Similarity search: how AI uses embedding vectors to find the most relevant results from millions of documents — in under a second. Coming next week.