Skip to main content
AI-Developer → AI Fundamentals#2 of 4

Part 2 — Embeddings: How AI Understands the World Without Knowing a Single Word

If AI doesn't read words, how does it know that 'Apple' the fruit is different from 'Apple' the company? The secret is in a 10,000-dimensional galaxy of numbers called Embeddings.

March 12, 2026
12 min read
#AI#Embeddings#Vectors#Semantic Search#LLM#Meaning

Type "apple" into a grocery app → you get fruit.

Type "apple" into a stock app → you get a $3 trillion company.

Same four letters. Completely different meanings. How does the AI tell them apart — without actually "reading" the word?

The answer is a mathematical trick called Embeddings — and it's the silent engine powering every AI tool you use today.

💡

Netflix's own 2016 research estimated $1 billion per year in reduced churn from its recommendation engine — all powered by the same math you're about to learn.

In our last article, we learned that AI breaks down words into numbers called Tokens. But tokens are just IDs. They are meaningless labels. Token #27438 {" Meta"} doesn't inherently mean anything to the computer.

So how does the AI actually understand meaning? Welcome to the concept that makes modern AI possible: Embeddings and Vectors.

STEP 1 OF 5

To understand how AI encodes meaning, we first need to understand how numbers can represent similarity.

Welcome to the Coordinate System

To understand how AI thinks, we have to look at how humans classify the world.

If I ask you to describe a person using only three numbers, you might choose:

  1. Height (cm)
  2. Weight (kg)
  3. Age (years)

If we plot three people using these exact metrics:

  • Alex: [180, 75, 25]

  • Sarah: [165, 60, 30]

  • Marcus: [178, 74, 26]

    HEIGHT vs WEIGHT (2D slice of their vectors)

    Weight (kg) Height (cm)
    Sarah
    [165, 60]
    Marcus
    [178, 74]
    Alex
    [180, 75]
    close ✓

    Alex and Marcus land near each other in the space. Sarah is far away. The computer knows they're similar — without seeing a single photo.

    This is a 2D slice — in reality, AI works in 384+ dimensions, making these clusters far more precise.

If you throw these numbers onto a 3D graph, Alex and Marcus will end up right next to each other. Sarah will be further away.

The computer didn't need to look at their pictures. It didn't need to interview them. By seeing that their numbers are mathematically close, the computer deduced that they share similar physical characteristics.

This list of numbers [180, 75, 25] is called a Vector.

THE MENTAL MODEL YOU'LL NEVER FORGET

Embeddings are GPS coordinates for meaning.

Cairo sits at (30.04°N, 31.23°E). Paris sits at (48.85°N, 2.35°E). Cities that are geographically close have similar coordinates. Words and sentences with similar meaning have similar embedding coordinates — just in 384 dimensions instead of 2.

STEP 2 OF 5

Vectors work — but 3 dimensions aren't nearly enough to capture the complexity of real-world meaning.

The Problem with 3 Dimensions

Vectors are great, but 3 numbers aren't enough to describe the complexity of the real world.

Let's say we are describing smart devices to an AI using a 3D Vector: [Price, Weight, Number of Features]

Ray-Ban Smart Glasses

[549, 48, 5]

Xreal Air 3 Glasses

[449, 72, 4]

Galaxy Smart Ring

[349, 3, 4]

The AI looks at the math: The Ray-Bans and the Xreals are close together in the vector space. The Galaxy Ring is far away. The AI correctly groups the glasses together.

But here is the fatal flaw: What if we need to know about battery life? What about camera quality? Does it translate languages? We can't capture all of that in just 3 dimensions.

To truly capture the meaning and context of a word or an object, we need more numbers. A lot more.

STEP 3 OF 5

This is where it gets interesting — instead of humans picking the dimensions, we let the AI discover them automatically.

From Vector to Embedding: The Masterpiece

When humans pick the properties (Price, Weight), it's a basic Vector. Subjective and limited.

When the AI reads billions of documents and mathematically decides on the properties itself? That is called an Embedding.

🧬
"Smart Glasses"
[0.012, -0.443, 0.891, 0.112 ... 380 More Dimensions]

The AI assigns a dense list of 384 to 3072 numbers to form a Digital Fingerprint.

(small models: 384 dims · medium: 768 dims · large: 3072 dims)

Each of those hundreds of dimensions represents a hidden layer of meaning that the AI learned from reading the internet. We humans don't even know exactly what dimension #247 represents. It might represent "electronic-ness". Dimension #89 might represent "wearable-ness".

Together, these hundreds of numbers form a Digital Fingerprint of Meaning.

WHAT MIGHT THOSE HIDDEN DIMENSIONS MEAN?

Dim #89
"wearable-ness"
0.92
Dim #247
"electronic-ness"
0.84
Dim #512
"visual-ness"
0.71
Dim #33
"edible-ness"
0.04

We don't actually know the labels — the AI invented these concepts itself. We only see the numbers.

How Does the AI Actually Learn These Numbers?

You might wonder: who decides what dimension #89 means? Nobody. The model learns these dimensions automatically through a process called contrastive learning.

During training, the model is shown millions of sentence pairs with labels:

SIMILAR PAIRS (pull together)
"I need coffee" + "Necesito café"
"Smart glasses" + "AR eyewear"
"How are you?" + "¿Cómo estás?"
→ Model learns to push their embeddings close together
DIFFERENT PAIRS (push apart)
"I need coffee" + "Je veux dormir"
"Smart glasses" + "Running shoes"
"Buy now" + "Philosophy book"
→ Model learns to push their embeddings far apart

After seeing billions of such pairs, the model develops internal dimensions that capture meaning — not because anyone defined them, but because that geometry is the only way to satisfy all the constraints simultaneously.

STEP 4 OF 5

Now we can see what that learned geometry actually looks like across languages.

The Cross-Language Magic Trick

Let's look inside the brain of an Embedding Model (specifically, a multilingual one).

If we ask the model to convert three sentences into their 384-dimensional Embeddings, and then we check how mathematically similar those lists of numbers are (using Cosine Similarity — a formula that measures the angle between two vectors: 1.0 = identical direction, 0 = unrelated, -1 = opposite):

COSINE SIMILARITY — HOW CLOSE ARE THESE MEANINGS?

"Necesito café" vs "I need coffee"
0.97 / 1.0 🟢

Near-perfect match — same meaning, different language

"I need coffee" vs "Je veux dormir" (I want to sleep)
0.12 / 1.0 🔴

Completely unrelated — the math knows it

"reset password" vs "forgot login credentials"
0.91 / 1.0 🟢

Same intent, zero shared words — this is why AI search beats keyword search every time

Same Meaning = Same Numbers. In ANY Language.

The AI isn't translating "café" to "coffee". It just noticed that in its training data, "café" appears in the exact same contexts, next to the same words, as "coffee" does. Therefore, their digital fingerprints are nearly identical.

This cross-language matching works because the model was trained on millions of multilingual sentence pairs — human-translated text where the same idea appears in dozens of languages side by side. The contrastive learning process we described in Step 3 forces "Necesito café" and "I need coffee" toward the same point in embedding space, since they were always labeled as equivalent.

COSINE SCORE QUICK REFERENCE

Score Meaning Example
0.95+ Near-identical meaning "I need coffee" vs "Necesito café"
0.7–0.9 Closely related "Smart glasses" vs "AR eyewear"
0.3–0.6 Loosely related "Smart glasses" vs "Wearable tech"
< 0.3 Unrelated "I need coffee" vs "Je veux dormir"
STEP 5 OF 5

Now that we understand how meaning is encoded in numbers, let's see why this shattered traditional search.

Why This Changed the Internet: Semantic Search

This math equation changed the world. It killed the traditional "Keyword Search".

Think about the old internet. If you searched an online store for "Device for my eyes".

The Old Way

Keyword Search

Looks strictly for exact letter matches.

🔍 "Device for my eyes"
❌ Result: "0 Matches Found"
(Fails because the word 'Glasses' does not contain 'D-e-v-i-c-e')
The AI Way

Semantic Search

Turns query into Math, finds nearest neighbors.

🔍 "Device for my eyes"
↳ [0.88, -0.42, 0.91...]
✅ Result: "Smart Glasses"
(Succeeds because their Vectors are >95% similar)

This is exactly how TikTok's algorithm knows what you want to watch. It's how Netflix recommends movies. Your watch history is turned into a Vector, and they just find the movie Vectors that are closest to yours in the mathematical space.


See The Math in Action (Python)

Are you a developer? You can run this on your laptop right now for free. We will use an open-source model called paraphrase-multilingual-MiniLM-L12-v2. It supports 50 languages and outputs a 384-dimensional vector.


from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

# Load the "Compass" (The Embedding Model)
encoder = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')

# Let's say we have a catalog of devices
devices = [
    "Ray-Ban Meta Ultra - Smart glasses with camera and translation",
    "Sony LinkBuds Open - Lightweight earbuds with translation",
    "Garmin Fenix 9 - Rugged sports watch for running",
]

# Step 1: Convert our catalog into Numbers (Embeddings)
device_embeddings = encoder.encode(devices)
# Now we have our 384-number coordinates for each product

# Step 2: The User searches for something tricky
query = "I want something very light for translating that isn't too expensive"
query_embedding = encoder.encode(query)

# Step 3: Math time! Find the closest matching numbers
scores = cosine_similarity([query_embedding], device_embeddings)[0]

# Step 4: Print ranked results
print(f"Query: \"{query}\"\n")
ranked = scores.argsort()[::-1]
for i in ranked:
    print(f"  {scores[i]:.3f}  {devices[i]}")

Output:

Query: "I want something very light for translating that isn't too expensive"

  0.421  Sony LinkBuds Open - Lightweight earbuds with translation
  0.287  Ray-Ban Meta Ultra - Smart glasses with camera and translation
  0.031  Garmin Fenix 9 - Rugged sports watch for running

The AI ranked the Sony LinkBuds first even though the user didn't type "Sony", "Earbuds", or "LinkBuds". The AI just mathematically aligned "light translating" with "lightweight translation". And the Garmin — a sports watch — scored almost zero.


The Heart of RAG

If you are building AI applications today, you are likely building something called RAG (Retrieval-Augmented Generation). It's the process of giving an AI access to your private company data (PDFs, Databases) so it can answer questions based on your facts.

Embeddings are the engine of RAG. Here's the full pipeline:

📄
Your PDFs
Company docs, manuals, data
🧬
Embeddings
Convert text → vectors
🗄️
Vector DB
Store & index vectors
📐
Cosine Match
Find closest paragraph
🤖
LLM
GPT-4 writes the answer
Answer
Grounded in your data

The RAG Pipeline — Embeddings power every arrow in this chain.

You can't do RAG without Embeddings. You can't do modern Search without Embeddings.

The AI does not understand words. It understands the distance between numbers in a 10,000-dimensional space. And that turns out to be a far more powerful way to understand the universe.


Try It In Your Head

If you sent two words to an Embedding Model:
"Apple" (the fruit) and "Apple" (the tech company).

Will their Embeddings be mathematically close, or far apart?

Think about the Context!

▶ Reveal the Answer

They would be FAR apart — because context matters.

The word "Apple" (fruit) appears in training data next to words like fruit, tree, juice, red, eat, orchard. The word "Apple" (company) appears next to iPhone, MacBook, Tim Cook, stock, CEO, App Store.

Because the surrounding contexts are completely different, the AI assigns them very different Embeddings — even though the word is spelled identically.

This is the superpower of Embeddings over simple keyword matching.

The same word can have multiple Embeddings depending on its context in the sentence.

This is why semantic search beats keyword search every time — same spelling, completely different meaning, completely different embedding.

PRO TIPS FOR BUILDERS

Use the same model for indexing and querying. If you embed your documents with model A and your queries with model B, the scores are meaningless — they live in different spaces.
Match model to your domain. paraphrase-multilingual-MiniLM-L12-v2 is great for general text. For legal, medical, or code, use domain-specific models.
Embed at the chunk level, not the document level. A 50-page PDF embedded as one vector loses all its detail. Split into paragraphs first.
⚠️ Cosine similarity ≠ probability. A score of 0.85 doesn't mean "85% match." Calibrate thresholds on your own data.
⚠️ Bigger isn't always better. A 3072-dimensional model costs 8× more to store and query than a 384-dim model, with often marginal accuracy gains for simple retrieval tasks.

Common Misconceptions About Embeddings

THINGS THAT SOUND RIGHT BUT AREN'T

❌ "The AI understands meaning"

Embeddings approximate meaning through statistical patterns learned during training. The model has no understanding — it just learned that words appearing in similar contexts should have similar coordinates. It's sophisticated pattern matching, not comprehension.

❌ "The same word always gets the same embedding"

In modern transformer-based models, context changes the embedding. "Apple" in "I ate an apple" and "Apple" in "Apple's stock dropped" produce completely different vectors — as our opening example showed.

❌ "You need embeddings in the same language to compare them"

Multilingual embedding models map all languages into a single shared space. "I need coffee" and "Necesito café" land at coordinates 0.97 apart — the language is irrelevant; only the meaning matters.

Key Takeaways

WHAT YOU LEARNED IN THIS ARTICLE

✅ Embeddings convert text, images, or any data into lists of numbers — vectors
✅ Concepts with similar meaning land close together in vector space — regardless of language
✅ The AI learns what each dimension means automatically through contrastive training on billions of pairs
Cosine similarity measures the angle between two vectors — 0.97 means nearly identical meaning, 0.12 means unrelated
✅ Embeddings power semantic search (find meaning, not keywords), RAG (give AI your private data), and recommendation systems
✅ You can run a full embedding model locally for free with sentence-transformers

Try It Yourself

Run the code above and try these three experiments to build your intuition:

  1. Spanish query test: Change the query to "Quiero algo ligero para traducir que no sea muy caro". Does the Sony LinkBuds still rank first? It should — same meaning, different language.
  2. The Apple test: Add two new devices: "Apple fruit - fresh red apple from the orchard" and "Apple MacBook - laptop computer". Then query "I want something to eat" — which Apple lands closer?
  3. Swap the model: Replace paraphrase-multilingual-MiniLM-L12-v2 with all-MiniLM-L6-v2 (English only, faster). Run the Spanish query — watch the score collapse. This proves the multilingual model is doing real cross-language work, not just keyword matching.

What's Next in the Series

Embeddings are the foundation. Here's where they lead:

Embeddings are one of the most important building blocks in modern AI. Understanding them unlocks the ability to build semantic search engines, intelligent chatbots, and AI assistants that reason over your own data. Every AI application you will build from here will use this math — you now understand what's happening under the hood.

Full Code on GitHub

The complete Jupyter notebook for this article — all examples, the cosine similarity visualiser, and the semantic search engine — is in the AI Fundamentals repository.

Open Vectors.ipynb →

Next in AI Fundamentals

Part 3 — How AI Finds Your Answer in 30 Million Documents

Similarity search: how AI uses embedding vectors to find the most relevant results from millions of documents — in under a second. Coming next week.

MH

Mohamed Hamed

20 years building production systems — the last several deep in AI integration, LLMs, and full-stack architecture. I write what I've actually built and broken. If this was useful, the next one goes to LinkedIn first.

Follow on LinkedIn →