THE LANGUAGE OF AI
Tokens
"The AI doesn't read words. It reads numbers."
Did you know that AI doesn't actually read your words at all? To be more precise: it doesn't see the letters or words you type. It sees something else entirely.
That something is called a Token.
The Journey of Your Words Inside an AI
Before we dive into what a token is, let's trace what actually happens the moment you send a message to an AI.
The moment you send any text to an AI, it runs a process called Tokenization — splitting your text into small units called tokens. Those tokens are converted into numbers. The model processes those numbers, understands the relationships between them, figures out what response fits, and converts its output back into words.
That's the complete pipeline. Every single time.
What Is a Token?
When you read a sentence, you read it word by word. The AI doesn't work that way — it cuts the text into these small token pieces.
A Real Example: 6 Words, 10 Tokens
Take this sentence:
You might count 6 words. But here's what the AI actually sees:
Even spaces before words, hyphens, and currency symbols become their own tokens.
| Token | What it is |
|---|---|
Ray | Part of a compound word |
- | The hyphen, treated separately |
Ban | The second part |
Meta | A word (note the leading space) |
Ultra | A word |
smart | A word |
glasses | A word |
for | A word |
$ | The currency symbol, its own token |
549 | The number |
Why Does It Cut Text This Way?
The technique is called Subword Tokenization. The model learned which sequences appear most frequently in its training data, and common ones get their own single token. Rare or complex ones get split.
✅ Simple words → 1 token
⚡ Complex words → many tokens
Arabic words fall into the complex category because of the lack of Arabic training data compared to English. Less data → fewer complete Arabic words in the vocabulary → more aggressive splitting.
The exact splits depend on the specific tokenizer's learned vocabulary. The splits shown here are illustrations — the actual tokenizer may differ by model.
How the Vocabulary Was Actually Built: Byte-Pair Encoding
The vocabulary of tens of thousands of tokens didn't appear by magic. It was learned from data using an algorithm called Byte-Pair Encoding (BPE) — and understanding it explains why some words are a single token and others get shredded into pieces.
The BPE Algorithm — 3 Steps
BPE in action — building "the" from characters
This is why English has a richer vocabulary than Arabic in any given tokenizer — the BPE algorithm found and encoded more frequent English patterns because there was more English text to learn from.
Impact #1: Cost
Every AI model charges you per token — not per message, not per word. Per token.
Here's what the major models cost as of early 2026:
Pricing as of March 2026. Gemini 2.5 Pro costs double for prompts exceeding 200K tokens. Always check official provider pages for latest rates.
Notice how every model charges separately for Input (what you send) and Output (what the AI replies). You're paying for both directions. And notice the range — GPT-4o mini at $0.15 input vs Claude Opus at $5.00. Understanding tokens lets you pick the right model for the right job.
The English vs. Arabic Gap
Here's where it gets real for Arabic speakers:
Not because Arabic is "worse" — purely because the tokenizer has seen less Arabic text and hasn't built as rich a vocabulary for it.
This gap has been improving with newer models. GPT-4o's tokenizer significantly improved Arabic efficiency vs. older models. The 50% figure is a reasonable average — actual overhead depends on the model and text type.
What This Means in Real Money
Say you send 100 messages a day for a month on a mid-range model:
🇺🇸 English
100 msg × 30 days × ~1,500 tokens
= 4.5 million tokens / month
🇪🇬 Arabic
100 msg × 30 days × ~2,300 tokens
= 6.9 million tokens / month
Scale that to 500–1000 messages/day and it becomes hundreds of dollars.
Calculation assumes GPT-4o pricing with ~70% input / 30% output token split. Actual costs vary by message structure and model.
Impact #2: Context Window
Every AI model has a maximum number of tokens it can see at one time. This is called the Context Window — think of it like the model's working RAM.
As of 2026, most frontier models (GPT-4o, Claude Sonnet, Gemini 2.5) have 128K–1M context windows by default.
When the context window fills up, the AI can't process anything beyond it — it forgets the older parts of your conversation.
A bigger context window = the AI remembers more of your conversation. But the more you send, the more tokens you consume, the more you pay.
Impact #3: Speed
Here's something most people don't think about:
The AI doesn't generate its entire response at once. It produces one token at a time, and each token feeds into the next.
This is why you see text streaming word by word in chat interfaces. The model literally can't generate the full response in one shot — it's always token by token.
Every Model Has Its Own Vocabulary
Different AI models don't share the same tokenizer. Each has its own vocabulary — a dictionary of tens of thousands to hundreds of thousands of word-pieces, each mapped to a unique numeric ID.
GPT-4o uses ~200,000 vocabulary entries. Older GPT models used ~100,000.
So the word "smart" might map to completely different numbers depending on the model:
Model A
"smart" = #10,119
Model B
"smart" = #28,644
This means token counts aren't directly comparable across models. A "1M token" context in one model doesn't hold the same amount of real text as a "1M token" context in another.
Special Tokens: The Hidden Language
Beyond regular text tokens, every model uses special tokens — reserved entries that signal structure, boundaries, and roles. You don't type them, but they're always there, and they count against your token budget.
Common Special Tokens
<|endoftext|>
[CLS] / [SEP]
<|im_start|>
role: "user" or write a system prompt, these tokens are automatically inserted — and they count.[PAD]
See It in Code (Python)
%pip install -q tiktoken
import tiktoken
tokenizer = tiktoken.encoding_for_model("gpt-4o")
text_en = "Ray-Ban Meta Ultra smart glasses with 48MP camera"
tokens_en = tokenizer.encode(text_en)
print(f"English token count: {len(tokens_en)}") # ~12
text_ar = "نظارة ذكية خفيفة بكاميرا وترجمة فورية"
tokens_ar = tokenizer.encode(text_ar)
print(f"Arabic token count: {len(tokens_ar)}") # ~15
And to see exactly what the AI sees — the numeric IDs behind each token:
for t in tokens_en:
print(f" ID {t:>6} → '{tokenizer.decode([t])}'")
Output:
ID 27438 → ' Meta' ID 38414 → ' Ultra' ID 8847 → ' smart'
ID 40081 → ' glasses' ID 483 → ' with' ID 3519 → '48'
ID 9125 → 'MP' ID 9427 → ' camera'
Compare the same concept across languages:
concept = {
"English": "smart glasses with camera",
"Arabic": "نظارة ذكية بكاميرا",
"Chinese": "带摄像头的智能眼镜",
"Japanese": "カメラ付きスマートグラス",
}
print(f"{'Language':<10} {'Tokens':>7}")
print("-" * 20)
for lang, text in concept.items():
n = len(tokenizer.encode(text))
print(f"{lang:<10} {n:>7}")
# English 5 ← most efficient (largest training corpus)
# Arabic 9 ← ~80% overhead
# Chinese 7 ← moderate (improved in recent models)
# Japanese 10 ← kanji + katakana = complex splitting
3 Ways to Save Tokens (and Money)
1. Write your prompts in English, ask for a reply in Arabic
You save tokens on the input side — usually the larger part of a conversation. Your answer still comes back in Arabic, but your question used far fewer tokens to get there.
2. Be direct and concise
A short, specific prompt outperforms a long, rambling one — and uses fewer tokens. Every extra sentence costs money. A well-crafted prompt can save you tens of dollars a month at scale.
3. Specify the response length
Tell the AI exactly how long you want the answer: "respond in 3 bullet points" or "summarize in 2 sentences." This directly controls how many output tokens the model generates.
Pro Tips for Builders
💡 What Knowing Tokens Changes For You as a Developer
Count tokens before you send, not after you pay. Use tiktoken (OpenAI) or the model provider's tokenizer to pre-count your prompts in code. Build token budgeting into your app's logic from day one — it's far harder to retrofit later.
Use max_tokens to cap your output costs. The API parameter max_tokens (or max_completion_tokens) hard-limits the response length. Without it, an open-ended question can return a 2,000-token essay when you needed 50 words.
System prompts are not free. A detailed system prompt of 500 words costs 600–800 tokens — on every single request. Distill it to its most essential constraints, or use prompt caching where the provider supports it.
Match model to task, not to default. A summarization task that costs $0.50/day on Claude Opus costs $0.015/day on GPT-4o mini — with similar quality for simple tasks. Token pricing is the key variable in that decision.
Conversation history multiplies your input token cost. In a chat app, every message in the history is re-sent with each new request. A 20-turn conversation means the 20th message costs 20× more input tokens than the 1st. Implement a rolling window or summarization strategy early.
The Core Insight
The AI doesn't understand language.
It understands numbers.
Every word you type gets converted to numbers. The model finds the right numbers to respond with. Those numbers become words. That's the entire process — a very sophisticated number-matching machine. Once you understand that, the cost, speed, and memory limits all make intuitive sense: they're all just constraints on how many numbers the model can process at once.
Try It Yourself
The best way to make this click is to see it live. Use OpenAI's free tokenizer tool at platform.openai.com/tokenizer — paste any text and watch it get split token by token, with each token highlighted in a different color.
Try this experiment: enter your full name first in Arabic, then in English. See how many tokens each version produces. If your name is on the longer side, the difference might surprise you.
Feel the cost difference across languages:
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
messages = {
"English": "Please summarize this article in three bullet points.",
"Arabic": "من فضلك لخص هذا المقال في ثلاث نقاط.",
"French": "Veuillez résumer cet article en trois points.",
"Spanish": "Por favor, resume este artículo en tres puntos.",
}
print(f"{'Language':<10} {'Tokens':>8} {'Relative cost':>14}")
print("-" * 38)
base = len(enc.encode(messages["English"]))
for lang, text in messages.items():
n = len(enc.encode(text))
print(f"{lang:<10} {n:>8} {n/base*100:>12.0f}%")
# Language Tokens Relative cost
# English 10 100%
# Arabic 11 110%
# French 13 130%
# Spanish 12 120%
Next in AI Fundamentals
Embeddings
The magic that lets the AI understand meaning from those numbers — and why "نظارة" and "glasses" end up representing the same concept even though they look nothing alike.