The AI Doesn't Read Words. It Reads Numbers.
Every message you send to an AI is shredded into small numeric units called tokens. Learn why this technical abstraction creates a 'language tax' for Arabic speakers and how to optimize your AI budget.
Did you know that AI doesn't actually read your words at all? To be precise: it doesn't see the letters or words you type. It sees something else entirely — and that something is called a Token.
Text → Tokenize (small pieces) → Encode (numbers) → ⚡ AI Brain → Decode → Final Response.
The moment you send any text to an AI, it runs Tokenization — splitting your text into small units, converting them into numbers, processing those numbers, figuring out what response fits, and converting the output back into words. That's the complete pipeline. Every single time.
What is a Token?
A token is the smallest unit an AI processes. It's not necessarily a word — it can be a whole word, part of a word, a punctuation mark, a space, or a number. When you read a sentence, you read it word by word. The AI doesn't work that way — it cuts text into these small token pieces.
Sentence: Ray-Ban Meta Ultra smart glasses for $549
Tokens: "Ray", "-", "Ban", " Meta", " Ultra", " smart", " glasses", " for", "$", "549"
Insight: Spaces, hyphens, and currency symbols are tokens too.
You might count 6 words. The AI sees 10 tokens:
| Token | What it is |
|---|---|
Ray | Part of a compound word |
- | The hyphen, treated separately |
Ban | The second part |
Meta | A word (note the leading space) |
Ultra | A word |
smart | A word |
glasses | A word |
for | A word |
$ | The currency symbol, its own token |
549 | The number |
Why Does It Cut Text This Way?
The technique is called Subword Tokenization. The model learned which sequences appear most frequently in its training data; common ones get their own single token, while rare or complex ones get split.
Simple vs. Complex Words
the→ 1 tokencat→ 1 tokenhello→ 1 tokenApple→ 1 token
un·forget·table→ 3 tokensinter·national·ization→ 3 tokensنظ·ارة→ 2 tokensال·ذك·اء→ 3 tokens
Arabic words fall into the complex category because of the lack of Arabic training data compared to English. Less data → fewer complete Arabic words in the vocabulary → more aggressive splitting. (The exact splits depend on each model's learned vocabulary; these are illustrations.)
How the Vocabulary Was Built: Byte-Pair Encoding
The vocabulary of tens of thousands of tokens didn't appear by magic. It was learned using an algorithm called Byte-Pair Encoding (BPE) — and understanding it explains why some words are a single token while others get shredded.
The BPE Algorithm
The initial vocabulary is every single character in the training data: a, b, c… أ, ب, ت… 0, 1, 2…
Scan all text, find the two consecutive tokens that appear together most often, and merge them. If "t" + "h" is most frequent, it becomes "th".
Keep merging — "th"+"e" → "the", "the"+" " → "the " — until you hit the target vocabulary size (~100K–200K entries).
BPE building "the" from characters:
Start: "t" "h" "e" " " "c" "a" "t"
Merge "t"+"h" → "th" "e" " " "c" "a" "t"
Merge "th"+"e" → "the" " " "c" "a" "t"
Merge "the"+" " → "the " "c" "a" "t"
Result: "the " is ONE token — it appears millions of times in English.This is why English has a richer vocabulary than Arabic in any given tokenizer — BPE found and encoded more frequent English patterns simply because there was more English text to learn from.
Impact #1: Cost
Every AI model charges you per token — not per message, not per word. And input (what you send) and output (what it replies) are billed separately. Here's what the major models cost as of early 2026:
| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| GPT-4o mini (budget) | $0.15 | $0.60 |
| Gemini 2.5 Flash | $0.30 | $2.50 |
| Claude Haiku 4.5 | $1.00 | $5.00 |
| Gemini 2.5 Pro | $1.25 | $10.00 |
| GPT-4o | $2.50 | $10.00 |
| Claude Sonnet 4.5 | $3.00 | $15.00 |
| Claude Opus 4.6 (premium) | $5.00 | $25.00 |
Pricing as of March 2026 — always check official provider pages for current rates. Notice the range: GPT-4o mini at $0.15 input vs Claude Opus at $5.00. Understanding tokens lets you pick the right model for the right job.
The English vs. Arabic Gap
Here's where it gets real for Arabic speakers. The same 100 words of content costs very different amounts of tokens:
You're paying ~50% more for the same content in Arabic — not because Arabic is "worse," but purely because the tokenizer has seen less Arabic text and hasn't built as rich a vocabulary for it. (This gap is improving; GPT-4o's tokenizer is notably more Arabic-efficient than older models.)
What This Means in Real Money
Say you send 100 messages a day for a month on a mid-range model:
Scale that to 500–1000 messages/day and the premium becomes hundreds of dollars — purely from the script your text is written in. (Assumes GPT-4o pricing, ~70% input / 30% output split.)
Impact #2: Context Window
Every AI model has a maximum number of tokens it can see at once — the Context Window, the model's working RAM.
- Regular models: 8K–32K tokens ≈ 20–70 pages
- Advanced models: 128K–200K tokens ≈ 300–460 pages
- Largest models: 1M–2M tokens ≈ an entire book, or two
When the context window fills up, the AI can't process anything beyond it — it forgets the older parts of your conversation. A bigger window means it remembers more, but the more you send, the more tokens you consume, and the more you pay.
Impact #3: Speed
Here's something most people don't think about: the AI doesn't generate its entire response at once. It produces one token at a time, and each token feeds into the next:
"smart" → " glasses" → " with" → " camera" → "..."This is why you see text streaming word by word in chat interfaces — the model literally can't produce the full response in one shot. Golden rule: a longer response takes longer to arrive; every extra token adds to your wait time.
Every Model Has Its Own Vocabulary
Different AI models don't share the same tokenizer. Each has its own vocabulary — a dictionary of tens to hundreds of thousands of word-pieces, each mapped to a unique numeric ID. (GPT-4o uses ~200,000 entries; older GPT models used ~100,000.) So "smart" maps to completely different numbers depending on the model:
Same Word, Different IDs
"smart" = #10,119
"smart" = #28,644
This means token counts aren't directly comparable across models — a "1M token" context in one model doesn't hold the same amount of real text as in another.
Special Tokens: The Hidden Language
Beyond regular text tokens, every model uses special tokens — reserved entries that signal structure, boundaries, and roles. You don't type them, but they're always there, and they count against your budget:
| Token | Purpose |
|---|---|
<|endoftext|> | End-of-document marker — prevents context "bleeding" across separate inputs. |
[CLS] / [SEP] | BERT classification & separator tokens (e.g. question vs. context in Q&A). |
<|im_start|> | Role boundary tokens — mark who is speaking. Inserted whenever you set role: "user" or a system prompt. |
[PAD] | Padding token — fills batches to equal length; masked during attention. |
A typical system prompt in the OpenAI chat format adds ~4–10 special tokens on top of your text. Negligible in a long chat, but they add up fast in a tight budget or batched production pipeline.
See It in Code (Python)
%pip install -q tiktoken
import tiktoken
tokenizer = tiktoken.encoding_for_model("gpt-4o")
text_en = "Ray-Ban Meta Ultra smart glasses with 48MP camera"
print(f"English token count: {len(tokenizer.encode(text_en))}") # ~12
text_ar = "نظارة ذكية خفيفة بكاميرا وترجمة فورية"
print(f"Arabic token count: {len(tokenizer.encode(text_ar))}") # ~15
# See exactly what the AI sees — the numeric IDs behind each token:
for t in tokenizer.encode(text_en):
print(f" ID {t:>6} → '{tokenizer.decode([t])}'")
# ID 37513 → 'Ray' ID 8287 → '-B' ID 270 → 'an'
# ID 27438 → ' Meta' ID 38414 → ' Ultra' ...The AI doesn't see "Ray-Ban" — it sees [37513, 8287, 270].
3 Ways to Save Tokens (and Money)
3 Ways to Save Tokens
Write prompts in English but request the answer in Arabic. You save on the input side — usually the larger part of a conversation.
A short, specific prompt outperforms a long, rambling one — and every extra sentence is a literal cost.
Specify length: "Respond in 3 bullet points" to hard-cap how many output tokens the model generates.
Pro Tips for Builders
- 1. Count before you pay. Use
tiktoken(OpenAI) or the provider's tokenizer to pre-count prompts in code. Build token budgeting in from day one — it's hard to retrofit. - 2. Cap output with
max_tokens. Without it, an open-ended question can return a 2,000-token essay when you needed 50 words. - 3. System prompts aren't free. A 500-word system prompt costs 600–800 tokens on every request. Distill it, or use prompt caching where supported.
- 4. Match model to task, not to default. A summarization task that costs $0.50/day on Opus costs $0.015/day on GPT-4o mini — similar quality for simple work.
- 5. History multiplies cost. In chat, every prior message is re-sent each turn. The 20th message costs ~20× the input tokens of the 1st. Use a rolling window or summarization early.
The Core Insight
The AI doesn't understand language. It understands numbers. Every word you type becomes numbers; the model finds the right numbers to respond with; those numbers become words. Once you grasp that, the cost, speed, and memory limits all make intuitive sense — they're all just constraints on how many numbers the model can process at once.
Try It Yourself
The best way to make this click is to see it live. Use OpenAI's free tokenizer at platform.openai.com/tokenizer — paste any text and watch it split token by token. Try entering your full name in Arabic, then in English, and compare the counts.
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
messages = {
"English": "Please summarize this article in three bullet points.",
"Arabic": "من فضلك لخص هذا المقال في ثلاث نقاط.",
"French": "Veuillez résumer cet article en trois points.",
"Spanish": "Por favor, resume este artículo en tres puntos.",
}
base = len(enc.encode(messages["English"]))
for lang, text in messages.items():
n = len(enc.encode(text))
print(f"{lang:<10} {n:>3} tokens ({n/base*100:.0f}%)")
# English 10 (100%) · Arabic 11 (110%) · French 13 (130%) · Spanish 12 (120%)Key Takeaways
Use tiktoken in Python to pre-calculate costs before sending prompts to the API.
A 500-word system prompt is charged on every single request. Keep them lean.
In chat, the 20th message re-sends all 19 previous messages. Summarize history often to stay within budget.
Up Next in the Series
The magic that lets the AI understand meaning from those numbers — and why "نظارة" and "glasses" end up representing the same concept even though they look nothing alike. Read Part 2 →