Skip to main content
AI-Developer → AI Fundamentals#1 of 4

Tokens: Why Your Language Costs More Than English When You Use AI

The AI doesn't read your words — it reads numbers. And those numbers aren't equal across languages. Here's what tokens are, why they cost money, and why non-English speakers pay more.

March 7, 2026
12 min read
#AI#Tokens#LLM#Cost#Arabic#Context Window#BPE#Tokenization

THE LANGUAGE OF AI

Tokens

"The AI doesn't read words. It reads numbers."

Did you know that AI doesn't actually read your words at all? To be more precise: it doesn't see the letters or words you type. It sees something else entirely.

That something is called a Token.


The Journey of Your Words Inside an AI

Before we dive into what a token is, let's trace what actually happens the moment you send a message to an AI.

Human Text
Tokenize → Small pieces (tokens)
Encode → Numbers
⚡ AI Model Brain
Decode → Final Response

The moment you send any text to an AI, it runs a process called Tokenization — splitting your text into small units called tokens. Those tokens are converted into numbers. The model processes those numbers, understands the relationships between them, figures out what response fits, and converts its output back into words.

That's the complete pipeline. Every single time.


What Is a Token?

A token is the smallest unit that any AI reads and works with. It's not necessarily a word — it can be a word, part of a word, a punctuation mark, a space, or a number.

When you read a sentence, you read it word by word. The AI doesn't work that way — it cuts the text into these small token pieces.


A Real Example: 6 Words, 10 Tokens

Take this sentence:

Ray-Ban Meta Ultra smart glasses for $549

You might count 6 words. But here's what the AI actually sees:

"Ray" "-" "Ban" " Meta" " Ultra" " smart" " glasses" " for" "$" "549"
10 Tokens — not 6 words.

Even spaces before words, hyphens, and currency symbols become their own tokens.

TokenWhat it is
RayPart of a compound word
-The hyphen, treated separately
BanThe second part
MetaA word (note the leading space)
UltraA word
smartA word
glassesA word
forA word
$The currency symbol, its own token
549The number

Why Does It Cut Text This Way?

The technique is called Subword Tokenization. The model learned which sequences appear most frequently in its training data, and common ones get their own single token. Rare or complex ones get split.

✅ Simple words → 1 token

the → 1 token
cat → 1 token
hello → 1 token
Apple → 1 token

⚡ Complex words → many tokens

unforgettable → 3 tokens
internationalization → 3 tokens
نظارة → 2 tokens
الذكاء → 3 tokens

Arabic words fall into the complex category because of the lack of Arabic training data compared to English. Less data → fewer complete Arabic words in the vocabulary → more aggressive splitting.

The exact splits depend on the specific tokenizer's learned vocabulary. The splits shown here are illustrations — the actual tokenizer may differ by model.

The bottom line: Less Arabic training data = more splitting = higher cost.

How the Vocabulary Was Actually Built: Byte-Pair Encoding

The vocabulary of tens of thousands of tokens didn't appear by magic. It was learned from data using an algorithm called Byte-Pair Encoding (BPE) — and understanding it explains why some words are a single token and others get shredded into pieces.

The BPE Algorithm — 3 Steps

1
Start with individual characters. The initial vocabulary is every single character that appears in training data: a, b, c… أ, ب, ت… 0, 1, 2…
2
Find the most common adjacent pair. Scan all training text. Find which two consecutive tokens appear together most often. If "t" + "h" is the most frequent pair, merge them into "th".
3
Repeat until vocabulary is full. Keep merging the most frequent pairs — "th"+"e" → "the", then "the"+" " → "the " — until you hit the desired vocabulary size (~100K–200K entries).

BPE in action — building "the" from characters

Start: "t" "h" "e" " " "c" "a" "t"
Merge "t"+"h" → "th" "e" " " "c" "a" "t"
Merge "th"+"e" → "the" " " "c" "a" "t"
Merge "the"+" " → "the " "c" "a" "t"
Result: "the " is one token — it appears millions of times in English text.

This is why English has a richer vocabulary than Arabic in any given tokenizer — the BPE algorithm found and encoded more frequent English patterns because there was more English text to learn from.


Impact #1: Cost

Every AI model charges you per token — not per message, not per word. Per token.

Here's what the major models cost as of early 2026:

Model Input / 1M tokens Output / 1M tokens
GPT-4o mini Budget $0.15 $0.60
Gemini 2.5 Flash $0.30 $2.50
Claude Haiku 4.5 $1.00 $5.00
Gemini 2.5 Pro $1.25 $10.00
GPT-4o $2.50 $10.00
Claude Sonnet 4.5 $3.00 $15.00
Claude Opus 4.6 Premium $5.00 $25.00

Pricing as of March 2026. Gemini 2.5 Pro costs double for prompts exceeding 200K tokens. Always check official provider pages for latest rates.

Notice how every model charges separately for Input (what you send) and Output (what the AI replies). You're paying for both directions. And notice the range — GPT-4o mini at $0.15 input vs Claude Opus at $5.00. Understanding tokens lets you pick the right model for the right job.


The English vs. Arabic Gap

Here's where it gets real for Arabic speakers:

🇺🇸 English — 100 words ~130 tokens
🇪🇬 Arabic — 100 words ~200 tokens
You're paying ~50% more for the same content in Arabic.

Not because Arabic is "worse" — purely because the tokenizer has seen less Arabic text and hasn't built as rich a vocabulary for it.

This gap has been improving with newer models. GPT-4o's tokenizer significantly improved Arabic efficiency vs. older models. The 50% figure is a reasonable average — actual overhead depends on the model and text type.


What This Means in Real Money

Say you send 100 messages a day for a month on a mid-range model:

🇺🇸 English

100 msg × 30 days × ~1,500 tokens

= 4.5 million tokens / month

~$22
vs

🇪🇬 Arabic

100 msg × 30 days × ~2,300 tokens

= 6.9 million tokens / month

~$34
~$12 per month lost — just from the language you write in.
Scale that to 500–1000 messages/day and it becomes hundreds of dollars.

Calculation assumes GPT-4o pricing with ~70% input / 30% output token split. Actual costs vary by message structure and model.


Impact #2: Context Window

Every AI model has a maximum number of tokens it can see at one time. This is called the Context Window — think of it like the model's working RAM.

Regular models 8K – 32K tokens ≈ 20 – 70 pages
Advanced models 128K – 200K tokens ≈ 300 – 460 pages
Largest models 1M – 2M tokens ≈ an entire book… or two!

As of 2026, most frontier models (GPT-4o, Claude Sonnet, Gemini 2.5) have 128K–1M context windows by default.

When the context window fills up, the AI can't process anything beyond it — it forgets the older parts of your conversation.

A bigger context window = the AI remembers more of your conversation. But the more you send, the more tokens you consume, the more you pay.


Impact #3: Speed

Here's something most people don't think about:

The AI doesn't generate its entire response at once. It produces one token at a time, and each token feeds into the next.

"smart" " glasses" " with" " camera" "..."

This is why you see text streaming word by word in chat interfaces. The model literally can't generate the full response in one shot — it's always token by token.

Golden rule: A longer response takes longer to arrive. Every extra token adds to your wait time.

Every Model Has Its Own Vocabulary

Different AI models don't share the same tokenizer. Each has its own vocabulary — a dictionary of tens of thousands to hundreds of thousands of word-pieces, each mapped to a unique numeric ID.

GPT-4o uses ~200,000 vocabulary entries. Older GPT models used ~100,000.

So the word "smart" might map to completely different numbers depending on the model:

Model A

"smart" = #10,119

Model B

"smart" = #28,644

This means token counts aren't directly comparable across models. A "1M token" context in one model doesn't hold the same amount of real text as a "1M token" context in another.


Special Tokens: The Hidden Language

Beyond regular text tokens, every model uses special tokens — reserved entries that signal structure, boundaries, and roles. You don't type them, but they're always there, and they count against your token budget.

Common Special Tokens

<|endoftext|>
End of document marker. Tells the model a document or conversation has ended. Prevents "bleeding" of context across separate inputs.
[CLS] / [SEP]
BERT classification & separator tokens. [CLS] marks the start of a sequence; [SEP] separates two inputs (e.g. question from context in Q&A tasks).
<|im_start|>
Role boundary tokens. Used in chat models to mark who is speaking. When you set role: "user" or write a system prompt, these tokens are automatically inserted — and they count.
[PAD]
Padding token. When processing a batch of sequences, shorter ones are padded to match the longest. These tokens are masked out during attention — invisible to the model, but occupying memory.
Practical impact: A typical system prompt in the OpenAI chat format adds ~4–10 special tokens on top of your actual text. In a long conversation these are negligible — but in a tight token budget or a batched production pipeline, they add up fast.

See It in Code (Python)

%pip install -q tiktoken
import tiktoken

tokenizer = tiktoken.encoding_for_model("gpt-4o")

text_en = "Ray-Ban Meta Ultra smart glasses with 48MP camera"
tokens_en = tokenizer.encode(text_en)
print(f"English token count: {len(tokens_en)}")   # ~12

text_ar = "نظارة ذكية خفيفة بكاميرا وترجمة فورية"
tokens_ar = tokenizer.encode(text_ar)
print(f"Arabic token count: {len(tokens_ar)}")    # ~15

And to see exactly what the AI sees — the numeric IDs behind each token:

for t in tokens_en:
    print(f"  ID {t:>6} → '{tokenizer.decode([t])}'")

Output:

ID 37513'Ray'    ID 8287'-B'    ID 270'an'
ID 27438' Meta'    ID 38414' Ultra'    ID 8847' smart'
ID 40081' glasses'    ID 483' with'    ID 3519'48'
ID 9125'MP'    ID 9427' camera'
The AI doesn't see "Ray-Ban" — it sees [37513, 8287, 270]

Compare the same concept across languages:


concept = {
    "English":  "smart glasses with camera",
    "Arabic":   "نظارة ذكية بكاميرا",
    "Chinese":  "带摄像头的智能眼镜",
    "Japanese": "カメラ付きスマートグラス",
}

print(f"{'Language':<10} {'Tokens':>7}")
print("-" * 20)
for lang, text in concept.items():
    n = len(tokenizer.encode(text))
    print(f"{lang:<10} {n:>7}")

# English      5  ← most efficient (largest training corpus)
# Arabic       9  ← ~80% overhead
# Chinese      7  ← moderate (improved in recent models)
# Japanese    10  ← kanji + katakana = complex splitting

3 Ways to Save Tokens (and Money)

1. Write your prompts in English, ask for a reply in Arabic

You save tokens on the input side — usually the larger part of a conversation. Your answer still comes back in Arabic, but your question used far fewer tokens to get there.

2. Be direct and concise

A short, specific prompt outperforms a long, rambling one — and uses fewer tokens. Every extra sentence costs money. A well-crafted prompt can save you tens of dollars a month at scale.

3. Specify the response length

Tell the AI exactly how long you want the answer: "respond in 3 bullet points" or "summarize in 2 sentences." This directly controls how many output tokens the model generates.


Pro Tips for Builders

💡 What Knowing Tokens Changes For You as a Developer

1.

Count tokens before you send, not after you pay. Use tiktoken (OpenAI) or the model provider's tokenizer to pre-count your prompts in code. Build token budgeting into your app's logic from day one — it's far harder to retrofit later.

2.

Use max_tokens to cap your output costs. The API parameter max_tokens (or max_completion_tokens) hard-limits the response length. Without it, an open-ended question can return a 2,000-token essay when you needed 50 words.

3.

System prompts are not free. A detailed system prompt of 500 words costs 600–800 tokens — on every single request. Distill it to its most essential constraints, or use prompt caching where the provider supports it.

4.

Match model to task, not to default. A summarization task that costs $0.50/day on Claude Opus costs $0.015/day on GPT-4o mini — with similar quality for simple tasks. Token pricing is the key variable in that decision.

5.

Conversation history multiplies your input token cost. In a chat app, every message in the history is re-sent with each new request. A 20-turn conversation means the 20th message costs 20× more input tokens than the 1st. Implement a rolling window or summarization strategy early.


The Core Insight

The AI doesn't understand language.

It understands numbers.

Every word you type gets converted to numbers. The model finds the right numbers to respond with. Those numbers become words. That's the entire process — a very sophisticated number-matching machine. Once you understand that, the cost, speed, and memory limits all make intuitive sense: they're all just constraints on how many numbers the model can process at once.


Try It Yourself

The best way to make this click is to see it live. Use OpenAI's free tokenizer tool at platform.openai.com/tokenizer — paste any text and watch it get split token by token, with each token highlighted in a different color.

Try this experiment: enter your full name first in Arabic, then in English. See how many tokens each version produces. If your name is on the longer side, the difference might surprise you.

Feel the cost difference across languages:

import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")

messages = {
    "English":  "Please summarize this article in three bullet points.",
    "Arabic":   "من فضلك لخص هذا المقال في ثلاث نقاط.",
    "French":   "Veuillez résumer cet article en trois points.",
    "Spanish":  "Por favor, resume este artículo en tres puntos.",
}

print(f"{'Language':<10} {'Tokens':>8}  {'Relative cost':>14}")
print("-" * 38)
base = len(enc.encode(messages["English"]))
for lang, text in messages.items():
    n = len(enc.encode(text))
    print(f"{lang:<10} {n:>8}  {n/base*100:>12.0f}%")

# Language    Tokens  Relative cost
# English         10          100%
# Arabic          11          110%
# French          13          130%
# Spanish         12          120%

Next in AI Fundamentals

Embeddings

The magic that lets the AI understand meaning from those numbers — and why "نظارة" and "glasses" end up representing the same concept even though they look nothing alike.

MH

Mohamed Hamed

20 years building production systems — the last several deep in AI integration, LLMs, and full-stack architecture. I write what I've actually built and broken. If this was useful, the next one goes to LinkedIn first.

Follow on LinkedIn →