The Language Tax: Why Different Languages Consume More Tokens
AI models break text into tokens. Spanish needs more than English to say the same thing. Here's why and what impact it has on your projects.
Contributors: Manu Rubio
Imagine you call a taxi with a meter. The destination is the same for all passengers, but the final price varies depending on the language you give the address in. English speakers pay 6 euros. Spanish speakers pay 9. And you, as the developer of the application, pay the difference.
That’s exactly what happens when you build something with AI for Spanish-speaking users. Not because the models discriminate against languages, but because of how they divide text before processing it.
Prerequisites
This post doesn’t assume previous AI knowledge. If you know what an API is (an endpoint to send requests to a service), you have enough. If you don’t, that’s fine too: what matters here is understanding the concept, not implementing it.
What is a Token
Before talking about inefficiencies, you need to understand what a token is, because it’s the central concept for everything that follows.
An AI model doesn’t read text the way you do. It doesn’t see the word “casa” as a unit. It receives a sequence of characters, breaks it into pieces called tokens, processes each piece separately, and then generates the response. Think of it like a puzzle: the model disassembles the text into pieces, works with them, and builds the response piece by piece as well.
Those pieces don’t follow a fixed rule. A short, common word can be 1 token: “casa”, “gato”, “the”, “cat”. A long or infrequent word can split into 2 or more: “anticonstitucional” could become “anti” + “constitu” + “cional”. Spaces, punctuation marks, and emojis also consume tokens.
As a rough rule for English: 1 token equals about 4 letters, or roughly three-quarters of an average word. For other languages, that proportion changes quite a bit.
Why English Has an Advantage: The BPE Bias
The process of creating tokens isn’t manual. There’s an algorithm called Byte-Pair Encoding (BPE) that learns to divide text automatically from large amounts of real text.
It works like this: the algorithm scans millions of texts and looks for sequences of characters that appear together most frequently. When it finds that “c-a-t” appears millions of times, it learns that that’s a complete unit and assigns it its own token. The same with “house”, “computer”, “the”. The result is a vocabulary of predefined tokens that the model can use directly, without fragmenting.
The problem is in the training data.
The vast majority of the text used to train these algorithms is in English. So BPE learned English very well: it memorized almost all its common words as complete tokens. When it encounters Spanish, German, or Arabic, it doesn’t recognize the words with the same frequency and is forced to break them into syllables or even individual letters. It’s not that it can’t read Spanish. It’s that it reads it less efficiently because it didn’t study it as thoroughly.
The Same Phrase, Different Tokens
To make this concrete, look at this example. The same idea in two languages:
| Phrase | Approximate Tokens |
|---|---|
| ”I love walking in the rain” | ~6 |
| ”Me encanta caminar bajo la lluvia” | ~9-11 |
“Encanta” and “caminar” tend to split because the tokenizer doesn’t recognize them as complete units as frequently as their English equivalents.
The extreme case is languages with non-Latin alphabets: Japanese, Arabic, Korean. The tokenizer doesn’t have predefined units for many of those characters and processes them practically one by one. A phrase that in English is 6 tokens can become 15 or more. It’s not a bug. It’s a direct consequence of what data the system was trained on.
The Real Cost: Money, Memory, and Speed
This isn’t just a curious technical fact. It has concrete effects when you program with AI.
The Economic Cost
Model APIs (like those from Anthropic or OpenAI) charge based on processed tokens. Each message you send and each response you receive is counted in tokens. If your application serves Spanish-language users and average messages consume significantly more tokens than English, your monthly bill will reflect that difference. For personal experiments, the impact is minimal. In production with thousands of users, it becomes relevant from the first month.
The Context Window
Models have a limit on how many tokens they can “remember” in a single conversation. That limit is called the context window (if you want to understand how to manage it well, the post on context window and best practices covers that in detail).
Imagine the model can handle 100,000 tokens total. In English, that could equal a very long conversation or an extensive document. In Spanish, that same limit is reached earlier because each page of text consumes more tokens. The model doesn’t lose capacity, but its “available memory” for your content shrinks.
Speed
The model generates text token by token. If saying the same thing in Spanish requires producing more tokens than English, the response takes proportionally longer to appear. In applications where user experience depends on response speed, this is noticeable.
What You Can Do About It
It depends on what part of the text you control.
If you control the instructions you send to the model (the prompt or system prompt, which is the initial message that tells the model how to behave), you can write them in English even if the final response is in Spanish. English instructions consume fewer tokens, and the model responds perfectly in the user’s language if you tell it to. Adding “respond in Spanish” at the end of an English system prompt works without issue. The post on prompt engineering for developers explores more techniques to optimize how you construct those instructions.
If the content the model processes comes from users (messages, documents, questions in Spanish), you don’t control the language. The tax simply exists and needs to be budgeted. What you can do is reduce the total volume: summarize documents before sending them to the model, eliminate repetitions, remove boilerplate. That helps regardless of language.
What doesn’t have a direct solution today is the structural gap between languages. It exists, and you need to design with it in mind from the beginning.
What’s Improving (and What’s Not)
Newer models have expanded their token vocabularies to include more languages. There are real improvements: Spanish today tokenizes better than it did two years ago. The gap with English has shrunk.
What hasn’t changed is the root cause: training data is still overwhelmingly in English. As long as that doesn’t change at scale, English will remain the most efficient language to work with these models.
For those of us building applications for Spanish-speaking users, this isn’t an abstract problem. It’s a cost that needs to be measured, anticipated in design, and considered when choosing architecture and context limits from the start.
One new concept every week
Checklist: What You Should Be Clear On Now
- A token is not a word: it can be a word, a syllable, or a character
- Models process and generate text token by token, not character by character or word by word
- Spanish generates more tokens than English to convey the same content
- The cause is the bias of the BPE algorithm’s training corpus toward English
- More tokens = higher API cost, context window that depletes sooner, slower responses
- System prompts can be written in English to reduce instruction tokenization cost
- Recent models have improved multilingual efficiency, but English remains the most efficient
Frequently Asked Questions
Why don’t models learn to read Spanish words completely from the start?
Newer models actually do it better. The problem is inherited: tokenization algorithms were trained on data that has far more English than any other language. Retraining them with more balanced corpora requires considerable resources and takes time. It’s being done, but English has a head start of years.
If I write the prompt in English but want the response in Spanish, does it work well?
Yes. You can write instructions in English and add at the end “respond in Spanish” or “responde en español”. The model follows the instruction without issue. English instructions consume fewer tokens, and the Spanish response will still consume the tokens that correspond to that language. It doesn’t eliminate the tax on the response, but it does on the instructions.
How much does this affect me if I’m learning?
Almost nothing economically: when you experiment with small volumes, the difference in tokens is cents. What matters is understanding the concept before you reach production. When you scale or process long documents, this factor becomes real and it’s worth having anticipated it.
Does Japanese or Arabic pay more tokens than Spanish?
Yes, quite a bit more. Languages with non-Latin alphabets have worse tokenization efficiency than European languages. In some cases the same content might need double or more tokens than English. It’s the same structural problem, but amplified by the distance between the alphabet and what BPE learned.