In 2006, we started with machine learning-based translations between English and Arabic, Chinese and Russian. Almost 10 years later, with today’s update, we now offer103 languages that cover 99% of the online population.
The 13 new languages — Amharic, Corsican, Frisian, Kyrgyz, Hawaiian, Kurdish (Kurmanji), Luxembourgish, Samoan, Scots Gaelic, Shona, Sindhi, Pashto and Xhosa — help bring a combined 120 million new people to the billions who can already communicate with Translate all over the world.
So what goes into adding a new language? Beyond the basic criteria that it must be a written language, we also need a significant amount of translations in the new language to be available on the web. From there, we use a combination of machine learning,licensed content and Translate Community.
As we scan the Web for billions of already translated texts, we use machine learning to identify statistical patterns at enormous scale, so our machines can “learn” the language. But, as already existing documents can’t cover the breadth of a language, we also rely on people like you in Translate Community to help improve current Google Translate languages and add new ones, like Frisian and Kyrgyz. So far, over 3 million people have contributed approximately 200 million translated words.
Before you dive into translating, here are a few fun facts about the new languages:
- Amharic (Ethiopia) is the second most widely spoken Semitic language after Arabic
- Corsican (Island of Corsica, France) is closely related to Italian and was Napoleon’s first language
- Frisian (Netherlands and Germany) is the native language of over half the inhabitants of the Friesland province of the Netherlands
- Kyrgyz (Kyrgyzstan) is the language of the Epic of Manas, which is 20x longer than the Iliad and the Odyssey put together
- Hawaiian (Hawaii) has lent several words to the English language, such as ukulele and wiki
- Kurdish (Kurmanji) (Turkey, Iraq, Iran and Syria) is written with Latin letters while the others two varieties of Kurdish are written with Arabic script
- Luxembourgish (Luxembourg) completes the list of official EU languages Translate covers
- Samoan (Samoa and American Samoa) is written using only 14 letters
- Scots Gaelic (Scottish highlands, UK) was introduced by Irish settlers in the 4th century AD
- Shona (Zimbabwe) is the most widely spoken of the hundreds of languages in the Bantu family
- Sindhi (Pakistan and India) was the native language of Muhammad Ali Jinnah, the “Father of the Nation” of Pakistan
- Pashto (Afghanistan and Pakistan) is written in Perso-Arabic script with an additional 12 letters, for a total of 44
- Xhosa (South Africa) is the second most common native language in the country after Afrikaans and features three kinds of clicks, represented by the letters x, q and c
We’ve come a long way with over 100 languages, but we aren’t done yet. If you want to help, International Mother Language day — just around the corner on February 21 — is a great time to get involved in Translate Community. To start, just select the languages you speak; then choose to either translate phrases on your own or validate existing translations. Every contribution helps improve the quality of translation over time. You can also share feedback directly from Translate.Google.com, so as you try out the new languages, we’d love to hear your suggestions.
For each new language, we make our translations better over time, both by improving our algorithms and systems and by learning from your translations with Translate Community. Today’s update will be rolling out over the coming days.
No matter what language you speak, we hope today’s update makes it easier to communicate with millions of new friends and break language barriers one conversation at a time.