Game Localization - Multilingual Project Management
Don't localize. Loekalize.

Other language pairs available upon request
SEGA: "You are faster than Sonic! It's easy to read, and you clearly have experience with these types of texts."

Recent projects

  • Dota 2 (Japanese)
  • Beat Cop (Japanese and Chinese)
  • Motorsport Manager (Dutch)
  • SEGA's official website (Dutch)
  • Multiple AAA titles for Electronic Arts (Dutch)
  • Gremlins Inc. (Japanese and Chinese)
  • Punch Club (Japanese)
  • Arma 3 and Argo (Japanese)
  • Satellite Reign (6 languages)
  • Mario & Sonic at the Olympic Games (Dutch)

  • Main Dutch language consultant for Electronic Arts 2010-2017.
  • Translation of all press releases, game packagings and national TV commercials for SEGA.
  • Localization of help files for Valve Corporation.
  • Localization of games for Bohemia Interactive.
  • Localization of games for Charlie Oscar.
  • Translation of all press releases and game manuals for Bigben Interactive (Square Enix/Turtle Beach).
  • Localization of websites, games and Xbox LIVE for Microsoft.
  • Translation of websites, games and national TV commercials for Electronic Arts.
  • Translation of package texts for Nordic Games.
  • Translation of website texts for Codemasters.
  • Translation of all game cards for NCSoft.
Why deep learning sucks

After statistical machine translation translators are threatened by a new technology called deep learning. It's very scary, mysterious stuff conjuring up images of Matrix-like worlds in which human beings are enslaved by computers and robots that have completely outsmarted us in every conceivable way. Purchasing departments of translation agencies like to tell us that our end is near and that the only way to survive is to lower our rates, because else the machines will take over.

"Know thy self, know thy enemy", Sun Tzu said, and the guy was right. Also: dinosaurs went extinct for a reason. The only way to survive is to adapt.

Though I'm a bit of a hobbyist programmer myself, using languages like PHP, MySQL and even a bit of C# to find my way in life, I consider myself by no means an expert on IT-related matters. However, the very fact that I earned my black belt in judo despite being a total loser when it comes to sports proves that you can achieve a lot by mere perseverance. Or, as an artist friend of my once stated: "If you throw enough shit to the wall, eventually some of it will stick."


So I've been throwing around shit a little, with interesting results. On this website you'll find DeepL, a neural network that has been trained with billions of sentences from Contrary to Google Translate, these sentences have all been verified by actual human beings, ensuring that the input DeepL bases its translations on is of high quality.

"DeepL runs on a supercomputer in Iceland, that is capable of performing more than 5,100,000,000,000,000 floating point operations per second. This would rank 23rd in the current list of the world's top 500 supercomputers. The super computer has enough power to translate 1,000,000 words in under a second.

In blind tests pitting DeepL Translator against the competition, translators preferred DeepL's results by a factor of 3:1. Automated tests bear this out, as well. Within the field of machine translation, the gold standard for measuring system performance is the bilingual evaluation understudy (BLEU) score, which compares machine-translated texts with those produced by a translator. DeepL Translator achieves record BLEU scores." End quote.

And indeed, if you tell DeepL to translate articles from newspapers, maybe even scientific papers or other rigid texts, the results are very impressive. Absolutely unacceptable when it comes to publishing, but definitely good enough to give a very good impression of the meaning of a text. Scarily good even: some translations are absolutely flawless.

So far DeepL's advertisement space. As said, these examples, plus the examples DeepL advertises, are based on very rigid, robotic texts. And of course robots are very good at doing robotic things; that's why they're called robots in the first place.

In the end however, only one thing matters: how do machines like DeepL perform in your own domain? Is DeepL a threat for me and should I start looking for another job, or is this just another scheme to lower my rates and work more for less?

Hayabusa: my gaming and translation set-up

About nerds and supernerds

First things first, it's important to realize that the brilliant scientists behind machine learning technology are not after our jobs. They're normal people like you and me, who are fascinated by the stuff they can accomplish and do: they want to build machines that can do the stuff of humans, and then better: sometimes out of mere curiosity and fascination, and sometimes because they think it will improve the world. I know, because I happen to know a few people who work in this industry. When I told them that project managers of translation agencies expect us to give discounts on the results produced by the machines they invented, they were flabbergasted: a bit like Einstein who regretted the invention of the atom bomb for the rest of his life, when he found out what people were actually planning to do with it. Naive? Maybe. Fact is that discounts on semimanufactured products is not what the makers had in mind when designing these systems.

I also know for a fact that every single MT-based product or platform I know of is based on open-source software that is freely available for everyone. This software then gets slammed a nice sticker on it and is packaged in a browser-friendly or Windows-friendly format by slick managers, and suddenly you find yourself with a flashy new product costing hundreds or thousands of dollars.

So why doesn't anybody use the free open-source software? Because it's all made by nerds, Unix-based and very complicated. More on that later.

Statistical Machine Translation vs. Deep Learning

Back to the main question: indeed, there are things that computers do a lot better than human beings. Driving cars and diagnosing diseases being just a few examples. But how does that hold up for translation?

Machine translation as we know it, statistical machine translation, is based on statistics (surprise!) Basically it compares data sets of two languages, sentence by sentence, to calculate probabilities for word sequences. In this particular article for example, chances that the word machine will be followed by translation and not, say gingerbread, is pretty big. Think like this for every conceivable sequence of words in your corpus, and you've got machine translation in a nutshell. The technology can do this even for sequences of multiple words, called n-grams, to make translations sound even more natural. So instead of calculating the probability for what comes after the word machine, it could also calculate the probability for what comes after the words machine translation. On this particular website, the chance that that particular sequence of two words (a so-called 2-gram) will be followed by a word like sucks is pretty big. And so on.

About white sheep and pink dildos

Thing is that these sets need to be trained for every conceivable language pair in the world and take a lot of time to be maintained. Enter deep learning: you just throw a corpus of patterns in it and tell the machine to figure out things by itself, by learning to recognize patterns. A simple example would be to throw in thousands of images, a few hundreds of these containing sheep, including a description of the images. After a few dozens of pictures, the machine will conclude that most sheep are white, and that hence a picture of a pink dildo is probably not a sheep. Of course it's a lot more complicated than that, but us translators were never very good with beta stuff and this is more or less what the system boils down to. It's therefore important to understand that AI is nothing to be scared of: there is no consciousness or intelligence involved. The machine does what everybody would have done after seeing dozens of pictures with sheep. The difference with human beings is the enormous scale on and speed with which this is happening.

The results are surprisingly good, and before you know it, the computer can tell you exactly which pictures contain sheep and which do not – to a certain extent that is, as contrary to popular belief these systems are everything but smart and can be easily fooled. In fact, the balance on which the machine's conclusions are based is pretty shaky, invoking a new kind of sports named How To Fool The Computer: scientists have been able to influence the outcome of the results by changing just one pixel in images of sheep, invisible to the human eye, making the machine believe that the sheep was actually not a sheep, but a pink dildo after all. This is a mistake that a human being would never make, and it also makes the technology very easy to hack by criminals or foreign governments. So if I were you, I wouldn't put my life in its hands yet.

Another big difference between machine learning on neural networks and statistical machine translation is the former's flexibility. Neural networks, specifically recurrent neural networks or RNN's, consist of a so-called encoder and decoder. The encoder is fed with sequences of words, and every successive word influences the direction in which the machine is thinking (this is called a vector or tensor). Once the sentence has ended, we get a unique vector expressed with a very precise number, say 0.938459983183754. This number is then fed to the decoder, that has learned to translate these numbers to an actual sentence in the target language. This way you can encode very long sequences of words without depending on the above-mentioned n-grams (which mostly never become longer than sequences of 9 words or so-called 9-grams due to the sheer number of possible combinations) in a unique and very precise way. This can then be refined with so-called attention spans: having certain groups of words in said sentence influence the numeric outcome more or less by automatically assigning weights to them.

This is all very interesting and fascinating stuff, but in the end there's only one thing we want to know: what will this do to our profession? Are we going to believe the translation agencies screaming that our end is near and that we need to drop our rates, or are we going to learn more about our enemy? I opt for the latter.

The DeepL that failed

The first thing I did was feeding a few game texts of mine to DeepL, but the results were catastrophic (see the addendum on the bottom of my previous article about machine translation). No sane person would ever dare send the result to a client. Game translations are a mix of smooth marketing talk and slick dialogues on one end and incredibly rigid and robotic texts on the other end, interspersed by tags, tags and more tags. Surprisingly, or actually not, neutral networks are very unreliable when it comes to translate the robotic parts too (robotic as in: far more robotic than newspaper texts). Neural networks (and statistical machine translation engines) are fuzzy by nature and therefore not fit for work like this (the most notorious example being the word two that was translated as three in my previous article, because the machine found that this sounded more "natural"). Translating words one by one using your terminology database is a far more reliable method when it comes to texts like that, for reasons already explained in said article. So what's left is the smooth marketing talk and slick dialogue. This is the fuzzy stuff neural networks are supposed to be good at. DeepL failed here too however. When I consulted a few of my neural network friends, they told me the cause was probably the fact that DeepL was not tuned to game translations, and they recommended me to start training my very own game translation dog. We'll call him Benji.
Benji is not as scary as you think

Installing Benji, my new dog

Now installing Benji (or OpenNMT, the actual name of the open-source software from Harvard University that enables you to make your own little DeepL) is everything but plug-and-play. First, scientists are a bit nerdy, and Windows is of course far too common to be classified as nerdy. Real nerds use Unix, and therefore all software for building neural networks is Unix software.

First I experimented a bit in virtual Unix machines that can be installed on top of Windows, but being systems on top of other systems these are pretty slow and cannot use the extra capacity of the GPU's (graphic cards) on my top-notch game laptop. Okay, they can via a technology called passthrough, but that technology is still in its infancy, incredibly complicated to configure and not really worth it if you know that you can achieve the same and probably a more reliable result by simply partitioning one of your hard disks and turning your system into a fully-fledged Unix machine by making it dual-boot.

Now, I know nothing about Unix, so the first few evenings all I was learning was how to open files, installing applications, navigating folders and all stuff even the most alpha-minded translator can do on Windows in the blink of an eye. Unix is... complicated. Probably a lot better and safer than Windows, but still complicated. There are like five ways to install a simple application, one even more intricate than the other, and especially if you enter the world of neural networks, it's not a matter of clicking a logo in your app store and then pressing OK. I mean, neural networks are the realm of supernerds, and supernerds don't need explanations about... anything, as they know everything already.

So, you'll need to verify the make and model of your graphics cards (my system uses a so-called SLI configuration consisting of two graphics cards working in tandem) and install the latest stable drivers for those using several instructions from your command prompt (using a mouse is a dirty concept in Unix environments). To give you an idea about how that works, you'll have to find out, all by yourself, that you need to open something called a terminal and then type the following:

sudo apt-get purge nvidia*
sudo add-apt-repository ppa:graphics-drivers
sudo apt-get update
sudo apt-get install nvidia-387

Easy peasy, for nerds

And that's just the first step of an incredibly complicated process. You'll be reading documents and asking for help on forums for days at a stretch trying to install matching versions of CUDA (NVIDIA's parallel computing architecture that enables increases in computing performance by harnessing the power of the GPU), cuDNN (CUDA Deep Neural Network library, a GPU-accelerated library of primitives for deep neural networks), Python (a widely used high-level programming language for general-purpose programming), TensorFlow (an open-source software library for machine intelligence) and OpenNMT, an open source initiative for neural machine translation and neural sequence modeling).

Then, finally, after you have gone through all this, this is what you get: a nice command prompt, soon coming to a theatre near you.
All versions of these need to be matched, meaning that you can start all over again once it turns out that OpenNMT only plays nicely with TensorFlow 1.2, which only plays nicely with CUDA 6.0 and not CUDA 9.0 and definitely not CUDA 8.0 as advertised, as CUDA 6.0 fancies cuDNN 7.0 and not cuDNN 7.1, because cuDNN 7.0 introduced a bug that can be circumvented only by editing python scripts in Python 2.7 (and not version Python 3.6) using software called gedit which needs to be installed from the command prompt. Of course, as the Unix enthusiasts will tell you, you need to call these scripts from the OpenNMT directory and not the usr/home directory (unless you have edited your PATH variable using gedit ~/.profile from the terminal). Of course, of course! But this applies only if you have Ubuntu 16.04 and not Ubuntu 14.04, and in all other cases the world will explode unless you have a purple desktop background or a Hebrew-sounding name ending with K. Or something like that. By the time you're done you have probably become a fully-fledged programmer and have learned Tagalog and ten other exotic languages while you were at it. (I've made up some version numbers and directory names as I forgot all the stuff I had to do to get this working, but you get the general idea, I hope.)

Now this rocket science is all basic stuff for nerds and programmers who can do in ten minutes what took me a whole fricking week, but I think the above is way out of reach for 99.9% of the translator community, unless people really have too much time on their hands like me and find climbing Mount Everest not enough of a challenge. This is where the slick managers step in selling you those flashy Windows-packages for hundreds of euros, that are all based on free software no one else dares install for good reason.

Being the gamer that I am, after all the above I expected at least the newest installment of Lord of the Rings – The Mother of All Games (A 700-hour Epic about Love and Lost Romances, Forgotten Wars and Elven Kingdoms) in Ultra-HD resolution, but the above purple screenshot was as exciting as it got.

The network training itself


You then need to prepare your corpus, in my case a database containing 399,684 sentences with 3,933,677 English words and 3,969,731 Dutch words, all pertaining to one client and one field (games). Preparing means getting rid of everything that makes your text dirty (capitals, lay-out tags) and tokenizing the text (making sure that all words but also all symbols are separated by spaces). There are probably utilities for this, but I programmed myself a simple PHP Script that solved this issue.

In the world of neural networks, the above is a medium-sized data set. I have far more data in my memories, but I chose a client-specific memory containing translations for games released by that client only. Sportsmanlike as I am, I'd use the resulting network only on texts from the very same client (okay, and a few general game texts from another client, just to compare the results). It doesn't get fairer than that.

There are multiple platforms Benji can run on. I chose TensorFlow, a library that includes TensorBoard, a tool which visualizes the data and gives you some nice eye-candy while the network is learning and training itself. While the network is training, it will evaluate itself every so many minutes using so-called validation data: an extra and much smaller corpus which it will try to translate using the patterns it has learned, to then compare it with the actual human translation and scoring itself. Benji will continue learning until it's satisfied about its score or until you tell it to stop.

Benji's brains. You can search for words and measure the distance to other words, id est how big the chance is said word will be followed by those other words.

Wax on, wax off

Now this training is the stuff of legends. Training Slate with Statistical Machine Translation in my previous article took 4 to 5 hours. Training my own neural network with OpenNMT on the very same system took more than 59 (!) hours (on a virtual system without usage of extra GPU's). During this time, your system is fully occupied slowing down to a crawl, to the extent that even opening up a simple browser window can take up to 10 seconds. Now you may think I use an hopelessly old dusty system, but we're talking about Hayabusa, my 7000 euro game laptop with an i7-6700K processor, 2 GTX980 graphic cards and 64 GB on-board memory. If that sounds like Chinese to you: currently (2017) this is still a bit of a monster and a very high-end laptop. This explains why neural networks are trained on dedicated systems. In normal language: you'll need to buy yourself the fastest and most expensive system you can think of, separately, just to be able to structurally train your own networks.

Now this was all done in a virtual Unix on top of Windows. Only after that I switched to the previously mentioned dual-boot Unix system that takes full advantage of my GPU's, but even though I haven't finished training my system on that configuration yet, it's clear that after 8 hours or so the system still has a very long way to go. Activating your GPU's is not going to do miracles, I'm afraid, but it may speed up things a bit.

An inside look

Playing around with TensorBoard is fun. The tool offers you a direct look into Benji's brain: you can see how words relate to other words while the visualization changes and morphs on the fly and the network is learning new things and improving itself. This, indeed, is a scary sight as it's almost like an organism growing, until it eventually becomes so smart that it can stretch its tentacles out of the computer to strangle its owner and take over the world.

The longer it trains, the less at a loss Benji is
Another interesting thing to watch is the so-called loss parameter, which shows how much the network is at a loss trying to understand the data it is analyzing (the word loss actually refers to something else, but this explanation makes it easier to understand). The lower the score, the better the translation results should be. Every so many minutes a preliminary translation model is saved, which you can then immediately use to translate new texts.

The results

By comparing the results of the different translation models, you can see the network gradually improving itself in a very tangible way, but as my data below shows, even the final result is far from usable. It does save you some typing work though. In fact, even though it took the computer ten times as long to train itself compared to Slate (basically a flashy Windows interface for the Unix-based Moses, see my previous article on statistical MT), the results were far worse, and it's interesting and reassuring at the same time to see that no matter how incredibly sophisticated the technology behind neural networks is, it is still easily beaten by something as simple and stupid as Dumb Assembly (again see my previous article).

English sourceNeural network, backtranslated from Dutch to English (after 59 hours of training in OpenNMT on a 4-million word domain-specific and client-specific corpus)Statistical machine translation, backtranslated from Dutch to English (after 1,5 hour of training in Moses on a 4-million word domain-specific and client-specific corpus)
NBA LIVE Mobile brings all the spectacle of NBA basketball to Android and iOS platforms. NBA Live Mobile brings the ? from ? to Android and ? NBA LIVE Mobile brings all the spectacle of NBA basketball to Android and iOS platforms.
Complete challenges with players who are at a crossroads in their careers, like a rookie, a veteran on a new team, a star joining a super team, and more. Complete challenges with players who are on a ? in their ? as a ? a ? a ? a star to a ? and ? Complete challenges with players who at a crossroads in their careers, like a rookie in a veteran, a new team to join a super team a star), and more.
Genghis Khan's alliance is probably the best known example, thanks largely to the remarkable document, the Secret History of the Mongols, which chronicled the Khan's rise from lone fugitive to the ruler of one of the greatest empires the world has seen. ? ? alliance is probably the best ? thanks to the incredible ? of the ? the secret history of the ? that the ? ? of one of the biggest ? that the world has seen. Genghis khan is probably the best known alliance: Thank you for the documenting, especially the incredible history of the secret chronicled mongols, that is the only khan from fugitive to the ruler of one of the greatest achievements on an empire. The world is
From small fires to structural fires to gas explosions, the factory fire defence has everything that a fire department faces at a modern industrial site. From small fires to fire, fires to ? fires the ? to ? has everything what a ? ? on a modern industrial ? Has to fire from small structural factory to the fire, gas explosions to fire. That is an industrial Department do on a modern site.
This nimble airship has room for a pilot with three gunners and will be a frightening sight on the battlefield. This nimble ? for a pilot with three gunners and are a frightening sight on the ? This nimble airship has room for a pilot with three gunners and will be a frightening sight on the battlefield.
Turning Tides extends the arsenal by introducing the new L-Class Destroyer ready to rule the waves in naval combat and the new C-Class Airship delivering death from above. Turning Tides extends the arsenal by introducing the new ? Destroyer for the waves in naval combat and the new ? Airship ? death from ? Turning Tides extends the arsenal of the new L-Class Destroyer Are you ready to rule in the naval combat and the waves on the new C-Class Airship the death coming from above.

Above you can see some samples from the results. The question marks in the neural network column are words or patterns the neural network still had not figured out after 59 hours: a nice English translation would be WTF!

Statisical machine translation (Moses) managed to get two sentences precisely right, but I have carefully selected these from a total of 151 pairs: in fact Moses only got 1,3% of all sentences offered exactly right. For a more careful analysis of the results, I have used a Levenshtein formula, which basically calculates the edit distance (difference) between the machine translations and the human translation I came up with myself (the desired result so to speak). The larger this number, the worse the result. The desired result is 0. There's no comparison with my own Dumb Assembly system (explained in my previous article) this time, because we already know it performs way better than statistical machine translation, and way worse than human translation. As training systems takes time, I have run the same text through the evolved model every 5 hours or so, so that you can see the quality of the machine translations improve over time, until it arrives at a point on which further improvement is no longer possible unless you throw even more data at it (apparently 4,5 million words is not enough to make Benji understand). Id est, the final result is the best you can get using the current data set.

Each number on the X-axis stands for a duration of approximately 4,5 hours. Within the first 4,5 hours, the statistical machine translation and my own human translation were already finished. Even after 59 hours (number 13 on the X-axis), deep learning was far behind. The orange on top is deep learning, the grey line under that is statistical machine translation, the yellow line on the bottom is human translation.

Ice ice baby

Yes, DeepL, after being fed with billions of sentences and harnessing the power of a computer in Iceland so big it needs to be cooled with glaciers, does a pretty impressive job when it comes to rigid texts like newspaper articles and scientific papers. However, it is absolutely unfit for legal texts, as neutral networks will sometimes skip or misinterpret words on purpose to make the result sound more natural (which is exactly what you do not want in a legal translation). It is also unfit for texts that are too robotic, like tables in technical user manuals, for exactly the same reason. And it is also unfit for smooth marketing texts and slick dialogues, that are so fuzzy that you'd need to feed the whole planet to the system to make sense of all the data, which is exactly the problem as material in this field is so scarce. Also, how much more data does a system need until it has seen about every possible sentence in the world? Hasn't a system like that already been implemented for years in a much more simple form? Exact and fuzzy matches, anyone? Neural networks seem like an extremely complicated way to generate answers to questions you have already answered yourself, if an exact match is all you need.

Neural networks seem like an extremely complicated way to generate answers to questions you have already answered yourself, if an exact match is all you need.

TensorBoard's dataflow graph representing all computations in terms of the dependencies between individual operations in the neural network. I have no idea what that means, but it looks really cool.

Incremental updates

Is it fun? Oh boy is it fun! This is the stuff we dreamed of when we were children: our own cute little pet that tries to be intelligent, fumbling and stumbling around and making us roar with laughter. Benji can save you typing work, yes, but we've already got other tools for that like Nuance Dragon Speech and Dumb Assembly. Also, in its current state you can't feed new incremental data to the system. So for example, if you receive a new batch from your client introducing a new concept (in my case a new game), the network will perform even worse than usual, as new concepts introduce new words and patterns the network hasn't seen yet. You'd think you can easily solve this by feeding the network a few extra words, but that won't work: as in a neural network, everything is linked to everything, the only way to expand on it is to retrain the entire network. And that took... oh yeah, 59 hours. 59 hours to learn one fricking new word.

There are new concepts being introduced, like ModernMT, that do support incremental updates, but this technology is still in its infancy. Even the makers of OpenNMT called it new and very experimental, and no one knows the outcome yet.

Fact is that to be actually able to use neural networks in our workflow, it needs to be completely integrated in tools like memoQ or Trados, and we'd need to be able to constantly feed it with new data, like we're feeding our current memories with every single sentence we add to them. Feedback to and from the neural network needs to be on the fly, and with current training times of medium-sized client-specific and domain-specific networks on different operating systems of more than 59 hours instead of the 0.1 second we need, we still have a very long way to go.


Currently our enemy turns out not to be our enemy, but a helping friend who is being framed by slick sales managers. Sales managers that abuse our lack of knowledge about certain subjects so that they can earn more while doing less. Next time when they tell you to give them a discount because they were kind enough to pretranslate their texts with their machines (I'm not talking exact matches here!), point them to this article.

DeepL is currently for free, and that is exactly the amount you should invest in discounts on pretranslations done by machines. Why would you ever pay a project manager for pressing a button you can press yourself? Now if client-specific and domain-specific data sets generated far better results the story may have been different, but at least for game translations (and all other translations requiring the slightest amount of creativity) this is absolutely not the case. It's not like deep learning lost from humans by a fraction. Deep learning was crushed, annihilated and utterly destroyed by a mere mortal like me. Apparently translation is a totally different beast than games like chess.

Don't get me wrong. I love Benji and have found a new friend to play with. We love each other, and I have tremendous respect for all the nerds that helped me during my journey through this system. But giving paws and playing dead is all Benji will do for the next few years, and I expect many years to come.

Loek van Kooten
Your English/Japanese-Dutch game translator


About Me | Contact Me | ©2006-2017 Loek van Kooten