Why deep learning sucks After statistical machine translation translators are threatened by a new technology called deep learning. It's very scary, mysterious stuff conjuring up images of Matrix-like worlds in which human beings are enslaved by computers and robots that have completely outsmarted us in every conceivable way. Purchasing departments of translation agencies like to tell us that our end is near and that the only way to survive is to lower our rates, because else the machines will take over. "Know thy self, know thy enemy", Sun Tzu said, and the guy was right. Also: dinosaurs went extinct for a reason. The only way to survive is to adapt. Though I'm a bit of a hobbyist programmer myself, using languages like PHP, MySQL and even a bit of C# to find my way in life, I consider myself by no means an expert on IT-related matters. However, the very fact that I earned my black belt in judo despite being a total loser when it comes to sports proves that you can achieve a lot by mere perseverance. Or, as an artist friend of my once stated: "If you throw enough shit to the wall, eventually some of it will stick."
sudo add-apt-repository ppa:graphics-drivers
sudo apt-get update
sudo apt-get install nvidia-387
Above you can see some samples from the results. The question marks in the neural network column are words or patterns the neural network still had not figured out after 59 hours:
a nice English translation would be WTF!
Statisical machine translation (Moses) managed to get two sentences precisely right, but I have carefully selected these from a total of 151 pairs:
in fact Moses only got 1,3% of all sentences offered exactly right. For a more careful analysis of the results, I have used a Levenshtein formula, which basically calculates the edit distance
(difference) between the machine translations and the human translation I came up with myself (the desired result so to speak). The larger this number, the worse the result. The desired result is 0.
There's no comparison with my own Dumb Assembly system (explained in my previous article) this time, because we already know it performs way better than
statistical machine translation, and way worse than human translation. As training systems takes time, I have run the same text through the evolved model every 5 hours or so, so that you
can see the quality of the machine translations improve over time, until it arrives at a point on which further improvement is no longer possible unless you throw even more data at it (apparently 4,5 million words is not enough to make Benji understand). Id est,
the final result is the best you can get using the current data set.
Your English/Japanese-Dutch game translator
DeepLSo I've been throwing around shit a little, with interesting results. On this website you'll find DeepL, a neural network that has been trained with billions of sentences from linguee.com. Contrary to Google Translate, these sentences have all been verified by actual human beings, ensuring that the input DeepL bases its translations on is of high quality. "DeepL runs on a supercomputer in Iceland, that is capable of performing more than 5,100,000,000,000,000 floating point operations per second. This would rank 23rd in the current list of the world's top 500 supercomputers. The super computer has enough power to translate 1,000,000 words in under a second. In blind tests pitting DeepL Translator against the competition, translators preferred DeepL's results by a factor of 3:1. Automated tests bear this out, as well. Within the field of machine translation, the gold standard for measuring system performance is the bilingual evaluation understudy (BLEU) score, which compares machine-translated texts with those produced by a translator. DeepL Translator achieves record BLEU scores." End quote. And indeed, if you tell DeepL to translate articles from newspapers, maybe even scientific papers or other rigid texts, the results are very impressive. Absolutely unacceptable when it comes to publishing, but definitely good enough to give a very good impression of the meaning of a text. Scarily good even: some translations are absolutely flawless. So far DeepL's advertisement space. As said, these examples, plus the examples DeepL advertises, are based on very rigid, robotic texts. And of course robots are very good at doing robotic things; that's why they're called robots in the first place. In the end however, only one thing matters: how do machines like DeepL perform in your own domain? Is DeepL a threat for me and should I start looking for another job, or is this just another scheme to lower my rates and work more for less?
About nerds and supernerdsFirst things first, it's important to realize that the brilliant scientists behind machine learning technology are not after our jobs. They're normal people like you and me, who are fascinated by the stuff they can accomplish and do: they want to build machines that can do the stuff of humans, and then better: sometimes out of mere curiosity and fascination, and sometimes because they think it will improve the world. I know, because I happen to know a few people who work in this industry. When I told them that project managers of translation agencies expect us to give discounts on the results produced by the machines they invented, they were flabbergasted: a bit like Einstein who regretted the invention of the atom bomb for the rest of his life, when he found out what people were actually planning to do with it. Naive? Maybe. Fact is that discounts on semimanufactured products is not what the makers had in mind when designing these systems. I also know for a fact that every single MT-based product or platform I know of is based on open-source software that is freely available for everyone. This software then gets slammed a nice sticker on it and is packaged in a browser-friendly or Windows-friendly format by slick managers, and suddenly you find yourself with a flashy new product costing hundreds or thousands of dollars. So why doesn't anybody use the free open-source software? Because it's all made by nerds, Unix-based and very complicated. More on that later.
Statistical Machine Translation vs. Deep LearningBack to the main question: indeed, there are things that computers do a lot better than human beings. Driving cars and diagnosing diseases being just a few examples. But how does that hold up for translation? Machine translation as we know it, statistical machine translation, is based on statistics (surprise!) Basically it compares data sets of two languages, sentence by sentence, to calculate probabilities for word sequences. In this particular article for example, chances that the word machine will be followed by translation and not, say gingerbread, is pretty big. Think like this for every conceivable sequence of words in your corpus, and you've got machine translation in a nutshell. The technology can do this even for sequences of multiple words, called n-grams, to make translations sound even more natural. So instead of calculating the probability for what comes after the word machine, it could also calculate the probability for what comes after the words machine translation. On this particular website, the chance that that particular sequence of two words (a so-called 2-gram) will be followed by a word like sucks is pretty big. And so on.
About white sheep and pink dildosThing is that these sets need to be trained for every conceivable language pair in the world and take a lot of time to be maintained. Enter deep learning: you just throw a corpus of patterns in it and tell the machine to figure out things by itself, by learning to recognize patterns. A simple example would be to throw in thousands of images, a few hundreds of these containing sheep, including a description of the images. After a few dozens of pictures, the machine will conclude that most sheep are white, and that hence a picture of a pink dildo is probably not a sheep. Of course it's a lot more complicated than that, but us translators were never very good with beta stuff and this is more or less what the system boils down to. It's therefore important to understand that AI is nothing to be scared of: there is no consciousness or intelligence involved. The machine does what everybody would have done after seeing dozens of pictures with sheep. The difference with human beings is the enormous scale on and speed with which this is happening. The results are surprisingly good, and before you know it, the computer can tell you exactly which pictures contain sheep and which do not – to a certain extent that is, as contrary to popular belief these systems are everything but smart and can be easily fooled. In fact, the balance on which the machine's conclusions are based is pretty shaky, invoking a new kind of sports named How To Fool The Computer: scientists have been able to influence the outcome of the results by changing just one pixel in images of sheep, invisible to the human eye, making the machine believe that the sheep was actually not a sheep, but a pink dildo after all. This is a mistake that a human being would never make, and it also makes the technology very easy to hack by criminals or foreign governments. So if I were you, I wouldn't put my life in its hands yet. Another big difference between machine learning on neural networks and statistical machine translation is the former's flexibility. Neural networks, specifically recurrent neural networks or RNN's, consist of a so-called encoder and decoder. The encoder is fed with sequences of words, and every successive word influences the direction in which the machine is thinking (this is called a vector or tensor). Once the sentence has ended, we get a unique vector expressed with a very precise number, say 0.938459983183754. This number is then fed to the decoder, that has learned to translate these numbers to an actual sentence in the target language. This way you can encode very long sequences of words without depending on the above-mentioned n-grams (which mostly never become longer than sequences of 9 words or so-called 9-grams due to the sheer number of possible combinations) in a unique and very precise way. This can then be refined with so-called attention spans: having certain groups of words in said sentence influence the numeric outcome more or less by automatically assigning weights to them. This is all very interesting and fascinating stuff, but in the end there's only one thing we want to know: what will this do to our profession? Are we going to believe the translation agencies screaming that our end is near and that we need to drop our rates, or are we going to learn more about our enemy? I opt for the latter.
The DeepL that failedThe first thing I did was feeding a few game texts of mine to DeepL, but the results were catastrophic (see the addendum on the bottom of my previous article about machine translation). No sane person would ever dare send the result to a client. Game translations are a mix of smooth marketing talk and slick dialogues on one end and incredibly rigid and robotic texts on the other end, interspersed by tags, tags and more tags. Surprisingly, or actually not, neutral networks are very unreliable when it comes to translate the robotic parts too (robotic as in: far more robotic than newspaper texts). Neural networks (and statistical machine translation engines) are fuzzy by nature and therefore not fit for work like this (the most notorious example being the word two that was translated as three in my previous article, because the machine found that this sounded more "natural"). Translating words one by one using your terminology database is a far more reliable method when it comes to texts like that, for reasons already explained in said article. So what's left is the smooth marketing talk and slick dialogue. This is the fuzzy stuff neural networks are supposed to be good at. DeepL failed here too however. When I consulted a few of my neural network friends, they told me the cause was probably the fact that DeepL was not tuned to game translations, and they recommended me to start training my very own game translation dog. We'll call him Benji.
Installing Benji, my new dogNow installing Benji (or OpenNMT, the actual name of the open-source software from Harvard University that enables you to make your own little DeepL) is everything but plug-and-play. First, scientists are a bit nerdy, and Windows is of course far too common to be classified as nerdy. Real nerds use Unix, and therefore all software for building neural networks is Unix software. First I experimented a bit in virtual Unix machines that can be installed on top of Windows, but being systems on top of other systems these are pretty slow and cannot use the extra capacity of the GPU's (graphic cards) on my top-notch game laptop. Okay, they can via a technology called passthrough, but that technology is still in its infancy, incredibly complicated to configure and not really worth it if you know that you can achieve the same and probably a more reliable result by simply partitioning one of your hard disks and turning your system into a fully-fledged Unix machine by making it dual-boot. Now, I know nothing about Unix, so the first few evenings all I was learning was how to open files, installing applications, navigating folders and all stuff even the most alpha-minded translator can do on Windows in the blink of an eye. Unix is... complicated. Probably a lot better and safer than Windows, but still complicated. There are like five ways to install a simple application, one even more intricate than the other, and especially if you enter the world of neural networks, it's not a matter of clicking a logo in your app store and then pressing OK. I mean, neural networks are the realm of supernerds, and supernerds don't need explanations about... anything, as they know everything already. So, you'll need to verify the make and model of your graphics cards (my system uses a so-called SLI configuration consisting of two graphics cards working in tandem) and install the latest stable drivers for those using several instructions from your command prompt (using a mouse is a dirty concept in Unix environments). To give you an idea about how that works, you'll have to find out, all by yourself, that you need to open something called a terminal and then type the following: sudo apt-get purge nvidia*
sudo add-apt-repository ppa:graphics-drivers
sudo apt-get update
sudo apt-get install nvidia-387
Easy peasy, for nerdsAnd that's just the first step of an incredibly complicated process. You'll be reading documents and asking for help on forums for days at a stretch trying to install matching versions of CUDA (NVIDIA's parallel computing architecture that enables increases in computing performance by harnessing the power of the GPU), cuDNN (CUDA Deep Neural Network library, a GPU-accelerated library of primitives for deep neural networks), Python (a widely used high-level programming language for general-purpose programming), TensorFlow (an open-source software library for machine intelligence) and OpenNMT, an open source initiative for neural machine translation and neural sequence modeling). All versions of these need to be matched, meaning that you can start all over again once it turns out that OpenNMT only plays nicely with TensorFlow 1.2, which only plays nicely with CUDA 6.0 and not CUDA 9.0 and definitely not CUDA 8.0 as advertised, as CUDA 6.0 fancies cuDNN 7.0 and not cuDNN 7.1, because cuDNN 7.0 introduced a bug that can be circumvented only by editing python scripts in Python 2.7 (and not version Python 3.6) using software called gedit which needs to be installed from the command prompt. Of course, as the Unix enthusiasts will tell you, you need to call these scripts from the OpenNMT directory and not the usr/home directory (unless you have edited your PATH variable using gedit ~/.profile from the terminal). Of course, of course! But this applies only if you have Ubuntu 16.04 and not Ubuntu 14.04, and in all other cases the world will explode unless you have a purple desktop background or a Hebrew-sounding name ending with K. Or something like that. By the time you're done you have probably become a fully-fledged programmer and have learned Tagalog and ten other exotic languages while you were at it. (I've made up some version numbers and directory names as I forgot all the stuff I had to do to get this working, but you get the general idea, I hope.) Now this rocket science is all basic stuff for nerds and programmers who can do in ten minutes what took me a whole fricking week, but I think the above is way out of reach for 99.9% of the translator community, unless people really have too much time on their hands like me and find climbing Mount Everest not enough of a challenge. This is where the slick managers step in selling you those flashy Windows-packages for hundreds of euros, that are all based on free software no one else dares install for good reason. Being the gamer that I am, after all the above I expected at least the newest installment of Lord of the Rings – The Mother of All Games (A 700-hour Epic about Love and Lost Romances, Forgotten Wars and Elven Kingdoms) in Ultra-HD resolution, but the above purple screenshot was as exciting as it got.
PreparationYou then need to prepare your corpus, in my case a database containing 399,684 sentences with 3,933,677 English words and 3,969,731 Dutch words, all pertaining to one client and one field (games). Preparing means getting rid of everything that makes your text dirty (capitals, lay-out tags) and tokenizing the text (making sure that all words but also all symbols are separated by spaces). There are probably utilities for this, but I programmed myself a simple PHP Script that solved this issue. In the world of neural networks, the above is a medium-sized data set. I have far more data in my memories, but I chose a client-specific memory containing translations for games released by that client only. Sportsmanlike as I am, I'd use the resulting network only on texts from the very same client (okay, and a few general game texts from another client, just to compare the results). It doesn't get fairer than that. There are multiple platforms Benji can run on. I chose TensorFlow, a library that includes TensorBoard, a tool which visualizes the data and gives you some nice eye-candy while the network is learning and training itself. While the network is training, it will evaluate itself every so many minutes using so-called validation data: an extra and much smaller corpus which it will try to translate using the patterns it has learned, to then compare it with the actual human translation and scoring itself. Benji will continue learning until it's satisfied about its score or until you tell it to stop.
Wax on, wax offNow this training is the stuff of legends. Training Slate with Statistical Machine Translation in my previous article took 4 to 5 hours. Training my own neural network with OpenNMT on the very same system took more than 59 (!) hours (on a virtual system without usage of extra GPU's). During this time, your system is fully occupied slowing down to a crawl, to the extent that even opening up a simple browser window can take up to 10 seconds. Now you may think I use an hopelessly old dusty system, but we're talking about Hayabusa, my 7000 euro game laptop with an i7-6700K processor, 2 GTX980 graphic cards and 64 GB on-board memory. If that sounds like Chinese to you: currently (2017) this is still a bit of a monster and a very high-end laptop. This explains why neural networks are trained on dedicated systems. In normal language: you'll need to buy yourself the fastest and most expensive system you can think of, separately, just to be able to structurally train your own networks. Now this was all done in a virtual Unix on top of Windows. Only after that I switched to the previously mentioned dual-boot Unix system that takes full advantage of my GPU's, but even though I haven't finished training my system on that configuration yet, it's clear that after 8 hours or so the system still has a very long way to go. Activating your GPU's is not going to do miracles, I'm afraid, but it may speed up things a bit.
An inside lookPlaying around with TensorBoard is fun. The tool offers you a direct look into Benji's brain: you can see how words relate to other words while the visualization changes and morphs on the fly and the network is learning new things and improving itself. This, indeed, is a scary sight as it's almost like an organism growing, until it eventually becomes so smart that it can stretch its tentacles out of the computer to strangle its owner and take over the world. Another interesting thing to watch is the so-called loss parameter, which shows how much the network is at a loss trying to understand the data it is analyzing (the word loss actually refers to something else, but this explanation makes it easier to understand). The lower the score, the better the translation results should be. Every so many minutes a preliminary translation model is saved, which you can then immediately use to translate new texts.
The resultsBy comparing the results of the different translation models, you can see the network gradually improving itself in a very tangible way, but as my data below shows, even the final result is far from usable. It does save you some typing work though. In fact, even though it took the computer ten times as long to train itself compared to Slate (basically a flashy Windows interface for the Unix-based Moses, see my previous article on statistical MT), the results were far worse, and it's interesting and reassuring at the same time to see that no matter how incredibly sophisticated the technology behind neural networks is, it is still easily beaten by something as simple and stupid as Dumb Assembly (again see my previous article).
Ice ice babyYes, DeepL, after being fed with billions of sentences and harnessing the power of a computer in Iceland so big it needs to be cooled with glaciers, does a pretty impressive job when it comes to rigid texts like newspaper articles and scientific papers. However, it is absolutely unfit for legal texts, as neutral networks will sometimes skip or misinterpret words on purpose to make the result sound more natural (which is exactly what you do not want in a legal translation). It is also unfit for texts that are too robotic, like tables in technical user manuals, for exactly the same reason. And it is also unfit for smooth marketing texts and slick dialogues, that are so fuzzy that you'd need to feed the whole planet to the system to make sense of all the data, which is exactly the problem as material in this field is so scarce. Also, how much more data does a system need until it has seen about every possible sentence in the world? Hasn't a system like that already been implemented for years in a much more simple form? Exact and fuzzy matches, anyone? Neural networks seem like an extremely complicated way to generate answers to questions you have already answered yourself, if an exact match is all you need.
Neural networks seem like an extremely complicated way to generate answers to questions you have already answered yourself, if an exact match is all you need.
Incremental updatesIs it fun? Oh boy is it fun! This is the stuff we dreamed of when we were children: our own cute little pet that tries to be intelligent, fumbling and stumbling around and making us roar with laughter. Benji can save you typing work, yes, but we've already got other tools for that like Nuance Dragon Speech and Dumb Assembly. Also, in its current state you can't feed new incremental data to the system. So for example, if you receive a new batch from your client introducing a new concept (in my case a new game), the network will perform even worse than usual, as new concepts introduce new words and patterns the network hasn't seen yet. You'd think you can easily solve this by feeding the network a few extra words, but that won't work: as in a neural network, everything is linked to everything, the only way to expand on it is to retrain the entire network. And that took... oh yeah, 59 hours. 59 hours to learn one fricking new word. There are new concepts being introduced, like ModernMT, that do support incremental updates, but this technology is still in its infancy. Even the makers of OpenNMT called it new and very experimental, and no one knows the outcome yet. Fact is that to be actually able to use neural networks in our workflow, it needs to be completely integrated in tools like memoQ or Trados, and we'd need to be able to constantly feed it with new data, like we're feeding our current memories with every single sentence we add to them. Feedback to and from the neural network needs to be on the fly, and with current training times of medium-sized client-specific and domain-specific networks on different operating systems of more than 59 hours instead of the 0.1 second we need, we still have a very long way to go.
ConclusionCurrently our enemy turns out not to be our enemy, but a helping friend who is being framed by slick sales managers. Sales managers that abuse our lack of knowledge about certain subjects so that they can earn more while doing less. Next time when they tell you to give them a discount because they were kind enough to pretranslate their texts with their machines (I'm not talking exact matches here!), point them to this article. DeepL is currently for free, and that is exactly the amount you should invest in discounts on pretranslations done by machines. Why would you ever pay a project manager for pressing a button you can press yourself? Now if client-specific and domain-specific data sets generated far better results the story may have been different, but at least for game translations (and all other translations requiring the slightest amount of creativity) this is absolutely not the case. It's not like deep learning lost from humans by a fraction. Deep learning was crushed, annihilated and utterly destroyed by a mere mortal like me. Apparently translation is a totally different beast than games like chess. Don't get me wrong. I love Benji and have found a new friend to play with. We love each other, and I have tremendous respect for all the nerds that helped me during my journey through this system. But giving paws and playing dead is all Benji will do for the next few years, and I expect many years to come. Loek van Kooten
Your English/Japanese-Dutch game translator