As many of you know, I’m very sceptical when it comes to machine translation. However, I’ve always been a huge fan of gadgets and software, and if something can be done quicker by automating stuff, I will do so. I'm not a luddite: it's in my very own personal interest to do this, as it can raise my productivity. On the one hand, I don’t believe the lies that are sold to us by marketing managers who often have no idea what their own product is doing, but on the other hand, I’m realistic enough to know that technology does not stand still and that we need to use it whenever it can help us to advance. So even though the results of my experiments with machine translations were very disappointing until now, I had not given up yet and kept on looking. The main issues I encountered were as follows:
if you are familiar with Linux-based systems (I’m afraid this rules out most freelance translators already), the real challenge is to get it to work with the CAT tool you’re using. Trados offers a solution for ModernMT via a plug-in called Memory.net, but this is once more a set-up that locks you into the cloud, and which uses one of those baseline memories that are really of no use for game translation. As soon as a gamer is addressed politely, as that is what such baseline memory has “learned” during its training, I’m forced to delete the entire suggestion as gamers are never addressed politely in Dutch, unless they are reading an End User License Agreement or similar legal document.
other code (often a black box) works. Also, once you’re locked in, you will totally depend on said code, and your plug-in may stop working as soon as your favorite CAT tool is updated, while being dependant is exactly what you don’t want as a privateering freelancer. So writing a plug-in was maybe impossible, but writing an entirely new CAT tool was an option, no matter how daunting the task.
Just when I was about to start programming, disaster struck: the boot drive of Hayabusa, my 7,000 euro main laptop featured in previous articles, decided to fail, just like that. During my frantic attempts to check whether the drive was still connected, I pulled the keyboard a bit too enthusiastically, permanently damaging the mother board. Yes, incredibly stupid, but it happened, and once it happened, I had no choice but to send said laptop back for repairs. As this was also my main production system and Hayabusa would be gone for the next four weeks or so, I was forced to immediately buy a new production system. Because my purse had bottomed out already, I settled for a 4,000 euro desktop (cheaper, but better at the same time), as I didn’t see myself traveling as much as before in the next few years.
Thanks to the tremendous efforts of Dragon Computers (who also delivered Doraemon), I got their top-of-the-line system the very next morning, so that I could start translating again right away. I called it Nozomi, Japanese, for hope, because hope was exactly what we needed in this situation.
The next few days were chaotic. I had to install all tools and data needed (thank God for back-ups: I always have two, one in my SOHO, and one in my actual office in an incubation center near) while still managing to meet all deadlines (and 20-30 deadlines per day are pretty common in the game translation industry).
Meanwhile Doraemon had been degraded to a second-class system. Even though the results of its neural network were very promising, I could only painstakingly copy and paste the lines one by one, because there was no direct interface between memoQ and ModernMT. Any time saved by machine translation would be undone right away, by the time that was needed to copy and paste all the lines to and fro.
Once the dust had settled a bit and Nozomi was in full production mode, I could finally start realizing my dream: building a direct connection between ModernMT on Doraemon and my very own CAT tool, as the memoQ plug-in route had become a dead end.
I never thought it would work: it really was a desperate attempt to get some return on investment on the 2,000 euro and months of research we invested in Doraemon. But after one night, things went a lot smoother than I thought and I had a basic tool that could extract text from TTX/XLF/SDLXLIFF/MQXLIFF files, display them for translation and pull suggestions from Doraemon. Later this was extended: I installed XAMPP on Nozomi to turn it into an Apache/MySQL server so that translations could be saved in a translation memory. If I could pull a >90% match from said memory, the CAT tool would display that, and if not, the CAT tool would display Doraemon’s MT suggestion. This would give me the best of both worlds: very reliable matches from my own memory, and quite unreliable but still useful matches from Doraemon, that even in a worst-case scenario saved me a lot of typing work.
You see, the Dumb Assembly method used until now (see previous articles) was pretty great, but had one huge disadvantage: it was very good at literal word-to-word translation, but horribly failed when words had to be switched around. To give you one example: “shoot with a gun” is “schieten met een pistool” in Dutch and has exactly the same word order, but “drop a gun” becomes “een pistool laten vallen”, which has a totally different word order (and different number of words). This is where neural networks really shine: they can at least put words in more or less the correct order, and as long as the suggestions are somewhat reliable and don’t contain any unknowns, they can actually save the translator time.
After four days of constant debugging and some total disasters during production (files that failed to export correctly, rows that were skipped, rows not being added to memory, communication failures with both the memory database and Doraemon), slowly but steadily a system evolved that was actually becoming really, really useful.
So, what are the current features of my own CAT tool?
costs time, a lot more time in fact. The same applies to English-Korean, another pair we often handle. English-Chinese may be an interesting candidate though, as Chinese grammar is basically non-existent.
According to Davide Caroselli, the big man behind ModernMT, you need around 1 billion source words of parallel text to obtain good results with these languages. If your data is very "in-domain", a tenth of that may do, but these are quantities no translator will ever be able to translate in his entire life. It may be an interesting option for really large LSP's though (which is exactly why us translators should not easily hand over the data we so painstakingly built), and even then, this will never solve the issue with politeness registers you encounter in these languages (basically a 80-year old grandpa will speak a language that is grammatically totally different from the language of a 16-year old girl, which is unique for both Korean and Japanese, the latter of which even features male and female language, resulting in something like 18 different registers).
Things also depend a lot on your line of work. In my case, I was able to feed a 30-million word corpus containing translations of myself alone. This means that all translations more or less use the same style, and since the vast majority of these translations pertains to games, also the same colloquial register. Compare that to Japanese with its 18 registers, and a similar memory with just as many words will be totally ineffective: while translating a line uttered by a 90-year old grandpa, you could get neural suggestions containing a mix of 3-year old girl speak and 40-year old aristocrat speak. That’s not gonna work.
Also, of course you get the best results in lines that require least creativity: the boring lines so to speak, containing general descriptions and not-so-Shakespearian marketing texts. Dialogues with flavor will always be human territory, and the neural network's suggestions are absolutely useless there. But standard dialogues, the basic stuff you can find in any cheap Android game: yes, adaptive machine really shines there, as long as it's being constantly steered by a human being.
Anyway, you probably want to see numbers. The best way to analyze the performance of a neural network, I think, is Levenshtein: we count the number of operations (changes/removals/insertions of characters) needed to turn Doraemon’s suggestion into the translation as I, as a human being, would deliver it. For standard engines like Google/DeepL this number hovers around 22 edits per sentence in my field. If you combine this with the fact that suggestions from neural networks are unreliable by definition (they tend to sound incredibly natural even if they contain grave errors, like mixed up numbers or positives that should actually be negatives), this basically does not pass the magical threshold needed to make the suggestions useful and time-saving. I can't prove this: the only way to find out is trying it out yourself in practice, as everybody has different default working speeds, working methods and working environments.
A huge cause of this may be the fact that the actual engines are based on EU texts and other corpora (to which neither Google or DeepL hold any copyrights, so as soon as some lawyer smells money, these companies may get into huge trouble). Now very recently Google has enabled users to upload their own memories, but I’m not sure whether these will be combined with Google’s baseline memory and what will happen with the contents of these memories; clients definitely won’t like it when the translations they ordered from you end up in Silicon Valley. One thing is clear though: no matter what, this feature will by no means be free.
The same Levenshtein number for my local ModernMT hovers around 11 operations per sentence (including very long sentences), and thanks to the fact that ModernMT is adaptive and doesn’t need to be retrained to actually learn from my corrections, its accuracy raises about 2% per 10,000 translated words (from 67% to 69%). Of course this number will flatten after a while, but the results are promising to say the least: this is basically a somewhat acceptable fuzzy match, where normally no fuzzy match would have been shown.
Interesting is the fact that very soon there will be a new version of ModernMT which runs on steroids, or better said: Tensorflow from Google. This is a special and freely available software library for neural networks resulting in even more fuzzy logic (read: smoother translations). I’ve already tested it (addendum: it's now in full production mode), and noticed a quality increase of several per cents. Installing Tensorflow on Ubuntu is quite a daunting task though, as it has lots of dependencies that need to have exactly the right version to be able to work together.
Despite the above, results greatly vary: when trying the very same strategy on a 5-million word German-Dutch corpus from my dad, who also happens to be a translator, the Levenshtein hovered around 20. In the end my dad decided not to invest the money needed for a neural network, as he found too little added value in it. So yes, it’s a gamble, and I wouldn’t immediately spend 2000 euro on a new system; instead, I would experiment first in virtual machines (be ready for training times of weeks, as these cannot leverage the power of your GPU) or a second old system which hopefully does feature something resembling a GPU that is not too dusty.
So, what do you need to make this really work?
want to release it.
You’ve read how much I suffered; I think this set-up, in which I invested a huge amount of time and money, is giving me a competitive advantage and it would be foolish to just give that away (not only to freelancers, but also to the big corporations that are undoubtedly interested in opting for a free ride). However, I do hope that this article will make you more sceptical about the parties in this industry: there’s no need to give discounts on machine translations generated by clients if you can generate these translations yourself with much higher quality, and there may be no need to pay thousands of euros for CAT tools if the basic functionality can be programmed within 4 days by – mind you – a total amateur like me (I never got any formal programmer’s education, I can only write spaghetti code and often I don’t even understand why my software works).
In fact, I suspect that most suppliers of software – any software – in the market can hardly program themselves. They devise a strategy for the next year and have said functionality implemented by a team of programmers they hire in India. Only that would explain the total lack of control they seem to have over their own software, and only that would explain the sluggishness with which they respond to requests, if ever.
I’ve also lurked in many machine translation user groups, and what I commonly see is that these guys often have no idea what the quality of their own translation engines is. They experiment with languages they don’t even speak themselves and compare the results by using something that is similar to back-translation, without even knowing whether the suggestions are actually useful (very often, numbers don’t say everything). Without a capable translator at the helm, these systems are absolutely useless: as long as us translators stay in control – for example by feeding local systems only and not uploading our translations to the cloud – we have nothing to fear.
I owe a huge thanks to Davide Caroselli, who has always been very nice and patient with me (and yeah, he's a real programmer, who is directly responsible for the code in his product). He could have sold me a 20,000 euro license for an installation on premise. But he didn’t. He went out of his way to help a user of his free software, no matter how noobish the questions I asked him. Thank you, Davide. I am much obliged to you.
Be lean, be mean. Be like the indie game developers in our industry. They know what they’re doing, they don’t outsource and they have full control over their own assets. This makes them very powerful.
So, without further ado, the final demonstration in the video below. Enjoy!
Loek van Kooten
Your English/Japanese-Dutch game translator
- The huge training time needed to build a neural network (several days), combined with the fact that the network had to be retrained every so many weeks to keep it up-to-date. What I really needed was a network that could learn on the fly, especially because games tend to have new features added every so many days, which then become the talk of the week (and that week only). Previous translations help only so much with that, so you really need a system that picks up on “newspeak” pretty fast (within several minutes) so that you can actually leverage it when it still counts.
- The enormous amount of tags in standard memory exports from tools like memoQ and Trados. They confuse neural networks and make the eventual output a lot worse. (In the end I wrote my own PHP utility that get rids of all tags, though any software that can do advanced search and replace using RegEx should be able to do the job.)
- The fact that to really enjoy the benefits of a neural network, it needed to be put on a separate (Ubuntu) system, as it consumes many resources and really demands at least one dedicated GPU.
- The fact that none of the CAT tools available (memoQ, Trados) enabled me to directly tap into the neural network. They only support neural networks in the clouds, which are either based on huge baseline memories from subjects that have nothing to do with games (like stuff from the European Union). And even if they enable you to upload your own memories, these are either mixed with said baseline memory, or you are obliged to use another CAT tool (MateCAT, Lilt) that locks you in and makes you pay for every so many words translated using said system.
- The fact that the output of MT until now, though usable, was unreliable, so that every single sentence, no matter how naturally it sounded, had to be checked word by word and very carefully to make sure that the offered translations contained no pitfalls.
- The fact that all solutions offered work in the cloud only, forcing the translator to upload confidential translations to the internet.
- The fact that none of the cloud solutions available is free and will therefore cost a lot of money in the long run.
Hope of deliveranceI told the developers of memoQ about ModernMT, and they were very enthusiastic. So enthusiastic even, that they promised to work on a new plug-in for their software that would enable on-premise use of ModernMT. This made me very happy of course; so happy even, that I immediately invested 2,000 euro in Doraemon, our dedicated Ubuntu-based neural network to be (Doraemon is a robot resembling a CAT and a famous Japanese manga character). A rushed decision maybe, and imagine my disappointment when two weeks later, I was told that writing a plug-in for memoQ seemed so hard, that even memoQ’s own developers gave up on it. I really thank them for trying though. So there I was, sitting besides a totally useless 2,000 euro investment. Now, there are manuals describing how to write your own plug-ins for memoQ, but if even memoQ’s own developers failed, how would I ever be able to come up with something useful myself? Some very good friends of mine work at Open Systems Development. These guys eat C+ for breakfast and offered to solve my problem for free. Of course I wanted to pay for it, but they told me that I should decide the value of their solution myself. Though I really appreciated the offer, this was something I didn't want to burden them with, so in the end I politely refused. Still, I appreciate their offer so much, that I will just mention them for the sake of it. If you ever need something coded, contact these guys. Their website looks terrible, but that is because they're too busy coding stuff. They're good. They're really really good. And then it dawned upon me. Maybe the plug-in wasn’t the problem. Maybe CAT tools were the problem: the hardest part of writing code that needs to work with other code is finding out how the
- Display exact and fuzzy matches from a human translation memory (using MySQL)
- Indicating differences between the current source and the displayed fuzzy match, using the excellent Diff class written by Kate Morley
- Display suggestions from a directly linked neural network if there are no good human matches (using ModernMT’s REST interface)
- Process tags (hiding information in them, and re-inserting said informaton during export)
- Combine tags (avoid multiple tags in a row by combining them in one "supertag")
- Insert tags at exactly the right places in ModernMT's suggestions
- Check tags (refusing to go to the next row if not all tags are present in the translation)
- Add and display terminology (using MySQL)
- Autopropagate sentences that are exactly the same
- Auto Assembly ("Dumb Assembly")
- Insert terms using shortcuts
- Live display of used characters in current line of target
- Live display of number of translated words in current file
- Copy source to target
- Direct support for TTX/XLF/SDLXLIFF/MQXLIFF/SRT/TXT (other formats can be imported/exported to these formats via existing CAT tools, so there’s no urgent need to invest any more time in that yet).
- Locked rows (SRT format only)
- Automated address transcription for Japanese addresses
- Automated date conversion for Japanese dates in imperial format
- A dedicated Ubuntu system with at least one fast GPU
- Knowledge about Linux: enough to install ModernMT and get it running
- Your own CAT tool, as no existing tool supports local neural networks at this moment
- A production system that can run Apache/MySQL (XAMPP) in Windows
- A 30-million word memory of high quality, preferably pertaining to your favourite subject only
- A pair of languages that are closely related to each other
- A good translator to immediately recognize the right strategy to use for each sentence (using a match from the human translation memory, correcting the output from the neural network or coming up with something new from scratch) and to constantly feed both the human memory and the neural network with human input
Your English/Japanese-Dutch game translator