The Project from Hell
Learn from the most challenging project I have ever undertaken and avoid these common pitfalls.
This is an insightful case study for both developers and localizers. In this project—the toughest I've ever tackled—every localization rule was broken, leading to a subpar product. This story reveals two key lessons: first, how *not* to organize your project. Second, it proves that even in the worst-case scenario, I can still work miracles with the tools available.
John and Bill Go Fishing
Obviously, I can't disclose the real names of the products or people involved, so let's use "John and Bill Go Fishing" as the project title. It was a highly anticipated release by a well-known publisher on a famous console. But first, let’s dive into some background.
Computer-Assisted Translation
There are two main types of tools for translation: machine translation tools and computer-assisted translation (CAT) tools. Machine translation works on a word-by-word basis and applies grammar rules to generate translations. However, computers often fail at this because language is filled with "fuzzy logic." This is why experienced translators dismiss these tools outright (try translating a long sentence to French and back to English using Google Translate, and you'll see why). [EDIT: Note that this was written before the advent of neural networks]
CAT tools, on the other hand, have become a near necessity. They store all the strings you've translated and provide suggestions when similar strings appear. For instance, a CAT tool might say:
"I see you're translating 'John is fishing.' On May 5th, 2009, at 21:38, you translated 'Bill is fishing' as 'Bill is aan het vissen.' Shall I replace 'Bill' with 'John' and suggest 'John is aan het vissen'?"
For this to work effectively, the string in question must be complete. The less complete it is, the less reliable the suggestions become. This is because parts of a sentence often depend on each other: changing one word might require other parts of the sentence to change as well. For example, if 'Bill goes fishing' becomes 'We goes fishing,' you'll need to change 'goes' to 'go' too.
For the CAT tool to provide intelligent suggestions, it's crucial that 'Bill' and 'goes' are in the same string. Here's why:
|
|
As you can see, Scenario 2 results in fewer false positives (in reality, it results in far fewer). CAT tools may seem like they only save a little time in these examples, but when you consider that games can easily contain 50,000 to 250,000 words, you realize the potential for efficiency, time-saving, and cost-saving.
For this to work, all strings must be complete. A second key rule: strings should be clean—code should be separated from text. Developers may recall the Model-View-Controller (MVC) model, which separates data, interface, and logic. It’s similar in translation: text is the model, and tags that don't need translating are the controller. They should be kept apart. Here's why:
|
|
If you use a common format, like HTML, CAT tools can automatically filter out tags, retain their positions, and only process the text between them. This means the "dirty" HTML is treated as "clean" text, allowing you to leverage repetitions efficiently. However, the more obscure your format, the less likely standard filters will exist. You can program custom filters for CAT tools, but that takes time—and money.
The Format from Hell
Now, let’s get back to the story. The game was developed in Japan, using a development logic that is common there (and, unfortunately, it’s not the best logic). The Japanese language uses no spaces, capitals, or lowercase letters, making it tough for computers to identify word boundaries. As a result, Japanese game text isn’t wrapped automatically to avoid breaking words in the middle. Their "brilliant" solution? They wrap every line of text manually.
Yes, you read that right. If a game contains 250,000 words, and the average string (sentence) has 7 words, they manually insert 35,714 line breaks to ensure proper word wrapping. Since the text is already manually wrapped, dialogue sizes are fixed, setting strict limits on characters per line and the number of lines per dialogue.
Now imagine[enter]
happens if this[enter]
small dialogue[enter]
needs editing[enter]
because someone[enter]
forgot to add the[enter]
word 'what' after[enter]
'imagine'.
Right. Insert the word "what," and every line needs to be manually rewrapped. The first line now exceeds the maximum length, causing a chain reaction where subsequent lines also need adjustments. Developers often make numerous text changes during development, which means repeating this process over and over.
It gets worse. This "logic" carries over to translations in other languages. Even though languages like Dutch use spaces, allowing for automatic word wrapping, Japanese developers don’t grasp this. Their games can't process soft-wrapped translations, so they expect the translated text to be manually wrapped as well. And, because dialogue sizes are fixed, translations must adhere to specific line lengths and character counts per line.
I’ve tried explaining to Japanese developers how unsuitable this process is for Western languages, but they just won't listen. So, here's hoping someone with more influence can get through to them.
Look at the "Now imagine" text above. Are these complete strings? Definitely not. This makes it impossible for CAT tools to function effectively, which means wasted time and money.
The format we were working with was an Excel file packed with numerous tabs, each containing countless cells. Each cell held a manually wrapped dialogue with hard line breaks, along with two numbers indicating the maximum characters per line and the total number of lines allowed for each cell.
The Solution
Thankfully, I have some programming experience. I realized that if the Japanese developers wouldn’t fix the problem, I had to find a workaround. So, I created a PHP program that retained three key pieces of information for each string: the maximum characters per line, the maximum number of lines, and the string itself. It then stripped out all the hard line breaks, turning the text into something like:
Now imagine what happens if this small dialogue needs editing because someone forgot to add the word 'what' after 'imagine'.
This clean text was exported to an XML file, which I imported into my CAT tool. I made sure that the character/line information was hidden (moved to separate XML tags), so the CAT tool processed only the plain text. The XML looked something like this:
[donottranslate]34*8[donottranslate]
[translate]Now imagine what happens if this small dialogue needs editing because someone forgot to
add the word 'what' after 'imagine'.[/translate]
I instructed the CAT tool to import only the text within the translate tags. After translating, my software analyzed the XML and rewrapped each string according to the specified character and line limits. If the Dutch translation was too long, the software flagged it.
In short, I wrote software to trick my CAT tool into processing hardcoded strings as if they were softcoded, allowing me to reuse translations efficiently even when line breaks differed in previous versions.
Then Things Got Worse
To complicate matters, the strings were also color-coded. The developers defined dozens of keywords (e.g., movements, item names, character names) and decided that each needed a unique color in the text, which was provided in Excel.
1. You need to know Visual Basic to extract color information from Excel into XML (which I don't). 2. Adding color tags to the text complicates the wrapping process since tags don’t count towards character limits. 3. Using tags in the text makes it "dirty," reducing translation leverage.
So, I created a keyword list and applied colors after exporting the translation back to Excel. I found a Visual Basic search-and-replace script online to automate 90% of this process, handling the remaining 10% manually.
At this point, I finally had a way to process these files (without charging extra, mind you) and could start the translation.
Too Much Leniency
Until my client, which in this case was a translation agency sitting between me and the end client, told me that part of the project would go to someone else. The reason? No, they didn’t hate me. No, they didn’t have any problems with my style. No, I had always delivered on time.
So... what was it?
“They had actually promised another translator that he would get part of the project too, and they felt sorry for him.”
So much for separating business from private. Even though I was perfectly able to meet the client's deadlines, the agency insisted that part of the project go to someone else. This not only ensured that the game would be translated by two different translators with different styles, but also made it impossible to match terminology, as the other translator didn't have their own format-from-hell filter and couldn't use CAT tools at all.
This translator had to revert to the old manual method of looking up every potential term (anything resembling an item name, weapon name, character name, etc.) from the translations I had done so far. If you consider that a 25-page manual can contain up to 2,500 unique terms, you’ll understand how slow and error-prone this approach is. With a CAT tool, I could add terminology like item names to a database on the fly, suggesting consistent translations automatically.
Of course, I wasn’t about to give my colleague—who had now become a competitor—my proprietary software. Not only would he not be able to operate or run it (it requires a PHP server and on-the-fly debugging), but giving away my tools would mean competing against myself. While translating games is my passion, this is still a business.
The wrapping software in action |
It Takes Two to Tango
Once I realized that I wasn’t the only one working on this project, and therefore, no longer had full control over the quality, I decided not to invest any extra unpaid time. I had been putting other clients on hold, refusing other projects, and sacrificing weekends to accommodate unexpected batches. But that time was over. The circumstances forced me to treat this project like any other. I could no longer guarantee delivery dates for texts with unknown hand-off dates. If the client wanted guaranteed turnaround, they needed to book me and reserve my time.
However, the client wanted guaranteed delivery dates without guaranteeing hand-off dates. Since it takes two to tango, I politely refused. The client could only outsource text to me if I had no other projects at that moment, which was impossible to predict.
Meanwhile, I wasn’t the only one losing interest. The other translator also seemed to have moved on—likely for similar reasons (or maybe because translating and manually rewrapping text for the standard rate isn’t exactly profitable). Soon, the project was being handled by dozens of translators, each using their own terminology and style. None of them could use CAT tools due to the "format-from-hell," resulting in six different translations for some in-game items. The project spiraled out of control.
Part of the wrapper code, which has more than 1,000 lines and is pretty complex. |
More Chaos
Complaints started coming in from the console manufacturer (most in-game translations are reviewed by the console manufacturer before they can actually be published): terms did not adhere to their glossaries, the translation was a mix of Dutch and Flemish (apparently the agency had used translators from Belgium too), terms were inconsistent, the language was inconsistent, and the style was inconsistent.
Additionally, the translations contained numerous errors. Since most translators working on the project—except me—were manually wrapping the lines, dozens of spelling and grammar errors crept in. Many lines didn’t even meet the line and length restrictions, something that is easy to manage with software like mine.
To make matters worse, no one knew who had translated what, how much had been completed, or how much still needed to be done. Finally, the developer (the end client) had added new strings and deleted old ones in the same version of the Excel file being translated, so by the time the translations were ready, no one knew where to place them in the updated file.
The developer had tried (in vain) to color-code all cells in Excel: using different background colors for cells where the Japanese source text had been updated (but not the English version), cells for which both the Japanese and English texts had been updated, cells currently being translated into Dutch that would need further updates, and others where the Japanese team had pending questions. By now, the chaos was complete.
There’s a reason why version control is essential. CAT tools can handle it, tracking who translated what, what’s new, and what’s outdated. However, to leverage this functionality, you need to use these CAT tools—something that had become impossible due to the format and disorganized project management.
Just when we thought it couldn't get worse, a new complication arose: in addition to the console manufacturer (let’s say Sony) defining platform terminology and the developer (let’s say Konami) defining brand terminology, there was also a licensor (say Warner Brothers) defining the license terminology. Suddenly, halfway through the project, they all began issuing their own instructions on which terms to use, creating conflicts. This probably explained at least half of the colors in the Excel cells, adding to the chaos.
The Voice of Reason
My client (the translation agency) realized things couldn’t continue like this and had a chat with the end client (the developer). They admitted that splitting the project among multiple translators wasn’t their best idea, and the end client acknowledged it was unfair to expect a translator to reserve time for batches that might never arrive. I, on the other hand, realized that if they were willing to compromise, I should be more flexible too.
The developer gave clear instructions on which hierarchy to follow: first the licensor, then the console manufacturer, and finally the developer. While this helped, we still had to deal with some odd translations due to the licensor’s shaky grasp of Dutch.
Now, the client kept me informed about how many words would be arriving when and ensured all work related to this project came to me exclusively. The project became manageable again. However, the damage had been done, and I realized that no matter how hard we tried, we’d never be able to deliver a perfect product. The final version would remain a patchwork of different translators' styles.
Conclusion
As I'm writing this, the project is still not finished. The majority of the work has been completed, but new batches keep coming in. For now, however, it seems the project has stabilized.
The game will likely be published near the end of this year. Whether the Dutch translation will be as good as it could have been remains to be seen. I’ve given it my best effort, but I could have done more if the project had been organized differently (or if the client’s budget hadn’t been so tight, so that I wouldn’t have been forced to divert my attention to other projects). What surprises me most is that this is a major title from big names in the industry—you’d think they’d know how to streamline their localization. Apparently not.
So what can we learn? Quite a few things, I guess:
- Make sure your strings are complete and clean. Separate text from codes as much as possible.
- Convince Japanese developers to auto-wrap texts in languages with spaces.
- Have your game color/tag keywords on the fly instead of coloring/tagging them manually. If you must do the latter, use actual tags instead of invisible color codes in Excel that can only be extracted with Visual Basic.
- Use as few translators as possible (1 translator is best).
- Do not use Flemish translators for the Dutch market. Do not, I repeat, do not believe them if they claim they can write Dutch suitable for the Dutch market. That would be a first. The difference between Dutch and Flemish is much greater than, for example, the difference between American and British English. Let the Flemish handle Flemish and the Dutch handle Dutch.
- Define responsibilities, ensure instructions don’t conflict, and specify who has the final say.
- Define terminology before the translation starts, not while it is underway.
- Have terminology defined by a linguist, not by someone who thinks they’re a linguist, regardless of their position in the chain of command.
- Don’t expect translators to reserve time for batches that may never arrive. Be ready to compensate if you can’t guarantee the hand-off date but still want the translator to guarantee the delivery date, or be less strict with your deadlines.
- Realize that proofreaders at console manufacturers are relatively inexperienced and sometimes merely try to justify their jobs.
- If you must work with length restrictions, at least define them fairly. Don't rely on automated algorithms. Remember: the less space you reserve for the translation, the worse it will be. Most languages are not as compact as English, so reserve at least 50% extra space for sentences and 400% extra space (or more) for single words.
- It is not cool to use dozens of tabs in your Excel file for "easy navigation," unless you have a sadistic nature.
- Consider hiring a freelancer directly instead of using an agency as a middleman. It gives you far more control over decision-making.
- Try to have the translation done directly from the source language. Avoid translations of translations. Ever played the whisper game with 30 colleagues? You’ll realize that every translation deviates a bit more from the original.
- Be realistic about deadlines and pricing. Don’t expect translators to work for free, nor should you expect them to be wizards who can translate 20,000 words a day.
- If your project is interesting and you grant me exclusivity, I’m definitely willing to go the extra mile and sacrifice a few weekends.
- Save money and time by implementing an efficient workflow from the start, rather than compromising quality to cut costs later. Well begun is half done! I’m convinced half of the client’s budget evaporated due to the impossible format of the source text.
- Microsoft Project is great, but sticking to rigid deadlines for batches, knowing changes will occur anyway, makes little sense. Project milestones are a means, not an end!
- Listen to your translator. Sometimes they might have good suggestions to streamline your localization process.
Good luck!
Loek van KootenYour English/Japanese-Dutch game translator