Never Send a Human to Do a Machine's Job.

Andrei Marks · September 13, 2007

I used the Google translator function for kicks yesterday, this is the little link that Google puts next to search results that aren’t English language websites, which automatically translates the page for you using whatever machine translation algorithm Google uses.

So yesterday there was a hit that had an excerpt from a novel, the first paragraph of which I translate as such:
It’s said that Lu Bu went from being a hired hand to being the boss, quickly recovering the status of his past. But not two days after these good times came about, Lu Bu felt an inexplicable agitation arise. Even though he was already in complete control of the Xuzhou company, he always felt as if there was something amiss. His mind inwardly strange, he said to Diao Chan, “How can a business and its owner not have the same affection as that between a man and his first wife?”

Also of note, Lu Bu and Diao Chan are, respectively, the names of a general and his wife from the Romance of the Three Kingdoms, one of the most famous novels in China. I think this story might have been placing the characters in a modern setting, I didn’t read any of it aside from what I’ve translated.

That being said, here’s the Google translation:
Wage earners from the house saying to the boss, quickly restored the scenery of the past. But no two days days, the house was a baffling irritable up. Although he has full control of the company in Xuzhou, but always feel that what wrong. His heart secretly strange, Chan said: “Is this the boss and between enterprises, but also pay attention to the former black fragmentation?”

I love reading this stuff. I bet you if someone would just bother to look they’d find a goldmine of little Shakespearean gems justs sitting there ready for use. “It was a baffling irritable up.” That’s my favorite line from that one.

An interesting feature of Google’s system is that when you mouse-over a sentence, a bubble pops up showing the text in the original translation, and you can offer a better translation. I think Google might be working with a statistical method of translation here. The last time I studied abroad in China I met a postgraduate from Harvard (or MIT? I can’t remember) who’s specialty was linguistics and machine translation. He was American, and was just in town for a conference being held at Beijing University. But he was telling me about an MT project he was working on, and the philosophy behind it.

You can think of language as having a set of rules, a syntax, that a speaker must conform to in order to make sense. The syntax itself is meaning neutral, it’s just the grammar rules, word classes, the relation between them, etc. The words, largely, carry the meaning. So there’s the idea that one can take an English sentence, tag all the parts of the sentence with their proper grammatical function and meaning, then strip the English pronunciations off, rearrange what remains into the appropriate syntax of say, Chinese, and then slap the Chinese pronunciations onto the corresponding semantic tags. And you’ve got a Chinese sentence!

In reality, there are a lot of fundamental problems with this method, and so people who work on machine translation have been developing other translation systems. What the postgrad told me was that he was working on a statistical translation model, which largely relies on a huge database of English-Chinese equivalent sentences/phrases. The input sentence is entered, and the database is searched for the corresponding data; if it doesn’t find an exact match it makes a guess about how the sentence should come out based on statistical similarities. So it wouldn’t be as streamlined as the aforementioned system, which would just need the syntax and a dictionary, but it would arguably be more accurate. Or maybe have a greater probability of being accurate.

I don’t know how far this method has been developed since then. It’s been 2 years, and a lot can happen in that short a time, especially in the world of computing. But perhaps Google is now using user input to help build up its own language database. And seeing how poor the above translation was, I wonder if there aren’t roving bands of ‘translators-by-day, linguistic-e-saboteurs-by night’! I was tempted, but then I just let it be. What a folly honor is!

So it’s still Man-1:Machine-0. For now…

Twitter, Facebook