Translation can be really unsexy at times. Sure, it can be good for a viral video that pokes fun at the Germans. (Wat are you zinking about?) And, yeah, occasionally some diplomat really flubs it, resulting in a whole lot of angry Russians. (We're looking at you, Hilary.) But, well, mostly translation is about processing a whole lot of data through tools so advanced even Gene Rodenberry's Star Trek writers couldn't imagine it.
So when you're charged with getting customers excited about words, it's at least cool to gather them around some pretty visualizations of the data. Ooh. Ahh. Nice.
Curious myself, I asked the folks in our Production Group to give me a sense of the numbers. With clients that include Microsoft there's no doubt we're processing (and gratefully getting paid for) a lot of words each year. The answer: 371,162,387 words translated into 122 languages.
I'm a talkative guy. Really talkative. If, as Scientific American reports, the average male speaks 16,000 words per day I'm probably doing twice that. At my pace, it would take my mouth more than 30 years to push out that kind of volume.
Look A Little Closer
I asked my buddies at the Process and Technology Group (PTG) to take this deeper — to look at the body of source data and to do some word frequency analysis. For the test, here's what they did:
PTG took translation memories with approximately 24 million English words, of which 206,000 were unique (non-repetitive). They divided the total number into Microsoft projects and non-Microsoft projects. They then broke those down into most frequent, least frequent, and — for kicks — the longest words.
At The Top
Perhaps no surprise for Microsoft data, click appeared most frequently and far more so than among non-Microsoft projects.
At The Bottom
Of the 206,000 unique words, there were of course many that appeared just once in our projects. The results were humorous ... or telling. We'll leave those assessments to you.
The Longest Words
We were looking at English words, of course. Were this an analysis of German words, we'd have to wrestle with the likes of Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz — 63 letters and, curiously enough, just lost from the German lexicon earlier this month because its reference, a law, was repealed. (Farewell, friend!) At just 22 letters, magnetoencephalography is a linguistic bargain.
Industry colleagues, you're welcome to use the comments for a little size comparison. I've shown you mine; how about you show me yours?