A Language of All Languages?

By: Ashiq Khondker

On: September 13, 2010

Comments »

About Ashiq Khondker
M.A - New Media, Universiteit van Amsterdam; B.S - Liberal Arts, The New School, NY. I forrestgump about the creative world, like where's waldo in the art fair circuit. I should write more, but I'm absorbing it all until my closet artist's got enough ammo.

Website
http://defunct.

Technologies have always appeared that have served to progressively make the world smaller — from ships, rail and air travel to electricity, the telegraph, telephone, etc. At the same time, they have made worlds larger, for individuals could explore areas new to them, regardless of whether in the realm of spatial, temporal or psychic experience. The very fact that I’m here in Amsterdam owes to being able to apply to UvA online from New York, fly over the ocean, and prolong having to learn de Nederlandse taal with my utmost and stubborn reliance on Google Translate.

Living in a converted shipping container is novel, and I’m glad the internet is available and fast. Sure, out in the friendly outside world of this beautiful city, avoiding having to know Dutch is a non-issue — but in this metal box with its prefabricated plastic bathroom, prefabricated metal kitchenette, and aura-less IKEA furniture, I often find myself having to read important webpages, letters from the bank and university, and various email blasts about things going on in the city. I have it set up so that my browser (Chrome) automatically translates (using the Google service) any Dutch language that might show up.

The New York Times ran an article a half a year ago about how Google began their translation project. Although there have been several other companies developing translation software for decades, Google has leaped past their competition, primarily due to their access to and ability to aggregate and process massive amounts of global data. Prior to the ’90s translation software relied on a rule-based approach, such as programming guidelines for grammar and syntax. But language is too fluid and dynamic to be contained by a static set of rules, so the approach shifted to an associative, statistically-based one, for which Google has a natural advantage.

When I worked at the Linguistic Data Consortium at the University of Pennsylvania, I would think about the life-expectancy of different languages in the digital age. Much like how the printing press homogenized regional languages, different languages would achieve different degrees of digital migration. My job there involved data collection — recording the speech from Bengali phone conversations which would be fed into the statistical machine — as well as tell the computer the rules for specific morphological differences so that the final translating computer would combine both approaches.

So I’m considering these:

(i) Speech (aural) and language recognition technologies, with increasing accuracy understanding, then translating and conveying an idea/ideas.

(ii) A growing body of translation data, in the order of the most digitally migrated languages, and potentially a translated version/multiplication of every page already existing.

(iii) The emergence of a standard, digital “in-between” language, not for people but for computers — an arrangement of binary code that would serve as core idea-components, as the language between languages, a point from which any language can be reached.

When a phrase is translated from Chinese to English, is it treated as exclusively as, say, Chinese to Spanish? Or would statistical differences be apparent if I translated a phrase from English to Dutch to Chinese to Portuguese to Arabic and back to English? Would I get back the same phrase?

I typed in, “This phrase is to test the effects of a hypothesis about this translation service,” as a test, following the path of 5 language obstructions mentioned above. The result was: “This was the sentence to test the hypothesis about the impact of this translation.”

Okay, pretty close, but also pretty dry so maybe that example was hard to mess up.

Through the translation circuit, “Living in a converted shipping container is novel, and I’m glad the internet is available and fast.” became: “Container once a new life, and I’m glad that the Internet is fast and”

Fail.

Running, “I love browsing the books and the dames at the library,” returned: “I would like to browse books and the library lady.”

So it doesn’t seem to be totally there yet.. if it had a common intermediate idea-component for each translated part, wouldn’t my library activity remain something I love doing, instead of something I would like to do? Why is my attention now directed to one librarian instead of multiple dames?

(Any experts out there? Please shed some light and correct me if I’m totally wrong!)

(iv) If the ability to form thoughts requires language… The internet already works like a huge nerve center, so could the development of a language on its own terms allow the internet to think?

Tags: google translate, language, speech recognition, translation

Comments are closed.