Chinese to Surpass English on the Web
Running into an article in Art Technica on US broadband policy got me thinking again about the shape of language on the web. Today, the web is predominantly English. Nobody knows exactly how many of the world’s 15b webpages are in English, but estimates are between 65 - 80%.
Most readers of this webpage will probably have had slim experience with the web in a foreign language. Maybe a few of us have tried playing with Google Translate or even AltaVista’s Babelfish back in the day. But far less yet will have considered purchasing Systran or Language Weaver translation tools to close an international deal or read an academic paper in Portugueuse.
As Anglophones, we’re the lucky ones. Wry observer Momus says we’ve benefited from the emergent structure of global cultural exchange. That structure routes knowledge through English akin to the way airlines route flights through regional hubs. “If culture were like an aviation model,” he asks, “would Poles be able to fly to Tokyo without having to stop at LAX?”
So we Anglophones have enjoyed a positive-feedback loop as the Lingua Franca of the Web, riding virtual shotgun to English’s preponderance in cultural and economic global spheres. As only 31.7% of Internet users, we account for 65 - 80% of web content.
(Of course, the ride has not been all gung-ho for us. Ever wonder why today’s blockbusters lack the sophistication of a Billy Wilder masterpiece? That’s because Hollywood studios recoup costs on expensive special effects by exporting movies to foreign audiences; thus, the easily translated dialog and plot.)
But, in the immortal words of Bob Dylan, “the times they are a-changin’.” In the 21st Century, America will share global leadership with 5 rapidly-emerging economies, known as the BRICs: Brazil, Russia, India and China.
What does this mean for the Web? Massive shifts in the origin of web content from English to other languages. Chinese, in particular, will overtake English as the dominant Internet vernacular. Today, Chinese grows 2.6x faster than English on the Web. “By 2011,” says JupiterResearch, “Asians will make up 42% of the world’s Internet population.”
Yikes. With reliable machine translation still decades away, I guess now is the time to brush up on my Mandarin.



I disagree that reliable machine translation is decades away. Language Weaver, which you mentioned, is doing amazing things already with its statistical translation methods. Check out this blog from one of their globalization customers who uses it to translate manuals for their big automotive clients: http://code-is-art.blogspot.com/2007/06/i-was-astonished-about-language-weaver.html If you want to see for yourself if you can get the concept of news stories written in other languages, including Chinese, check out http://www.kontrib.com. It’s a beta social bookmarking site that Language Weaver has posted for consumers, and you can translate to/from French, Spanish, Arabic, Chinese, English and Romanian. It’s not perfect but you definitely get the gist of the stories and comments.
Beth
July 30, 2007
Hi Beth,
Thanks for your reply - the first comment of substance here on Metamash!
So I like both Language Weaver and Kontrib, but I still think that reliable MT is decades away.
With the 2 translated stories which I could find on Kontrib (it took some digging), there was a high gobbedly gook quotient, which I think would frustrate regular users.
As to the blog post, I would hazard a guess that this client’s hurdle for acceptable translation (for software localization) is substantively lower than a regular user’s.
So, to clarify a bit, I think that reliable MT for the regular user is still decades away. Hopefully companies like LW bring us there sooner.
Alex
apaq
July 30, 2007