The Internet’s Hidden Boundaries

November 29, 2017

The Internet isn’t the same for everyone. How do we unify it?

In 1969, when scientists turned on the first Internet router, it was the size of a telephone booth. The Honeywell-based Internet Message Processor served just a handful of academics. They may have seen the potential in a network of computers, but they couldn’t have anticipated just how much it would change the world.

 

internet router
Internet pioneer Leonard Kleinrock stands with an Interface Message Processor, which transmitted the first message between computers hundreds of miles apart.

 

Today, 53% of households around the globe have Internet access, according to the International Telecommunications Union (ITU). It has been a levelling economic, social and political force, enabling information to flow more readily around the world than ever before.

A global Internet may be a wonderful thing, but it doesn’t eliminate all communication barriers. Boundaries still exist, and they’re growing more problematic by the day.

One of the biggest hurdles on the Internet is the same one that humankind has faced for millennia: the fact that we don’t all speak the same language.  If your website or online application works only in English, then statistics suggest that you are missing out on a large potential population of users.


Are you talking my language?

Foundation Networks & Development (FUNDRES), an NGO specializing in ICT for development, worked with the International Organization of Francophonie and Maaya, the world network for linguistic diversity, to explore the state of language on the web.

Its June 2017 analysis showed that while English was still very popular, it was no longer dominant. 22.2% of Internet users spoke English as their native language, but 20.5% spoke Chinese, and 9.1% spoke Spanish. Other popular languages included French (5.6%), and German (3.1%).

One thing that stands out in the FUNDRES statistics is the productivity ratio. This is the proportion of content available online in different languages compared to the proportion of native speakers using the Internet.

While only slightly over one in five Internet users spoke English as their native language, almost one third (32%) of the content available online is English. This makes the ratio of English content to English speakers the highest, at 1.44. A productivity ratio greater than one means that a language is overrepresented, because its content outweighs the proportion of speakers online.

Predictably, other languages that are overrepresented online are spoken primarily in developed Western countries, which adopted the Internet and the web first. French, German and Italian all had a ratio of English speakers to content higher than one.

Comparatively, Chinese scored slightly less. While 20.5% of Internet users are Chinese speakers, only 18% of the content online is Chinese. Similarly underrepresented are Portuguese, Bengali, Urdu and Hindi. Among the top 15 languages on the Internet, the two most poorly represented in terms of their productivity ratios are Arabic and Russian.

These disparities will become increasingly important as more people around the world come online. The Pew Research Center reports that users from developing countries are flocking to the Internet. In 2013, the median average of 45% across 21 emerging and developing countries reported using the Internet at least occasionally or owning a smartphone.

That figure rose to 54% in 2015, with the biggest proportion of new users coming from large emerging economies such as Malaysia, Brazil and China.


National and international diversity

Don’t think that this is only a problem if you’re tackling those emerging markets with your online product or service, though. In an increasingly globalized economy, linguistic differences are surfacing inside countries, too.

A 2014 analysis of the U.S. Census by Slate magazine explored the most commonly spoken household language other than English in all US states. The overwhelming result was Spanish (which has an anemic 0.88 productivity ratio in the FUNDRES analysis).

The same Slate maps, exploring the most commonly spoken language in all states other than English or Spanish, produced a bewildering array of results. Native American languages appeared, as did a distinct proportion of German and Vietnamese speakers.

 

And who knew that the most commonly spoken language other than English or Spanish in California was Togalog?

Try accessing Buzzfeed in that language.

Coping with a growing number of languages online can be challenging enough, but things get even more difficult when native English-speaking companies must tackle non-English character sets.

Japanese users will often read and write in the Japanese kanji writing system and its subsets, featuring syllabic characters, such as hiragana or katakana. They might also input options in romaji, a romanized Japanese character set.

Character sets like these have been added to international standards, but must be declared in code when marking up web pages. It all increases the workload when targeting new non-English speaking markets.


Levelling the Internet playing field

The Internet may seem at first glance like the great leveler, but in practice it is uneven territory, with many barriers and hurdles to the free flow of information. What can organizations do to combat this, and get their message across to everyone?

Companies often play ball with governments seeking to interfere with information flows. They give in to government demands for censorship in return to access to new markets, in moves that have angered digital rights activists.

One way in which they do this is to support government censorship of anti-censorship tools themselves. Activist groups and for-profit companies often produce virtual private networks and other programs designed to route around censorship.

Apple – which relies increasingly on emerging economies to boost its revenues in the heavily-saturated smartphone market – made concessions to the Chinese government, pulling VPN tools from its App Store in the region in summer 2017.

 

 

At roughly the same time, a company responsible for running Amazon cloud-based services in China banned the use of software on its infrastructure that would help customers to circumvent Chinese censorship measures.

Some have taken a contrary approach, abandoning repressive regimes. Google pulled out of China in 2010 after attributing a major hack of its infrastructure to the government there.  However, the company has more recently prepared for re-entry, negotiating for an app store that will be friendly to Chinese government policies. It also launched a version of its YouTube video service designed for Pakistan to appease the government there.

So, most commercial approaches to censorious governments seem to involve appeasing them somehow. When they can’t make that work, they may try getting in through the back door. Facebook went so far as to launch an app under an entirely different name to subvert a Chinese ban on its services.


Smoothing the Internet’s information flows

How will we bridge these barriers? Some options include raising issues of Internet censorship at an international level and trying to build consensus among stakeholders from multiple governments. Regulating Western companies to stop them abetting Internet censorship is another potential option. Others will rely on the ability to continually subvert increasingly sophisticated censorship technology with their own tools, in a perpetual game of cat and mouse.

In 2008, German hacking group the Chaos Computer Club released an electronic toolkit designed to help journalists reporting from the Chinese Olympic Games to get uncensored access to Western websites. It used the Tor onion routing mechanism that has since become a gateway to the dark web, and sent the toolkit physically with journalists on a USB stick that it called the ‘freedom stick.’

This problem is inherently complex, with so many moving parts that it will be difficult to solve the censorship problem in the short term, if at all.


Overcoming language barriers

Surmounting language barriers online is a more tractable problem, and one which technology can help with. Artificial intelligence has led to great strides in machine translation, making them less and less a barrier, but we are far from “the universal translator” of science fiction, as recent news illustrates:

• Facebook apologizes after wrong translation sees Palestinian man arrested for posting ‘good morning’ — Facebook had to apologise for when their fully automatic translation service erroneously translated “يصبحهم”, or “yusbihuhum” (which translates as “good morning”) to “attack them” resulting in a construction worker being arrested in Jerusalem.

• China’s WeChat app translates ‘black foreigner’ to N-word — China’s 900-million-user-strong platform blamed their artificial intelligence so ftware for WeChat translating the Chinese for “black foreigner” to That Word in English.“The company uses AI and machine learning, feeding computers huge amounts of data to train it to pick the best translations based on context. But the system also removes human oversight, leading to incorrect and even offensive words being used.”

Google Translate Thinks “oga Booga Wooga” Is Somali And People Are Confused AF — all that data and billions spent on R&D over many years show that machine translation alone is nowhere near being trustworthy enough beyond getting a general gist for things.

 

 

This presents an exciting future in which the Internet not only brings people together from different parts of the world, but enables them to communicate seamlessly regardless of their native tongue. As we move into that future, though, there are looming shadows on the Internet landscape.


Stormclouds ahead

As the world’s governments seem to drift apart politically and become less collaborative, the danger is that the Internet as we know it may shift, balkanizing still further and separating.

Not only are governments restricting the free flow of information online, but some of them are even subverting the Internet’s underlying network infrastructure to create alternative networks altogether.

For example, Iran has been working on building an entirely separate Internet of its own for several years, cordoned off from the broader global web. Germany has also floated the idea of a walled-off national Internet, after allegations of US spying claims, and both China and Russia have explored a similar idea.

Facebook, eager to access emerging markets, has posited its own corporate version of the existing Internet, delivered to – and mediated for – those in developing countries for free.

It seems the real boundaries of the internet are not natural horizons such as language but the decisions that have always existed — to build walls, or to build bridges, between cultures around the world.

The post The Internet’s Hidden Boundaries appeared first on Unbabel.

About the Author

Profile Photo of Content Team
Content Team

Unbabel’s Content Team is responsible for showcasing Unbabel’s continuous growth and incredible pool of in-house experts. It delivers Unbabel’s unique brand across channels and produces accessible, compelling content on translation, localization, language, tech, CS, marketing, and more.