Where Google Translate fails: how web’s English dominance leaves minorities out, and the people determined to break down linguistic barriers
Artificial intelligence
  • Minorities in countries like the US can feel excluded from politics, health care and more because of the primacy of English on the web and social media

In 2020, Jennifer Xiong spent her summer helping Hmong people in California register to vote in the United States presidential election.

The Hmong are an ethnic group that come from the mountains of China, Vietnam, Laos and Thailand but do not have a country of their own, and Xiong was a volunteer organiser at Hmong Innovating Politics, in Fresno. There are about 300,000 Hmong people in the US, and she spent hours phone-banking and working on advertisements to run on Hmong radio and television channels. It was inspiring work.

“This was an entirely new thing for me to see,” she says. “Young, progressive, primarily women doing this work in our community was just so rare, and I knew it was going to be a huge feat.”

And by all accounts, it was. Asian-American turnout in the 2020 election was extraordinary, and observers say turnout among Hmong citizens was the highest they can remember.

But Xiong says it was also incredibly disheartening.

Minorities at a polling station for the 2020 United States presidential election. Photo: Getty Images

While Hmong people have long ties to the US – many were encouraged to migrate across the Pacific after being recruited to support the US during the Vietnam war – they are often left out of mainstream political discourse.

One example? On the website of Fresno’s county clerk, the government landing page for voter registration has an option to translate the page into Hmong – but, Xiong says, much of the information is mistranslated.

And it starts right at the beginning. Instead of the Hmong word for “hello” or “welcome”, she says, is “something else that said, like, ‘your honour’ or ‘the queen’ or ‘the king’ instead”.

Seeing something so simple done incorrectly was both frustrating and off-putting. “Not only was it just probably churned through Google Translate, it wasn’t even peer edited and reviewed to ensure that there was fluency and coherence,” she says.

I think we’ve become really complacent and dependent on advanced systems like Google. They claim to be ‘language accessible’ and then I read [a translation] and it says something totally different
Jennifer Xiong

Xiong says this kind of carelessness is common online, and it is one reason she and others in the Hmong community can feel excluded from politics.

They are not the only ones with the sense that the digital world wasn’t made for them. The web itself is built on an English-first architecture, and most of the big social media platforms that host public discourse in the US put English first, too.

As technologies have become proxies for civic spaces in the US, the primacy of English has been magnified. For Asian-Americans, the move to digital means that access to democratic institutions – everything from voting registration to local news – is impeded by linguistic barriers.

It is an issue in health care as well. During the pandemic, when black, Hispanic and Native patients have been two to three times more likely to be hospi­talised or die than white patients, these barriers add another burden: Brigham and Women’s Hospital, in Boston, found that non-English-speaking patients were 35 per cent more likely to die of Covid-19 than those who spoke English.


‘Is this patriot enough?’: US veteran shows military scars as he addresses anti-Asian violence

‘Is this patriot enough?’: US veteran shows military scars as he addresses anti-Asian violence
Translation problems are not the only issue. Xiong says that when Hmong users were trying to make vaccine appointments, they were asked for their zodiac sign as a security question – despite the fact that many in the community are unfamiliar with Western astrology.

In normal times, overcoming such challenges would be complicated enough, since Asian-Americans are the most linguistically diverse ethnic group in America. But after a year that has seen a dramatic increase in real-world and online attacks on Asian-Americans, the situation has become urgent in a different way.

Christine Chen, executive director of APIAVote, a non-profit group that promotes civic engagement among Asian people and Pacific Islanders, says political life has always been “exclusionary” for Asian people in the US, but “with digital spaces, it’s even more challenging. It’s so much easier to be siloed”.

Big platforms such as Facebook, Twitter and YouTube are popular among Asian-Americans, as are messaging apps like WeChat, WhatsApp and Line.

Which commu­ni­ca­tion channels people use often depends on their ethnicity. During the election campaign, Chen focused on building a volunteer network that could move in and out of those silos to achieve maximum impact.

Even her incredible memory couldn’t save IBM’s Chinese typewriter

At the time, disinformation targeting Asian-Americans ran rampant in WeChat groups and on Facebook and Twitter, where content moderation is less effective in non-English languages.

APIAVote volunteers would join different groups on the various platforms to monitor them for dis­information while encouraging members to vote.

Volunteers found Vietnamese-Americans, for example, were being targeted with claims that Joe Biden was a socialist – similar to political messages pushed at Cuban-Americans – preying on fears of communism.

Chen says that while content-moderation policies of Facebook, Twitter and others succeeded in filtering out some of the most obvious English-language dis­information, the system often misses such content when it is in other languages.

Most of the large social media platforms that host public discourse in the US, like Facebook and Twitter, put English first. Photo: Getty Images

That work instead had to be done by volunteers like those on her team, who searched for disinformation and were trained to defuse it and minimise its spread. “Those mechanisms meant to catch certain words and stuff don’t necessarily catch that dis- and misinformation when it’s in a different language,” she says.

Google’s translation services and technologies such as Translatotron and real-time translation headphones use artificial intelligence to convert between languages. But Xiong finds these tools inadequate for Hmong, a complex language in which context is extremely important.

“I think we’ve become really complacent and dependent on advanced systems like Google,” she says. “They claim to be ‘language accessible’ and then I read [a translation] and it says something totally different.”

A Google representative admitted that smaller lan­guages “pose a more difficult translation task” but said the company has “invested in research that particularly benefits low-resource language translations”, using machine learning and community feedback.

What I can do with three lines of code in Python in English literally took me two years of looking at 28 million words of Sinhala to build the core corpuses for, to build the core tools for
Yudhanjaya Wijeratne, researcher and data scientist, LIRNEasia

The challenges of language online go beyond the US. Yudhanjaya Wijeratne is a researcher and data scientist at Sri Lankan think tank LIRNEasia.

In 2018, he started tracking bot networks whose activity on social media encouraged violence against Muslims: in February and March that year, a string of riots by Sinhalese Buddhists targeted Muslims and mosques in the cities of Ampara and Kandy.

His team documented “the hunting logic” of the bots, catalogued hundreds of thousands of Sinhalese social media posts, and took the findings to Twitter and Facebook. “They’d say all sorts of nice and well-meaning things – basically canned statements,” he says.

In a statement, Twitter says it uses human review and automated systems to “apply our rules impartially for all people in the service, regardless of background, ideology, or placement on the political spectrum”.

A Facebook representative said the company commissioned an independent human rights assessment of the platform’s role in the violence in Sri Lanka, which was published in May 2020, and made changes in the wake of the attacks, including hiring dozens of Sinhala and Tamil-speaking content moderators.

“We deployed proactive hate-speech detection technology in Sinhala to help us more quickly and effectively identify potentially violating content,” the representative said.


Black American creates VR experience to give viewers first-hand look at racism in the US

Black American creates VR experience to give viewers first-hand look at racism in the US

When the bot behaviour continued, Wijeratne grew sceptical of the platitudes. He decided to look at the code libraries and software tools the companies were using, and found that the mechanisms to monitor hate speech in most non-English languages had not yet been built.

“Much of the research, in fact, for a lot of languages like ours has simply not been done yet,” Wijeratne says.

“What I can do with three lines of code in Python in English literally took me two years of looking at 28 million words of Sinhala to build the core corpuses for, to build the core tools for, and then get things up to that level where I could potentially do that level of text analysis.”

After suicide bombers targeted churches in Colombo, the Sri Lankan capital, in April 2019, Wijeratne built a tool to analyse hate speech and misinformation in Sinhala and Tamil.

The system, called Watchdog, is a free mobile application that aggregates news and attaches warnings to false stories. The warnings come from volunteers who are trained in fact-checking.

Wijeratne stresses that this work goes far beyond translation.

“Many of the algorithms that we take for granted that are often cited in research, in particular in natural-language processing, show excellent results for English,” he says.

“And yet many identical algorithms, even used on languages that are only a few degrees of difference apart – whether they’re West German or from the Romance tree of languages – may return completely different results.”


Robot expert predicts a future filled with artificially intelligent, self-conscious machines

Robot expert predicts a future filled with artificially intelligent, self-conscious machines

Natural-language processing is the basis of automated content moderation systems, and Wijeratne published a paper in 2019 that examined the discrepancies between their accuracy in different languages.

He argues that the more computational resources that exist for a language, such as data sets and web pages, the better the algorithms can work. Languages from poorer countries or communities are disadvantaged.

“If you’re building, say, the Empire State Building for English, you have the blueprints. You have the materials,” he says. “You have everything on hand and all you have to do is put this stuff together.

“For every other language, you don’t have the blueprints. You have no idea where the concrete is going to come from. You don’t have steel and you don’t have the workers, either. So you’re going to be sitting there tapping away one brick at a time and hoping that maybe your grandson or your granddaughter might complete the project.”

The movement to provide those blueprints is known as language justice, and it is not new. The American Bar Association describes language justice as a framework that preserves people’s rights to communicate, understand, and be understood in the language which they prefer and in which they feel most articulate and powerful.

The path to language justice is tenuous. For its realisation, tech­nology companies and government service providers would have to make it a higher priority and invest more resources.

‘They had no English and no power’: life in London’s Chinatown

And, Wijeratne points out, racism, hate speech and exclusion targeting Asian people, especially in the US, existed long before the internet. Even if language justice could be achieved, it is not going to fix these deep-seated issues.

But for Xiong, language justice is an important goal that she believes is crucial for the Hmong community.

After the election, Xiong took on a new role with her organisation, seeking to connect California’s Hmong community with public services such as the Census Bureau, the county clerk and vaccine registration.

Her main objective is to “meet the community where they are”, whether that is on Hmong radio or in English via Facebook Live, and then amplify the perspective of Hmong people to the broader public. Every day, she says, she has to face the imbalances in technology that shut people out of the conversation and block them from access to resources.

Equality would mean “operating in a world where interpretation and translation is just the norm”, she says.

“We don’t ask whether there’s enough budgeting for it, we don’t question if it’s important or it’s valuable, because we prioritise it when it comes to the legislative table and public spaces.”

MIT Technology Review