China’s popular social media platform Weibo is cracking down on the use of homonyms and deliberately misspelt words to evade the country’s strict online censorship. The Twitter-like service said in a Weibo post on Wednesday that it would launch a campaign to regulate “the illegal behaviour of using homophone characters, variants of words, and other ‘misspelt words’ to spread harmful information”. “We will strengthen the platform’s mechanism of language [wording] supervision, and will refine the keyword identification model,” the post said. Chinese word processor WPS under fire for censoring private documents The move comes as Chinese regulators ramp up ongoing efforts to “clean up” the country’s cyberspace. As certain keywords are often censored, homonyms and “misspelt” words have long been used by Chinese internet users to evade strict censorship. This week, for example, some users are using “Helan”, two Chinese characters that mean the Netherlands in English, to discuss the bank protests happening in China’s central Henan province. As discussions on the topic are strictly censored, in the new language system created by internet users, Zhengzhou – the capital of Henan province – becomes Amsterdam, with some asking on Weibo, “What’s the latest situation of the Netherlands’ Amsterdam bank?” Since April, hundreds of thousands of people across China who deposited money in four rural banks based in Henan have been trying to get their funds back, generating wide attention in China. In particular, “Helan” has been a useful meme in Henan province because of the tradition of Mandarin speakers in southern China to confuse the pronunciation of “l” with “n”, hence Henan becomes “Helan”. Chinese characters can also be split up into their component parts in an attempt to avoid detection by censors. For example, Li Peng, who was the Chinese Premier during the 1989 Tiananmen Square crackdown, is often written on Chinese blogs as five Chinese characters instead of two. Some internet users are unhappy about Weibo’s new move . “But if you don’t spell the word incorrectly, you won’t let me post it right?,” a user commented on Weibo. “Do you want us to post on Weibo in Morse code?” another commented. Wong Kam-fai, a professor at Chinese University of Hong Kong who specialises in natural language processing, said there are multiple methods for platforms to track popular homonyms and misspelt words. For example, platforms can add new keywords to their dictionary, and train the model to understand the author’s intention based on contextual information. “It’s a very mature technology, but each time you want to apply it [to a new scenario], you still need to gather some data to train the model,” he said. Taking Henan as an example, the most frequent use method is disclosure analysis, according to Wong. When users are discussing bank protests online, the system can understand the intention of the author by studying the previous sentence and relying on contextual information. “When you are speaking, you are not saying single words. [Artificial intelligence] can identify what you mean by analysing the [context],” he said. Regarding the Li Peng case, “morphological analysis” also works, according to Wong. “If this sequence of characters appears many times they will be grouped together as a new [bad] word and introduced to the dictionary,” he said. How the technology will be implemented depends on the government, Wong added. “Authorities know that something is happening and many internet users are discussing it. But at some point, the discussion crosses the line, so they will tighten the regulation,” he said.