Advertisement

Baidu creates ‘world’s largest’ Chinese natural language processing database

  • Baidu has been seeking to diversify its revenue mix, as changing internet usage patterns chip away at its search engine dominance
  • It launched Qian Yan, which it says is the largest Chinese database for natural language processing, among other AI-related products on Tuesday

Reading Time:2 minutes
Why you can trust SCMP
0
Chinese search engine giant Baidu has launched what it says is the world’s largest Chinese natural language processing database. Photo: Reuters
Chinese search engine giant Baidu has launched what it says is the world’s largest Chinese natural language processing (NLP) database, among several other artificial intelligence (AI) products, as it seeks to diversify its revenue sources.

NLP is a branch of AI involved in making computers understand the way humans naturally talk and type online, turning such information into structured data for further analysis.

The project, called Qian Yan – or “thousand words” in Chinese – is a collaboration with industry group China Computer Federation meant to help the industry cope with a lack of computing capacity and linguistic data, which are both barriers to the development of NPL technology, Baidu said in a press release on Tuesday.

Data scientists from 11 local universities and enterprises contributed to Qian Yan’s first phase, which includes 20 open source Chinese data sets and covers seven major machine learning tasks such as reading comprehension and open-domain dialogue systems used in chatbots, the press release said.

“In the future, we hope that more data scientists can participate in Qian Yan, jointly promote the progress of Chinese information processing technology, and build a worldwide Chinese information processing influence,” said Wu Hua, chairman of the Baidu Technical Committee. She added that the company aimed to construct at least 100 Chinese NLP data sets that can carry out more than 20 tasks within the next three years.

The ambitious project comes as Baidu seeks to diversify its revenue mix, amid a shift in internet usage patterns that has chipped away at its dominance in the search engine industry. The company reported a 5 per cent drop in online marketing revenue year-on-year in 2019 to 78.1 billion yuan (US$11.22 billion), as it faced rising competition from self-contained, super-app ecosystems like Tencent Holdings’ WeChat as well as short-video platforms like Tencent-backed Kuaishou and ByteDance’s Douyin.
Advertisement