Baidu creates ‘world’s largest’ Chinese natural language processing database
- Baidu has been seeking to diversify its revenue mix, as changing internet usage patterns chip away at its search engine dominance
- It launched Qian Yan, which it says is the largest Chinese database for natural language processing, among other AI-related products on Tuesday

NLP is a branch of AI involved in making computers understand the way humans naturally talk and type online, turning such information into structured data for further analysis.
The project, called Qian Yan – or “thousand words” in Chinese – is a collaboration with industry group China Computer Federation meant to help the industry cope with a lack of computing capacity and linguistic data, which are both barriers to the development of NPL technology, Baidu said in a press release on Tuesday.
Data scientists from 11 local universities and enterprises contributed to Qian Yan’s first phase, which includes 20 open source Chinese data sets and covers seven major machine learning tasks such as reading comprehension and open-domain dialogue systems used in chatbots, the press release said.
“In the future, we hope that more data scientists can participate in Qian Yan, jointly promote the progress of Chinese information processing technology, and build a worldwide Chinese information processing influence,” said Wu Hua, chairman of the Baidu Technical Committee. She added that the company aimed to construct at least 100 Chinese NLP data sets that can carry out more than 20 tasks within the next three years.