Advertisement

DeepSeek speculation swirls online over Chinese AI start-up’s much-anticipated R2 model

The latest speculation includes R2’s imminent launch and the new benchmarks that it set in terms of cost-efficiency and performance

Reading Time:2 minutes
Why you can trust SCMP
DeepSeek’s R2 artificial intelligence model was said to have been trained on a server cluster of Huawei Technologies’ Ascend 910B chips, according to Chinese social-media posts. Photo: Shutterstock
Ben Jiangin Beijing
Chinese start-up DeepSeek is seeing rampant speculation on social media, raising anticipation for its next open-source artificial intelligence (AI) model, as the company continues to keep the industry guessing about its progress amid an intensifying US-China tech war.
Advertisement
The latest speculation about DeepSeek-R2 – the successor to the R1, reasoning model, which was released in January – that surfaced over the weekend included the product’s imminent launch and the purported new benchmarks it set for cost-efficiency and performance.
That reflects heightened online interest in DeepSeek after it generated worldwide attention from late December 2024 to January by consecutively releasing two advanced open-source AI models, V3 and R1, which were built at a fraction of the cost and computing power that major tech companies typically require for large language model (LLM) projects. LLM refers to the technology underpinning generative AI services such as ChatGPT.
According to posts on Chinese stock-trading social-media platform Jiuyangongshe, R2 was said to have been developed with a so-called hybrid mixture-of-experts (MoE) architecture, with a total of 1.2 trillion parameters, making it 97.3 per cent cheaper to build than OpenAI’s GPT-4o.

MoE is a machine-learning approach that divides an AI model into separate sub-networks, or experts – each focused on a subset of the input data – to jointly perform a task. This is said to greatly reduce computation costs during pre-training and achieve faster performance during inference time.

Advertisement

In machine learning, parameters are the variables present in an AI system during training, which helps establish how data prompts yield the desired output.

02:51

South Korea says DeepSeek sent data to ByteDance-owned servers in China without consent

South Korea says DeepSeek sent data to ByteDance-owned servers in China without consent
Advertisement