Alibaba reveals progress with large language model research as Chinese Big Tech firms continue to push for ChatGPT rival

A group of researchers from DAMO Academy have unveiled a new audiovisual language model called Video-LLaMA

The new DAMO Academy model is an enhancement from previous vision-LLMs as it can tackle two challenges in video understanding

Reading Time:2 minutes

The new DAMO Academy model is an enhancement from previous vision-LLMs as it can tackle two challenges in video understanding. Photo: Shutterstock

Ann Caoin Shanghai

Published: 8:30am, 13 Jun 2023Updated: 4:00pm, 14 Jun 2023

Alibaba Group Holding’s in-house research unit is making progress with its own large language models (LLMs), as Chinese Big Tech companies continue to pile into the artificial intelligence (AI) space in an attempt to come up with a rival to OpenAI’s ChatGPT.

A group of researchers from DAMO Academy unveiled a new audiovisual language model called Video-LLaMA, which helps the system to understand visual and auditory content in videos, in a research paper published last week on ArXiv, an online scientific paper repository.

The codes have also been open-sourced by the researchers on online developer community GitHub. Alibaba owns the South China Morning Post.

LLMs, which are trained through machine learning, are the underpinning of AI-powered chatbots like ChatGPT. LLMs allow the chatbots to answer sophisticated queries, generate detailed writings, code, or other content.

The new DAMO Academy model is an enhancement from previous vision-LLMs as it can tackle two challenges in video understanding: capturing the temporal changes in visual scenes and integrating audiovisual signals, according to the three researchers, Zhang Hang, Li Xin and Bing Lidong.

Alibaba reveals progress with large language model research as Chinese Big Tech firms continue to push for ChatGPT rival

A group of researchers from DAMO Academy have unveiled a new audiovisual language model called Video-LLaMA The new DAMO Academy model is an enhancement from previous vision-LLMs as it can tackle two challenges in video understanding

A group of researchers from DAMO Academy have unveiled a new audiovisual language model called Video-LLaMA

The new DAMO Academy model is an enhancement from previous vision-LLMs as it can tackle two challenges in video understanding