Advertisement
Advertisement
Cybersecurity
Get more with myNEWS
A personalised news feed of stories that matter to you
Learn more
Web scraping is an automated task that copies data and information from web pages in bulk. Photo: Reuters

Nearly 235 million social media profiles from Instagram, TikTok and YouTube exposed in data leak

  • A database including email addresses and phone numbers of Instagram, TikTok and YouTube users was left exposed on the web, according to a report
  • Hong Kong-registered Social Data says it only uses publicly available data, but web scraping is strictly against most social media platforms’ terms of use
A Hong Kong-registered company that sells data on social media influencers has exposed as many as 235 million user profiles scraped from Instagram, TikTok, and YouTube on the web without a password or any other authentication required to access it, according to a report by British research firm Comparitech.

Security researcher Bob Diachenko, who leads Comparitech’s cybersecurity research team, uncovered three identical copies of a database which included names, contact information, images and statistics about followers on August 1, Comparitech said in the report on Wednesday.

The data was from a company called Social Data, which helps businesses “find influencers and get in-depth insights into demographic and psychographic data of influencers and their audience throughout different types of social media on the web”, according to its website.

The vast majority of the profiles were scraped from Facebook-owned Instagram, with the largest data sets including two with data from more than 95 million Instagram profiles each, while at least 42 million records from TikTok and nearly 4 million from Google-owned YouTube were also included in the database, according to the Comparitech report, which added that about one in five records contained either a phone number or email address.
The breach comes at a time when both Western and Chinese social media giants are coming under heavy scrutiny from governments over their data protection policies. Last year, Facebook agreed to pay a fine over the Cambridge Analytica scandal, which involved millions of Facebook users’ personal data being harvested without their consent and used for political campaigns including those related to the 2016 US Presidential Election and the UK’s referendum the same year on leaving the European Union.

Data privacy concerns weigh heavily on China’s AI leadership ambitions

TikTok has also been criticised by governments in countries including the US, India and France for its data collection practices. The short video app is now blocked in India and faces a similar ban in the US if it does not divest its American operations within 90 days, US President Donald Trump said last Friday.

Much of the data originated from another now-defunct firm called Deep Social, with which Social Data denies any connection, said Comparitech. It added in the report that Social Data’s chief technology officer acknowledged the exposure and the servers hosting the data were taken down about three hours later.

03:34

Cambridge Analytica ‘ceasing all operations’ in wake of Facebook scandal

Cambridge Analytica ‘ceasing all operations’ in wake of Facebook scandal

Web scraping is an automated task that copies data and information from web pages in bulk. It can be difficult to distinguish the automated bots from normal website visitors, so it is hard for social media platforms to prevent them from accessing user profiles, according to the research firm.

Comparitech’s report said Social Data has insisted it only scrapes what is publicly accessible, but the practice is against the terms of use for Facebook, Instagram, TikTok and YouTube.

Such scraping and storing of information is “vulnerable to spam marketing and phishing campaigns”, Comparitech warned in its report, adding that “even though the information is publicly available, the size and scope of an aggregated database makes it more vulnerable to mass attack than it would be in isolation”.

Florida teen, two others arrested over massive Twitter breach

Facebook spokeswoman Stephanie Otway said that scraping people's information from Instagram is a clear violation of the company’s policies.

“We revoked Deep Social's access to our platform in June 2018 and sent a legal notice prohibiting any further data collection,” Otway said.

A TikTok representative said the short video app places the “highest priority on user privacy” and has anti-scraping policies in place.

“Our Terms of Service prohibit third parties from running automated scripts to collect information from our platform, including public profile information,” the representative said. “If we identify any such practices, we will take rapid action, including seeking legal redress.”

A YouTube representative said that the video platform’s terms of service explicitly forbids collecting data that can be used to identify a person.

“We are currently investigating the specific issue, and will send Social Data a cease and desist letter if the scraping activity is verified or otherwise we believe it necessary,” the representative said.

Social Data did not immediately respond to the Post’s request for comment. According to the Comparitech report, a spokesperson from Social Data told the research firm that “all of the data is available freely to anyone with internet access” and that “social networks themselves expose the data to outsiders – that is their business”.

“Those users who do not wish to provide information, make their accounts private,” the spokesperson reportedly said.

Privacy laws to get major overhaul as government targets data breaches

Michael Gazeley, managing director of Hong Kong cybersecurity firm Network Box, said that despite the size of the leak, he did not think that it was a particularly serious breach.

“I don't think it's really a breach of privacy, if the data is already public,” he said. “It's far more worrying when critical, private, data is leaked. For example: passwords, bank details, health records.”

He added: “It becomes more serious if it's possible to do data analysis, for say political manipulation, but the key data, in this case as far as I understand it, isn't critical private data”

01:29

US indicts Chinese men for hacking related to coronavirus vaccine data and defence secrets

US indicts Chinese men for hacking related to coronavirus vaccine data and defence secrets

Nathaniel Rushforth, a US-qualified lawyer and cybersecurity specialist at Shanghai-based DaWo Law Firm, also said that scraping public profile information is a legal “grey area”, and whether it amounts to a real breach of privacy is “highly debatable”.

“Scraping itself is not necessarily illegal, and it probably doesn’t really breach anybody’s privacy in any significant way,” he said, although he added that some countries penalise offences such as misusing scraped data to inappropriately target people for financial gain or exploiting the data in anticompetitive ways.

“The only real way to prevent a determined data-gatherer from obtaining information on you is to limit what information you put online,” Rushforth said.

This article appeared in the South China Morning Post print edition as: 235m social media profiles exposed in huge data leak
Post