• Fri
  • Dec 19, 2014
  • Updated: 8:38am

Rise of the robot journalist

PUBLISHED : Friday, 20 July, 2012, 12:00am
UPDATED : Friday, 20 July, 2012, 12:00am
 

Next week, after the London Olympics have begun, you will read stories about athletes pushing human physiology to new limits. What you may not realise, however, is that many of these stories may not be written by humans.

Until now, robots that can write and reason have belonged to the realm of science fiction. The ability to master complex language, to process information and express it clearly with meaningful ideas, has been deemed a trait unique to mankind.

However, at the offices of Narrative Science, a Chicago-based company, computers are churning out flawless sports and financial reports for subscribers including the websites of Forbes magazine and the Big Ten Network, a broadcaster dedicated to university American football.

These robot journalists take no holidays, miss no deadlines and produce clean, well-researched copy for about HK$90 an article. On top of that, the algorithms that power these machines are designed to catch errors and learn from their mistakes.

Here is an excerpt of an American football report by Narrative Science published on the Big Ten Network website this year: 'Wisconsin jumped out to an early lead and never looked back in a 51-17 win over UNLV on Thursday at Camp Randall Stadium. The Badgers scored 20 points in the first quarter on a Russell Wilson touchdown pass and a James White touchdown run. Wisconsin's offence dominated the Rebels' defence ...'

Not something to impress the judges of the Pulitzer Prizes for journalism, but it is perfectly readable. Few can tell the difference between this and many other match reports.

Furthermore, Narrative Science finished the report within seconds of the final whistle - even before the players had left the field.

While automated journalism is still a novel concept, it is becoming the rage in America. Last year, dozens of websites published millions of news articles that had been written entirely by computer. Narrative Science, a start-up formed in 2010 with the trademark slogan 'we transform data into stories and insight,' already has more than 40 clients and the list is growing. And there are several similar competitors in the US.

Before long it may arrive in this part of the world, according to Katy De Leon, Narrative Science's marketing director. She told the South China Morning Post that, while her company did not have any Asian clients yet, it had plans to expand outside the US. 'Our long-term plans do include expansion into multi-lingual capabilities,' De Leon said.

Hong Kong University of Science and Technology Associate Professor Wu Dekai - the only Chinese person honoured as a founding fellow of the Association for Computational Linguistics - agrees that the technology is now mature enough for large-scale commercial applications.

'We are getting pretty good at [teaching computers] to mine a large collection of data and automatically discover interesting patterns,' said Wu, a pioneer in automated translation between English and Chinese and one of the world's leading experts in computational linguistics.

'Generating news stories in plain language following a certain template is not difficult for computers. There is no reason why we can't do it in Chinese as well.'

In an interview with Wired magazine in April, Narrative Science co-founder Kris Hammond predicted that 'more than 90 per cent' of the news in the US would be written by computer programmes in 15 years.

Scott Frederick, chief operating officer of rival company Automated Insights, told Agence France-Presse this month that he believed every American media outlet would need 'some automation strategy' in a year or two.

That may sound boastful but Wu agrees that automated journalism is not just a passing fad.

Even for the most knowledge-intensive work, a lot of effort and time are spent on repetitive drudgery, he says. 'Most lawyers' writing involves simple copy-and-paste. Similarly, most day-to-day news reporting is also pretty mundane,' Wu said. Technology can relieve reporters from these 'mechanical' aspects of journalism and allow them to focus on important stories.

To make the programme work, software engineers build complicated computational linguistic algorithms with the help of journalists.

The system first needs to gather large quantities of high-quality data. By studying the data, the algorithm learns to identify 'turning points' - the most dramatic moments in a sports game or a business transaction, and highlight them. Trained journalists are hired to build different sets of templates and to coach the computer to identify 'story angles' from raw material.

The secret is to identify the 'surprise' factors in an event, Wu explains. What makes something newsworthy is often a development that deviates from people's normal expectations. The share price of a listed company dropping or rising one or two percentage points may not be news - particularly if the fluctuation is in line with general market movements. But if a company's share price suddenly shoots up by 10 percentage points, that signals that something extraordinary has happened.

'When we build templates for programmes like these, we teach them to look out for certain types of surprises,' Wu said. 'We can build hundreds or even thousands of templates for different types of surprises - from sudden share movements to merger-and-acquisition deals. The system, equipped with these templates, can see through large sets of data and identify the right types of surprises.'

Most news reports, particularly on subjects like finance or a recap of a sports match, tend to have a fixed structure. News agencies like Bloomberg have developed a set of simple rules to teach their new recruits to put together a story quickly when facing deadlines. By using formulas like these, designers are able to create a framework for computers to produce articles.

The software also allows clients to customise the tone and angle of a story. 'You can get anything, from something that sounds like a breathless financial reporter screaming from a trading floor to a dry sell-side researcher pedantically walking you through it,' Jonathan Morris, the COO of analysis firm Data Explorers, which is using Narrative Science's technology to run a securities newswire, told Wired.

The algorithm can also mimic a certain writing style by systematically studying a particular writer's work. By going through the writer's published articles, the algorithm will remember which are the writer's favourite expressions and his preferred way of structuring an argument. While the computer may never have the insight of the Post's Tom Holland or the waspy wit of our Alex Lo, it can produce copy eerily reminiscent of their styles.

The rapid development of such technology gives Narrative Science's Hammond the confidence to aim for glory. In 20 years' time, he believes, there will be no areas in which the company does not write stories. Soon, Narrative Science will offer a news service to cover how the US presidential election is being reflected in social media.

When asked by Wired how long it would take for his robot to win a Pulitzer, Hammond, a professor of computer science and journalism at Northwestern University, unblinkingly answered: 'Five years'.

Technology like this may be a boon for media companies looking to cut costs. But at a time when the future of traditional media is hanging by a thread, the last thing journalists want is to compete with machines for their jobs.

It is not just journalists who are concerned. The use of automated journalism has given rise to a debate about its ethics, after a scandal rocked the American media industry and aroused much indignation about false bylines and plagiarism. News provider Journatic - a partner company of the Chicago Tribune - had been found to have used a combination of human editors in the United States and Philippines as well as computer algorithms to generate 'local' news under 'assumed bylines' - an act that violates ethical policies for the dailies, according to AFP.

Journatic clients include big organisations like the San Francisco Chronicle, Houston Chronicle, Chicago Sun-Times as well as the Chicago Tribune. Apart from using fake bylines, Journatic was also accused of plagiarising and fabricating some information. Journatic editorial director Mike Fourcher subsequently resigned and its CEO, Brad Timpone, apologised. But the anger continues. Many journalists and academics see this as an example of business sacrificing high-standard news reporting for the sake of money.

Questions have also been raised about how automated journalism will get good quality data in the first place. For financial and sports stories, there is lots of highly structured, reliable data readily available. But for human-interest stories, data and numbers are less important and more difficult to verify.

That does not mean the system is useless in these areas. The algorithms can scan hundreds of thousands tweets and Facebook messages at lightning speed and come up with angles that would take a human days of searching. It can even take the most popular quotes from a tweet to give the story a 'human touch'.

Many bloggers also openly question if the widespread use of automated journalism will lead to rampant 'personalised advertorials' in cyberspace. Big corporates sitting on piles of valuable personal data can easily use such programmes to produce tailor-made advertorials.

Even worse, authoritarian governments could use the technology to drown out opposition voices.

On the mainland, the authorities have already employed hundreds of thousands of ghost writers to mass-produce 'commentaries' in important forums to trump up support for the government.

'They [censors] are already watching your every move online, monitoring your every word. They have access to every piece of information,' said a Shenzhen-based apps developer who refused to be named. 'If the mainland government also masters this technology, forums and tweeters will soon be flooded with computer-generated wumao [pro-government] messages.' Wu mao, or 50 mainland cents, is said to be the fee the government pays up to 300,000 hirelings for each favourable comment online.

Wu, however, said it would take decades, if ever, before computer algorithms could really compete with humans, and algorithms would supplement, not supplant, professional reporters. 'Algorithms can have some superficial understanding and are good at rapidly processing large amounts of data,' he said. 'But they are not able to go deep. For true translation or news reporting, you have to have a deep knowledge of the real world and the context to understand all the subtleties, not just mechanically putting things together.

'A common complaint of journalists is that tedious run-of-the-mill reportage stifles their creativity. We can now use technology to alleviate that drudgery. This is where this kind of technology will be useful.'

In the end, journalism, just like language, is not about vocabulary or data. It is about the subtle and intricate relations between people and the world they inhabit.

'The algorithm can only identify surprises that you design it to look for. But the best surprises are those you [least] expect. ... Those are the really big stories. And those are the things that this automated journalism cannot catch unless we solve the entire AI [artificial intelligence] problem.'

So does he, too, believe computer algorithms can win a Pulitzer in five years' time? 'I do think we are making progress on that,' Wu said. 'But the only way to solve this problem is to build a model that can learn to solve problems as human beings do. Is this a five-year problem? No. This is one of the open grand challenges for scientists - just like the origins of the universe.

'To understand what a surprise is requires an understanding of how the human mind works. Until we really build a machine to understand our mind, we cannot build a machine that knows what is truly a journalistic surprise.'

1m

In less than three years, Foxconn, the electronics manufacturer, expects to have this many robots; its human workforce is 1.2m

In spite of an expected dip in profit, most analysts are positive about VF Corporation (VFC) before it repor ts its second quarter earnings on Thursday, July 19, 2012.

Analysts are projecting V.F to come in with earnings of 95 cents per share, down 15.2% from a year ago when it reported earnings of $1.12 per share.

The consensus estimate hasn??t changed over the past month, but it??s down from three months ago when it was $1.10. For the fiscal year, analysts are projecting earnings of $9.49 per share. Revenue is projected to eclipse the year-earlier total of $1.84 billion by 18.5%, finishing at $2.18 billion for the quar ter. For the year, revenue is projected to come in at $11.07 billion

Intel Corp., the world??s largest chipmaker, said Tuesday that the weak global economy is slowing its growth, and revenue for the current quarter is likely to come in below Wall Street forecasts.

Intel??s second-quar ter net income was $2.83 billion, or 54 cents per share. That was down 4.3 percent from $2.95 billion, or 54 cents per share, a year earlier, as operating expenses rose faster than revenue. Intel has been buying back shares, accounting for the flat earnings per share.

Question: which of these stories was written by a computer?

Answer: the top one

Share

For unlimited access to:

SCMP.com SCMP Tablet Edition SCMP Mobile Edition 10-year news archive
 
 

 

 
 
 
 
 

Login

SCMP.com Account

or