TECHNOLOGY

World Bank study shows that many pdf computer files are never read

PUBLISHED : Monday, 12 May, 2014, 3:12am
UPDATED : Monday, 12 May, 2014, 4:16am

It's the standard file format for academic papers, political briefings and research notes. but a report by the World Bank suggests the venerable pdf is keeping valuable information buried in servers, unread.

The working paper, which was released as a pdf, examines which reports released by the organisation were widely read, or even read at all.

Of the 1,611 reports the study looked at, only 25 were downloaded more than 1,000 times between 2008 and 2012.

At the other end of the scale, more than 31 per cent of the reports the group looked at, in all 517 papers, were not downloaded a single time.

"It is, however, important to keep in mind that many policy reports were not intended to reach a large audience," note the report's authors, Doerte Doemeland and James Trevino, "but prepared to assess very specific technical questions or inform the design of lending operations."

The portable document format, or "pdf", was invented by Adobe in 1993 as a way of rendering documents with rich text formatting and inline images in a consistent way across multiple computing platforms and various software packages. But owing to the way such documents are rendered, pdfs often give up machine readability in favour of human readability. The basic format doesn't require that text be selectable or searchable.

That then makes it impossible to mine the documents for the data they contain and so create databases of new information pulling together disparate sources.

Despite efforts to create "pdf to html" converters, they still need human oversight to check for errors of interpretation.