UK spy agency played catch-up to 'master the internet'
It appeared to Britain's spy agency in 2009 that technology was leaving it behind and it needed to get moving on 'mastering the internet'
The memo was finished at 9.32am on Tuesday, May 19, 2009, and was written jointly by the director in charge of the British Government Communications Headquarters' top-secret Mastering the Internet (MTI) project and a senior member of the GCHQ's cyber-defence team.
The internal e-mail was a "prioritisation and tasking initiative" to another senior member of staff about the problems facing the British spy agency during a period when technology seemed to be racing ahead of the intelligence community.
The authors wanted new ideas - and fast.
"It is becoming increasingly difficult for GCHQ to acquire the rich sources of traffic needed to enable our support to partners within HMG [Her Majesty's government], the armed forces, and overseas," they wrote.
"The rapid development of different technologies, types of traffic, service providers and networks, and the growth in sheer volumes that accompany particularly the expansion and use of the internet, present an unprecedented challenge to the success of GCHQ's mission."
The memo continued: "We would like you to lead a small team to fully define this shortfall in tasking capability [and] identify all the necessary changes needed to rectify it."
According to the papers leaked by the National Security Agency (NSA) whistle-blower Edward Snowden, GCHQ's overarching project to "master the internet" was under way, but one of its core programmes, Tempora, was still being tested and developed, and the agency's principal customers, the government, MI5 and MI6 (British domestic and foreign intelligence), remained hungry for more and better-quality information.
It seems the MTI programme began life in early 2007 and, a year later, work began on an experimental research project, run out of GCHQ's outpost at Bude in Cornwall, southwest England.
Its aim was to establish the practical uses of an "internet buffer", the first of which was referred to as CPC, or Cheltenham Processing Centre.
By March 2010, analysts from the NSA had been allowed some preliminary access to the project, which, at the time, appears to have been codenamed TINT, and was being referred to in official documents as a "joint GCHQ/NSA research initiative".
TINT, the documents explain, "uniquely allows retrospective analysis for attribution" - a storage system of sorts, which allowed analysts to capture traffic on the internet and then review it.
Historically, the spy agencies have intercepted international communications by focusing on microwave towers and satellites. The NSA's intercept station at Menwith Hill in North Yorkshire played a leading role in this.
The papers make clear that at some point - it is not clear when - GCHQ began to plug into the cables that carry internet traffic into and out of the country, and garner material in a process repeatedly referred to as SSE. This is thought to mean special source exploitation.
The capability, which was authorised by legal warrants, gave GCHQ access to a vast amount of raw information, and the TINT programme a potential way of being able to store it.
A year after the plaintive e-mail asking for new ideas, GCHQ reported significant progress on a number of fronts.
One document described how there were two billion users of the internet worldwide, how Facebook had more than 400 million regular users and how there had been a 600 per cent growth in mobile internet traffic the year before.
"But we are starting to 'master' the internet," the author claimed. "And our current capability is quite impressive."
The report said the UK now had the "biggest internet access in Five Eyes" - the group of intelligence organisations from the US, the UK, Canada, New Zealand and Australia. "We are in the golden age," the report added.
However, the paper warned that US internet service providers were moving to Malaysia and India, and the NSA was "buying up real estate in these places". The author suggested Britain should do the same and play the "US at [their] own game - and buy facilities overseas".
GCHQ's mid-year 2010-11 review revealed another startling fact about Mastering the Internet. "MTI delivered the next big step in the access, processing and storage journey, hitting a new high of over 39 billion events in a 24-hour period, dramatically increasing our capability to produce unique intelligence from our targets' use of the internet and made major contributions to recent operations."
This appears to suggest GCHQ had managed to record 39 billion separate pieces of information during a single day.
The NSA remarked on the success of GCHQ in a "Joint Collaboration Activity" report in February 2011.
In a startling admission, it said Cheltenham now "produces larger amounts of metadata collection than the NSA", metadata being the bare details of calls made and messages sent rather than the content within them.
By May last year, GCHQ reported that it now had "internet buffering" capability running from its headquarters in Cheltenham, Bude, and a location abroad. The programme was now capable of collecting, a memo explained with excited understatement, "a lot of data!"
What British spy agency GCHQ does to monitor web
What is an internet buffer?
British eavesdropping agency GCHQ, helped by the National Security Agency in the US, intercepts and collects a large fraction of web traffic coming into and out of Britain. This is then filtered to get rid of uninteresting content, and what remains is stored for a period of time - three days for content and 30 days for metadata. The result is that GCHQ and NSA analysts have a vast pool of material to look back on if they are not watching a particular person in real time.
How is it done?
GCHQ appears to have intercepts placed on most of the fibre-optic communications cables in and out of Britain. This seems to involve some degree of co-operation from companies operating either the cables or the stations at which they come into the country. These agreements, and the exact identities of the companies that have signed up, are seen as extremely sensitive and are classified as top secret.
How does it operate?
The system seems to operate by letting GCHQ survey web traffic flowing through different cables at regular intervals, and then detecting automatically which are most interesting, and harvesting the information from those. The documents suggest GCHQ is able to survey about 1,500 of the 1,600 or so high-capacity cables in and out of Britain at any one time, and aspires to harvest information from 400 or so at once - a quarter of all traffic. As of last year, the agency had gone halfway, attaching probes to 200 fibre-optic cables, each with a capacity of 10 gigabits per second. In theory, that gave GCHQ access to a flow of 21.6 petabytes in a day, equivalent to 192 times the British Library's entire book collection. GCHQ documents say efforts are made to filter out Britain-to-Britain communication automatically, but it is unclear how this is defined, or whether it is even possible in many cases.
What does this let GCHQ do?
GCHQ and NSA analysts, who share direct access to the system, are repeatedly told that they need a justification to look for information on targets in the system and cannot simply go on fishing trips - under the Human Rights Act, searches must be necessary and proportionate. However, when they do search the data, they have lots of specialist tools that let them obtain a huge amount of information from it: e-mail addresses, IP addresses, who contacts whom, and what search terms they use.
What is the difference between content and metadata?
A simple analogy is that content is a letter, and metadata is the envelope. However, internet metadata can reveal much more than that: where you are, what you are searching for, who you are messaging and more. One of the documents sets out how GCHQ defines metadata, noting that "we lean on legal and policy interpretations that are not always intuitive". It notes that in an e-mail, the "to", "from" and "cc" fields are metadata, but the subject line is content. It also sets out how even passwords can be seen as metadata.