Imagine learning how to translate from Chinese to English by reading millions of sentences from Hong Kong's bilingual Legislative Council transcripts.
You look at the Chinese. You look at the English below.
Actually, you don't know either language: you are a cluster of 75 computers in Professor De Kai Wu's computational linguistics and musicology lab at the University of Science and Technology.
But as a machine, you are not only looking at the unfamiliar sentences two at a time, as a person would. Instead, you are using statistics to relate huge heaps of data to one another simultaneously.
You notice that in thousands of instances, the English 'government building' appears in the same chunk of text as the Chinese phrase for it, so it is highly probable that these chunks mean the same thing.
You study these bilingual patterns, billions of them, cranking away at your algorithms.