Data Mining

Into the data mine's depths

PUBLISHED : Sunday, 23 October, 2011, 12:00am
UPDATED : Sunday, 23 October, 2011, 12:00am


Related topics

So you've lost money in the stock market lately? Join the club. Many people I know have lost between 10 per cent and 30 per cent of their portfolio since April. We did our research, tracked stock price movements, spotted trends, perused available information to try to predict which way the markets were heading. To no avail.

In the past six months, we've been buffeted by some of the greatest volatility in global financial history. During the week of August 4-11, for instance, the S&P 500 index recorded an average daily movement of 4.25 per cent (compared to only slightly above 0.5 per cent since 1950).

A key contributor to this extreme volatility is the dominance of computer-driven high-frequency trading that makes up more than 65 per cent of transactions in the stock market today. And key to such trading is something we all do: data mining. But while small investors like you and me do a primitive sort of data mining by Googling price-to-earning ratios, high-frequency traders apply sophisticated mathematical techniques to sift through colossal amounts of information. Buying and selling billions of dollars of stock in nanoseconds, such traders drive the markets, leaving ordinary investors at the mercy of the tsunami they generate.

Data mining is the wave of the future in many other fields too. As its name implies, data mining entails extracting hidden predictive information from massive databases to identify patterns and detect relationships. This is useful to governments, corporations, just about anyone. For example, if a telecoms company can map relationships between the age, sex, income, hobbies and education levels of a large population of mobile-phone users, it can use that to predict the potential size of a new market.

Since antiquity humans have made connections between natural phenomena, such as the changing of the seasons and the movement of planets, and activities in their daily lives, such as planting and harvesting of crops or picking auspicious dates for important events. Observing phenomena, interpreting information and establishing relationships and patterns were early attempts at what we call data mining today. Now, with powerful computers and ultra-high-capacity storage devices, immense amounts of complex information are collated and analysed, and highly sophisticated applications developed. Whenever you use an internet search engine, you are invoking its data-mining capabilities. For example, Googling my name, Tom Yam, will produce over 3.2 million results, albeit mostly spicy soup recipes and Thai restaurants. That is a lot of information mined from just two words.

Imagine, then, the capability of a vast complex of computer farms unceasingly monitoring and analysing all significant global events for high-frequency traders. This goes well beyond traditional analysis of stock price movements.

Companies with a strong consumer focus on retail, financials, communication and marketing increasingly use data mining to reveal relationships among 'internal' factors they can control (i.e. price, product positioning, staff skills) and 'external' factors that they can't control (i.e. economic indicators, competition, customer demographics). Data mining enables them to 'drill down' into transactional details such as products purchased, dollar value, time of purchase. The larger the number of customers analysed, the more accurate the predictive capability of the models.

A more ominous application of data mining is in security and surveillance. In the United States, the Pentagon reportedly pays a private contractor to mine data on teenagers it can recruit into the military. The Homeland Security Department buys consumer information that it uses to screen people at borders and detect immigration fraud. Many governments are delving into the vast commercial market for consumer information, such as buying habits and financial records. They are tapping into data that would be difficult for governments to amass because most government agencies do not have direct transactions with individuals that would uncover their behaviour.

Strolling in the streets of London, Manhattan, or Shenzhen, you are constantly watched by ubiquitous surveillance cameras. Images from thousands of these cameras are fed into powerful computers and displayed in control centres. A face can be matched with its biometric description in the database and precisely located in a sea of humanity within minutes. In Hong Kong, the acid-thrower whose grainy image was captured by surveillance cameras in Causeway Bay was apprehended by using data-mining techniques.

Data mining raises concerns about the protection of personal information and privacy.

There was the recent controversy over Octopus selling the personal details of MTR passengers to marketing companies for use in product promotion. There is also a risk of the garbage in-garbage out syndrome - extracting information from massive amounts of data can throw up wrong connections between unrelated events.

The vast amounts of data and the complexity of the mathematics, all buried in the bowels of massive computers, gives new meaning to a popular saying: 'To err is human. To really screw up, you need a computer'.

Tom Yam, a Hong Kong-based management consultant, holds a doctorate in electrical engineering and an MBA from the Wharton School of the University of Pennsylvania. He has worked at AT&T, Ernst & Young and IBM