Great things come from small samples

PUBLISHED : Sunday, 29 April, 2012, 12:00am
UPDATED : Sunday, 29 April, 2012, 12:00am


Opinion polling regularly hit the headlines in the run-up to the chief-executive election, and might have influenced the outcome. It's a science that should be better understood, because such 'sample surveys', as statisticians call them, are an important part of democracy.

Regular opinion polls are conducted among samples of Hongkongers. These polls typically survey 600 to 1,200 people, a tiny fraction in a population of seven million. It's no small irony that in an age of massive data mining and analysis by supercomputers for optimum results, such a miniature model can influence momentous decisions affecting entire populations.

'By a small sample we may judge the whole piece,' wrote Miguel de Cervantes (1547-1616), the Spanish author of Don Quixote. Sampling is an age-old method most familiar to us in the chef taking a spoonful of soup to determine its taste, or the brewer needing only a sip of beer to test its quality.

But how can a small sample represent the whole? How can we be sure of its reliability and accuracy? That depends on the statistical method that is used and the degree of confidence that can be estimated.

The most reliable method is random sampling, used by the University of Hong Kong's public opinion programme in conducting surveys prior to the chief-executive election. Random sampling, done properly, gives every Hongkonger of voting age an equal chance of being surveyed. Random selection can be done with various statistical models, the simplest being drawing names from a bin, as in a lottery. Another method is to divide Hong Kong into districts, and randomly draw respondents in each.

In random sampling, the reliability of the result is measured by calculating the margin of error and the confidence interval. Say, for example, 800 people are surveyed in Hong Kong about a politician and 50 per cent of them support him, with a 3 per cent margin of error and 95 per cent confidence. This means that, if the survey is repeated by randomly choosing another 800 respondents in Hong Kong, 95 per cent of the time the result with fall within 3 percentage points of 50 per cent, i.e. between 47 per cent and 53 per cent. In other words, if the exercise is repeated the results will be similar - the essence of well-designed scientific testing.

So when you look at the results of a survey, look also for the margin of error and the confidence interval - without them, the rise or fall of a fraction of a few percentage points could be statistically meaningless. Take the opinion polls that track the popularity of government leaders. A rise in popularity from 50 per cent in January to 50.7 per cent in February, with a 3 per cent margin of error and 95 per cent confidence interval, means that the January poll result could vary between 47 per cent and 53 per cent, while the February result could vary between 47.7 per cent and 53.7 per cent. Both polls fall in the range of 47.7 per cent to 53 per cent. Effectively, no change of statistical significance.

Opinion polls using statistical sampling are good for revealing an explicit preference, such as which electoral candidate is preferred. In surveys prior to the chief executive election, for example, Leung Chun-ying consistently outstripped his rivals by a significant margin.

However, this quantitative approach does not reveal the reasons behind the preference for Leung. To find out why people prefer him, pollsters would have to use a qualitative approach involving non-statistical methods of inquiry and analysis. These methods identify the attitudes, values and concerns of a sample of society, and the results are then extrapolated to reflect those of society at large.

In qualitative surveys, themes and categories emerge through analysis of data collected from focus-group discussions, interviews, observations, videotapes and case studies. The people surveyed are not randomly selected, but are chosen based on some subjective criteria such as sex, income, profession or age. As such, no statistical quantification of margin of error or confidence interval can be analysed.

Qualitative approaches have the advantages of flexibility, in-depth analysis, and the potential to observe various aspects of a social situation. By asking follow-up questions on the spot, a qualitative researcher can gain a deeper understanding of the respondent's beliefs, attitudes or situation. As such, qualitative surveys may be better at studying human systems such as families, organisations and communities, or examining popular attitudes towards social issues, such as the right of abode for domestic workers, mainland women giving birth in Hong Kong, or Article 23 of the Basic Law.

Whether using qualitative or quantitative methods, a large sample size does not mean a more accurate survey. There's a famous example from the US presidential election in 1936, when the magazine Literary Digest surveyed its two million readers and forecast that Democratic candidate Franklin D.Roosevelt would lose. The up-and-coming pollster George Gallup devised a statistically more valid but much smaller sample for his survey, and correctly predicted Roosevelt's victory.

The magazine's fatal error was that its readers were typically middle and upper class, who tended to vote Republican. They were over-represented in the magazine's survey. The Literary Digest ceased publication in 1937. George Gallup went on to become the most famous and successful pollster, with the Gallup poll.

As Hong Kong approaches 2017 and universal suffrage in the next election for chief executive, it is all the more important that popular opinion is reflected scientifically. Just as important are the qualitative methods used to identify the values, attitudes and concerns of Hong Kong as a society.

Tom Yam is a Hong Kong-based management consultant. He holds a doctorate in electrical engineering and an MBA from the Wharton School of the University of Pennsylvania. He has worked at AT&T, Ernst & Young and IBM


You may also like