Survey Quality: You gets what you pay for
Surveys are tricky– how do you know they are really representative of public opinion? Patrick Sturgis, Professor of Research Methodology at the University of Southampton, considers survey methodology and why the 2012 Wellcome Trust Monitor took pains to ensure “gold-standard random sampling”.
A few weeks ago the UK Education Secretary, Michael Gove, got himself into a spot of bother about the evidential basis of his claims that young people in Britain know next to nothing about key historical facts. According to Mr Gove, 20 per cent of young people believe Winston Churchill to be a fictional character, while a slightly higher proportion think Sherlock Holmes was a real person. It subsequently transpired (after some persistence) that these figures were taken from a ‘PR’ poll carried out by the well-known research power-house, UKTV Gold.
And, although no alternative figures have yet been produced to show that Mr Gove is wrong in his diagnosis of youthful historical ignorance, we were left in little doubt by the commentariat about what we should make of the robustness of his evidence. The public is generally unconcerned about dodgy survey methodology when polls are used to inform the marketing strategy of a cable TV channel. Citizens are rightly alarmed, though, if the same evidence is used to justify important changes in public policy.
The hoo-ha around this story begs the question of what we mean by a ‘dodgy’ poll and, by the same token, what constitutes a good one. With the publication of the 2012 Wellcome Trust Monitor survey report, my intention in this short post is to shed some light on how potential users of the Monitor might evaluate the quality and robustness of the survey as an evidence base for policy-making, both within the Trust and externally.
Survey methodologists have given a good deal of thought to the question of how to tell a good survey from a bad one (this is, after all, what we are paid to do) and have come up with a number of different dimensions of survey quality. Unfortunately, the most obvious criterion of whether a survey is any good – how accurate its estimates are – is almost always impossible to assess. This is because, for many population characteristics (such as the proportion of young people who think Churchill is a fictional character), there is no external criterion against which the survey can be validated. Surveys are all we have. And, in the relatively few cases where the ‘true’ value of a population characteristic is independently known, it is difficult to justify the expense of undertaking a survey to estimate it.
Beyond the holy grail of accuracy, then, an important quality criterion is transparency; if you can’t find any information about how the survey was conducted, this a pretty good indication in itself that you should be wary about its findings. It is notable in this context that the information that is available about Mr Gove’s historical facts survey does not appear to extend beyond the name of the organization that conducted it. However, what I suspect primarily underlies unease about the UKTV Gold numbers is that the sample interviewed may not have been representative of all young people in the UK. In particular, it may have over-represented the less well-informed.
How then do we collect a sample in a manner that makes it representative of a target population? The gold-standard approach is to draw the sample at random, so that everyone in the population has an equal (or at least known) probability of being interviewed. When done in this way, we can use long-established principles of statistical theory to draw accurate inferences about the characteristics of the entire population, based just on the sample of people we actually interviewed.
The term ‘gold-standard’ is apposite here because interviewer time is very expensive when scaled up over the thousands of households selected in a conventional survey sample. The reason interviewer time is so expensive in random surveys is that they have to keep calling back at sampled addresses until they get an interview with, or a refusal from, the particular individual selected. They can’t just switch to the more compliant next-door neighbour because this would violate the principle of random selection, only the selected respondent will do. As a result, it is not at all uncommon for interviewers on random surveys to make more than ten calls at a single address over a period of 6 to 8 weeks or more. So, random sampling is a labour-intensive and, therefore, expensive data collection strategy.
And, even with this high level of effort and expense, we usually obtain interviews with only around a half to three quarters of those selected to be in the sample. This ‘nonresponse’ leaves open the possibility that the estimates from our survey will be inaccurate, as a result of differences between the responders and the non-responders.
Given the high cost of probability sampling, then, research commissioners reasonably ask if a different (and less costly) approach could be used instead. The answer to this question is that, yes, surveys can be done more cheaply, primarily by using non-probability sampling methods. However, it should at the same time be acknowledged that any cost savings will be bought at the expense of an increase in the risk of inaccurate estimates.
The conventional approach to non-random sampling, whether done online, on the telephone or face-to-face, is to apply so-called ‘quota-controls’. For face-to-face quota samples, interviews are undertaken within a defined geographical area with anyone willing to be interviewed, subject to the constraint that the final sample must match the population on a set of known characteristics. These characteristics are usually gender, age group and employment status. By conducting interviews until the quotas are filled, we end up with a sample that is representative of the target population on the variables used to set the quotas. Quota sampling can yield very substantial cost-savings relative to probability methods, due to the reduction in expensive interviewer time that it produces.
So, why should survey funders like the Wellcome Trust resist the siren calls of those who argue that the same end product – survey estimates – can be achieved at substantially lower cost by ditching random sampling? Well, for one thing, if a sample is not drawn randomly then we cannot make use of the accepted statistical theory that enables inference from samples to populations. Instead, we must rely on a rather ill-defined set of assumptions about the relationship between the quota control characteristics and the variables measured in the survey. This lack of an underlying theory of inference is not very satisfactory.
Of more practical importance, however, is the greater potential for inaccurate estimates in a quota sample. For, if we are concerned about the accuracy of estimates from surveys with response rates in the range of 50 per cent to 60 per cent, we should be considerably more troubled by a sample for which a response rate is not even recorded (remember the point about transparency?). I am not aware of any published evidence on response rates for face-to-face quota samples and in many respects it does not make sense to try to calculate one. However, anecdotal evidence suggests that an approximation would likely place it at somewhere between 5 per cent and 10 per cent. Not very impressive.
The 2012 Wellcome Trust Monitor uses a gold-standard random sampling methodology which comes at considerably greater cost than commensurate endeavors, such as the RCUK Public Attitudes to Science (PAS) survey, which is non-random and based on quota controls. Does its random sample design guarantee that an estimate from the Monitor will be more accurate than the same variable measured in the PAS? Unfortunately, we will almost certainly never know. But, if asked to wager, I know where my money would be.
Patrick Sturgis is the Principal Investigator for the Wellcome Trust Monitor, Professor of Research Methodology at the University of Southampton and Director of the ESRC National Centre for Research Methods.
Image credit: Flickr/Marco De Cesaris