Detecting Overdispersion in Large Scale Surveys: Application to a Study of Education and Social Class in Britain (with Comments)
A practical problem with large scale survey data is the potential for overdispersion. Overdispersion occurs when the data display more variability than is predicted by the variance–mean relationship for the assumed sampling model. This paper describes a simple strategy for detecting and adjusting for overdispersion in large scale survey data. The method is primarily motivated by data on the relationship between social class and educational attainment obtained from a 2% sample from the 1991 census of the population of Great Britain. Overdispersion can be detected by first grouping the data into a number of strata of approximately equal size. Under the assumption that the observations are independent and there is no variability in the parameter of interest, there is a direct relationship between the nominal standard errors and the empirical or sample standard deviation of the parameter estimates obtained from each of the separate strata. With the 2% sample from the British census data, quite a discernible departure from this relationship was found, indicating overdispersion. After allowing for overdispersion, improved and more realistic measures of precision of the strength of the social class–education associations were obtained.