Fundamentals of AI and Data Science + Health & Biomed Communities: Unrepresentative Big Surveys Significantly Overestimate US Vaccine Uptake

Seth Flaxman, University of Oxford | In collaboration with a statistics seminar

 

29 November 2022, 10:30 
Checkpoint Building 002 
Fundamentals of AI and Data Science + Health & Biomed Communities: Unrepresentative Big Surveys Significantly Overestimate US Vaccine Uptake

Fundamentals of AI and Data Science + Health & Biomed Communities:

Unrepresentative Big Surveys Significantly Overestimate US Vaccine Uptake

In collaboration with a statistics seminar

Seth Flaxman, University of Oxford:

"Unrepresentative Big Surveys Significantly Overestimate US Vaccine Uptake"

 

Abstract:

Surveys are a crucial tool for understanding public opinion and behavior, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the effect of survey bias: an instance of the Big Data Paradox. Here we demonstrate this paradox in estimates of first-dose.

 

COVID-19 vaccine uptake in US adults from 9 January to 19 May 2021 from two large surveys: Delphi–Facebook (about 250,000 responses per week) and Census Household Pulse4 (about 75,000 every two weeks). In May 2021, Delphi–Facebook overestimated uptake by 17 percentage points (14–20 percentage points with 5% benchmark imprecision) and Census Household Pulse by 14 (11–17 percentage points with 5% benchmark imprecision), compared to a retroactively updated benchmark the Centers for Disease Control and Prevention published on 26 May 2021.

 

Moreover, their large sample sizes led to miniscule margins of error on the incorrect estimates. By contrast, an Axios–Ipsos online panel with about 1,000 responses per week following survey research best practices provided reliable estimates and uncertainty quantification.

 

We decompose observed error using a recent analytic framework to explain the inaccuracy in the three surveys. We then analyze the implications for vaccine hesitancy and willingness. We show how a survey of 250,000 respondents can produce an estimate of the population mean that is no more accurate than an estimate from a simple random sample of size 10. Our central message is that data quality matters more than data quantity, and that compensating the former with the latter is a mathematically provable losing proposition.

 

(Bradley et al, Nature 2021, https://www.nature.com/articles/s41586-021-04198-4)

 

Bio: I am an associate professor at the University of Oxford in the Department of Computer Science. My research is on scalable methods and flexible models for spatiotemporal statistics and Bayesian machine learning, applied to public policy and social science. Active application areas include public and global health and machine learning for science. I co-founded the Machine Learning & Global Health Network (MLGH.net) and I help run the WHO-associated "Global Reference Group on Children Affected by COVID-19." My research is currently supported by an EPSRC Fellowship, "Spatiotemporal Statistical Machine Learning (ST-SML): Theory, Methods, and Applications."

Tel Aviv University makes every effort to respect copyright. If you own copyright to the content contained
here and / or the use of such content is in your opinion infringing, Contact us as soon as possible >>