Estimating disease prevalence from preferentially sampled, pooled data

Journal of Data Science
By: , and 

Links

Abstract

After the onset of the COVID-19 pandemic, scientific interest in coronaviruses endemic in animal populations has increased dramatically. However, investigating the prevalence of disease in animal populations across the landscape, which requires finding and capturing animals can be difficult. Spatial random sampling over a grid could be extremely inefficient because animals can be hard to locate, and the total number of samples may be small. Alternatively, preferential sampling, using existing knowledge to inform sample location, can guarantee larger numbers of samples, but estimates derived from this sampling scheme may exhibit bias if there is a relationship between higher probability sampling locations and the disease prevalence. Sample specimens are commonly grouped and tested in pools which can also be an added challenge when combined with preferential sampling. Here we present a Bayesian method for estimating disease prevalence with preferential sampling in pooled presence-absence data motivated by estimating factors related to coronavirus infection among Mexican free-tailed bats (Tadarida brasiliensis) in California. We demonstrate the efficacy of our approach in a simulation study, where a naive model, not accounting for preferential sampling, returns biased estimates of parameter values; however, our model returns unbiased results regardless of the degree of preferential sampling. Our model framework is then applied to data from California to estimate factors related to coronavirus prevalence. After accounting for preferential sampling impacts, our model suggests small prevalence differences between male and female bats.

Study Area

Publication type Article
Publication Subtype Journal Article
Title Estimating disease prevalence from preferentially sampled, pooled data
Series title Journal of Data Science
DOI 10.6339/25-JDS1191
Volume 23
Issue 3
Publication Date June 11, 2025
Year Published 2025
Language English
Publisher School of Statistics and the Center for Applied Statistics, Renmin University of China
Contributing office(s) Northern Rocky Mountain Science Center
Description 18 p.
First page 542
Last page 559
Country United States
State California
Additional publication details