A novel approach to evaluate data integrity: evidence from COVID-19 in China

Main Article Content

Lucas Emanuel de Oliveira Silva
https://orcid.org/0000-0002-5013-6278
Dalson Figueiredo
https://orcid.org/0000-0001-6982-2262

Abstract

The COVID-19 pandemic has generated an unprecedented amount of epidemiological data. Yet, concerns regarding the validity and reliability of the information reported by health surveillance systems have emerged worldwide. In this paper, we develop a novel approach to evaluating data integrity by combining the Newcomb-Benford Law with outlier methods. We demonstrate the advantages of our framework using a case study from China. To ensure more robust findings, we employ multiple diagnostic procedures, including three conformity estimates, four goodness-of-fit tests, and two distance measures (Cook and Mahalanobis). To promote transparency, we have made all computational scripts publicly available. Our findings indicates a significant deviation in the distribution of new deaths from the theoretical expectations of Benford's Law. Importantly, these results remain accurate even when considering alternative model specifications and conducting various statistical tests. Furthermore, the procedures developed here are easily applicable in other areas of knowledge and can be scaled to assess data quality in both the public and private sectors.

Article Details

How to Cite
Emanuel de Oliveira Silva, L., & Figueiredo, D. (2024). A novel approach to evaluate data integrity: evidence from COVID-19 in China. Brazilian Journal of Biometrics, 42(1), 78–87. https://doi.org/10.28951/bjb.v42i1.659
Section
Articles

References

Alvarez, E., Bielska, I. A., Hopkins, S., Belal, A. A., Goldstein, D. M., Slick, J., Pavalagantharajah, S., Wynfield, A., Dakey, S., Gedeon, M.-C., Alam, E. & Bouzanis, K. Limitations of COVID-19 testing and case data for evidence-informed health policy and practice. Health Research Policy and Systems 21 (1), 11 (2023). https://doi.org/10.1186/s12961-023-00963-1

Annaka, S. Political regime, data transparency, and COVID-19 death cases. SSM - Population Health 15, 100832 (2021). https://doi.org/10.1016/j.ssmph.2021.100832

Balashov, V. S., Yan, Y. & Zhu, X. Using the Newcomb–Benford law to study the association between a country’s COVID-19 reporting accuracy and its development. Scientific Reports 11 (1), 22914 (2021). https://doi.org/10.1038/s41598-021-02367-z

Benford, F. The Law of Anomalous Numbers. Proceedings of the American Philosophical Society 78 (4), 551–572 (1938).

Campolieti, M. COVID-19 deaths in the USA: Benford’s law and under-reporting. Journal of Public Health (2021) fdab161. https://doi.org/10.1093/pubmed/fdab161

Cerioli, A., Barabesi, L., Cerasa, A., Menegatti, M. & Perrotta, D. Newcomb–Benford law and the detection of frauds in international trade. Proceedings of the National Academy of Sciences 116, 201806617 (2018). https://doi.org/10.1073/pnas.1806617115

Cho, W. K. T. & Gaines, B. J. Breaking the (Benford) Law. The American Statistician 61 (3), 218–223 (2007). https://doi.org/10.1198/000313007X223496

Cinelli, C. Benford.analysis: Benford Analysis for Data Validation and Forensic Analytics (0.1.5) (2018). https://CRAN.R-project.org/package=benford.analysis

Cohen, J. ‘It’s inexcusable.’ WHO blasts China for not disclosing potential data on COVID-19’s origin [Data set] (2023). https://doi.org/10.1126/science.adh8529

Deckert, J., Myagkov, M. & Ordeshook, P. C. Benford’s Law and the Detection of Election Fraud. Political Analysis 19 (3), 245–268 (2011). https://doi.org/10.1093/pan/mpr014

Druică, E., Oancea, B. & Vâlsan, C. Benford’s law and the limits of digit analysis. International Journal of Accounting Information Systems 31, 75–82 (2018). https://doi.org/10.1016/j.accinf.2018.09.004

Durtschi, C., Hillison, W. & Pacini, C. The effective use of Benford’s Law to assist in detecting fraud in accounting data. (2004) https://digitalcommons.usf.edu/fac_publications/939

Filho, D. F., Silva, L., Pires, A., & Malaquias, C. Living with outliers: How to detect extreme observations in data analysis. BIB - Revista Brasileira de Informação Bibliográfica Em Ciências Sociais, 99, 1-24 (2023). https://bibanpocs.emnuvens.com.br/revista/article/view/619

Figueiredo Filho, D., Silva, L. & Medeiros, H. “Won’t get fooled again”: Statistical fault detection in COVID-19 Latin American data. Globalization and Health 18 (1), 105 (2022). https://doi.org/10.1186/s12992-022-00899-1

Formann, A. K. The Newcomb-Benford Law in Its Relation to Some Common Distributions. PLOS ONE 5(5), e10541 (2010). https://doi.org/10.1371/journal.pone.0010541

Hicken, A. & Mebane, W. A Guide to Election Forensics (2015). https://pdf.usaid.gov/pdf_docs/PA00MXR7.pdf

Hill, T. P. Base-Invariance Implies Benford’s Law. Proceedings of the American Mathematical Society 123 (3), 887–895 (1995). https://doi.org/10.2307/2160815

Hodge, V. & Austin, J. A survey of outlier detection methodologies. Artificial Intelligence Review 22 (2), 85–126 (2004).

Horton, J., Krishna Kumar, D. & Wood, A. Detecting academic fraud using Benford law: The case of Professor James Hunton. Research Policy 49 (8), 104084 (2020). https://doi.org/10.1016/j.respol.2020.104084

Idrovo, A. J. & Manrique-Hernández, E. F. Data Quality of Chinese Surveillance of COVID-19: Objective Analysis Based on WHO’s Situation Reports. Asia Pacific Journal of Public Health 32 (4), 165–167 (2020). https://doi.org/10.1177/1010539520927265

Ivorra, B., Ferrández, M. R., Vela-Pérez, M. & Ramos, A. M. Mathematical modeling of the spread of the coronavirus disease 2019 (COVID-19) taking into account the undetected infections. The case of China. Communications in Nonlinear Science and Numerical Simulation 88, 105303 (2020). https://doi.org/10.1016/j.cnsns.2020.105303

Joenssen, D. W. & Muellerleile, T. BenfordTests: Statistical Tests for Evaluating Conformity to Benford’s Law (1.2.0) (2015). https://CRAN.R-project.org/package=BenfordTests

Kaiser, M. Benford’s Law as an Indicator of Survey Reliability—Can We Trust Our Data? Journal of Economic Surveys 33 (5), 1602–1618 (2019). https://doi.org/10.1111/joes.12338

Kennedy, A. P. & Yam, S. C. P. On the authenticity of COVID-19 case figures. PLOS ONE 15(12), e0243123 (2020). https://doi.org/10.1371/journal.pone.0243123

Kolias, P. Applying Benford’s law to COVID-19 data: The case of the European Union. Journal of Public Health 44 (2), e221–e226 (2022). https://doi.org/10.1093/pubmed/fdac005

Lacasa, L. & Fernández-Gracia, J. Election Forensics: Quantitative methods for electoral fraud detection. Forensic Science International 294, e19–e22 (2019). https://doi.org/10.1016/j.forsciint.2018.11.010

Mathieu, E., Ritchie, H., Rodés-Guirao, L., Appel, C., Giattino, C., Hasell, J., Macdonald, B., Dattani, S., Beltekian, D., Ortiz-Ospina, E., & Roser, M. Coronavirus Pandemic (COVID-19). Our World in Data (2020). https://ourworldindata.org/coronavirus

Mebane, W. R. Comment on “Benford’s Law and the Detection of Election Fraud.” Political Analysis 19 (3), 269–272 (2011). https://doi.org/10.1093/pan/mpr024

Mi, Y., Huang, T., Zhang, J., Qin, Q., Gong, Y., Liu, S., Xue, H., Ning, C., Cao, L. & Cao, Y. Estimating the instant case fatality rate of COVID-19 in China. International Journal of Infectious Diseases 97, 1–6 (2020). https://doi.org/10.1016/j.ijid.2020.04.055

Mir, T. A. The Benford law behavior of the religious activity data. Physica A: Statistical Mechanics and Its Applications 408, 1–9 (2014). https://doi.org/10.1016/j.physa.2014.03.074

Natashekara, K. COVID-19 cases in India and Kerala: A Benford’s law analysis. Journal of Public Health 44(2), e287–e288 (2022). https://doi.org/10.1093/pubmed/fdab199

Neumayer, E. & Plümper, T. (2022). Does ‘Data fudging’ explain the autocratic advantage? Evidence from the gap between Official Covid-19 mortality and excess mortality. SSM - Population Health 19, 101247 (2022). https://doi.org/10.1016/j.ssmph.2022.101247

Newcomb, S. Note on the Frequency of Use of the Different Digits in Natural Numbers. American Journal of Mathematics, 4 (1), 39–40 (1881). https://doi.org/10.2307/2369148

Nigrini, M. J. Benford’s Law: Applications for Forensic Accounting, Auditing, and Fraud Detection (Wiley, New York, 2012).

Pericchi, L. & Torres, D. Quick Anomaly Detection by the Newcomb—Benford Law, with Applications to Electoral Processes Data from the USA, Puerto Rico and Venezuela. Statistical Science, 26 (4), 502–516 (2011).

Power, J. Why China’s COVID figures don’t add up (2023). https://www.aljazeera.com/economy/2022/12/9/why-chinas-covid-figures-are-hard-to-take-at-face-value

Sambridge, M. & Jackson, A. National COVID numbers—Benford’s law looks for errors. Nature 581 (7809), 384–384 (2020). https://doi.org/10.1038/d41586-020-01565-5

Sambridge, M., Tkalčić, H. & Jackson, A. Benford’s law in the natural sciences. Geophysical Research Letters 37 (22) (2010). https://doi.org/10.1029/2010GL044830

Shao, L. & Ma, B.-Q. Empirical mantissa distributions of pulsars. Astroparticle Physics 33 (4), 255–262 (2010). https://doi.org/10.1016/j.astropartphys.2010.02.003

Silva, L. & Figueiredo Filho, D. (2020). Using Benford’s law to assess the quality of COVID-19 register data in Brazil. Journal of Public Health 43 (1), 107–110 (2021) https://doi.org/10.1093/pubmed/fdaa193.

Stanway, D. & Lapid, N. Analysis: How accurate are China’s COVID-19 death numbers? (2022). https://www.reuters.com/business/healthcare-pharmaceuticals/how-accurate-are-chinas-covid-19-death-numbers-2022-12-22/

WHO. (2023, January). WHO welcomes data on COVID-19 in China, meeting with Minister. https://www.who.int/news/item/14-01-2023-who-welcomes-data-on-covid-19-in-china--meeting-with-minister

Wu, J., Wang, J., Nicholas, S., Maitland, E., & Fan, Q. Application of Big Data Technology for COVID-19 Prevention and Control in China: Lessons and Recommendations. J Med Internet Res 22 (10): e21980 (2020).