A novel approach to evaluate data integrity: evidence from COVID-19 in China
Main Article Content
Abstract
The COVID-19 pandemic has generated an unprecedented amount of epidemiological data. Yet, concerns regarding the validity and reliability of the information reported by health surveillance systems have emerged worldwide. In this paper, we develop a novel approach to evaluating data integrity by combining the Newcomb-Benford Law with outlier methods. We demonstrate the advantages of our framework using a case study from China. To ensure more robust findings, we employ multiple diagnostic procedures, including three conformity estimates, four goodness-of-fit tests, and two distance measures (Cook and Mahalanobis). To promote transparency, we have made all computational scripts publicly available. Our findings indicates a significant deviation in the distribution of new deaths from the theoretical expectations of Benford's Law. Importantly, these results remain accurate even when considering alternative model specifications and conducting various statistical tests. Furthermore, the procedures developed here are easily applicable in other areas of knowledge and can be scaled to assess data quality in both the public and private sectors.
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
References
Alvarez, E., Bielska, I. A., Hopkins, S., Belal, A. A., Goldstein, D. M., Slick, J., Pavalagantharajah, S., Wynfield, A., Dakey, S., Gedeon, M.-C., Alam, E. & Bouzanis, K. Limitations of COVID-19 testing and case data for evidence-informed health policy and practice. Health Research Policy and Systems 21 (1), 11 (2023). https://doi.org/10.1186/s12961-023-00963-1
Annaka, S. Political regime, data transparency, and COVID-19 death cases. SSM - Population Health 15, 100832 (2021). https://doi.org/10.1016/j.ssmph.2021.100832
Balashov, V. S., Yan, Y. & Zhu, X. Using the Newcomb–Benford law to study the association between a country’s COVID-19 reporting accuracy and its development. Scientific Reports 11 (1), 22914 (2021). https://doi.org/10.1038/s41598-021-02367-z
Benford, F. The Law of Anomalous Numbers. Proceedings of the American Philosophical Society 78 (4), 551–572 (1938).
Campolieti, M. COVID-19 deaths in the USA: Benford’s law and under-reporting. Journal of Public Health (2021) fdab161. https://doi.org/10.1093/pubmed/fdab161
Cerioli, A., Barabesi, L., Cerasa, A., Menegatti, M. & Perrotta, D. Newcomb–Benford law and the detection of frauds in international trade. Proceedings of the National Academy of Sciences 116, 201806617 (2018). https://doi.org/10.1073/pnas.1806617115
Cho, W. K. T. & Gaines, B. J. Breaking the (Benford) Law. The American Statistician 61 (3), 218–223 (2007). https://doi.org/10.1198/000313007X223496
Cinelli, C. Benford.analysis: Benford Analysis for Data Validation and Forensic Analytics (0.1.5) (2018). https://CRAN.R-project.org/package=benford.analysis
Cohen, J. ‘It’s inexcusable.’ WHO blasts China for not disclosing potential data on COVID-19’s origin [Data set] (2023). https://doi.org/10.1126/science.adh8529
Deckert, J., Myagkov, M. & Ordeshook, P. C. Benford’s Law and the Detection of Election Fraud. Political Analysis 19 (3), 245–268 (2011). https://doi.org/10.1093/pan/mpr014
Druică, E., Oancea, B. & Vâlsan, C. Benford’s law and the limits of digit analysis. International Journal of Accounting Information Systems 31, 75–82 (2018). https://doi.org/10.1016/j.accinf.2018.09.004
Durtschi, C., Hillison, W. & Pacini, C. The effective use of Benford’s Law to assist in detecting fraud in accounting data. (2004) https://digitalcommons.usf.edu/fac_publications/939
Filho, D. F., Silva, L., Pires, A., & Malaquias, C. Living with outliers: How to detect extreme observations in data analysis. BIB - Revista Brasileira de Informação Bibliográfica Em Ciências Sociais, 99, 1-24 (2023). https://bibanpocs.emnuvens.com.br/revista/article/view/619
Figueiredo Filho, D., Silva, L. & Medeiros, H. “Won’t get fooled again”: Statistical fault detection in COVID-19 Latin American data. Globalization and Health 18 (1), 105 (2022). https://doi.org/10.1186/s12992-022-00899-1
Formann, A. K. The Newcomb-Benford Law in Its Relation to Some Common Distributions. PLOS ONE 5(5), e10541 (2010). https://doi.org/10.1371/journal.pone.0010541
Hicken, A. & Mebane, W. A Guide to Election Forensics (2015). https://pdf.usaid.gov/pdf_docs/PA00MXR7.pdf
Hill, T. P. Base-Invariance Implies Benford’s Law. Proceedings of the American Mathematical Society 123 (3), 887–895 (1995). https://doi.org/10.2307/2160815
Hodge, V. & Austin, J. A survey of outlier detection methodologies. Artificial Intelligence Review 22 (2), 85–126 (2004).
Horton, J., Krishna Kumar, D. & Wood, A. Detecting academic fraud using Benford law: The case of Professor James Hunton. Research Policy 49 (8), 104084 (2020). https://doi.org/10.1016/j.respol.2020.104084
Idrovo, A. J. & Manrique-Hernández, E. F. Data Quality of Chinese Surveillance of COVID-19: Objective Analysis Based on WHO’s Situation Reports. Asia Pacific Journal of Public Health 32 (4), 165–167 (2020). https://doi.org/10.1177/1010539520927265
Ivorra, B., Ferrández, M. R., Vela-Pérez, M. & Ramos, A. M. Mathematical modeling of the spread of the coronavirus disease 2019 (COVID-19) taking into account the undetected infections. The case of China. Communications in Nonlinear Science and Numerical Simulation 88, 105303 (2020). https://doi.org/10.1016/j.cnsns.2020.105303
Joenssen, D. W. & Muellerleile, T. BenfordTests: Statistical Tests for Evaluating Conformity to Benford’s Law (1.2.0) (2015). https://CRAN.R-project.org/package=BenfordTests
Kaiser, M. Benford’s Law as an Indicator of Survey Reliability—Can We Trust Our Data? Journal of Economic Surveys 33 (5), 1602–1618 (2019). https://doi.org/10.1111/joes.12338
Kennedy, A. P. & Yam, S. C. P. On the authenticity of COVID-19 case figures. PLOS ONE 15(12), e0243123 (2020). https://doi.org/10.1371/journal.pone.0243123
Kolias, P. Applying Benford’s law to COVID-19 data: The case of the European Union. Journal of Public Health 44 (2), e221–e226 (2022). https://doi.org/10.1093/pubmed/fdac005
Lacasa, L. & Fernández-Gracia, J. Election Forensics: Quantitative methods for electoral fraud detection. Forensic Science International 294, e19–e22 (2019). https://doi.org/10.1016/j.forsciint.2018.11.010
Mathieu, E., Ritchie, H., Rodés-Guirao, L., Appel, C., Giattino, C., Hasell, J., Macdonald, B., Dattani, S., Beltekian, D., Ortiz-Ospina, E., & Roser, M. Coronavirus Pandemic (COVID-19). Our World in Data (2020). https://ourworldindata.org/coronavirus
Mebane, W. R. Comment on “Benford’s Law and the Detection of Election Fraud.” Political Analysis 19 (3), 269–272 (2011). https://doi.org/10.1093/pan/mpr024
Mi, Y., Huang, T., Zhang, J., Qin, Q., Gong, Y., Liu, S., Xue, H., Ning, C., Cao, L. & Cao, Y. Estimating the instant case fatality rate of COVID-19 in China. International Journal of Infectious Diseases 97, 1–6 (2020). https://doi.org/10.1016/j.ijid.2020.04.055
Mir, T. A. The Benford law behavior of the religious activity data. Physica A: Statistical Mechanics and Its Applications 408, 1–9 (2014). https://doi.org/10.1016/j.physa.2014.03.074
Natashekara, K. COVID-19 cases in India and Kerala: A Benford’s law analysis. Journal of Public Health 44(2), e287–e288 (2022). https://doi.org/10.1093/pubmed/fdab199
Neumayer, E. & Plümper, T. (2022). Does ‘Data fudging’ explain the autocratic advantage? Evidence from the gap between Official Covid-19 mortality and excess mortality. SSM - Population Health 19, 101247 (2022). https://doi.org/10.1016/j.ssmph.2022.101247
Newcomb, S. Note on the Frequency of Use of the Different Digits in Natural Numbers. American Journal of Mathematics, 4 (1), 39–40 (1881). https://doi.org/10.2307/2369148
Nigrini, M. J. Benford’s Law: Applications for Forensic Accounting, Auditing, and Fraud Detection (Wiley, New York, 2012).
Pericchi, L. & Torres, D. Quick Anomaly Detection by the Newcomb—Benford Law, with Applications to Electoral Processes Data from the USA, Puerto Rico and Venezuela. Statistical Science, 26 (4), 502–516 (2011).
Power, J. Why China’s COVID figures don’t add up (2023). https://www.aljazeera.com/economy/2022/12/9/why-chinas-covid-figures-are-hard-to-take-at-face-value
Sambridge, M. & Jackson, A. National COVID numbers—Benford’s law looks for errors. Nature 581 (7809), 384–384 (2020). https://doi.org/10.1038/d41586-020-01565-5
Sambridge, M., Tkalčić, H. & Jackson, A. Benford’s law in the natural sciences. Geophysical Research Letters 37 (22) (2010). https://doi.org/10.1029/2010GL044830
Shao, L. & Ma, B.-Q. Empirical mantissa distributions of pulsars. Astroparticle Physics 33 (4), 255–262 (2010). https://doi.org/10.1016/j.astropartphys.2010.02.003
Silva, L. & Figueiredo Filho, D. (2020). Using Benford’s law to assess the quality of COVID-19 register data in Brazil. Journal of Public Health 43 (1), 107–110 (2021) https://doi.org/10.1093/pubmed/fdaa193.
Stanway, D. & Lapid, N. Analysis: How accurate are China’s COVID-19 death numbers? (2022). https://www.reuters.com/business/healthcare-pharmaceuticals/how-accurate-are-chinas-covid-19-death-numbers-2022-12-22/
WHO. (2023, January). WHO welcomes data on COVID-19 in China, meeting with Minister. https://www.who.int/news/item/14-01-2023-who-welcomes-data-on-covid-19-in-china--meeting-with-minister
Wu, J., Wang, J., Nicholas, S., Maitland, E., & Fan, Q. Application of Big Data Technology for COVID-19 Prevention and Control in China: Lessons and Recommendations. J Med Internet Res 22 (10): e21980 (2020).