INCIDENCE AND LETHALITY OF COVID-19 CLUSTERS IN BRAZIL VIA CIRCULAR SCAN METHOD

The COVID-19 pandemic has spread rapidly around the world in a frightening way. In Brazil, the third country with the highest number of infected and deaths from the disease, it is important for government health authorities to identify the federation units that stand out in cases and deaths from this disease to target resources. The circular scan statistic proposed by Martin Kulldorff allows to identify with some statistical significance the units of the federation that stand out in relation to the number of cases and deaths of COVID-19 in Brazil. Such units of federation are known as clusters. Once these clusters were identified, we used the coefficients of incidence and lethality to better describe the behavior of these clusters during three phases of the pandemic: the initial phase, the peak phase, and also the stability and fall phase. We observed changes in the location of the clusters identified in these three phases and used the R software and also the SaTScan software to obtain the maps and results, which were consistent with what was reported by the Brazilian media.


Introduction
At the end of 2019, a local outbreak of SARS-CoV-2 pneumonia was identified in Wuhan (Hubei, China), a new coronavirus. Since then, contagion has spread to all provinces in mainland China and 27 other countries and regions, with more than 70, 000 cases confirmed as the 17 February date of 2020 (DONG et al., 2020).
Coronaviruses (CoV) are a sizeable viral family known around the mid-1960s, causing respiratory infections in humans and animals. These viruses are large, enveloped, positive-stranded RNA that can be divided into four genera: alpha, beta, delta, and gamma, of which CoVs alpha and beta infect humans (WILD et al., 2017;FERNANDES et al., 2020). This infection causes significant damage to public health, which is aggravated by the lack of specific vaccines and medications to treat COVID-19 (KHAILANY et al., 2020;NASSIRI , 2020;VELAVAN AND MEYER , 2020;SALATHÉ et al., 2020). Besides, many critically ill patients need hospital assistance and mechanical respirators, so there is a need for social distance, and these characteristics made COVID-19 quickly considered a pandemic by the WHO (SALATHÉ et al., 2020). Cordes et al. (2020) used the scan statistic to identify clusters in which COVID-19 stood out in New York City, aiming to identify clusters with high and low positive rates and also a high positive proportion of COVID-19 infection. Hohl et al. (2020) used the circular scan statistic proposed by Martin Kulldorff in the U.S. using the proposed Poisson temporal space model.
In Brazil, the first documented case was in the city of São Paulo at 26 in February of 2020 and spread to reach 27 federal units (F.U.'S). Since then, until the 16 October 2020, 5, 169, 386 cases were confirmed and 152, 460 deaths in national territory (SAÚDE , 2020). Given the above, it is evident that Brazil's situation is worrying. Action to combat rapid contagion and a detailed survey of how the pandemic took place must be carried out in the country. Isolated measures to combat COVID-19 can and should be implemented by organs responsible bodies in each Brazilian F.U.. However, the population heterogeneity of each Brazilian F.U., in addition to the number of beds in intensive care units (ICU) available for emergency care and also the socio-economic vulnerability conditions, make it difficult to indicate which of these F.U.'S needs more resources to combat COVID-19.
One strategy is to identify F.U.'S, whose occurrence of any factor related to this pandemic is inconsistent with the others, which can be just one F.U. or a set of F.U.'S connected, as was done in the work of (ALVES et al., 2020). In this sense, spatial methods of cluster detection become useful tools as they aim to identify different patterns of spatial association (PINTO et al., 2014). Some of these methods can be seen in Choynowsky (1959); Naus (1965a,b); Clayton and Kaldor (1987); Whittemore et al. However, we highlight the proposed circular scan method Kulldorff (1997), and, as an objective described here, we use this method to find the most plausible cluster concerning COVID-19 among the Brazilian F.U.'S, in three phases: the initial stage, the peak period, and the degree of stability and fall of the disease in Brazil. We also calculated the coefficients of incidence and lethality for each of the clusters detected in each phase considered.

Kulldorff 's circular scan method
To assign statistical significance to a cluster detection method, Kulldorff (1997) proposed a statistical significance test that has been widely used. The idea is to build circles and gradually increase its radius. For each new circle created, the number of cases and the population at risk are updated, calculating a likelihood ratio involving the relative risk inside and outside the zone, according to some pre-established probability model. The circle radius variation is "pseudo-arbitrary", being limited to up to 50% of the total map population. The most likely cluster will be the one formed by the areas whose likelihood ratio is maximum. There are several probability distribution models associated with this test statistic: Poisson, Bernoulli, Binomial, Hypergeometric, Normal, and space-time permutation. Here, as we are dealing with new case counts and further deaths from COVID-19, a natural choice would be the Poisson model.

The Poisson model
Consider the problem of testing the following hypotheses associated with the dimension of the space: where p is the probability that an individual will be a case of some specific characteristic within a zone and q is and the probability that an individual will be a case of that same characteristic outside that zone. Thus, the null hypothesis is equivalent to the statement that the number of cases is randomly distributed in the region of interest. The alternative hypothesis, on the other hand, confronts the existence of this randomness in at least one area from the same region. Consider N to be the total number of individuals in the region of interest and have the total number of cases in that region be denoted by C. Also, let Z be the total number of all possible zones that can be formed in that region of interest and make z be a specific zone. Let n z and c z be the number of individuals and the number of cases of some characteristic within the zone z, respectively, and make µ z be the number of cases expected within the zone z. Under H 0 , we have µ z = pn z as the number of cases expected within the zone z and, consequently, µz = p(N − n z ) as the number of cases expected outside the z zone. Thus, according to Kulldorff (1997), where we adapt some notations, the support function under the restricted parametric space is denoted by: In this case, the log-likelihood is given by whose estimator isp = C/N . Substituting in the expression (2), we get Under H 1 , where a case is more likely to occur in any area of the region of interest, the expected number of cases within the z zone will be mu z = pn z and the expected number of cases outside that zone same zone will be µz = q(N − n z ). Thus, the support function under the unrestricted parametric space is given by the expression and the log-likelihood is given by the following expression whose estimators arep = c z /n z andq = (C − c z )/(N − n z ). Substituting in the expression (5), we get the expression Make RR = c z /µ z and rr = (C −c z )/(C −µ z ). Thus, the likelihood ratio, according to Kulldorff (1997), where we adapted some notations, is given by the following expression We are only interested in high-rate clusters and, therefore, this is the justification for the case where λ = 1. This method helps us to find the most likely cluster, which does not always correspond to the actual cluster, making this test statistic provide an estimate for the position and radius of the actual cluster.

Algorithm for cluster detection using the Kulldorff method
To detect the most likely cluster, choose the zoneẑ that maximizes λ, as in the expression (8). However, its distribution is unknown and is then obtained via Monte Carlo simulation. The algorithm for detecting clusters follows the following steps: 1. a centroid is chosen in each region of the map; 2. the distance between the chosen centroid and the other centroids is calculated, sorting them in ascending order and storing them in a vector; 3. for each centroid of the study region, steps 1 and 2 are repeated; 4. From that chosen centroid a circular window is then created, and its radius is continuously increased according to the distances obtained in step 2; 5. each time a new centroid is reached, the number of cases and the population at risk within the circle (zone) z is updated; 6. then the value of the test statistic λ is calculated, as in the expression (8), for each pair (c z , n z ); 7. the zone with the highest value of λ is stored and this process is repeated until the threshold for the radius of that circle is reached; 8. repeat steps 3 − 6 for each of the centroids on the map; 9. the significance of the test is obtained via Monte Carlo simulations.
The spatial scan statistic applied has some limitations regarding its use, such as cluster overlap and even the choice of the optimal size for the scanning radius (the Gini coefficient is a possible tool to correct this problem, see (HAN, JUNHEE et al., 2016)). However, as stated by Assunção, there is no better technique for detecting spatial clusters than the technique described here (COSTA et al. , 2012).
Although an important temporal division is considered, the study carried out here is very simple due to the various application alternatives, such as, for example, the use of the Bernoulli model considering the space-time interaction and also the inclusion of qualitative covariates in the study, we believe that other applications specifically for this data would not be justified, accorded Kulldorff (1997). First, because the use of Bernoulli's spatio-temporal model requires the condition that the number of cases from the previous (prospective) or later (retrospective) period be attributed to the population at risk. In the first case, for the nature of the data in question, it would be inconsistent to use such a model, especially in the peak period of the disease, where the number of cases and deaths tends to increase a lot every day, thus allowing the future number of cases to be higher. than the population at risk, which does not make sense. In the second case, it is not in our interest to evaluate in this way, because what we want is to evaluate the evolution of the disease. Second, the inclusion of qualitative covariates, such as length of stay, severe respiratory disease, smoking, comorbidities, among others, in this study is not possible due to lack of information. The competent bodies are not making this data available.

Diagnostic epidemiological measures
It is of great importance for the bodies responsible for public health in some locality to know the rate of manifestation of a specific disease (incidence) to assess situations of cause and effect and the number of people who were affected by such disease (lethality) by estimating the severity of the disease. We calculated these coefficients for the weighted detected clusters for each 100, 000 inhabitants.

The incidence coefficient
The incidence coefficient (IC) expresses the risk of new cases of disease occurring in a population over a period of time. Its expression is given by where |NC| expresses the new cases and |POP| the population at risk.

The lethality coefficient
The lethality coefficient (LC) provides the risk of death from a disease in a population over a period of time. Its expression is given by where |CD| expresses accumulated deaths and |CAS| the number of accumulated cases in this population.

Results
In this section, we present our results and a discussion of our perceptions through these results.

The dataset
The data under study are available from the Ministry of Health of Brazil and can be found at https://susanalitico.saude.gov.br/extensions/covid-19_ html/covid-19_html.html. We chose to analyze the evolution of COVID-19 in Brazil, by F.U., in three different periods: 2020-02-04 to 2020-04-01, 2020-04-02 to 2020-07-01, and 2020-07-02 to 2020-09-15. We justify our choice for three reasons: onset of contagion of the disease, peak of contagion of the illness, and phase of stability and fall of the disease, in Brazil (see Figure 1).  We only considered the data counted on the last day of each period: 2020-04-01, 2020-07-01, and 2020-09-15, respectively, for the variables number of new cases and new deaths. We justify the choice of these dates because we want to identify, at the end of each period, via the circular scan method, the most plausible cluster. We then use the circular scan method proposed by Kulldorff (1997) to find the most plausible cluster for the occurrence of new cases and new deaths from COVID-19, where we choose the maximum radius size of this circle up to 50% of the size of the total population of the map.

The identified clusters
Unlike what was presented by Cordes et al. (2020) and Hohl et al. (2020), we seek to identify clusters of new infection cases and new deaths by COVID-19 in Brazilian F.U.'S. Besides, we calculate the incidence and lethality coefficients in the detected clusters. The objective is to describe how worrying the situation is in these clusters. We believe that these coefficients are an indicator of Organ's competent bodies' intervention with public health policies.
4.2.1 Initial period of the disease: 2020-02-04 to 2020-04-01 In the period of initial disease contagion, Figure 2   We mentioned earlier that we are considering three distinct periods of study: the initial period of contagion, the picon period of contagion of the disease, and the period of stability and fall. Then, we used Kuldorff's circular scan method considering the disease's peak period to verify if the cluster detected in that period remains.
4.2.2 Peak illness period: 2020-04-02 to 2020-07-01 We noted in Figures 3(a) and 3(b) that the most plausible cluster by the circular scan method is now the state of Roraima (14)    We then applied the same method in the third period of infection of the disease: stability and fall.

Period of stability and fall of the disease: 2020-07-02 to 2020-09-15
In the period of stability and fall of the disease, Figures 4(a) and 4(b) shows the spatial distribution of the primary cluster found for new cases and also for further deaths, and these clusters are formed by F.U.'S Rondônia (11), Mato Grosso (51)  We then calculated the coefficients of incidence and mortality for each of the clusters detected in each evaluated period. We emphasize that, although clusters are detected, this does not mean that the manifestation (incidence) and severity (lethality) of COVID-19 are the most serious in the Brazilian FU's. We only evaluated these measures in the detected clusters.

The coefficients: incidence and lethality.
Recently, (GOUVEIA et al., 2020) analyzed the incidence and lethality of COVID-19 in Brazil through a historical series. The results showed an increasing trend of incidence in all states, except Ceará, and the highest lethality found was in the state of Piauí. (MACIEL et al., 2020) analyzed the spatial distribution of the incidence of COVID-19 in Brazil and correlated it with the municipal human development index (MHDI) of the municipalities in Ceará.
Unlike, we, once we identified distinct clusters in the three periods mentioned above, to verify these clusters' behavior in each of these periods, we calculated the coefficients of incidence and lethality (see section 3) each contagion period of COVID-19 for the respective detected clusters. The justification for choosing these coefficients is that they measure the rate of manifestation of the disease (incidence) and also the cause and effect situations by estimating the severity of the illness (lethality) (see section 3). Barbosa et al. (2020) evaluated Brazilian FU's that had up to 50 deaths from COVID-19 regarding the accumulated incidence, mortality, and lethality among the elderly Brazilian population, in the period between the beginning of the pandemic in Brazil until 2020-05-25. The variables considered were services and health professionals' offer, demographic, income, and development indicators. As a result, the authors concluded that Pará (15) had the highest accumulated incidence and mortality rates (763.37, 219.06) among the elderly, for every 100, 000 inhabitants. The highest accumulated mortality rates were observed in Bahia (29), Rio de Janeiro (33), and Pernambuco (26).
We, in the same period, according to Table 1, considered all Brazilian federative units and not only the elderly population, but all individuals at risk, including children, and observed that São Paulo (35), which was the cluster detected in this period via circular scan statistics, showed an incidence of 1.39 and lethality of 0.06 for every 100, 000 inhabitants. Unlike Barbosa et al. (2020), we went ahead and evaluated the incidence and lethality in the clusters detected in the other periods: peak and stability and fall. Table 1 shows that Roraima (14), which was the cluster detected during the peak phase of contagion of the disease (2020-07-01), had the highest coefficients for each 100, 000) inhabitants. The incidence coefficient for that cluster in that period was 384, 9926, indicating that for every 100, 000 individuals at risk, 385 became new cases of COVID-19. This cluster's lethality coefficient was 6.3373 considering only new deaths, indicating that for every 100, 000 individuals at risk, approximately 7 individuals died daily. Both coefficients were obtained at the end of the peak contagion period for this disease.

Conclusions
The circular scan method proposed by Martin Kulldorff corresponded to the detection of the most plausible clusters in each period considered. We note that these clusters were consistent with what was reported by Brazilian media and also government health agencies throughout each period considered. We recommend using this method to detect clusters emerging from georeferenced phenomena of any nature.