STATISTICAL ANALYSIS WITH A BAYESIAN APPROACH TO THE HARDY-WEINBERG EQUILIBRIUM

§ ABSTRACT: In population genetics, it is very common to use statistical analysis to test the Hardy-Weinberg genetic equilibrium in a given population. The classical method of approaching this problem is done through the chi-square test that often leads to the verification of the equilibrium hypothesis. In the present work, a Bayesian analysis was developed involving hypothesis testing, estimation and credibility intervals to test this balance. Data on M, MN and N blood groups from the MNS system were used on samples from two populations, one from Brazilians and one from North Americans, obtained by Beiguelman (1977). The HardyWeinberg equilibrium adhesion chi-square test was performed, where the acceptance of the Hardy-Weinberg equilibrium hypothesis was confirmed. By Bayesian analysis, the rejection of the Hardy-Weinberg equilibrium hypothesis was confirmed, mainly by the Bayes factor. Our primary concern was to develop a Bayesian technique as an alternative to testing HardyWeinberg equilibrium using the MNSs blood sample data. The result obtained may encourage researchers mainly in the field of biological sciences to practice Bayesian Methodology, as an alternative in statistical tests.


Introduction
In population genetics, the use of the structure of statistical analysis to test Hardy-Weinberg genetic equilibrium is very common. The classical method of approaching this problem is performed using the chi-square adherence test, which often assumes the hypothesis of the referred genetic balance.
Researchers such as Hogen (1946), Levence (1949), Haldade (1954), Cannings (1969), Smith (1970), Emigh (1975), Elston (1977), and Beiquelman (1973) developed methods to prove or reject the hypothesis of Hardy-Weinberg equilibrium given a population sample using the chi-square adherence test. Rogatko (1985) made a study, through the Bayesian methodology of data analysis, for problems of estimation of the penetrance from genealogies, and the verification of the Hardy-Weinberg equilibrium, in Mendelian populations. Giannoni (1989) conducted an study on quantitative and population genetics to demonstrate and make applications of Hardy-Weinberg equilibrium related to multiple alleles and genes linked to sex, in cases of codominance and complete dominance.
In the present work, a Bayesian analysis was developed involving hypothesis testing, Bayes estimator and credibility intervals to test said balance. Data regarding the M, MN and N blood groups of the MNSs system were used in samples from two populations, one from Brazilians and one from Americans, obtained by Beiguelman (1977). The Hardy-Weinberg equilibrium adhesion test was performed, which confirms the acceptance of the Hardy-Weinberg equilibrium hypothesis. The Bayesian analysis developed confirmed the rejection of the Hardy-Weinberg equilibrium hypothesis, estimated by the Bayes factor. The fundamental objective was to develop a Bayesian analysis technique, as an alternative to test the Hardy-Weinberg equilibrium, thus using blood group sample data from the MNSs system. The result obtained may encourage researchers mainly from the field of biological sciences in the practice of Bayesian methodology, as an alternative in statistical tests.
The frequencies of the M and N alleles in each population are estimated. The Hardy-Weinberg equilibrium adherence test is performed and the Bayesian methodology is applied to test the genetic balance in the same samples used in the Hardy-Weinberg equilibrium adhesion test.

Material and methods
We are using data regarding to the M, MN and N blood groups of the MNSs system, obtained from samples from two populations, one from Brazilians and one from the Americans, respectively, with which the probabilities are determined. Posteriori, under the Hardy-Weinberg hypothesis of equilibrium, as shown in Table I below. The frequencies of the M and N alleles in each population are estimated, and the standard deviation of is calculated using = ) ⁄ where is the sample size. We denote by n1: the absolute frequency of the genotype , : the absolute frequency of the genotype , and : the absolute frequency of the genotype . Thus, if we call the relative frequency of the allele , and the allele , and symbolize the relative frequencies of individuals with genotypes , and by D, H e R, respectively, it can be written that the frequencies and of alleles , and in the generation under study will be: It is known that: You can also write: Thus, in the sample of Brazilians: = 0.55, so, = 0.45 and in the sample of Americans, = 0.562, so, = 0.438.
To calculate the frequencies in these classes two pieces of information are needed, that is, the sample size and the frequency of one of the alleles. So the number of degrees of freedom of the C is equal to the number of expected classes minus the number of information required to calculate the frequencies in these classes, thus, in the present case, that . . = 1.

A posteriori distribution
The choice of a priori distribution is restricted to the class of beta distributions given by: It is deduced that this distribution, with parameters: = 1 e = 5, has a considerable approximation with the distribution of homozygous genotypes , n populations with different allelic frequencies and Hardy-Weinberg equilibrium.
Therefore, a priori to will be: where, 0 ≤ ≤ 1 is the range of variation of the genotypic relative frequency of . A population is in Hardy-Weinberg equilibrium if and only if there is a number , (0 ≤ ≤ 1) such that = 2 , = 2(1 − ) and = (1 − ) 2 , where + + = 1 . Therefore, in the Hardy-Weinberg equilibrium situation the likelihood function is given by with the implications, Let be the parameter = j e kl , we saw above that, if = 4, we have that the hardyweinberg balance is satisfied.
Combining a priori distribution ( |1; 5) with the likelihood function ( | 0 ), the marginal probability function of the data under the Hardy-Weinberg equilibrium hypothesis is given by: Under the hypothesis 1 : ≠ 4, (Hardy-Weinberg's non-equilibrium) likelihood can be expressed as: A priori may be particularized, in relation to the hypothesis of genetic equilibrium, by a beta distribution with parameters = 1 and = 5, imposed by the Hardy-Weinberg equilibrium conditions. We now take as a priori, in relation to the Hardy-Weinberg nonequilibrium hypothesis, a generalized beta probability density function belonging to the Dirichlet class of distributions.
Predictive probability function under 1 , obtained by combining ( | 1 ) with a priori ( 1 ), is given by: The marginal (a posteriori) probabilities of the data, determined under the hypothesis of equilibrium ( 0 | ), are presented in Table 2. The marginal (a posteriori) probabilities of the data under the hypothesis of non-equilibrium 1 are described in Table 3.

Bayes Estimator
We have seen that a situation characterizing Hardy-Weinberg equilibrium is given by the expression. = j e kl , when equals 4, where is a parameter of interest. The Bayes estimator for the parameter is simplified by two facts: If ( , ) has a priori distribution ( B , C , D ), that is, Dirichlet distribution with parameters ( B , C , D ), then it is a posteriori distribution is ( B + B , C + C , D + D ).
If B , C e D are independent gamma variables with parameters ( B + B ; ), ( C + 2; ) and ( D + D ; ) respectively, then € In addition, if( C + C ) > 2 and ( C + C ) > 2, then the a posteriori variance of is given by: This proposition is demonstrated in Rogatko (1985).
Taking the data regarding the M, MN and N blood groups from the MNSs system, obtained from samples from two populations, one from Brazilians and one from Americans, by Beiguelman (1977), we now have the results shown in Table 4, below.

Credibility interval
A Bayesian range with 95% credibility to contain , considering normally distributed, is given by: Based on a posteriori distribution, a credibility interval that has a 95% chance of containing . Then, for the sample of Brazilians, according to the results extracted from Table 4, a credibility interval given by: ( ) (^.•') = 4.39 ± 2.56 ou seja, 1.86 ≤ ≤ 6.91.
It is observed that in the equilibrium condition the parameter really falls within the two credibility ranges calculated above, and the second range is narrower, or more accurate, as it refers to a sample that has, besides the lower a posteriori average, also a much smaller a posteriori variance than the a posteriori variance of the first sample.

Results and discussion
The classical statistics methods require that the sample space of the experiment to be performed be fully known. This requirement comes from the fact that these methods are based on the probability distribution of the data for each parameter value (which is unknown). It turns out that in Population Genetics, the probability distribution for allelic and genotypic frequencies of a population that meets the Hardy-Weinberg equilibrium conditions is not precisely determined.
In Bayesian inference, instead of taking into account all the infinite possible observations that could have occurred but did not, we consider the results actually observed. Moreover, sample size does not limit its application (CANNING, 1969). Gelman (1977) agrees that all statistical methods that use probabilities are subjective in the sense that they are based on mathematical idealizations of the world.
According to Reis (2011), it is assumed that the best Bayesian model to study Hardy-Weinberg equilibrium through the inbreeding coefficient is that which uses Dirichlet a priori distributions.
The researcher Shoemaker (2018) compares the Dirichlet prioris, beta, uniform and uniform step, which confirms a priori Dirichlet, as the best option to study Hardy-Weinberg imbalance.
Although, the choice of a priori distribution for the Hardy-Weinberg equilibrium hypothesis was based on the distribution of homozygous genotypes , in populations with different allelic frequencies and within quadratic proportions, we could also choose an a priori distribution based on the distribution of heterozygous genotypes , or even homozygous genotypes . Therefore, given the value of emos we get the value of , and by the conditions of the said equilibrium , and are related by = 2, = 2 and = 2.
The major relevance for the rejection of the Hardy-Weinberg equilibrium hypothesis is the fact that the posterior distributions for the non-equilibrium hypothesis are greater than the posterior distributions for the equilibrium hypothesis, ( | ) < ( | ).
In the Hardy-Weinberg equilibrium adherence test, the hypothesis of equilibrium, in relation to the same previous samples, was accepted. However, the opposite occurs, that is, the rejection of the equilibrium hypothesis, when the Bayesian methodology is used, a fact that it is an agreement, thus, with the injunction of the evolutionary factors, that is, of those quantities capable of to alter allele frequencies (mutation, natural selection, gene flow and genetic drift). Bayes factors in relation to the Brazilian sample and the North American sample, respectively, were: This fact confirms the rejection of because a Bayes factor equal to 1, or tending to 1, indicates a lower predisposition to accept , rather than .

Conclusions
The rejection of the Hardy-Weinberg equilibrium hypothesis was found, when it is tested by the Bayesian methodology, which leads us to reflect that the Bayesian analysis obtained relatively closer results to the reality of the concrete facts.
The Bayesian analysis methods developed in this paper to verify Hardy-Weinberg equilibrium hypothesis have the advantage that they are applicable to samples of any size.
The Bayesian techniques studied showed significant differences in relation to the chi-square adherence test, commonly used to test the Hardy-Weinberg equilibrium hypothesis.
The Bayesian methods presented were efficient to test the Hardy-Weinberg equilibrium. Their application may serve as a subsidy so that the researcher's decisionmaking is as close to reality as possible.