RESIDUAL ANALYSIS IN RASCH POISSON COUNTS MODELS

A Rasch Poisson counts (RPC) model is described to identify individual latent traits and facilities of the items of tests that model the error (or success) count in several tasks over time, instead of modeling the correct responses to items in a test as in the dichotomous item response theory (IRT) model. These types of tests can be more informative than traditional tests. To estimate the model parameters, we consider a Bayesian approach using the integrated nested Laplace approximation (INLA). We develop residual analysis to assess model fit by introducing randomized quantile residuals for items. The data used to illustrate the method comes from 228 people who took a selective attention test. The test has 20 blocks (items), with a time limit of 15 seconds for each block. The results of the residual analysis of the RPC were promising and indicated that the studied attention data are not well fitted by the RPC model.


Introduction
In the context of school evaluation, item response theory (IRT) models are a set of probabilistic models where latent characteristics of individuals who take a test and latent characteristics of the items in the test are considered to explain the answers obtained (BAZÁN, 2018). The best-known IRT models are those where the answer is dichotomous, for example the so-called three-parameter IRT model used in Brazil's National High School Exam (ENEM in the Portuguese acronym), a standard non-mandatory exam that evaluates high school students in Brazil.
The model considers three characteristics of the items: difficulty, discrimination and guessing. These are called parameters of the items and need to be estimated along with the characteristics of the individuals, called latent traits. Particular cases of this model are the so-called two-parameter models (only the difficulty and discrimination parameters are considered) and the models with one parameter (which considers only the difficulty parameter) (HAMBLETON and SWAMINATHAN, 2013).
The one-parameter model, called the Rasch model, was originally formulated by George Rasch (1960) and is derived considering another approach. In this case, the model considers the probabilistic distribution as well as the response variable and belongs to the exponential family (VERHELST and GLAS, 1995). Thus, the Rasch model, through its additive specification, can be viewed as a generalized linear mixed model (GLMM), an important class of regression models (WANG; YUE;FARAWAY, 2018). Therefore, methods to fit GLMMs and diagnose them can also be applied to this model (DOEBLER and HOLLING, 2016).
Currently, it is more frequent to observe count responses in student assessment. For example, consider an assessment where the task is to identify correctly spelled words in a long list of words. Hence, responses to the items correspond to the total scores (counts) or the total number of errors. In these cases, there is also a need to develop IRT models for count responses. Fortunately, a model with these characteristics was formulated by George Rasch (1960), called the RPC model. Although this model is not new, it has been increasingly used in recent evaluations and more complex models have been formulated using this model as a basis (e.g.; HUNG, 2012;FORTHMANN, GUHNE and DOEBLER, 2019).
The RCP model can be estimated using a classical approach (see BAGHAEI, RAVAND and NADRI, 2019) and a Bayesian approach (see MUTZ and DANIEL, 2018). Recently, Baghaei and Doebler (2019) showed that the RPC model can be estimated using lme4 (BATES et al., 2015), considering this as a GLMM model. In this work, we develop Bayesian estimation of the Rasch Poisson counts model considering a similar formulation.
Residual analysis is an important tool to assess a model's fit to a given dataset. In the case of IRT models, we are interested in the residual analysis for test items of the test. Thus, a tool for graphical visualization of residuals is important to assess whether an item can be considered to follow the proposed model. In the case of Rasch counts models, Pearson residuals have been studied for estimation using the maximum likelihood method and are currently available for GLMMs in the stats package in the resid function (R CORE TEAM, 2020). In this context, we propose the use of the randomized quantile residuals developed by Dunn and Smyth (1996), and more specifically, we develop residuals analysis through graphical visualization considering the violin plot proposed by Hintze and Nelson (1998).
The remainder of this paper is organized as follows. In Section 2, we present the Rasch Poisson counts model. The inference based on the Bayesian modeling approach is described in Section 3. Also in that section, we define residual analysis for items of the test, in particular, Pearson and randomized quantile residuals to evaluate the fit of the items. In Section 4, we apply the RPC model to model a real dataset using the Bayesian approach. Finally, in Section 5, we make some concluding remarks.

Rasch Poisson Counts Model
Consider a test of k items applied to n individuals, where the responses obtained correspond to counts such as number of correct solutions or number of errors, among others. The Rasch Poisson counts model (RPCM) assumes that the count responses Y ij of individual i in item j are independent and Poisson distributed (RASCH, 1960) with: where µ ij > 0 is the expected count value of individual i in item j, with i = 1, · · · , n and j = 1, · · · , k. In other words, µ ij is a function of the parameter associated with the "latent trait" or ability of individual i denoted by θ i , and the "facility" of item j denoted by β j . Furthermore, it is assumed to have an additive composition, using the log link function, expressed by: where: µ ij = exp{β j + θ i + t j } with i = 1, · · · , n and j = 1, · · · , k, where n is the number of individuals (sample size) and k is the number of items; β j is the facility parameter of item j, θ i is the latent trait of individual i and t j is the known time limit for item j, corresponding in the expression to an offset variable, which is relevant when modeling ratios or rates when individuals do not take the same time lengths to answer each item. It can be fixed at zero in the case of no time limit. In Figure 1 we illustrate the expected score (count) for some selected combinations of the facility parameters for the item and the values of latent traits, considering an no time limit (t j = 0). In other words, we present the distribution curves of y ij for different values of β j and θ i . In particular, in Figure 1(a), we fix the value of the latent trait θ i = 0.5 and vary the values of the facility of the item β j = (0.2; 0.5; 0.9). In contrast, in Figure 1(b), we fix the value of the item β j = 0.8 and we vary the values of the latent trait θ i = (−1.5; 0, 1.5). Considering Figure  1(a), we observe that given a latent trait value, the expected value of the response variable µ ij is higher if the item is easier. Likewise, in Figure 1(b), given a facility value of the item, the expected value of the response variable µ ij is higher if the individual has a greater latent trait.
The RPC model assumes the property of conditional independence, that is, for individual i, the responses y ij corresponding to the items j are conditionally independent given the values of the latent trait of the individual, θ i . In addition, the model assumes independence between the responses of different individuals. So, Item facility is fixed and the latent traits vary. letting β = (β 1 , · · · , β k ) , θ = (θ 1 , · · · , θ n ) and t = (t 1 , · · · , t k ) , considering the model's assumptions and equation (1), the likelihood function is given by: Several methods exist to estimate the parameters of the RPC model (VERHELST and KAMPHUIS, 2009), such as conditional maximum likelihood (CML), joint maximum likelihood (JML), and marginal maximum likelihood (MML), among others (BAGHAEI and DOEBLER, 2019). Some authors, like Jansen and Duijn (1992), impose the constraint k j=1 β j t j = 1, which is added to identify the model when using the estimation procedure of marginal maximum likelihood (JANSEN and DUIJN, 1992). Also, Jansen (1994) proposed the use of a distribution for latent traits within an EM algorithm, θ i ∼ N 0, σ 2 . Using this specification and the formulation of additive effects (equation 2), the RPCM can be seen as a generalized linear mixed model, in which we consider θ i as the individual random effects, β j the fixed effect associated with the items and t j an offset, that is, a known constant added to the regression equation.

Bayesian Inference and Residual Analysis
To obtain the estimates of the parameters of the RPC model, we consider a Bayesian approach and propose priors for θ i and β j in order to obtain the posterior distribution of these parameters. In this context, we propose the use of the integrated nested Laplace approximation (INLA) approach developed by Rue, Martino and Chopin (2009). The INLA method is a deterministic approach to Bayesian inference in a wide structure of latent Gaussian models, including generalized linear mixed models (RUE et al., 2017), in which the response variable Y ij , with a mean of µ ij , is linked to the additive structure of the linear predictor η ij through a link function g(·), such that g(µ ij ) = η ij . The use of INLA for the RPC is very convenient because the Gaussian models are part of the exponential family.
The additive structure of the RPC model given by (2), as already mentioned, can be adapted as a mixed Poisson regression model and therefore can be written using a hierarchical structure: where σ −2 θ = τ θ is the precision parameter.
Residuals carry important information for checking the assumptions that underlie statistical models, and therefore play an important role in data analysis. The use of residuals allows detecting discrepancies of some specific observations of the model, in addition to providing an overview in terms of goodness-of-fit. For the Poisson model, Pearson residuals are commonly used (BAGHAEI and DOEBLER, 2019).
is the posterior mean of Y ij , obtained using µ ij = exp{ β j + θ i +t j } with θ i and β j being the a posterior means of the latent trait and the facility of the item, respectively.
In this paper, we propose the use of randomized quantile residuals (DUNN and SMYTH, 1996) which are defined by: where the terms f (·) and F (·) represent the probability mass function and cumulative distribution function of the Poisson distribution, respectively, and u i is a uniform distribution value in the interval (0, 1). Pearson residuals have an asymptotically normal distribution (CORDEIRO and SIMAS, 2009), that is, r ij ≈ N (0, 1), and quantile residues have an exact normal distribution (DUNN and SMYTH, 1996), that is, q ij ∼ N (0, 1). In a well-specified model, the residuals are expected to be concentrated around zero, evenly covering a range of approximately −1.96 to 1.96 considering a confidence level of 95%. To check the fit of a particular item, we present the distribution of the residuals of the different individuals for that item using graphical methods: boxplot and violin plot.
The violin plot combines the boxplot graph and density estimation in a single graph. In other words, it adds the available information from local density estimates to the basic summary statistics inherent in a boxplot (HINTZE and NELSON, 1998). Thus, this combination of the density format and statistics summarized in a single graph provides a useful tool to illustrate the model goodness-of-fit, to detect incorrect specifications of the error distribution, and to identify the behavior of the distribution of errors and potential items with problematic fit.
In the Appendix we show R code used in the application for the bayesian estimation and the residual analysis for the Rasch Count Model.

Application
Here we illustrate the Bayesian estimation of the RPC model considering the data presented by Baghaei and Doebler (2019), referring to a study of 228 people on the selective attention test proposed by Beyzaee (2017). The test consists of 20 blocks with a time limit of 15 seconds to perform the task in each block, in which the participants need to cross out the numbers 2 and 7 in three lines of randomly arranged digits and letters. An example block is given below.

G O X C 7 M J 7 H Z R N G A S Y W Q L H B Z G J N V 7 E T P R V M J H S T Q C 7 K L W C 7
X M T 7 K T R 2 A V P I W O C 2 G J 7 L S 2 B N V W 7 T O X R 2 P H 7 F D A B M 2 W H K A S T 2 O P H W E D 2 T R N E Q X 2 P K L 7 P K 7 Z C V 7 2 Z 7 E T G H L K S D I N 7 S 2 W I S N 7 T B M O P W Thus, the explanatory variables are: "ID", the student's identification number; "Item", referring to the block of letters and numbers that the students have to verify; "Hit", denoting the total number of correct checks on each item for each subject; and "TL", the time limit (in seconds) defined for each item. In the same manner as Baghaei and Doebler (2019), each block is considered an item, and the total number of correct cross-outs of 2's and 7's recorded as "Hit" are modeled as the unit of analysis. Thus, we fit the RPC model as defined in equation (4) considering the Bayesian approach. In Table 1, we show the posterior summary of the item parameters of the test. The items range from 0.157 to 0.648, where item 2 is the most difficult item and item 12 is the easiest, as shown in Figure 2. Values are very close with the obtained using lme4. Data can be required directly from the authors of the paper Baghaei and Doebler (2019).  To assess model goodness-of-fit, we conduct a residual analysis using Pearson and randomized quantile residuals. Table 2  measures of both types of residuals. Considering these results, we found outliers. Additionally, as shown in Figure 3, where lines with the value −3 and 3 have been added, considering Pearson residuals we identified that only items 5, 13 and 15 present outlying values. Also, considering the randomized quantile residuals in Figure 3(a), a larger number of items is identified as outliers (2, 3, 5, 9, 13, 15 and 17). In order to clarify the distribution of the outliers detected using randomized quantile residuals, we report the distribution of the quantile residuals of these items using the violin plot (Figure 4). We chose to report the results of the randomized quantile residuals, since normal distribution behavior of these residuals is expected, unlike the case of Pearson residuals, which only present asymptotic normal behaviour. It is possible to show this behavior using qqplot, however, we   Using the violin plot, we can not only identify the width of the residuals, but the distribution of the residuals of the items as well. We note that items 5, 13 and 15 show serious departure from normality. These results indicate that the proposed RPC model for the data may not be the best model. This shows the advantages of using residual analysis. Additionally, Figure 5 shows the distribution of the individual latent trait parameters. This distribution is approximately normal around zero. Finally, the hyperparameter associated with the dispersion of the latent trait parameter presents a posterior meanμ θ = 44.310 and a posterior standard deviation of 4.582, indicating a small variance for the latent trait.

Conclusions
In this work, we present a Bayesian approach to estimate the parameters of the Rasch Poisson counts model, using the INLA method. This is an alternative method to the Bayesian methods commonly used in the statistical literature, but not in the psychometric literature. We use a specification of the RPC model as a mixed Poisson regression model. Results, not shown here, indicated that our estimation was very close to the estimation using glmer, a function under the approach of the generalized linear mixed models using the marginal maximum likelihood method proposed in Baghaei and Doebler (2019). By considering our formulation, extensions to Rasch models as proposed by De Boeck and Wilson (2004) can be explored and easily implemented from a Bayesian approach using the INLA method. We illustrate the estimation method with real data from the attention test presented by Baghaei and Doebler (2019).
Additionally, we introduce the use of randomized quantile residuals to evaluate the fit of each item of the test. We show that Pearson and randomized quantile residuals present discrepant information about the number of items that are not well fitted to the RPC model. We prefer the use of randomized quantile residuals because these residuals are normally distributed under a well-fitting model. We show that violin plots are interesting to show the fit of the test items. Considering the proposed method of analysis of residuals in the analyzed data, our results show that the model is not the best model for the data, so other models such as those of Hung (2012) and Forthmann, Gühne and Doebler (2019) can be studied in the future with this dataset.