Abstract
A three-parameter logistic (3PL) model variant, named the two-parameter logistic extension (2PLE) model, was developed. This new model employs a function that integrates item features according to an examinee’s ability level instead of a fixed guessing parameter used in the 3PL model to quantify guessing behavior. Correct response probabilities from a solution behavior and guessing behavior increase as the level of ability increases. At extreme cases in which the level of ability is close to negative infinity, the 2PLE model degenerates into a 3PL model with a guessing probability at chance level (i.e., 1/m, where m is the number of options). The properties of the 2PLE model were described and compared with those of other guessing models. Then, a simulation study comparing the performance of the 2PLE model with that of the 3PL model under three scenarios was conducted. Results showed that the 2PLE model generally outperforms the 3PL model. Finally, the application of the new model in comparison with several existing models was demonstrated by using two real data sets.
Introduction and Motivation
Multiple-choice items are extensively used in standardized tests. Thus, assuming that respondents guess when they are in doubt of the correct response is reasonable (San Martín, del Pino, & de Boeck, 2006). The most popular item response theory (IRT) model that includes guessing is the three-parameter logistic (3PL) model (Birnbaum, 1968), which has been discussed in many papers and books (e.g., Baker & Kim, 2004; Hambleton, Swaminathan, & Rogers, 1991; Han, 2012; van der Linden & Hambleton, 1997; von Davier, 2009). However, several past studies revealed that the 3PL model has technical and theoretical limitations. Moreover, item parameters are often estimated poorly in small numbers of items and respondents (Swaminathan & Gifford, 1979), and guessing parameter is not dependent on a person’s latent trait level.
The aim of the present study is to propose a 3PL model variant including a guessing component that depends on ability levels and item characteristics. First, the authors start by briefly reviewing the viewpoints in the studies of San Martín et al. (2006) and van der Maas, Molenaar, Maris, Kievit, and Borsboom (2011), which motivates the current research. Second, they summarize various existing models that quantify guessing behavior. These models form the prototype of the new model. Third, a new integrated model is introduced followed by a discussion of model properties. Fourth, a series of simulation studies are presented. Finally, two real data examples are described.
Alternative Models With Guessing
In conventional IRT models, the probability of a correct item response depends on the characteristics of items and respondents. For instance, in the popular two-parameter logistic (2PL) model (Birnbaum, 1968), the probability of a correct response
where
The 2PL model is widely used because it is more flexible than the one-parameter logistic (1PL) model (Rasch, 1960), in which all ai’s are equal (ai = 1), although it still provides a relatively parsimonious representation of the association structure in the data. The 2PL model identifies the correct response probability primarily from the solution behavior. Although the 2PL model, similar to other psychometric models, focuses on the representation of individual differences, Tuerlinckx and De Boeck (2005) showed the equivalence between the 2PL model and the diffusion process model for two-choice responses. Their conclusion was further extended in the study of van der Maas et al. (2011). According to the equivalence findings, the 2PL model is somewhat related to the response process models, and the 2PL model can reflect a response process indirectly to some extent. However, the 2PL model only models the successful problem-solving process, and it does not include guessing strategy. Thus, the item characteristic curve (ICC) of the 2PL model has an asymptote of zero. Multiple new models based on the 2PL model have been introduced for solving the guessing problem.
Models With Guessing
Guessing is generally accommodated in IRT through (a) single-process models, such as the Q-diffusion model (van der Maas et al., 2011) and the three-parameter residual heteroscedasticity (3P-RH) model (Lee & Bolt, 2018) and (b) two-process models, which decouple the p process of searching for a correct answer and the g process for guessing (San Martín et al., 2006). One example of the two-process model is the 3PL model. This two-process interpretation, although attractive, does not include other possible serial or parallel processes, and thus single-process models are preferred when the p process and g process are mixed up.
In the two-process model,
which can also be rewritten as
The second term,
San Martín et al. (2006) argued that three arrangements are possible for the execution of the two processes. In the first one, the g process comes first, and the p process is executed when the g process fails. This arrangement is reflected in Equation 2. That is, a person first makes a guess with a success probability of
Several models that vary the specific forms of the guessing function
Pelton (2002) found that the parameter recovery accuracy for the 3PL model depends on the amount of guessing present in the data. Thus, the estimation of the guessing parameters is unstable. In fact, even when the converged estimates are obtained successfully, the standard errors (SE) of ci parameter estimates are extremely large to be practical when the sample size is small (<2,000) or when the response matrix is moderately sparse (missing proportion >50%; Han, 2012). As Holland (1990) alluded, the fundamental reason may be that a one-dimensional test can support only two parameters per item. Roughly speaking, as the number of model parameters increases, the number of data points per parameter decreases, and this decrease may cause severe instability (San Martín et al., 2006). Therefore, reducing the number of parameters is considered.
When
Maris (2002) noted that the 1PL-G model and the 3PL model are both not identifiable.
San Martín et al. (2006) suggested that the 3PL model can be used only for large samples unless the guessing parameters are equal to a known or unknown constant. Birnbaum (1968) conjectured that low-ability subjects select a correct response by chance such that the guessing parameter is the same as the chance level
The 3PL with FLA model has a fixed asymptotic of
Pelton (2002) described another limitation of the 3PL model. The calibration of guessing parameters on the basis of the capable and weak samples of respondents may produce substantially different item parameter estimates. Thus, presuming that the success of guessing depends on ability level in some cases is reasonable. Based on this idea, a function
Based on Equation 6, San Martín et al. (2006) proposed a model called the one-parameter logistic ability-based guessing (1PL-AG) model. The success probability of guessing in the 1PL-AG model is defined as
whereas the success of solution behavior still follows 2PL, that is,
The lower asymptote in the 1PL-AG model is zero, which may not hold for multiple choice items because for these items, the worst case that a respondent can do is guess at random. That is, he or she approaches the guessing probability of chance level rather than zero (van der Maas et al., 2011). This situation is one of the main points of the Q-diffusion model, and this axiomatic property continues to hold in our proposed model.
Motivation of the New Model
The motivation for the new model is twofold. First, empirical evidence shows that the average estimated ci-parameter in the 3PL model is close to 1/m (Han, 2012). As a result, the assumption that low-ability examinees are attracted to incorrect responses is unsubstantiated. Therefore, in theory, the lower asymptote of an ICC should be close to 1/m, which is the success probability of a completely random guessing process.
Second, the success of “educated” guessing depends on an examinee’s ability (San Martín et al., 2006; van der Maas et al., 2011). This ability-based guessing process happens when an examinee guesses an answer from the remaining items after eliminating wrong items. In fact, “almost all formula-scoring instructions in use today, while advising avoidance of completely blind guessing, do encourage examinees to guess whenever they can eliminate a wrong choice” (Frary, 1988, p. 34).
Therefore, it is rational to assume that the probability of a correct guess is 1/m when
The 2PLE Model
Formulation of the Guessing Function,
For item i with mi options, assume that the guessing function relates only to the item parameters (mi, ai, bi) and θ. In the 2PL model, the probability of an incorrect response is
Because it is reasonable to assume that a high probability of guessing an item right corresponds to a low possibility of an incorrect response, the correct guessing probability p(g) should have a monotonically decreasing relationship with
Equation 9 can be rewritten as
Here, note that this guessing function (Equation 10) integrates the information from both ability levels and item characteristics, and it is called as integrated guessing (IG) function, which is denoted as
Notably, our proposed model based on this IG function preserves some axiomatic properties of the 2PL model (such as identifiability and linear invariance). Next, the authors focus on exploring these properties in detail.
Formulation of the Model
By plugging the guessing function into Equation 6 and replacing the success solution probability by the 2PL model, the item response function of the new model has the following form:
Note that the 3PL model can be viewed as a special case of this equation by replacing
The 2PLE model maintains the monotonicity feature of the original 3PL model in which the correct response probability monotonically increases with increasing
In the 2PLE model, when
The average of three numbers is V. If one of the numbers is Z, and another is Y, what is the remaining number?
Three numbers are provided, as well as their average. Thus, options (A) and (E) without 3 may be wrong. Some examinees can eliminate one or two response options by using their logical ability.
Identifiability of the 2PLE Model
The 2PLE model is identified with the following restriction:
Here,
when
Also because Φ
i
is a monotonic function of θ, the ICC of the 2PLE model is monotonically increasing as a function of θ. Therefore,
Interpretation of the Model Parameters
The interpretation of item parameters in the 2PLE model is similar to that in the 3PL model. This statement can be verified by applying the idea of Lord and Novick (1968).
In the 2PLE model, when
Therefore,
For the 2PLE model, taking the first-order partial derivative of
In particular, when
At point bi, the ICC of the 2PLE model has a slope that is a function of ai. The slope of the tangent line of the ICC at bi is proportional to the magnitude of ai. Thus, from a geometric perspective, ai can be viewed as the discrimination parameter.
The meaning of the ability parameter is similar to that in the 1PL-AG model, and the ability level not only plays a crucial role in the problem-solving process that is captured in the original 2PL model but also influences the guessing process.
Rationality of the 2PLE Model
For item i, the lower asymptotic line of the ICC of the 2PLE model is equal to

The 2PLE model where ai = 1, bi = 1, and mi = 4.
Note that for small negative ability values (e.g.,
The guessing part of the 2PLE model
is illustrated in Figure 2a and 2b, which present how the guessing component changes as a function of θ. The contribution of the guessing part in the 2PLE model to the probability of success is a single-peaked function of θ. The probability of guessing an item correctly reaches maximum value near the point of

Guessing part (i.e., the second term in Equation 3) of the two models for two items as a function of θ.
Figure 3 illustrates how the guessing part changes as a function of bi for a fixed θ level. The guessing part in the 2PLE model is a single-peaked function of bi with the maximum value near the point

Guessing part of the two models for two fixed θ levels as a function of item difficulty.
Finally, because the problem-solving process and the guessing process in the 2PLE model share the same set of item parameters (i.e.,

Conditional probability of solving the item correctly (“ϕ” from 2PL) and the conditional probability of guessing successfully (
Simulation and Comparison
Two evaluation indexes were used for the evaluation of the parameter estimation: the absolute bias (
where R is the number of replications. The parameter recovery simulation and details for estimating parameters by using the 2PLE model are provided in the Online Supplement.
Model Comparison
A simulation study was conducted to investigate the performance of the 2PLE model and the recovery of θ from the 2PLE and 3PL models was compared. A simulation study was performed consisting of three scenarios because it cannot be determined which model is the source of true data in practice.
Scenario 1: Whether the 2PLE model is better than the 3PL model was determined by analyzing the data generated from 2PLE model with 2PLE and 3PL models.
Scenario 2: Mixed data generated from the 2PLE and 3PL models were analyzed by using both models for the identification of any potential advantage of the 2PLE model over the 3PL model.
Scenario 3: The feasibility of using the 2PLE model as a substitute for the 3PL model was determined by analyzing the data generated from the 3PL model with the 2PLE and 3PL models.
Two evaluation criteria for model selection were used in this simulation: the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Owing to a collection of competing models for the data, the AIC estimates the quality of each model relative to those of other models. The BIC is a criterion for model selection among a finite set of models, and the model with the lowest BIC value is preferred. The formulas are provided as follows:
where
In the three scenarios, all discrimination parameters were randomly generated from the log-normal distribution log N(exp(1), 0.5). The difficulty parameters of the dichotomous items were randomly generated from the standard normal distribution N(0, 1), and the guessing parameters were set to 0.25 in these scenarios. This condition corresponds to 1/m (m = 4) in the 2PLE model. In the simulation, 11 equally spaced values for
Scenario 1
For each replication, 40 dichotomous item responses were simulated with the 2PLE model. The same item responses were fitted with the 2PLE and 3PLE models.
The derivation and implementation of the MLE of θ with the 3PL model can be found in literature (see, e.g., Baker & Kim, 2004; du Toit, 2003). The |BIAS| and RMSE are shown in Figure 5.

Comparison of the absolute bias (|BIAS|) and RMSE of the ability parameter between the 3PL and 2PLE models in Scenario 1.
In Scenario 1, the 2PLE and 3PL models were both estimated for the data set generated from the 2PLE model. Figure 5 shows that the
Scenario 2
In each replication, 20 dichotomous item responses were simulated according to the 2PLE model, and 20 responses were simulated according to the 3PL model. The

Comparison of the absolute bias (|BIAS|) and RMSE of the ability parameter between the 3PL model and the 2PLE model in Scenario 2.
In Scenario 2, the 2PLE and 3PL model parameters were estimated with a mixed data set. Figure 6 shows that nearly all the values of
Scenario 3
In each replication, 40 dichotomous item responses were simulated according to the 3PL model. The same item responses were used for the 2PLE and 3PLE models for the estimation of ability parameters. The

Comparison of the absolute bias (|BIAS|) and RMSE of the ability parameter between the 3PL model and the 2PLE model in Scenario 3.
In Scenario 3, the 2PLE and 3PL model parameters were estimated by using the data generated from the 3PL model. Figure 7 shows that the
Model Selection
Table 1 provides the AIC and BIC values in Scenarios 1, 2, and 3.
AIC and BIC Values for Scenarios 1 Through 3.
Note. The AIC and BIC values using the 2PLE model are consistently smaller than those of the 3PL model in Scenarios 1 and 2. In other words, the 2PLE model is preferred in the two scenarios. AIC = Akaike information criterion; BIC = Bayesian information criterion; 2PLE = two-parameter logistic extension; 3PL = three-parameter logistic.
In Scenario 3, the AIC from the 2PLE model is larger than that of the 3PL model, but the BIC is smaller for the 2PLE model. This result is consistent with previous findings that AIC tends to select overparameterized models (e.g., Kass & Raftery, 1995), whereas BIC tends to favor parsimonious models. Therefore, in this scenario, the 2PLE model can still be used as a substitute for the 3PL model. Table 1 also presents the number of replications each model is selected by AIC/BIC, and the results are consistent with the findings from the average AIC/BIC. That is, 2PLE is selected by both criteria except in Scenario 3, in which case AIC picks 3PL most of the time.
Real Data Illustration
First, to study the applicability of the 2PLE model, a pilot study was reviewed on a sample of 2000 examinees. The selected test is from a recent state mathematics assessment consisting of 26 items. All multiple-choice items contain four options (
Goodness of Fit for the Mathematics Test Data.
Note. 1PL-AG = one-parameter logistic ability-based guessing; 3PL = three-parameter logistic; FLA = fixed lower asymptote; 1PL-G = one-parameter logistic guessing; 2PL = two-parameter logistic; PLE = parameter logistic extension; LL = Log-likelihood; AIC = Akaike information criterion; BIC = Bayesian information criterion.
Table 2 summarizes the goodness of fit of the six models for the same data set according to the log-likelihood and AIC and BIC values. The goodness of fit of the 2PLE model is clearly better than those of the other models.
Second, the situation in which the number of options of items varied is considered. The Programme for International Student Assessment (PISA) is a triennial international survey that aims to evaluate educational systems worldwide by testing the skills and knowledge of 15-year-old students. More than 70 countries and economies have participated in the assessment. Data from the mathematics exam PISA 2012 in Switzerland based on a sample of 3,397 examinees were used. In this exam, one of the items (PM923Q01) contains five options (
Table 3 summarizes the goodness of fit of the six models for the data based on the log-likelihood, AIC and BIC values. The AIC and BIC values of the 2PLE model are again the smallest among all models.
Goodness of Fit for the PISA Data.
Note. PISA = Program for International Student Assessment; 1PL-AG = one-parameter logistic ability-based guessing; 3PL = three-parameter logistic; FLA = fixed lower asymptote; 1PL-G = one-parameter logistic guessing; 2PL = two-parameter logistic; 2PLE = two-parameter logistic extension; LL = Log-likelihood; AIC = Akaike information criterion; BIC = Bayesian information criterion.
The 1PL-AG, 3PL, 3PL with FLA, and 2PLE models have the same basic structure (Equation 2) but different guessing functions. In these models, the guessing functions of the 1PL-AG and 2PLE models are related to ability. The AIC and BIC values of the former are larger than those of the latter.
Discussion and Conclusion
San Martín et al. (2006) used a two-process theory in cognitive psychology to provide a theoretical foundation for the construction of ability-based guessing model. Extensive research has shown that the probability of a correct guess does not only depend on the item but also on the ability of an individual (Han, 2012; San Martín et al., 2006; van der Maas et al., 2011). Enlightened by the ideas of the above works, the authors proposed the 2PLE model as an alternative to the 3PL model. The key component of the 2PLE model is the IG function. The IG function depends on item characteristics and ability levels. Compared with the 3PL model, the 2PLE model contains one fewer parameter per item, and, therefore, it not only reduces the difficulty of parameter estimation, but it also improves the accuracy of parameter estimation. The simulation results confirmed these advantages in terms of satisfactory parameter recovery and overall model goodness of fit.
Compared to the 1PL-AG model, the 2PLE model has two advantages. First, it allows for item-level discrimination parameter (Lee & Bolt, 2018). Second, the 1PL-AG model contains two distinct sets of item parameters for the problem-solving (bi) and guessing processes (α, γ
i
), respectively. As a result, the interpretation of the item “difficulty” parameter is affected by the guessing process because the overall difficulty of an item is defined by both
From cognitive psychology perspective, Tuerlinckx and De Boeck (2005) discussed the linkage between the 2PL model and the diffusion model for two-choice response processes. Following the same line of reasoning, van der Maas et al. (2011) established the connection of general psychometric models and the Q-diffusion model. This connection provides novel interpretations for the parameters in the 2PL model and other models beyond merely interpreting parameters from a curve-fitting perspective. More importantly, this connection provides new research directions. As an extension of the 2PL model, the 2PLE model preserves the main properties of the 2PL model, such as identifiability and linear invariance. As a result, the establishment of a connection between the 2PLE and diffusion process models represents an interesting yet challenging avenue for further research.
Supplemental Material
Online_Supplement-7-24 – Supplemental material for A Two-Parameter Logistic Extension Model: An Efficient Variant of the Three-Parameter Logistic Model
Supplemental material, Online_Supplement-7-24 for A Two-Parameter Logistic Extension Model: An Efficient Variant of the Three-Parameter Logistic Model by Zhemin Zhu, Chun Wang and Jian Tao in Applied Psychological Measurement
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was partially supported by the National Natural Science Foundation of China (Grants 11571069, 11501094) and the Fundamental Research Funds for the Central Universities (Grant 2412017FZ028).
Supplemental Material
Supplemental material is available for this article online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
