Effects of the Quantity and Magnitude of Cross-Loading and Model Specification on MIRT Item Parameter Recovery

Abstract

In real-world situations, multidimensional data may appear on large-scale tests or psychological surveys. The purpose of this study was to investigate the effects of the quantity and magnitude of cross-loadings and model specification on item parameter recovery in multidimensional Item Response Theory (MIRT) models, especially when the model was misspecified as a simple structure, ignoring the quantity and magnitude of cross-loading. A simulation study that replicated this scenario was designed to manipulate the variables that could potentially influence the precision of item parameter estimation in the MIRT models. Item parameters were estimated using marginal maximum likelihood, utilizing the expectation-maximization algorithms. A compensatory two-parameter logistic-MIRT model with two dimensions and dichotomous item–responses was used to simulate and calibrate the data for each combination of conditions across 500 replications. The results of this study indicated that ignoring the quantity and magnitude of cross-loading and model specification resulted in inaccurate and biased item discrimination parameter estimates. As the quantity and magnitude of cross-loading increased, the root mean square of error and bias estimates of item discrimination worsened.

Keywords

MIRT item parameter recovery cross-loading

Introduction

One of the assumptions of item–response theory (IRT) models is that the underlying latent ability being measured is unidimensional in nature (Finch, 2011). However, there are a great number of surveys, instruments, and tests that measure multiple latent abilities, which leads to a potentially multidimensional structure of item–response data. Within-item multidimensionality refers to a set of items in which some or all items on a test measure more than one underlying latent ability (Adams et al., 1997; McDonald, 1999). Educational and psychological measurement instruments are more likely to represent a multidimensional Item Response Theory (MIRT) model where more than one underlying latent ability is being measured simultaneously (Finch, 2011; Reckase, 1985). For example, in a mathematics test, some items may be hypothesized to measure algebra skills but also require some geometry or trigonometry skills in order for an examinee to respond to that particular item correctly. Figure 1 illustrates a MIRT model where two underlying abilities ( $θ_{1}$ and $θ_{2})$ with a correlation of $ρ$ are measured by 10 items. As shown in Figure 1, Item 4 and Item 7 cross-load on $θ_{1}$ and $θ_{2}$ , meaning some amount of ability on both dimensions is required to answer these items correctly.

Figure 1.

A MIRT Model With Two Underlying Abilities ( $θ_{1}$ and $θ_{2}$ ) and Correlation of ρ.

When data are multidimensional, the quantity and magnitude of the cross-loaded items should be taken into account to ensure the precision of item parameters and appropriate interpretation of item characteristics and examinees’ abilities. Understanding the true structure of the data and assessing dimensionality prior to applying IRT models in social sciences is imperative to make appropriate inferences about the underlying latent abilities (Finch, 2010, 2011; Strachan et al., 2020; Svetina et al., 2017; Svetina & Levy, 2016; Zhang, 2007, 2012).

In this study, cross-loading refers to the items that are associated with multiple abilities at the same time or those items that require an examinee to demonstrate knowledge on multiple underlying abilities. The magnitude of cross-loading is an indication of how strongly primary and secondary dimensions are associated with the items. The quantity of cross-loading items refers to the number, or percentage, of the items on a test, survey, or a measurement instrument that exhibit cross-loading. Previous research studies discussed dimensionality assessment and its performance precision under various conditions (Finch & Habing, 2005; Svetina, 2013; Svetina & Levy, 2016). Other researchers have taken into account multidimensionality and structure of the data to evaluate item parameter recovery (Finch, 2011; Svetina et al., 2017; Zhang, 2012). However, to our knowledge, these studies did not investigate the effects of the quantity and magnitude of the cross-loading on secondary dimension and model specification on item parameter estimation.

The purpose of this study was to investigate the effects of the quantity and magnitude of cross-loading on secondary dimension, and model specification on item parameter estimation. The focus of this study is a compensatory, dichotomous, two-parameter logistic (2PL) MIRT model.

Literature Review

There are a number of variables that could potentially influence item parameter estimation and must be taken into account to evaluate the precision of item parameter estimates when applying a MIRT model. These variables include but are not limited to sample size, correlation level between the latent abilities, estimation method, number of items, distribution of the latent abilities, the quantity and magnitude of the cross-loading on secondary dimension and model specification, etc. Previous studies have investigated the effects of structure complexity, the correlation between the underlying latent abilities, sample size, and distribution of examinees on item parameter recovery in complex structure MIRT models (Finch, 2011; Finch & Habing, 2005; Svetina, 2013; Svetina et al., 2017; Svetina & Levy, 2016; Zhang, 2012). Bolt and Lall (2003) investigated item parameter estimation precision of multidimensional compensatory and non-compensatory item–response models. The authors performed a simulation study to evaluate parameter recovery for the multidimensional two-parameter logistic model (M2PL) and the multidimensional latent ability model under various conditions. Results suggested that sample size, number of items, and the correlation between latent abilities had noticeable effects on item parameter estimates (Bolt & Lall, 2003). Finch (2010) investigated the accuracy of item parameter estimates in the area of MIRT model context. The results of this study indicated that regardless of the distribution of the latent ability, bias was much higher in the 3PL than in the 2PL case. In addition, higher correlation values demonstrated a greater bias in item discrimination and location estimates. Finch (2011) investigated the accuracy of item location and discrimination parameter estimation in MIRT models when some items exhibited a complex structure. The results of this simulation study indicated that item parameter estimation, when items did not exhibit a simple structure, was more accurate when a multidimensional approach was utilized compared with a unidimensional approach. Item discrimination estimates demonstrated lower levels of bias when two latent abilities were present. Furthermore, item discrimination parameters were consistently underestimated when the latent abilities were non-normal. The author noted that both bias and standard error increased when item–response data did not conform simple structure (Finch, 2011). Zhang (2012) conducted a simulation study to compare the unidimensional and multidimensional approaches with the marginal maximum likelihood method when a test was composed of several unidimensional subtests with simple structure. Results of this study indicated that item parameter estimation utilizing a multidimensional approach was more precise than item parameter estimation utilizing a unidimensional approach when the number of items in a test or an instrument was small (Zhang, 2012).

Sometimes a complex structure can be approximated to a simple structure where each item depends predominantly and strongly on one primary underlying latent ability and relatively weakly on other secondary latent abilities (Hulin et al., 1983; Strachan et al., 2020; Svetina & Levy, 2016). Svetina et al. (2017) performed a simulation study to investigate the effects of complex structures and the distribution of examinees’ latent ability on item parameter recovery in dichotomous compensatory MIRT models. The results of this study indicated that when latent abilities were skewed, item parameter recovery was generally adversely impacted. In addition, the presence of complexity contributed to decreases in the precision of parameter recovery, particularly for discrimination parameters when at least one latent ability was generated as skewed (Svetina et al., 2017). The components of the aforementioned studies investigated the effects of sample size, model type (2PL or 3PL), the correlation between latent abilities, the distribution of examinees’ ability, and the structure complexity of the data on item parameter estimation. However, to our knowledge, none investigated the effects of the quantity and magnitude of the cross-loading on the secondary dimension and model specification on item parameter estimation.

Method

In this study, we designed a simulation study to investigate the effects of the quantity and magnitude of cross-loading on secondary dimension and model specification (misspecified model vs. correctly specified model) on the precision of item parameter estimation in the MIRT models.

Quantity of cross-loading refers to the proportion of items that display cross-loading with the loading on the primary dimension being greater than the loading on the secondary dimension (i.e., $a_{p} > a_{s}$ and $a_{s} \neq 0$ ). In this study, three levels of the quantity of cross-loading were included; when 10% (one cross-loaded item of 10 items on each dimension), 30% (three cross-loaded items of 10 items on each dimension) or 50% (five cross-loaded items of 10 items on each dimension) of items represented cross-loadings, similar to that of Svetina and Levy (2016).

The magnitude of cross-loading is an indication of how strongly primary and secondary dimensions are associated with the item. In this study, three levels of low, medium, or high magnitude of cross-loading on secondary dimension were considered. Exact values from the Svetina and Levy (2016) were used for the high magnitude of cross-loading, where the discrimination on the secondary dimension of items ranged from 0.80 to 1.40. Then a modified medium magnitude of cross-loading was used where the discrimination on the secondary dimension ranged from 0.40 to 0.70 (1/2 of the original values) and a modified low magnitude of cross-loading was specified where the discrimination on the secondary dimension of items ranged from 0.20 to 0.35 (1/3 of the original values).

Correlation Between Dimensions

Previous studies investigated the effect of correlation between the dimensions within a variety of simulated correlation levels ranging from .0 to .95 (Bolt & Lall, 2003; Finch, 2010, 2011; Svetina et al., 2017; Zhang, 2012). Bolt and Lall (2003) utilized three levels of correlation between abilities (.0, .3, and .6) to investigate the accuracy of item parameter estimation. Finch (2010) and Finch (2011) investigated the effect of the correlation between latent abilities on item parameter estimation M2PL model. The two latent abilities were simulated to be correlated at .0, .3, .5, or .8. Zhang (2012) utilized three levels of correlation between abilities (.0, .5, and .8) to investigate the precision of item parameter estimation. In the Svetina et al. (2017) study, the correlations between abilities were set to .0, .4, or .7. In the current study, the data were simulated when the correlation between the abilities was set to .0, .6, or .9.

Sample Size

Previous studies have investigated the effect of sample size from 500 of examinees to 5,000 examinees on the item parameter estimation in the MIRT models (Bolt & Lall, 2003; Finch, 2010, 2011; Zhang, 2012). Bolt and Lall (2003) explored the effect of sample size at two levels of 1,000 and 3,000 on item parameter estimation in compensatory and no compensatory MIRT models. Finch (2010) and Finch (2011) simulated the number of examinees at four levels of 250, 500, 1,000, and 2,000 to investigate the item parameter precision under the influence of various sample sizes. Zhang (2012) evaluated item parameter estimation under various conditions and six levels of sample sizes (500, 1,000, 2,000, 3,000, 4,000, and 5,000). In this study, three levels of sample sizes of 500, 1,000, and 2,000 examinees were selected.

Model Specification

To investigate the effect of model specification on item parameter estimation, a correctly specified model and a misspecified model where a simple structure model was applied while the data exhibited cross-loadings.

Table 1 shows the item parameter specifications for a two-dimensional compensatory MIRT model with 10 items primarily measuring each dimension for three different types of cross-loading quantity from Svetina and Levy (2016). Three levels of cross-loading magnitude, the first of which was considered as high magnitude of cross-loading taken from Svetina and Levy (2016) and additional two modified medium and low magnitude of cross-loading on secondary discrimination values.

Table 1.

Item Parameters for 2D Compensatory MIRT Model for 10 Items Per Dimension.

Item	d	10% cross-loaded						30% cross-loaded						50% cross-loaded
		High		Medium		Low		High		Medium		Low		High		Medium		Low
		$a_{1}$	$a_{2}$	$a_{1}$	$a_{2}$	$a_{1}$	$a_{2}$	$a_{1}$	$a_{2}$	$a_{1}$	$a_{2}$	$a_{1}$	$a_{2}$	$a_{1}$	$a_{2}$	$a_{1}$	$a_{2}$	$a_{1}$	$a_{2}$
1	−1.5	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—
2	−0.75	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—
3	0	1.00	0.80	1.00	0.40	1.00	0.20	1.00	0.80	1.00	0.40	1.00	0.20	1.00	0.80	1.00	0.40	1.00	0.20
4	0.75	1.00	—	1.00	—	1.00	—	1.00	—	1.00	—	1.00	—	1.00	—	1.00	—	1.00	—
5	1.5	1.20	—	1.20	—	1.20	—	1.20	—	1.20	—	1.20	—	1.20	1.00	1.20	0.50	1.20	0.25
6	−1.5	1.20	—	1.20	—	1.20	—	1.20	—	1.20	—	1.20	—	1.20	1.00	1.20	0.50	1.20	0.25
7	−0.75	1.40	—	1.40	—	1.40	—	1.40	1.20	1.40	0.60	1.40	0.30	1.40	1.20	1.40	0.60	1.40	0.30
8	0	1.40	—	1.40	—	1.40	—	1.40	—	1.40	—	1.40	—	1.40	—	1.40	—	1.40	—
9	0.75	1.60	—	1.60	—	1.60	—	1.60	1.40	1.60	0.70	1.60	0.35	1.60	1.40	1.60	0.70	1.60	0.35
10	1.5	1.60	—	1.60	—	1.60	—	1.60	—	1.60	—	1.60	—	1.60	—	1.60	—	1.60	—
11	1.5	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80
12	0.75	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80	—	0.80
13	0	0.80	1.00	0.40	1.00	0.20	1.00	0.80	1.00	0.40	1.00	0.20	1.00	0.80	1.00	0.40	1.00	0.20	1.00
14	−0.75	—	1.00	—	1.00	—	1.00	—	1.00	—	1.00	—	1.00	—	1.00	—	1.00	—	1.00
15	−1.5	—	1.20	—	1.20	—	1.20	—	1.20	—	1.20	—	1.20	1.00	1.20	0.50	1.20	0.25	1.20
16	1.5	—	1.20	—	1.20	—	1.20	—	1.20	—	1.20	—	1.20	1.00	1.20	0.50	1.20	0.25	1.20
17	0.75	—	1.40	—	1.40	—	1.40	1.20	1.40	0.60	1.40	0.30	1.40	1.20	1.40	0.60	1.40	0.30	1.40
18	0	—	1.40	—	1.40	—	1.40	—	1.40	—	1.40	—	1.40	—	1.40	—	1.40	—	1.40
19	−0.75	—	1.60	—	1.60	—	1.60	1.40	1.60	0.70	1.60	0.35	1.60	1.40	1.60	0.70	1.60	0.35	1.60
20	−1.5	—	1.60	—	1.60	—	1.60	—	1.60	—	1.60	—	1.60	—	1.60	—	1.60	—	1.60
M	0	1.16	1.16	1.13	1.13	1.11	1.11	1.18	1.18	1.05	1.05	0.99	0.99	1.16	1.16	0.98	0.98	0.89	0.89
SD	1.09	0.31	0.31	0.37	0.37	0.41	0.41	0.29	0.29	0.38	0.38	0.48	0.48	0.27	0.27	0.41	0.41	0.51	0.51

Note. MIRT = multidimensional item response theory. Item specifications of the high magnitude of cross-loading were taken from Svetina and Levy (2016).

Analysis

The described manipulated variables influencing the precision of item parameter estimation in this study led to 162 simulated item–response data sets. Each condition combination was replicated 500 times. A compensatory 2PL MIRT model with two dimensions, a dichotomous item-response type with a standard bivariate normal θ distribution (Reckase, 1985) was used to simulate and calibrate the data for every replication for each condition combination. Parameters were estimated using marginal maximum likelihood, utilizing the expectation-maximization algorithms. R Studio (RStudio Team, 2018) was used for both data generation and item parameter estimation and analyses. The “mirt” package (Chalmers, 2012) was used to generate the simulated item–response patterns for a sample size of 500, 1,000, and 2,000 examinees within each combination of conditions and for item calibration.

Evaluation Criteria

To investigate the effect of the manipulated variables, estimated item parameters within each condition across the 500 replications were compared with the true item parameters. Root mean square of error (RMSE) and bias (B) were calculated to evaluate the performance and accuracy of each item parameter estimated, that is, discrimination and location, within each combination of conditions across 500 replications.

Results

The described manipulated variables influencing the precision of item parameter estimation in this study led to 162 simulated item–response data sets. Each condition combination was replicated 500 times. Within each combination of conditions across the 500 replicated datasets of item–response data, the estimated item discriminations on each dimension ( $\hat{a_{1}} and \hat{a_{2}}$ ) and item location ( $\hat{d}$ ) were calculated. Results are reported in two main sections of RMSE and bias considering the quantity of cross-loaded items (when 10%, 30% or 50% of items were cross-loaded) incorporating the magnitude of cross-loading (low, medium and high), sample size (500, 1,000 and 2,000), and model specification (correct specified model compared with misspecified model).

Section I: RMSE

Item Discrimination Parameters

The RMSEs were averaged for the estimated item discrimination parameters and the estimated item location parameter within each combination of conditions for three sets of item cross-loading. When a model was correctly specified, some items had a cross-loading and others did not. For those items with a cross-loading, $a_{pcl}$ refers to the discrimination parameter on the primary dimension on an item that had a cross-loading and $a_{s}$ refers to the discrimination parameter on the secondary dimension. For items without a cross-loading, only $a_{pncl}$ was estimated on the primary dimension for a non-cross-loaded item. See Table 2 for additional details.

Table 2.

Discrimination Parameter Specification for Correct Model Specification and Number of Items Within Each Specification.

Items	Cross load	Discrimination		10% cross-loaded		30% cross-loaded		50% cross-loaded
Items	Cross load	1^st	2^nd	$a_{1}$	$a_{2}$	$a_{1}$	$a_{2}$	$a_{1}$	$a_{2}$
1–10	CL	$a_{1 pcl}$	$a_{2 s}$	1	1	3	3	5	5
1–10	NCL	$a_{1 pncl}$	—	9	—	7	—	5	—
11–20	CL	$a_{1 s}$	$a_{2 pcl}$	1	1	3	3	5	5
11–20	NCL	—	$a_{2 pncl}$	—	9	—	7	—	5

Note. CL = cross-loaded item; $pcl$ = primary dimension for a cross-loaded item; $s$ = secondary dimension; NCL = non-cross-loaded item; $pncl$ = primary dimension for a non-cross-loaded item.

Primary Item Discrimination Parameters

Table 3 shows the average RMSEs of the estimated item discrimination parameters when the models were correctly specified considering three levels of cross-loading quantity incorporating low, medium or high magnitude of cross-loading. As shown in Table 3, the average RMSEs for the primary cross-loaded item discrimination parameters on the first dimension ( $a_{1 pcl}$ ) ranged from 0.078 to 0.334. The average RMSEs for the primary non-cross-loaded item discrimination parameters on the first dimension ( $a_{1 pncl}$ ) ranged from 0.092 to 0.221 across all combination of conditions. However, the average RMSEs for the primary cross-loaded item discrimination parameters on the second dimension ( $a_{2 pcl}$ ) ranged from 0.075 to 0.340. The average RMSEs for the primary non-cross-loaded item discrimination parameters on the second dimension ( $a_{2 pncl}$ ) ranged from 0.091 to 0.216 across all combination of conditions. Item discrimination estimates on the primary dimension tended to have larger RMSEs when the item was cross-loaded than when it was non-cross-loaded.

Table 3.

Average RMSE of the Primary (CL and NCL) and Secondary Discrimination on the First and Second Dimensions When the Models Were Correctly Specified.

Quantity	Correlation	N	Low						Medium						High
			$a_{1}$			$a_{2}$			$a_{1}$			$a_{2}$			$a_{1}$			$a_{2}$
			$a_{1 p}$ $CL$	$a_{1 p}$ $NCL$	$a_{1 s}$	$a_{2 p}$ $CL$	$a_{2 p}$ $NCL$	$a_{2 s}$	$a_{1 p}$ $CL$	$a_{1 p}$ $NCL$	$a_{1 s}$	$a_{2 p}$ $CL$	$a_{2 p}$ $NCL$	$a_{2 s}$	$a_{1 p}$ $CL$	$a_{1 p}$ $NCL$	$a_{1 s}$	$a_{2 p}$ $CL$	$a_{2 p}$ $NCL$	$a_{2 s}$
10%	.0	500	0.156	0.199	0.124	0.163	0.200	0.122	0.151	0.197	0.129	0.165	0.199	0.123	0.168	0.198	0.155	0.170	0.194	0.151
		1,000	0.111	0.135	0.087	0.111	0.138	0.09	0.114	0.139	0.091	0.114	0.135	0.091	0.115	0.140	0.109	0.114	0.138	0.112
		2,000	0.081	0.095	0.062	0.075	0.096	0.061	0.078	0.097	0.065	0.079	0.096	0.064	0.083	0.095	0.077	0.084	0.098	0.077
	.6	500	0.174	0.202	0.24	0.180	0.198	0.24	0.188	0.195	0.246	0.182	0.198	0.231	0.225	0.198	0.235	0.207	0.196	0.24
		1,000	0.128	0.137	0.221	0.125	0.135	0.221	0.120	0.138	0.211	0.123	0.143	0.213	0.152	0.134	0.192	0.159	0.137	0.19
		2,000	0.094	0.100	0.207	0.091	0.098	0.207	0.086	0.094	0.197	0.083	0.096	0.19	0.125	0.096	0.164	0.125	0.097	0.168
	.9	500	0.256	0.203	0.419	0.248	0.199	0.423	0.214	0.198	0.392	0.222	0.200	0.392	0.244	0.197	0.313	0.231	0.199	0.29
		1,000	0.224	0.138	0.414	0.240	0.138	0.395	0.164	0.138	0.36	0.163	0.136	0.355	0.162	0.138	0.272	0.175	0.137	0.264
		2,000	0.216	0.098	0.393	0.214	0.095	0.389	0.142	0.096	0.337	0.138	0.098	0.341	0.123	0.096	0.239	0.130	0.096	0.242
30%	.0	500	0.211	0.189	0.14	0.207	0.196	0.142	0.204	0.194	0.149	0.202	0.196	0.15	0.210	0.195	0.194	0.213	0.195	0.189
		1,000	0.137	0.130	0.103	0.141	0.133	0.101	0.143	0.131	0.108	0.140	0.133	0.103	0.145	0.137	0.134	0.147	0.131	0.135
		2,000	0.098	0.096	0.069	0.097	0.092	0.068	0.101	0.095	0.074	0.099	0.095	0.074	0.098	0.093	0.093	0.101	0.095	0.092
	.6	500	0.215	0.195	0.345	0.216	0.194	0.339	0.225	0.201	0.334	0.224	0.194	0.327	0.293	0.191	0.335	0.300	0.198	0.328
		1,000	0.158	0.136	0.319	0.159	0.136	0.314	0.147	0.137	0.305	0.152	0.135	0.299	0.226	0.135	0.277	0.228	0.136	0.272
		2,000	0.118	0.096	0.31	0.119	0.097	0.304	0.104	0.097	0.284	0.110	0.094	0.281	0.186	0.094	0.242	0.191	0.096	0.246
	.9	500	0.308	0.201	0.596	0.318	0.206	0.593	0.253	0.203	0.543	0.241	0.199	0.524	0.315	0.196	0.422	0.325	0.199	0.422
		1,000	0.284	0.139	0.579	0.292	0.141	0.581	0.191	0.142	0.503	0.187	0.135	0.505	0.246	0.135	0.357	0.266	0.138	0.386
		2,000	0.259	0.098	0.568	0.266	0.098	0.562	0.150	0.096	0.49	0.154	0.098	0.484	0.210	0.098	0.341	0.201	0.096	0.336
50%	.0	500	0.204	0.188	0.147	0.204	0.196	0.151	0.206	0.190	0.156	0.209	0.190	0.158	0.207	0.199	0.189	0.208	0.200	0.194
		1,000	0.142	0.132	0.104	0.144	0.131	0.103	0.145	0.137	0.11	0.145	0.135	0.11	0.142	0.138	0.131	0.143	0.140	0.132
		2,000	0.101	0.093	0.073	0.103	0.091	0.073	0.100	0.092	0.078	0.100	0.091	0.077	0.100	0.097	0.095	0.101	0.092	0.092
	.6	500	0.213	0.203	0.354	0.219	0.202	0.359	0.223	0.206	0.355	0.223	0.199	0.355	0.305	0.213	0.343	0.300	0.204	0.347
		1,000	0.153	0.142	0.335	0.155	0.141	0.328	0.154	0.139	0.315	0.152	0.143	0.322	0.244	0.146	0.285	0.234	0.148	0.295
		2,000	0.117	0.098	0.324	0.116	0.098	0.323	0.102	0.101	0.305	0.108	0.103	0.304	0.193	0.104	0.262	0.193	0.108	0.262
	.9	500	0.304	0.216	0.617	0.308	0.216	0.619	0.241	0.213	0.56	0.244	0.215	0.554	0.334	0.221	0.441	0.340	0.210	0.442
		1,000	0.266	0.147	0.593	0.267	0.146	0.594	0.180	0.152	0.528	0.183	0.149	0.519	0.261	0.147	0.397	0.259	0.155	0.386
		2,000	0.247	0.108	0.58	0.246	0.108	0.579	0.142	0.108	0.509	0.143	0.109	0.506	0.222	0.110	0.367	0.223	0.112	0.365

Note. RMSE = root mean square of error; CL = cross-loaded item; NCL = non-cross-loaded item.

In misspecified models, the misspecification specifically occurs on the cross-loaded items, ignoring the cross-loading. Table 4 reports the average RMSEs of the estimated item discrimination parameters when the models were misspecified considering three levels of quantity of cross-loading incorporating low, medium, or high magnitude of cross-loading. As shown in Table 4, the average RMSEs for the truly primary cross-loaded item discrimination parameters on the first dimension ( $a_{1 pcl}$ ) ranged from 0.080 to 1.119. The average RMSEs for the truly primary non-cross-loaded item discrimination parameters on the first dimension ( $a_{1 pncl}$ ) ranged from 0.083 to 0.262 across all combinations of conditions. However, the average RMSEs for the truly primary cross-loaded item discrimination parameters on the second dimension ( $a_{2 pcl}$ ) ranged from 0.075 to 1.144. The average RMSEs for the truly primary non-cross-loaded item discrimination parameters on the second dimension ( $a_{2 pncl}$ ) ranged from 0.084 to 0.269 across all combinations of conditions. Item discrimination estimates on the truly primary dimension tended to have much larger RMSEs when the item was supposed to be specified as cross-loaded compared with when it was non-cross-loaded. In addition, it should be noted that very similar patterns and values in terms of average RMSEs were observed for the truly primary item discrimination parameters on the first and second dimensions.

Table 4.

Average RMSE of the Truly Primary (CL and NCL) on the First and Second Dimensions When the Models Were Misspecified.

Quantity	Correlation	N	Low						Medium						High
			$a_{1}$			$a_{2}$			$a_{1}$			$a_{2}$			$a_{1}$			$a_{2}$
			$a_{1 p}$ $CL$	$a_{1 p}$ $NCL$	$a_{1 s}$	$a_{2 p}$ $CL$	$a_{2 p}$ $NCL$	$a_{2 s}$	$a_{1 p}$ $CL$	$a_{1 p}$ $NCL$	$a_{1 s}$	$a_{2 p}$ $CL$	$a_{2 p}$ $NCL$	$a_{2 s}$	$a_{1 p}$ $CL$	$a_{1 p}$ $NCL$	$a_{1 s}$	$a_{2 p}$ $CL$	$a_{2 p}$ $NCL$	$a_{2 s}$
10%	.0	500	0.153	0.199	—	0.161	0.200	—	0.148	0.198	—	0.162	0.199	—	0.184	0.200	—	0.181	0.197	—
		1,000	0.110	0.135	—	0.109	0.138	—	0.111	0.140	—	0.110	0.136	—	0.145	0.141	—	0.147	0.140	—
		2,000	0.080	0.095	—	0.075	0.097	—	0.083	0.097	—	0.082	0.097	—	0.135	0.096	—	0.133	0.099	—
	.6	500	0.215	0.201	—	0.217	0.198	—	0.294	0.194	—	0.304	0.197	—	0.448	0.197	—	0.426	0.196	—
		1,000	0.170	0.137	—	0.167	0.135	—	0.248	0.138	—	0.251	0.142	—	0.392	0.134	—	0.395	0.136	—
		2,000	0.139	0.099	—	0.144	0.098	—	0.238	0.094	—	0.242	0.095	—	0.385	0.096	—	0.387	0.097	—
	.9	500	0.263	0.200	—	0.267	0.196	—	0.420	0.196	—	0.417	0.196	—	0.746	0.194	—	0.740	0.195	—
		1,000	0.220	0.137	—	0.215	0.137	—	0.387	0.136	—	0.392	0.134	—	0.711	0.135	—	0.711	0.135	—
		2,000	0.190	0.097	—	0.201	0.094	—	0.364	0.095	—	0.370	0.097	—	0.691	0.094	—	0.697	0.094	—
30%	.0	500	0.209	0.189	—	0.204	0.194	—	0.209	0.189	—	0.207	0.191	—	0.261	0.202	—	0.261	0.198	—
		1,000	0.137	0.129	—	0.141	0.132	—	0.148	0.130	—	0.143	0.131	—	0.196	0.154	—	0.199	0.149	—
		2,000	0.098	0.095	—	0.098	0.091	—	0.109	0.095	—	0.107	0.095	—	0.153	0.126	—	0.156	0.123	—
	.6	500	0.297	0.186	—	0.294	0.184	—	0.449	0.183	—	0.460	0.182	—	0.839	0.178	—	0.837	0.181	—
		1,000	0.241	0.131	—	0.237	0.130	—	0.395	0.127	—	0.402	0.127	—	0.789	0.130	—	0.791	0.128	—
		2,000	0.214	0.092	—	0.209	0.093	—	0.368	0.091	—	0.377	0.088	—	0.765	0.099	—	0.770	0.101	—
	.9	500	0.379	0.186	—	0.369	0.188	—	0.615	0.184	—	0.613	0.181	—	1.119	0.174	—	1.144	0.176	—
		1,000	0.318	0.129	—	0.307	0.130	—	0.557	0.128	—	0.564	0.124	—	1.092	0.120	—	1.093	0.122	—
		2,000	0.289	0.091	—	0.288	0.090	—	0.541	0.087	—	0.540	0.089	—	1.061	0.087	—	1.053	0.086	—
50%	.0	500	0.206	0.184	—	0.206	0.191	—	0.228	0.182	—	0.231	0.185	—	0.410	0.262	—	0.407	0.269	—
		1,000	0.145	0.130	—	0.147	0.129	—	0.171	0.136	—	0.171	0.136	—	0.367	0.237	—	0.363	0.242	—
		2,000	0.104	0.092	—	0.106	0.091	—	0.128	0.101	—	0.127	0.105	—	0.338	0.226	—	0.339	0.223	—
	.6	500	0.290	0.185	—	0.294	0.181	—	0.448	0.173	—	0.454	0.170	—	0.887	0.183	—	0.886	0.180	—
		1,000	0.233	0.128	—	0.241	0.126	—	0.414	0.122	—	0.407	0.122	—	0.850	0.146	—	0.836	0.146	—
		2,000	0.208	0.088	—	0.209	0.088	—	0.383	0.088	—	0.388	0.091	—	0.816	0.123	—	0.816	0.122	—
	.9	500	0.344	0.175	—	0.348	0.181	—	0.584	0.171	—	0.590	0.170	—	1.098	0.167	—	1.103	0.167	—
		1,000	0.301	0.124	—	0.302	0.121	—	0.537	0.125	—	0.542	0.119	—	1.054	0.114	—	1.053	0.120	—
		2,000	0.272	0.086	—	0.274	0.086	—	0.519	0.083	—	0.519	0.084	—	1.033	0.084	—	1.034	0.085	—

Note. RMSE = root mean square of error; CL = cross-loaded item; NCL = non-cross-loaded item.

Secondary Item Discrimination Parameters

As shown in Table 3, the average RMSEs for the secondary item discrimination parameter for the first dimension ( $a_{1 s}$ ) ranged from 0.062 to 0.617. However, the average RMSEs for the secondary item discrimination parameter on the second dimension ( $a_{2 s}$ ) ranged from 0.061 to 0.619 across all combinations of conditions. It was interesting to observe that secondary item discrimination parameters on the first and second dimensions had very similar patterns and values in terms of average RMSEs. However, compared with the RMSEs of corresponding items on the primary dimension ( $a_{1 pcl}$ and $a_{2 pcl}$ ), the RMSEs were larger on the secondary dimension than on the primary dimension. It should be noted that there was no secondary item discrimination defined for the first and second dimension in the misspecified models ( $a_{1 s}$ and $a_{2 s}$ ).

Item Location Parameter ( $d$ )

Table 5 shows the average RMSEs for the item location parameter ( $d$ ) for both correct specified and misspecified models across all combinations of conditions. The average RMSEs for item location parameter when the model was correctly specified ranged from 0.067 to 0.169. The lowest average RMSE for the item location parameter was associated with the condition when the correlation between the abilities was .0 and the sample size was 2,000. The highest average RMSE for item location parameter was associated with the condition when the correlation between the abilities was .9 and the sample size was 500.

Table 5.

Average Root Mean Square of Error of the Item Location Parameter (d) for the Correct and Misspecified Models.

Quantity	Correlation	N	Correct specified models			Misspecified models
			Low	Medium	High	Low	Medium	High
10%	.0	500	0.139	0.140	0.139	0.139	0.140	0.138
		1,000	0.096	0.097	0.099	0.096	0.096	0.098
		2,000	0.069	0.069	0.067	0.069	0.069	0.067
	.6	500	0.138	0.140	0.142	0.137	0.140	0.140
		1,000	0.097	0.098	0.097	0.096	0.097	0.096
		2,000	0.069	0.068	0.069	0.069	0.067	0.069
	.9	500	0.138	0.143	0.140	0.137	0.143	0.139
		1,000	0.098	0.099	0.099	0.098	0.099	0.098
		2,000	0.070	0.069	0.070	0.069	0.069	0.070
30%	.0	500	0.144	0.141	0.147	0.143	0.138	0.144
		1,000	0.097	0.098	0.102	0.097	0.096	0.106
		2,000	0.068	0.069	0.070	0.067	0.069	0.079
	.6	500	0.142	0.143	0.151	0.141	0.140	0.145
		1,000	0.100	0.101	0.108	0.099	0.099	0.105
		2,000	0.069	0.072	0.073	0.068	0.071	0.074
	.9	500	0.144	0.145	0.155	0.142	0.142	0.151
		1,000	0.100	0.104	0.108	0.099	0.102	0.105
		2,000	0.069	0.074	0.077	0.068	0.072	0.075
50%	.0	500	0.140	0.141	0.147	0.139	0.137	0.145
		1,000	0.099	0.101	0.103	0.098	0.099	0.108
		2,000	0.069	0.071	0.072	0.069	0.070	0.082
	.6	500	0.145	0.149	0.163	0.143	0.146	0.157
		1,000	0.101	0.103	0.112	0.100	0.101	0.110
		2,000	0.071	0.073	0.079	0.070	0.071	0.079
	.9	500	0.148	0.154	0.169	0.145	0.149	0.165
		1,000	0.104	0.107	0.117	0.103	0.104	0.115
		2,000	0.072	0.075	0.081	0.070	0.073	0.079

Effect of Sample Size (RMSE)

Aligned with the previous studies (Bolt & Lall, 2003; Finch, 2010, 2011; Zhang, 2012), the results of this study indicated that item parameter recovery performed better in terms of RMSE as the sample size increased. As shown in Figures 2 and 3, all of the item discrimination parameters including primary and secondary item discrimination on first and second dimensions ( $a_{1 pcl}, a_{1 pncl}, a_{1 s} a_{2 pcl}, a_{2 pncl}, a_{2 s}$ ) had a consistent decreasing trend for both correct and misspecified models in terms of average RMSE as the sample size increased from 500 to 2,000. The lowest average RMSE of item discrimination for correct (0.061) and misspecified (0.075) models were associated with the sample size of 2,000 and the highest average RMSE of item discrimination for the correct (0.619) and misspecified (1.144) models were associated with the sample size of 500. Item location parameter for both correct and misspecified models had a consistent decreasing trend in terms of average RMSE as the sample size increased from 500 to 2,000 across all combinations of conditions. The lowest average RMSE of item location for the correct (0.067) and misspecified (0.067) models were associated with the sample size of 2,000 and the highest average RMSE of item location for the correct (0.169) and misspecified (0.165) models were associated with the sample size of 500.

Figure 2.

Average RMSE for Item Discrimination Parameters When the Models Were Correctly Specified.

Figure 3.

Average RMSE for Item Discrimination Parameters When the Models Were Misspecified.

Effect of Correlation (RMSE)

For the correctly specified models, when the sample size and quantity of cross-loading were held constant for each section while the correlation between abilities varied across the conditions, the average RMSEs increased consistently for the primary and secondary item discriminations on both dimensions ( $a_{1 pcl}, a_{1 pncl}, a_{1 s} a_{2 pcl}, a_{2 pncl}, a_{2 s}$ ) as the correlation increased from .0 to .9 across combination of conditions. This increasing trend was more obvious for the secondary item parameter discriminations on both dimensions ( $a_{1 s}$ and $a_{2 s}$ ) compared with the primary item discrimination parameters ( $a_{1 pcl}, a_{1 pncl}, a_{2 pcl}, a_{2 pncl}$ ). When the true correlation was zero, estimated item discrimination on the secondary dimension was estimated with the smallest RMSE, and item discrimination on the primary dimension (both cross-loading items and non-cross-loading items) was slightly larger. On the primary item discrimination parameters, the RMSE values for the cross-loaded items ( $a_{1 pcl}, a_{2 pcl}$ ) tended to increase at a higher rate as correlation increased than for the RMSE of the non-cross-loaded items ( $a_{1 pncl}, a_{2 pncl}$ ).

Correlation had very little to no effect on the RMSE of estimated item discrimination for the primary non-cross-loaded items; as correlation increased, the RMSE of the estimated item discrimination for primary cross-loaded items tended to decrease (a slight increase in RMSE when correlation increased from .0 to .6 and a larger increase when RMSE increased from .6 to .9). As correlation increased, the RMSE of the estimated item discrimination on the secondary dimension increased substantially. This may be due to the lack of freely estimated correlation in the model specification when calibrating the simulated data; all models assumed a correlation of .0 (Figure 2).

For the misspecified models, when the sample size and the quantity of the cross-loaded items were held constant for each section while the correlation between abilities varied across the conditions, the average RMSEs of the item discrimination parameter for the truly cross-loaded item (i.e., the model did not account for these cross-loadings) on first and second dimensions ( $a_{1 pcl}, a_{2 pcl}$ ) increased consistently as the correlation increased from .0 to .9 across combinations of conditions. This increase was much greater when the quantity of the cross-loaded items was the highest (i.e., 50%) and the magnitude of cross-loading was highest. However, the average RMSEs of item discrimination parameters for the truly non-cross-loaded items ( $a_{1 pncl}, a_{2 pncl}$ ) decreased slightly as the correlation increased from .0 to .9 across combinations of conditions. This decreasing trend was more obvious when the quantity of cross-loaded items was the highest at 50% and the magnitude of cross-loading was highest. The RMSE values for the truly cross-loaded items (i.e., those that were misspecified, $a_{1 pcl}, a_{2 pcl}$ ) were similar to the RMSE values on the non-cross-loaded items ( $a_{1 pncl}, a_{2 pncl}$ ) when correlation was zero, but increased to be greater than the discrimination on the non-cross-loaded items as the correlation increased from .0 to .9 across combination of conditions (Figure 3). In both correct and misspecified models, item location parameter had a consistent constant trend in terms of average RMSE as the correlation between abilities increased from .0 to .9 across all combinations of conditions.

Effect of the Quantity of Cross-Loaded Items (RMSE)

When the model was correctly specified the average RMSEs increased consistently for the primary cross-loaded and secondary item discrimination parameters on both dimensions ( $a_{1 pcl}, a_{1 s}, a_{2 pcl}, a_{2 s}$ ) as the quantity of the cross-loaded items increased from 10% to 30% to 50%. The RMSE values of the primary item discrimination parameters for the cross-loaded items ( $a_{1 pcl}, a_{2 pcl}$ ) were greater than the values on the non-cross-loaded items ( $a_{1 pncl}, a_{2 pncl}$ ) as the quantity of the cross-loaded items increased from 10% to 50%. The RMSE values for the discrimination of the non-cross-loaded items on both dimensions ( $a_{1 pncl}, a_{2 pncl}$ ) had a constant trend with slightly increasing pattern as the quantity of the cross-loaded items increased from 10% to 50%. The increasing trend was much greater for the secondary item parameter discriminations on both dimensions ( $a_{1 s}$ and $a_{2 s}$ ) compared with the primary item discrimination parameters ( $a_{1 pcl}, a_{1 pncl}, a_{2 pcl}, a_{2 pncl}$ ) especially when the magnitude of cross-loading was low or medium (Figure 2). When the models were misspecified, the average RMSEs for the truly cross-loaded item discrimination parameters on first and second dimensions (i.e., those that were misspecified, $a_{1 pcl}, a_{2 pcl}$ ) increased consistently across combinations of conditions as the quantity of the cross-loaded items increased from 10% to 50% for the magnitude of cross-loading was high. When the magnitude of cross-loading was low or medium, RMSE of the truly cross-loaded item discrimination increased when the quantity of the cross-loaded items increased from 10% to 30%, but had little to no change when the quantity of the cross-loaded items increased from 30% to 50%. However, the average RMSEs for the item discrimination parameters for non-cross-loaded items (i.e., no misspecification, $a_{1 pncl}, a_{2 pncl}$ ) decreased slightly as the quantity of the cross-loaded items decreased from 10% to 50% across combinations of conditions. The RMSE of estimated item discrimination was slightly higher for non-cross-loaded items ( $a_{1 pncl}, a_{2 pncl}$ ) than for truly cross-loaded ( $a_{1 pcl}, a_{2 pcl}$ ) (Figure 3). In both the correct specified and misspecified models as the quantity of the cross-loaded items increased from 10% to 50%, the item location parameter had a constant trend with negligible changes in terms of average RMSE across all combinations of conditions. In addition, as the quantity of the cross-loaded items increased from 10% to 50%, the item location parameter had a constant trend with negligible changes in terms of average RMSE across all combinations of conditions.

Effect of the Magnitude of Cross-Loading (RMSE)

The changes in the magnitude of cross-loading from low to medium to high had very little effect on the average RMSEs of item discrimination on non-cross-loaded items on both dimensions ( $a_{1 pncl}, a_{2 pncl}$ ). However, the average RMSEs of the item discrimination on cross-loaded items on both dimensions ( $a_{1 pcl}, a_{2 pcl}$ ) decreased slightly as the magnitude of cross-loading increased from low to medium, and remained constant as the magnitude of cross-loading increased to high. For the secondary item discrimination parameters ( $a_{1 s}$ and $a_{2 s}$ ) the average RMSEs was constant when shifting from low to medium magnitude of cross-loading. However, the RMSE of the secondary item discrimination parameters increased as the magnitude of cross-loading shifted from medium to high on both dimensions (Figure 2). When the models were misspecified, the average RMSEs for the item discrimination on non-cross-loaded items on both dimensions ( $a_{1 pncl}, a_{2 pncl}$ ) remained constant for low, medium and high magnitude of cross-loading when the quantity of cross-loaded items was 10% or 30%. However, when the quantity of cross-loaded items was 50% the non-crossloaded items had a increasing pattern in terms of RMSE when shifting from low to medium to high magnitude of cross-loading. The RMSEs for the item discrimination parameters were larger for the cross-loaded items compared with non-cross-loaded items only when correlation was .9 for the low and medium degrees of cross-loading, and only when correlation was .6 or higher for the high degree of cross-loading. For the secondary item discrimination parameters ( $a_{1 s}$ and $a_{2 s}$ ) the average RMSEs was not affected by changes in the degree of cross-loading when correlation was .0 and .6; when correlation was .9, the RMSE of the secondary item discrimination parameters had a decreasing trend on both dimensions. The average RMSEs for the item discrimination on cross-loaded items on both dimensions (i.e., those that were misspecified, $a_{1 pcl}, a_{2 pcl}$ ) had a large increasing pattern across the combinations of conditions as the magnitude of cross-loading increased from low to high (Figure 3). In both correct and misspecified models, the average RMSEs for item location parameter had a constant trend with a very slight increase as the magnitude of cross-loading shifted from low to high.

Section II: Average Bias

Average bias in each combination of conditions was calculated by comparing the true item parameter specification with the estimated item parameter. For the item discrimination parameters ( $a_{1} and a_{2}$ ) in correct specified models, the bias for the primary cross-loaded item discriminations on first and second dimensions ( $a_{1 pcl} and a_{2 pcl}$ ) were averaged based on those items that had a primary item discrimination and were cross-loaded on both dimensions. Likewise, the biases for the primary non-cross-loaded item discriminatis on first and second dimensions ( $a_{1 pncl} and a_{2 pncl}$ ) were averaged based on the items that had a primary item discrimination but were not cross-loaded on both dimensions. For the secondary item discriminations ( $a_{1 s} and a_{2 s}$ ) the biases were averaged based on the items that had a secondary item discrimination.

Item Discrimination Parameters

Tables 6 and 7 report the average bias of the estimated item discrimination parameters when the models were correctly specified and misspecified, respectively. In the correct specified models, the average bias for the primary cross-loaded item discrimination parameters on the first dimension ( $a_{1 pcl}$ ) ranged from −0.206 to 0.242 and on the second dimension ( $a_{2 pcl}$ ) ranged from −0.212 to 0.253. The average bias for the primary non-cross-loaded item discrimination parameters on the first dimension ( $a_{1 pncl}$ ) ranged from −0.071 to 0.001 and on the second dimension ( $a_{2 pncl}$ ) ranged from −0.060 to 0.003. The average bias for the secondary item discrimination parameter for the first dimension ( $a_{1 s}$ ) ranged from −0.583 to 0.006. However, the average bias for the secondary item discrimination parameter on the second dimension ( $a_{2 s}$ ) ranged from −0.586 to 0.007 across all combinations of conditions. It should be noted that there was no secondary item discrimination in misspecified models.

Table 6.

Average Bias of the Primary (CL and NCL) and Secondary Discrimination on the First and Second Dimensions When the Models Were Correctly Specified.

Quantity	Correlation	N	Low						Medium						High
			$a_{1}$			$a_{2}$			$a_{1}$			$a_{2}$			$a_{1}$			$a_{2}$
			$a_{1 p}$ $CL$	$a_{1 p}$ $NCL$	$a_{1 s}$	$a_{2 p}$ $CL$	$a_{2 p}$ $NCL$	$a_{2 s}$	$a_{1 p}$ $CL$	$a_{1 p}$ $NCL$	$a_{1 s}$	$a_{2 p}$ $CL$	$a_{2 p}$ $NCL$	$a_{2 s}$	$a_{1 p}$ $CL$	$a_{1 p}$ $NCL$	$a_{1 s}$	$a_{2 p}$ $CL$	$a_{2 p}$ $NCL$	$a_{2 s}$
10%	.0	500	−0.010	−0.016	−0.010	−0.013	−0.014	0.007	−0.011	−0.015	0.006	−0.011	−0.011	0.004	−0.014	−0.014	−0.003	−0.013	−0.013	−0.016
		1,000	0.002	−0.009	0.002	−0.013	−0.005	−0.002	−0.014	−0.008	<0.001	−0.013	−0.007	−0.007	−0.009	−0.007	−0.003	−0.007	−0.008	−0.006
		2,000	−0.004	−0.002	−0.002	0.001	−0.002	<0.001	−0.001	−0.001	0.001	−0.002	−0.001	−0.005	−0.002	0.001	−0.001	−0.004	−0.004	−0.004
	.6	500	0.038	−0.016	−0.195	0.034	−0.013	−0.198	−0.005	−0.009	−0.186	−0.019	−0.018	−0.175	−0.106	−0.022	−0.151	−0.089	−0.014	−0.163
		1,000	0.046	−0.009	−0.195	0.047	−0.003	−0.199	0.004	−0.008	−0.182	0.005	−0.006	−0.184	−0.084	−0.005	−0.145	−0.084	−0.005	−0.145
		2,000	0.051	−0.003	−0.194	0.045	−0.005	−0.194	<0.001	−0.005	−0.181	−0.003	−0.002	−0.174	−0.081	−0.003	−0.139	−0.083	0.000	−0.143
	.9	500	0.181	−0.014	−0.386	0.170	−0.013	−0.388	0.093	−0.014	−0.347	0.101	−0.019	−0.346	−0.087	−0.013	−0.233	−0.080	−0.019	−0.213
		1,000	0.188	−0.006	−0.396	0.207	−0.006	−0.377	0.098	−0.006	−0.334	0.097	−0.005	−0.330	−0.079	−0.008	−0.224	−0.074	−0.007	−0.221
		2,000	0.201	−0.003	−0.382	0.194	−0.002	−0.378	0.110	−0.001	−0.324	0.103	−0.003	−0.328	−0.067	−0.005	−0.213	−0.074	−0.004	−0.217
30%	.0	500	−0.022	−0.013	−0.001	−0.017	−0.019	−0.001	−0.014	−0.016	−0.006	−0.018	−0.012	−0.002	−0.019	−0.009	−0.009	−0.022	−0.018	−0.015
		1,000	−0.006	−0.006	0.002	−0.006	−0.004	−0.007	−0.010	−0.005	<0.001	−0.006	−0.004	−0.004	−0.010	−0.009	−0.014	−0.011	−0.008	−0.006
		2,000	−0.002	−0.001	−0.001	−0.004	−0.002	0.001	−0.004	−0.002	−0.001	−0.005	−0.003	−0.003	0.001	0.001	−0.005	−0.001	−0.005	−0.003
	.6	500	0.042	−0.018	−0.306	0.053	−0.021	−0.296	−0.020	−0.029	−0.282	−0.029	−0.023	−0.272	−0.169	−0.024	−0.238	−0.165	−0.030	−0.228
		1,000	0.053	−0.013	−0.297	0.060	−0.010	−0.291	−0.008	−0.019	−0.277	−0.007	−0.017	−0.272	−0.147	−0.024	−0.225	−0.151	−0.023	−0.216
		2,000	0.059	−0.010	−0.299	0.065	−0.011	−0.293	0.003	−0.014	−0.268	−0.004	−0.014	−0.265	−0.142	−0.017	−0.215	−0.147	−0.014	−0.218
	.9	500	0.220	−0.029	−0.564	0.234	−0.035	−0.559	0.081	−0.027	−0.495	0.083	−0.030	−0.478	−0.172	−0.033	−0.331	−0.186	−0.029	−0.333
		1,000	0.242	−0.023	−0.562	0.253	−0.021	−0.564	0.110	−0.025	−0.479	0.100	−0.022	−0.482	−0.158	−0.026	−0.307	−0.184	−0.029	−0.336
		2,000	0.240	−0.015	−0.558	0.246	−0.016	−0.553	0.103	−0.018	−0.477	0.107	−0.017	−0.472	−0.161	−0.021	−0.315	−0.152	−0.021	−0.310
50%	.0	500	−0.021	−0.011	−0.004	−0.021	−0.017	0.002	−0.015	−0.016	−0.006	−0.015	−0.010	−0.006	−0.020	−0.014	−0.021	−0.008	−0.013	−0.013
		1,000	−0.007	−0.005	−0.001	−0.012	−0.005	−0.003	−0.011	−0.009	−0.003	−0.011	−0.008	0.001	−0.010	−0.010	−0.003	−0.007	−0.003	−0.011
		2,000	−0.008	−0.001	−0.002	−0.002	0.003	0.001	−0.004	−0.005	−0.003	−0.002	<0.001	<0.001	−0.003	−0.002	−0.002	−0.003	−0.004	<0.001
	.6	500	0.041	−0.036	−0.311	0.042	−0.033	−0.316	−0.016	−0.046	−0.302	−0.022	−0.042	−0.301	−0.182	−0.060	−0.255	−0.180	−0.052	−0.255
		1,000	0.054	−0.027	−0.313	0.047	−0.026	−0.305	−0.016	−0.034	−0.288	−0.012	−0.038	−0.295	−0.168	−0.044	−0.237	−0.161	−0.044	−0.246
		2,000	0.057	−0.022	−0.312	0.056	−0.021	−0.312	−0.006	−0.033	−0.290	−0.007	−0.031	−0.288	−0.153	−0.037	−0.236	−0.153	−0.041	−0.235
	.9	500	0.221	−0.056	−0.583	0.217	−0.048	−0.586	0.070	−0.058	−0.515	0.075	−0.056	−0.505	−0.206	−0.071	−0.355	−0.212	−0.060	−0.359
		1,000	0.223	−0.043	−0.575	0.223	−0.045	−0.576	0.086	−0.048	−0.505	0.089	−0.050	−0.496	−0.193	−0.053	−0.353	−0.181	−0.057	−0.343
		2,000	0.227	−0.042	−0.570	0.225	−0.041	−0.570	0.092	−0.044	−0.496	0.093	−0.046	−0.494	−0.180	−0.049	−0.343	−0.181	−0.051	−0.342

Note. CL = cross-loaded item; NCL = non-cross-loaded item.

Table 7.

Average Bias of the Truly Primary (CL and NCL) on the First and Second Dimensions When the Models Were Misspecified.

Quantity	Correlation	N	Low						Medium						High
			$a_{1}$			$a_{2}$			$a_{1}$			$a_{2}$			$a_{1}$			$a_{2}$
			$a_{1 p}$ $CL$	$a_{1 p}$ $NCL$	$a_{1 s}$	$a_{2 p}$ $CL$	$a_{2 p}$ $NCL$	$a_{2 s}$	$a_{1 p}$ $CL$	$a_{1 p}$ $NCL$	$a_{1 s}$	$a_{2 p}$ $CL$	$a_{2 p}$ $NCL$	$a_{2 s}$	$a_{1 p}$ $CL$	$a_{1 p}$ $NCL$	$a_{1 s}$	$a_{2 p}$ $CL$	$a_{2 p}$ $NCL$	$a_{2 s}$
10%	.0	500	0.001	−0.016	—	0.000	−0.014	—	0.024	−0.015	—	0.025	−0.012	—	0.108	−0.014	—	0.106	−0.014	—
		1,000	0.012	−0.009	—	−0.003	−0.005	—	0.023	−0.008	—	0.022	−0.007	—	0.107	−0.008	—	0.108	−0.009	—
		2,000	0.005	−0.002	—	0.010	−0.002	—	0.034	−0.001	—	0.032	−0.001	—	0.114	0.001	—	0.112	−0.005	—
	.6	500	−0.124	−0.016	—	−0.126	−0.013	—	−0.223	−0.008	—	−0.238	−0.017	—	−0.393	−0.021	—	−0.377	−0.014	—
		1,000	−0.118	−0.008	—	−0.116	−0.002	—	−0.213	−0.007	—	−0.213	−0.005	—	−0.366	−0.005	—	−0.366	−0.004	—
		2,000	−0.110	−0.003	—	−0.117	−0.005	—	−0.218	−0.005	—	−0.223	−0.002	—	−0.370	−0.002	—	−0.373	0.000	—
	.9	500	−0.192	−0.013	—	−0.202	−0.012	—	−0.370	−0.014	—	−0.362	−0.018	—	−0.699	−0.012	—	−0.695	−0.017	—
		1,000	−0.181	−0.005	—	−0.172	−0.005	—	−0.361	−0.005	—	−0.365	−0.004	—	−0.690	−0.007	—	−0.687	−0.006	—
		2,000	−0.169	−0.002	—	−0.179	−0.001	—	−0.349	−0.001	—	−0.354	−0.003	—	−0.678	−0.004	—	−0.684	−0.003	—
30%	.0	500	−0.021	−0.007	—	−0.015	−0.013	—	−0.026	0.007	—	−0.030	0.010	—	−0.115	0.083	—	−0.111	0.072	—
		1,000	−0.008	0.000	—	−0.008	0.002	—	−0.025	0.018	—	−0.022	0.019	—	−0.102	0.081	—	−0.106	0.084	—
		2,000	−0.005	0.005	—	−0.007	0.005	—	−0.023	0.021	—	−0.023	0.021	—	−0.095	0.090	—	−0.095	0.084	—
	.6	500	−0.195	−0.005	—	−0.186	−0.008	—	−0.368	−0.001	—	−0.384	0.006	—	−0.778	0.044	—	−0.776	0.042	—
		1,000	−0.181	−0.001	—	−0.176	0.002	—	−0.356	0.008	—	−0.359	0.010	—	−0.754	0.046	—	−0.757	0.045	—
		2,000	−0.179	0.003	—	−0.176	0.002	—	−0.345	0.013	—	−0.353	0.013	—	−0.746	0.052	—	−0.751	0.054	—
	.9	500	−0.288	−0.013	—	−0.277	−0.016	—	−0.546	−0.005	—	−0.551	−0.007	—	−1.062	0.004	—	−1.080	0.008	—
		1,000	−0.270	−0.006	—	−0.259	−0.004	—	−0.521	−0.003	—	−0.528	−0.001	—	−1.059	0.011	—	−1.061	0.004	—
		2,000	−0.264	−0.001	—	−0.262	−0.001	—	−0.521	0.003	—	−0.521	0.004	—	−1.042	0.014	—	−1.035	0.015	—
50%	.0	500	−0.034	0.004	—	−0.034	−0.001	—	−0.085	0.043	—	−0.085	0.048	—	−0.329	0.204	—	−0.326	0.212	—
		1,000	−0.024	0.011	—	−0.028	0.010	—	−0.082	0.048	—	−0.082	0.050	—	−0.323	0.207	—	−0.320	0.211	—
		2,000	−0.024	0.013	—	−0.020	0.018	—	−0.074	0.052	—	−0.073	0.057	—	−0.314	0.211	—	−0.315	0.209	—
	.6	500	−0.192	−0.004	—	−0.188	0.000	—	−0.377	0.021	—	−0.385	0.024	—	−0.832	0.079	—	−0.832	0.085	—
		1,000	−0.175	0.004	—	−0.184	0.005	—	−0.376	0.028	—	−0.368	0.023	—	−0.820	0.090	—	−0.807	0.088	—
		2,000	−0.176	0.009	—	−0.177	0.010	—	−0.364	0.028	—	−0.367	0.031	—	−0.800	0.095	—	−0.800	0.092	—
	.9	500	−0.259	−0.008	—	−0.261	−0.004	—	−0.521	−0.002	—	−0.524	0.002	—	−1.045	0.010	—	−1.047	0.015	—
		1,000	−0.254	−0.001	—	−0.255	−0.001	—	−0.505	0.005	—	−0.508	0.005	—	−1.027	0.021	—	−1.024	0.021	—
		2,000	−0.246	0.000	—	−0.248	0.001	—	−0.500	0.009	—	−0.501	0.008	—	−1.017	0.026	—	−1.018	0.024	—

Note. CL = cross-loaded item; NCL = non-cross-loaded item.

In misspecified models, the average bias for the truly primary cross-loaded item discrimination parameters (i.e., items that were supposed to be specified as cross-loaded) on the first dimension ( $a_{1 pcl}$ ) ranged from −1.062 to 0.114 and on the second dimension ( $a_{2 pcl}$ ) ranged from −1.080 to 0.112. The average bias for the primary non-cross-loaded item discrimination parameters on the first dimension ( $a_{1 pncl}$ ) ranged from −0.021 to 0.211 and on the second dimension ( $a_{2 pncl}$ ) ranged from 0.018 to 0.212 across all combinations of conditions across all combinations of conditions. In conclusion, it should be noted that item discrimination estimates tended to have negative values when the item was cross-loaded ( $a_{1 pcl}$ , $a_{2 pcl}$ ) suggesting that estimated primary item discrimination parameters on the truly cross-loaded items were somewhat smaller than their true values. However, item discrimination estimates tended to have positive values generally closer to 0.000 when the item was non-cross-loaded ( $a_{1 pncl}$ , $a_{2 pncl}$ ) suggesting that estimated primary item discrimination parameters on the non-cross-loaded items were somewhat close to or slightly greater than their true values.

Item Location Parameter ( $d$ )

Table 8 reports the average bias for the item location parameter ( $d$ ) for both correct specified and misspecified models across all combinations of conditions. The average bias for item location parameter when the model was correctly specified ranged from −0.157 to 0.244. It was interesting that similar to the correctly specified models the average bias for item location parameter when the models were misspecified ranged from −0.011 to 0.010.

Table 8.

Average Bias of the Item Location Parameter (d) for the Correct and Misspecified Models.

Quantity	Correlation	N	Correct specified models			Misspecified models
			Low	Medium	High	Low	Medium	High
10%	.0	500	−0.003	−0.001	−0.005	−0.003	−0.001	−0.005
		1,000	−0.002	0.003	0.005	−0.002	0.003	0.005
		2,000	<0.001	0.001	−0.001	<0.001	<0.001	−0.001
	.6	500	<0.001	−0.001	−0.003	<0.001	−0.001	−0.003
		1,000	0.002	0.001	0.001	0.002	0.001	0.001
		2,000	0.001	0.002	<0.001	0.001	0.002	<0.001
	.9	500	−0.002	<0.001	0.002	−0.002	<0.001	0.002
		1,000	0.001	<0.001	−0.005	0.001	<0.001	−0.005
		2,000	−0.002	−0.003	0.005	−0.002	−0.003	0.005
30%	.0	500	0.003	−0.003	0.001	0.002	−0.003	0.001
		1,000	0.003	0.004	<0.001	0.003	0.004	<0.001
		2,000	−0.001	−0.001	0.003	−0.001	−0.001	0.003
	.6	500	0.002	0.001	0.010	0.002	<0.001	0.010
		1,000	0.005	−0.002	0.001	0.005	−0.002	0.001
		2,000	<0.001	<0.001	−0.002	0.001	0.000	−0.002
	.9	500	0.006	−0.002	0.003	0.006	−0.003	0.003
		1,000	0.001	0.001	0.002	0.001	0.001	0.002
		2,000	<0.001	<0.001	<0.001	<0.001	<0.001	<0.001
50%	.0	500	−0.005	0.010	0.001	−0.005	0.010	<0.001
		1,000	0.003	−0.001	−0.002	0.003	−0.002	−0.002
		2,000	−0.002	0.003	0.001	−0.002	0.003	<0.001
	.6	500	−0.001	0.003	0.002	−0.001	0.004	0.003
		1,000	−0.004	−0.003	0.003	−0.004	−0.002	0.002
		2,000	0.004	−0.001	0.002	0.004	−0.001	0.003
	.9	500	0.001	0.006	−0.012	0.001	0.006	−0.011
		1,000	<0.001	−0.002	0.008	0.001	−0.001	0.008
		2,000	−0.001	−0.002	<0.001	−0.001	−0.002	<0.001

Effect of Sample Size (Bias)

For both the correct specified and misspecified models all of the item discrimination parameters including primary and secondary item discrimination on first and second dimensions ( $a_{1 pcl}, a_{1 pncl}, a_{1 s} a_{2 pcl}, a_{2 pncl}, a_{2 s}$ ) had very small or no change of average bias as sample size changed (Figures 4 and 5). The average bias for item location parameter for both correct specified or misspecified models was near zero, with no effect as the sample size increased from 500 to 2,000, suggesting that the estimated item location parameter were close to the true values (ranged from −0.012 to 0.010).

Figure 4.

Average Bias for Item Discrimination Parameters When the Models Were Correctly Specified.

Figure 5.

Average Bias for Item Discrimination Parameters When the Models Were Misspecified.

Effect of Correlation Between Abilities (Bias)

As shown in Figure 4, in correct specified models, when the degree of cross-loading was low or medium, the discrimination of cross-loaded items on both dimensions ( $a_{1 pcl} and a_{2 pcl}$ ) was equal to or near zero when data were truly uncorrelated; as correlation increased, the average bias of cross-loaded items departed from zero in the positive direction. These positive values indicate that estimated cross-loaded primary item discrimination parameters on both of the dimensions were somewhat greater than their true values. However, when the degree of cross-loading was high, the item discrimination of cross-loaded items on both dimensions ( $a_{1 pcl} and a_{2 pcl}$ ) departed from zero in the negative direction as the correlation between abilities increased from .0 to .9. These negative values indicate that estimated item discrimination parameters of cross-loaded items on both of the dimensions were somewhat smaller than their true values. For non-cross-loaded items’ discrimination on both dimensions ( $a_{1 pncl} and a_{2 pncl}$ ), the average bias had a constant pattern with values close to 0.000 when data were truly uncorrelated, and approaching −0.05 as correlation increased to .9. Secondary item discrimination parameters on both dimensions ( $a_{1 s}, a_{2 s}$ ) had a consistent changing pattern with increasing negative values as the correlation increased from .0 to .9 across combinations of conditions. These negative values indicate that estimated cross-loaded secondary item discrimination parameters on both of the dimensions were smaller than their true values.

However, when the models were misspecified, the truly cross-loaded primary item discrimination on both dimensions (i.e., misspecified items $a_{1 pcl} and a_{2 pcl}$ ) had an increasing pattern in absolute value (with negative values) as the correlation between abilities increased from .0 to .9. These negative values indicate that estimated cross-loaded primary item discrimination parameters on both of the dimensions were somewhat smaller than their true values. When the magnitude of cross-loading was low or medium, non-cross-loaded item discriminations on both dimensions (i.e., not misspecified, $a_{1 pncl} and a_{2 pncl}$ ) had a constant pattern with values close to 0.000 as the correlation between abilities increased from .0 to .9. However, when the magnitude of cross-loading was high, non-cross-loaded item discrimination on both dimensions ( $a_{1 pncl} and a_{2 pncl}$ ) remained constant with values near zero when the quantity of cross-loaded items was 10% and increased (in the positive direction) as correlation increased when the quantity of cross-loaded items was 30% or 50% (Figure 5). The average bias for item location parameter had a constant trend with values close to zero as the correlation increased from .0 to .9 for both correct specified and misspecified models.

Effect of the Quantity of the Cross-Loaded Items (Bias)

As shown in Figure 4, for the correct specified models, When the magnitude of cross-loading was low or medium, the average bias of the item discrimination parameters on cross-loaded items ( $a_{1 pcl}, a_{2 pcl}$ ) had an increasing pattern with positive values as the quantity of the cross-loaded items increased from 10% to 50% (holding all other variables constant). However, when the magnitude of cross-loading was high, as the quantity of the cross-loaded items increased from 10% to 30% and 50%, the average bias of the item discrimination on cross-loaded item departed from zero in the negative direction, suggesting that the estimated item parameters were smaller than the true item parameters. The average bias for the item discrimination on non-cross-loaded items ( $a_{1 pncl}, a_{2 pncl}$ ) had a constant trend with values close to 0.000 across all levels of the quantity of cross-loaded items. The average bias for the secondary item discriminations on both dimensions ( $a_{1 s}, a_{2 s}$ ) departed from zero in the negative direction as the quantity of the cross-loaded increased from 10% to 50% especially when the correlation between abilities was either .6 or .9. The average bias for secondary item discrimination parameters were generally negative suggesting that the estimated item parameters were smaller than the true item parameters. It seemed that when the correlation between the abilities was .0, as the quantity of the cross-loaded items increased from 10% to 50%, the average bias of secondary item discrimination parameters had a constant trend with values closer to 0.000 in correctly specified models.

For misspecified models, the primary item discrimination parameters on truly cross-loaded items (i.e., misspecified items, $a_{1 pcl}, a_{2 pcl}$ ) had an increasing pattern in absolute value with generally negative values as the quantity of the cross-loaded items increased from 10% to 50%. These negative values indicate that the estimated item parameters were smaller than the true item parameters. When the magnitude of cross-loading was low, the average bias for the primary item discrimination on non-cross-loaded items ( $a_{1 pncl}, a_{2 pncl}$ ) had a constant trend with values close to 0.000 as the quantity of the cross-loaded items increased from 10% to 50%. However, when the magnitude of cross-loading was medium or high, for the primary item discrimination on non-cross-loaded items ( $a_{1 pncl}, a_{2 pncl}$ ) as the quantity of the cross-loaded items increased from 10% to 50%, the average bias increased with positive values when correlation was greater than zero, suggesting that estimated values were greater than the true values (Figure 5). As the quantity of the cross-loaded items increased from 10% to 50% item location parameter had a consistent steady trend in terms of average bias with values closer to 0.000 across all combination of conditions for both correct and misspecified models.

Effect of Magnitude of Cross-Loading (Bias)

For the correct specified models, as the magnitude of cross-loading increased from low to medium the average bias decreased with values close to zero for the primary item discrimination parameters on cross-loaded items on both dimension ( $a_{1 pcl}, a_{2 pcl}$ ) with generally positive values, and continued to decrease, departing farther from zero in the negative direction for high degree of cross-loading. This pattern was more obvious when the correlation between abilities was .6 or .9 compared with a correlation of .0. The primary item discrimination parameters on non-cross-loaded items on both dimension ( $a_{1 pncl}, a_{2 pncl}$ ) had a constant trend in terms of average bias as the magnitude of cross-loading changed. The average bias for the seconda item discriminations on both dimensions ( $a_{1 s}, a_{2 s}$ ) was near zero when correlation was .0 at all levels of cross-loading magnitude. When correlation was greater than zero, the average bias departed from zero in the negative direction, and to a greater degree when the degree of cross-loading was low or medium. The negative values suggested that the estimated parameter were smaller than the true item parameters (Figure 4). For misspecified models, As the magnitude of cross-loading increased from low to high the average bias had larger departures from zero in the negative direction for the primary item discrimination parameters on truly cross-loaded items on both dimension (i.e., misspecified items, $a_{1 pcl}, a_{2 pcl}$ ). The item discrimination parameters on non-cross-loaded items on both dimension ( $a_{1 pncl}, a_{2 pncl}$ ) remained near zero at all levels of cross-loading (Figure 5). The average bias for item location parameter was near zero, with no effect as the magnitude of cross-loading changed, suggesting that the estimated item location parameter were close to the true values (ranged from −0.012 to 0.010) for both correct and misspecified models.

Discussion

Although model fit could be useful to identify cross-loadings, the purpose of this study was to investigate the influence of the quantity and the magnitude of cross-loading on secondary dimensions and model specification, especially when the model was specified as simple structure, ignoring the cross-loading on item parameter recovery in MIRT models. It should be noted that it is difficult to compare the result of this study to those of Finch (2011), Zhang (2012) and Svetina et al. (2017) due to different structures of the data in terms of item discrimination and item location specifications, quantity and magnitude of cross-loading, model specification or the distribution of latent abilities. Comparison of the results of this study to studies of Finch (2011), Zhang (2012) and Svetina et al. (2017) should be made with caution as those studies focused on different combinations of influencing variables and conditions. For instance, Finch (2011) focused on the complex MIRT models when the distribution of latent abilities were non-normal. Zhang (2012) focused on comparing the precision of item parameter estimation in unidimensional and multidimensional estimation approaches within simple structure and mixed structure environments. This study was primarily an extension of Svetina et al. (2017) in which the authors focused on comparison of item parameter estimation under complex structure when the distribution of abilities were non-normal with balanced and imbalanced item discriminations but not incorporating the magnitude of cross-loading and model specification effects.

The results of this study have implications for test or instrument developers and practitioners especially for those that are involved with multidimensional item–response data. When the quantity and magnitude of cross-loadings are misspecified or ignored, item discrimination parameters are adversely affected. Under all circumstances, a larger sample size improved the item discrimination estimations, but even with a sample as large as 2,000, if cross-loadings are ignored and data are treated as having a simple structure, item discrimination estimates are severely adversely affected, especially when the magnitude of the cross-loading is high. This can ultimately result in inaccurate inferences regarding the examinees’ abilities on each dimension. Therefore, it is imperative that test designers take variables such as quantity and magnitude of cross-loading and model specification into account to have accurate inferences about the item parameter recovery and ultimately examinees’ abilities on multiple dimensions.

Test designers should consider that even if the model is correctly specified and quantity of cross-loading is taken into account, still there may be effects of the magnitude of cross-loading on the secondary dimension on item discrimination estimation precision. For instance, on a mathematics test, one or some of the primarily algebra items may require some little, moderate, or high secondary geometry knowledge to answer the item correctly (i.e., magnitude of cross-loading). Test developers should take into account the degree of cross-loading when tests for accurate item and examinee parameter estimates. In addition, test designers should be aware of and cautious that utilizing a misspecified simple structure multidimensional model to evaluate the items, ignoring the and magnitude of cross-loading of some items on the secondary dimension, could have serious consequences regarding item discrimination estimation accuracy, that is, misspecifying the model and ignoring the cross-loading on the items that primarily measure algebra knowledge and require some geometry knowledge. The results of this study support the conclusion that the quantity of cross-loaded items and the magnitude of cross-loading on secondary dimension and model specification had an influence on the precision of item discrimination recovery that ultimately result in inappropriate inferences about the latent abilities of the examinees.

In this simulation study, there exist some limitations that should be noted for future studies. First, the item–response data were generated and analyzed utilizing R programming and simulation study techniques and it is possible that the results in real-world situation differ when actual data from instruments such as tests and surveys are analyzed. Example of such factors could be the distribution of the examinees’ abilities (i.e., depression), testing environment conditions, etc. Second, item–response data in this study were simulated considering the sample size of examinees similar to large-scale tests and surveys, and it is likely that the results differ when the number of examinees are relatively small. Although many variables were manipulated within the context of this simulation study, for future studies, it should be noted that in addition to the variables manipulated in this study, there are a number of other variables that could influence the precision of item parameter recovery in MIRT models. For instance, a compensatory 2PL-MIRT model with two dimensions and dichotomous item–response type was considered to simulate and calibrate the data for every replication for each condition combinations. It would be interesting to further investigate the effect of the manipulated variables in this study on other MIRT models, such as having more than two dimensions or bifactor and higher order models. In addition, it would be interesting to see how items would be recovered under a non-compensatory MIRT model considering the manipulated variables in this study. Finally, not only is IRT used for item evaluation, it is used for scoring respondents. Scores may be used to diagnose depression or anxiety, to classify respondents into groups, or to benchmark students in education. In any situation, it is imperative that scores provide accurate understandings of underlying abilities. Model misspecification, quantity, and magnitude of cross-loading often affect item parameter estimates, which then is likely to affect estimated trait scores. Future analysis may also investigate the effects of these variables on estimated trait scores.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Mostafa Hosseinzadeh

References

Adams

R. J.

Wilson

Wang

(1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21(1), 1–23. https://doi.org/10.1177/0146621697211001

Bolt

D. M.

Lall

V. F.

(2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 27(6), 395–414. https://doi.org/10.1177/0146621603258350

Chalmers

R. P.

(2012). Mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48, 1–29.

Finch

(2010). Item parameter estimation for the MIRT model: Bias and precision of confirmatory factor analysis—Based models. Applied Psychological Measurement, 34(1), 10–26. https://doi.org/10.1177/0146621609336112

Finch

(2011). Multidimensional item response theory parameter estimation with nonsimple structure items. Applied Psychological Measurement, 35(1), 67–82.

Finch

Habing

B. T.

(2005). Comparison of NOHARM and DETECT in item cluster recovery: Counting dimensions and allocating items. Journal of Educational Measurement, 42, 149–169.

Hulin

C. L.

Drasgow

Parsons

C. K.

(1983). Item response theory: Application to psychological measurement. Dow Jones-Irwin.

McDonald

R. P.

(1999). Test theory: A unified treatment. Lawrence Erlbaum.

Reckase

M. D.

(1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9(4), 401–412. https://doi.org/10.1177/014662168500900409

10.

Reckase

M. D.

(2009). Multidimensional item response theory. Springer-Verlag. https://doi.org/10.1007/978-0-387-89976-3

11.

RStudio Team. (2018). RStudio: Integrated development for R. http://www.rstudio.com/

12.

Strachan

Ackerman

Chen

S.-H.

Willse

(2020). Robustness of projective IRT to misspecification of the underlying multidimensional model. Applied Psychological Measurement, 44(5), 362–375. https://doi.org/10.1177/0146621620909894

13.

Svetina

(2013). Assessing dimensionality in noncompensatory MIRT with complex structure. Educational and Psychological Measurement, 73, 312–338.

14.

Svetina

Levy

(2016). Dimensionality in compensatory MIRT when complex structure exists: Evaluation of DETECT and NOHARM. The Journal of Experimental Education, 84(2), 398–420.

15.

Svetina

Valdivia

Underhill

Dai

Wang

(2017). Parameter recovery in multidimensional item response theory models under complexity and nonnormality. Applied Psychological Measurement, 41(7), 530–544. https://doi.org/10.1177/0146621617707507

16.

Zhang

(2007). Conditional covariance theory and detect for polytomous items. Psychometrika, 72, 69–91.

17.

Zhang

(2012). Calibration of response data using MIRT models with simple and mixed structures. Applied Psychological Measurement, 36(5), 375–398. https://doi.org/10.1177/0146621612445904

Effects of the Quantity and Magnitude of Cross-Loading and Model Specification on MIRT Item Parameter Recovery

Abstract

Keywords

Introduction

Literature Review

Method

Correlation Between Dimensions

Sample Size

Model Specification

Analysis

Evaluation Criteria

Results

Section I: RMSE

Item Discrimination Parameters

Primary Item Discrimination Parameters

Secondary Item Discrimination Parameters

Item Location Parameter ( d )

Effect of Sample Size (RMSE)

Effect of Correlation (RMSE)

Effect of the Quantity of Cross-Loaded Items (RMSE)

Effect of the Magnitude of Cross-Loading (RMSE)

Section II: Average Bias

Item Discrimination Parameters

Item Location Parameter ( d )

Effect of Sample Size (Bias)

Effect of Correlation Between Abilities (Bias)

Effect of the Quantity of the Cross-Loaded Items (Bias)

Effect of Magnitude of Cross-Loading (Bias)

Discussion

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

References

Item Location Parameter ( $d$ )

Item Location Parameter ( $d$ )