MIMIC Methods for Assessing Differential Item Functioning in Polytomous Items

Abstract

Three multiple indicators—multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), and the MIMIC method with a pure anchor (M-PA), were developed to assess differential item functioning (DIF) in polytomous items. In a series of simulations, it appeared that all three methods yielded a well-controlled Type I error rate when tests did not contain any DIF items. M-ST and M-SP began to yield an inflated Type I error rate and a deflated power when tests contained 10% and 20% DIF items, respectively. M-PA maintained an expected Type I error rate and a high power even when tests contained as many as 40% DIF items. An iterative MIMIC procedure was proposed to select a small set of DIF-free items to serve as the anchor in M-PA. It was found in a series of simulations that this procedure yielded a very high rate of accuracy. Two simulated data sets were then analyzed to show applications of these MIMIC methods for DIF assessment in polytomous items.

Keywords

differential item functioning MIMIC scale purification item-response theory Rasch measurement

References

Ankenmann, R.D. , Witt, E.A. , & Dunbar, S.B. ( 1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36, 277-300.

Birnbaum, A. ( 1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-424). Reading, MA : Addison-Wesley.

Browne, M.W. , & Cudeck, R. ( 1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136-162). Newbury Park, CA: Sage.

Chang, H.-H. , Mazzeo, J. , & Roussos, J. ( 1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement , 33, 333-353.

Clauser, B. , Mazor, K. , & Hambleton, R.K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel–Haenszel procedure. Applied Measurement in Education, 6, 269-279.

Finch, H. ( 2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295.

French, B.F. , & Maller, S.J. ( 2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67, 373-393.

González-Romá, V. , Hernández , A. , & Gómez-Benito, J. ( 2006). Power and Type I error of the mean and covariance structure analysis model for detecting differential item functioning in graded response items. Multivariate Behavioral Research, 41, 29-53.

Holland, W.P. , & Thayer, D.T. ( 1988). Differential item performance and the Mantel–Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum.

10.

Linacre, M.J. ( 2003). Winsteps Rasch measurement software [Computer software] . Chicago: Winsteps.

11.

Lord, F.M. ( 1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

12.

Masters, G.N. ( 1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.

13.

Meade, A.W. , Lautenschlager, G.J. , & Johnson, E.C. ( 2007). A Monte Carlo examination of the sensitivity of the differential functioning of items and tests framework for tests of measurement invariance with Likert data. Applied Psychological Measurement, 31, 430-455.

14.

Muraki, E. ( 1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159-176.

15.

Muthén, B.O. , du Toit, S.H.C. , & Spisic, D. ( 1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Retrieved from http://www.gseis.ucla.edu/faculty/muthen/articles/Article&lowbar;075.pdf

16.

Muthén, L.K. , & Muthén, B.O. (2004). Mplus user’s guide. Los Angeles: Muthén & Muthén.

17.

Navas-Ara, M.J. , & Gómez-Benito, J. (2002). Effects of ability scale purification on identification of DIF. European Journal of Psychological Assessment, 18, 9-15.

18.

Penfield, R.D. ( 2007). Assessing differential step functioning in polytomous items using a common odds ratio estimator. Journal of Educational Measurement, 44, 187-210.

19.

Samejima, F. ( 1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 17, 1-100.

20.

Shealy, R.T. , & Stout, W.F. ( 1993). A model-biased standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159-194.

21.

Shih, C.-L. , & Wang, W.-C. (2009). Differential item functioning detection using the multiple indicators, multiple causes MIMIC method with a pure short anchor . Applied Psychological Measurement, 33, 184-199.

22.

Stark, S. , Chernyshenko, O.S. , & Drasgow, F. ( 2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1292-1306.

23.

Sternberg, R.J. , & Wagner, R.K. ( 1992). Thinking styles inventory. Unpublished manual, Yale University, New Haven, CT.

24.

Su, Y.-H. , & Wang, W.-C. ( 2005). Efficiency of the Mantel, generalized Mantel-Haenszel, and logistic discriminant function analysis methods in detecting differential item functioning for polytomous items. Applied Measurement in Education, 18, 313-350.

25.

Thissen, D. ( 2001). IRTLRDIF v.2.0b: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. University of North Carolina at Chapel Hill.

26.

Thissen, D. , Steinberg, L. , & Wainer, H. ( 1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147-169). Hillsdale, NJ: Lawrence Erlbaum.

27.

Vandenberg, R.J. , & Lance, C.E. ( 2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4-69.

28.

Wang, W.-C. ( 2004). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221-261.

29.

Wang, W.-C. ( 2008). Assessment of differential item functioning. Journal of Applied Measurement, 9, 387-408.

30.

Wang, W.-C. , Shih, C.-L. , & Yang, C.-C. (2009). The MIMIC method with scale purification for detecting differential item functioning. Educational and Psychological Measurement, 69, 713-731.

31.

Wang, W.-C. , & Su, Y.-H. ( 2004a). Effects of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17, 113-144.

32.

Wang, W.-C. , & Su, Y.-H. ( 2004b). Factors influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items. Applied Psychological Measurement, 28, 450-480.

33.

Wang, W.-C. , & Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498.

34.

Ware, J.E. , & Sherbourne, C.D. (1992). The MOS 36-item Short-Form Health Survey (SF-36): I. Conceptual framework and item selection. Medical Care, 30, 473-483.