Rasch Investigation of the CogAT Form 8 Verbal,Quantitative,and Nonverbal Domains in Grades 3 Through 5

Abstract

This study investigates the CogAT Form 8, which is widely used to help determine student eligibility for gifted education programs in K-12 schools, using dichotomous Rasch modeling to determine the psychometric properties of the test items. Scores were collected from 1,689 students in grades 3–5 from a large urban public school district in Texas in the 2022–2023 school year. Results indicate strong but preliminary evidence supporting the validity and use for assessing cognitive abilities in this diverse student population. On average, item difficulties were slightly lower than the person measure distributions, which aligns with usage in gifted education programs. Item fit statistics were acceptable for most items, and the person and item reliability indices suggest the CogAT supports consistent inferences about student ability across domains and grade levels. Overall, the results of our Rasch analyses support the continued, but cautiouse of the CogAT with similar student populations.

Keywords

CogAT Rasch psychometrics validity gifted ability testing

Introduction

Assessment plays a vital role in educational systems by offering insight into student learning and growth. Across the globe, a multitude of assessment options are available to K-12 school systems, each designed to measure different aspects of student abilities. Students are also assessed to determine if they would benefit from specialized program services, such as gifted education, special education, or bilingual education. Some assessments can be used to make student placement decisions for more than one school program.

The Cognitive Abilities Test (CogAT) offers a comprehensive, research-based assessment of student reasoning abilities and cognitive potential. This assessment can inform classroom instruction, identify students who would benefit from gifted education services, and support other educational options supporting their academic abilities (Lohman, 2015). The CogAT is among the most widely utilized assessments for identifying students who may benefit from placement in gifted education programs (Carman et al., 2018; Lakin, 2012; Ozen et al., 2025). This study aims to investigate the most recent version of the CogAT, Form 8, through Rasch methods to support missions of equitable identification and access, as the Rasch model works in pursuit of more objective measurement. Further, no previous research has used Rasch measures to investigate the CogAT Form 8 test.

CogAT Background

The CogAT is a group-administered assessment that measures student reasoning abilities in verbal, quantitative, and nonverbal domains through cognitive tasks designed for K-12 students. For students in grades 3–12, the CogAT is separated into three distinct batteries (verbal, quantitative, and nonverbal), each with three subtests. For students in grades K-2, the CogAT has one less subtest for the verbal battery. The CogAT assessment is grounded in the work of the Lorge-Thorndike Intelligence Tests, first published in 1954, which consisted of a Verbal Battery, a Nonverbal Battery, and a Quantitative Reasoning subtest (Lohman, 2012). Building on the legacy of the Lorge-Thorndike Intelligence Tests, the CogAT continues to provide separate scores for each of the three batteries because it was purposefully developed to measure student abilities in all three components of general fluid reasoning, setting it apart from other ability tests used in K-12 educational settings.

Beginning with CogAT Form 6, the test developers focused the assessment revisions on the Cattell-Horn-Carroll (CHC) Theory, emphasizing Carroll’s stratum II fluid reasoning skills of sequential reasoning, quantitative reasoning, and inductive reasoning (Ozen et al., 2025). CogAT Form 7 was the first version to include test directions in English and Spanish, although the items were assessed in English. To provide more access for English Learners, the most recent version provides test directions in nine languages (English, Spanish, French, Russian, Arabic, Vietnamese, Cantonese, Somali, and Mandarin) through audio files provided through the online CogAT Form 8 assessment platform. However, like the previous version of the test, Form 8 still tests students in English for grades 3–12, with pictorial items for grades K-2.

CogAT Forms 7 and 8 report student scores based on a National Standardization Study conducted by the test developers using Fall 2010 data (Lohman, 2012). The objective was to create a standardized set of CogAT scores that closely represented the demographic composition of student populations across the United States based on data obtained from the NCES Common Core of Data Public Elementary/Secondary School Universe Survey: School Year 2008–09 (Lohman, 2012). The Form 7 CogAT Research and Development Guide explains that considerations were made based on geographic region (Northeast, Midwest, South, and West), district size, and Title I status, with individual schools randomly selected to participate in the study. The schools that granted permission to participate in the national norming sample included Catholic and non-Catholic public and private schools with diverse backgrounds, socioeconomic status, and population sizes (Lohman, 2012).

CogAT scores provide schools with a holistic view of each student’s strengths and/or weaknesses across three distinct batteries. The CogAT verbal and quantitative batteries predict abilities in reading and mathematics, respectively, which are critical foundations of learning in traditional academic settings (Lakin, 2012). While verbal abilities involve reasoning with words and their corresponding concepts, quantitative abilities involve reasoning with abstract symbols and their corresponding mathematical concepts (Lohman, 2012). Additionally, reading and mathematics learning are commonly included in curriculum standards and gifted education programs, which supports using verbal and quantitative scores as part of a comprehensive assessment process used to determine which students would benefit from participation in gifted education programs.

The CogAT nonverbal battery includes items not commonly part of traditional academic settings. However, the CogAT nonverbal “visual mental models” support verbal and quantitative reasoning skills (Lohman, 2012). A key component of nonverbal assessments is their non-reliance on a specific language. Therefore, some scholars note that an advantage of nonverbal assessments is they may reduce the potential influence of understanding a specific language or culture on student outcomes (Cao et al., 2017; Lohman et al., 2008). Some scholars believe this has led to the popularity of using nonverbal assessments to support identifying high-ability students from diverse racial, ethnic, economic, and language backgrounds (Cao et al., 2017; Carman et al., 2020; Lakin, 2012; Lohman et al., 2008; Naglieri & Ford, 2015). Although these same scholars emphasize that nonverbal assessments are not always “fairer” or “unbiased,” the consensus appears to stress the need to include nonverbal ability measures in a comprehensive approach to determining which students would benefit from gifted education services (Cao et al., 2017; Carman et al., 2020; Lakin, 2012; Lohman et al., 2008; Naglieri & Ford, 2015).

Previous Psychometric Research and Benefits of Rasch

Previous forms of CogAT have numerous validity studies documenting evidence supporting its use, notwithstanding relatively stronger convergent validity evidence provided by Naglieri (Lee et al., 2021) and Lohman (Ozen et al., 2025), the latter of which has vested interested as CogAT co-author. Ozen et al. (2025) noted strong convergent evidence supporting the use of CogAT across 24 studies with 33 effect sizes. In their respective reviews in the Mental Measurements Yearbook (Carlson et al., 2017) of the psychometric evidence related to CogAT, both Ackerman (2017) and Miller (2017) concluded that the available psychometric evidence supports the use of CogAT Form 7 although Ackerman (2017) conditions his conclusion on being used in conjunction with Iowa assessments. The psychometric research related to CogAT using latent variable models were based on item response theory (IRT) or factor analysis, and CogAT was developed using IRT, not Rasch. Although IRT and Rasch are mathematically similar, Rasch offers important benefits that, when empirically supported, offer important additional psychometric support for CogAT.

Like IRT, Rasch yields person- and item-related characteristics as well as item characteristics plots and diagnostic output. Unlike IRT, the Rasch model yields so-called “objective measurement,” which might also be characterized as measurement that is invariant across latent variables when there is adequate model-data fit. That is, the Rasch model imposes certain model constraints (i.e., discrimination parameter equal 1 and no guessing) such that when the data fit the model, it yields results in a common interval-level metric, within an allowable range of error, for all cognitive and psychological variables. Objective, or invariant, measurement is critical and widely required in the sciences. For example, a meter is same amount of distance everywhere, within an allowable range of error. We hope the alignment between the Rasch framework and these fundamental characteristics of measurement systems in science make it clear how demonstrating adequate model-data fit in cognitive and psychological science is strong psychometric evidence that is not duplicated with IRT or factor analysis.

Purpose of this Study

We acknowledge that there is considerable evidence supporting the use of the CogAT; however, to our knowledge, (1) much of the psychometric evidence has been provided by one or more of the test authors, (2) has neither been as comprehensively collected yet for Form 8 nor (3) using the Rasch model. The purpose of this study is to examine the psychometric properties of the CogAT Form 8 Levels 9 through 11 using the Rasch measurement model and present the validity evidence to support or challenge the continued use of CogAT.

Method

Sample

School District Data

CogAT Form 8 scores were collected for 1,689 students, in grades 3–5, from a large urban public school district in Texas. Students completed the online CogAT Form 8 assessment during the 2022–2023 school year. All students were referred for testing by a parent/guardian or school staff member to be considered for placement in the district’s gifted education program. Texas has outlined policies that require public school districts to provide assessments, identification, and services for gifted and talented students in grades K-12. The students attended one of 71 elementary schools in the same district, with 42 schools classified as Title I. In addition to CogAT Form 8 scores, unique demographic data were collected for each student, including grade level, gender, race/ethnicity, and free/reduced lunch status. The CogAT verbal, quantitative, and nonverbal student scores were collected by the school district as part of a comprehensive, holistic assessment process that included other quantitative and qualitative measures to determine if students would benefit from receiving gifted education services.

Student Background Data

The 1,689 participants overall divided across 3^rd (n = 651, 38.5%), 4^th (n = 617, 36.5%), and 5^th (n = 421, 24.9%) graders from a large, diverse, and urban district in Texas. These students completed the CogAT Form 8 as a part of the identification process for gifted and talented services. That these students were a referred sample has important implications for interpreting the results with respect to item and student distributions, which will be discussed in the Results section. Forty-five percent (n = 753) of the population attended a Title I school. About 31% of students (n = 528) qualified for free-and-reduced lunch, and 18% of students (n = 310) were emerging bilinguals. Available demographic information can be found in Table 1.

Table 1.

Demographic Information for Student Sample

Demographic variable	n	%
Race/ethnicity
Hispanic/Latino	737	43.6
Black	82	4.9
White	641	38.0
Asian	121	7.2
American Indian or Alaska Native	3	0.2
Native Hawaiian or Pacific Islander	1	0.1
Two or more races	102	6.0
Did not report	2	0.1
Gender
Female	857	50.7
Male	832	49.3
Grade
Third	651	38.5
Fourth	617	36.5
Fifth	421	24.9
Title I school
Yes	753	44.6
No	936	55.4
Free/reduced lunch
Yes	528	31.3
No	1,161	68.7
Emerging bilingual
Yes	310	18.4
No	1,379	81.6

Instrumentation

The CogAT Form 8 was used to assess student cognitive ability. It is a standardized, norm-referenced assessment designed to measure reasoning in three domains: verbal, quantitative, and nonverbal (Lakin & Driver, 2017). It is intended to provide a comprehensive evaluation of cognitive development, independent of acquired knowledge, and is often used to identify students with high potential. The CogAT Form 8 is comprised of three batteries, each containing three subtests:

• Verbal Battery: Evaluates verbal reasoning and comprehension through subtests including Picture/Verbal Classification, Sentence Completion, and Picture/Verbal Analogies.

• Quantitative Battery: Evaluates quantitative reasoning and problem-solving skills using subtests including Number Series, Number Puzzles, and Number Analogies.

• Nonverbal Battery: Evaluates nonverbal reasoning abilities using subtests including Figure Classification, Paper Folding, and Figure Matrices.

The CogAT Form 8 provides several scores, including the following:

• Raw Scores: The number of items answered correctly on each subtest.

• Universal Scale Scores (USS): Scores that allow for comparisons across different levels of the test.

• Standard Age Scores (SAS): Normative scores with a mean of 100 and a standard deviation of 16, indicating a student’s performance relative to their age group.

• Percentile Ranks: Scores that indicate the percentage of students in the norming sample who scored at or below a given student’s score.

• Stanines: Standard nine scores, ranging from 1 to 9, that provide a broader categorization of performance.

The CogAT Form 8 was selected for this study because it is a widely used measure of cognitive abilities with strong psychometric properties. The test has demonstrated evidence of reliability and validity in diverse student populations when compared to similar assessments but has additional strides to make in the future (Carman et al., 2020; Lohman et al., 2008; Ozen et al., 2025). The test was administered according to the standardized procedures.

Data Collection

Individual student identification numbers, which were assigned by the local school district, were used to assign the proper online CogAT assessment through the Riverside Insights online testing platform, riversidedatamanager.com. This ensured that each student had a test that matched their grade level and language audio file. Once all testing was complete, the school district gifted education administrator had access to download a detailed score report from riversidedatamanager.com that included all students in grades 3–5 who had completed the assessment during the 2022–2023 school year.

A separate school-district-generated report was used to match specific student demographic information details based on the assigned student identification number used to take the CogAT. Student data was collected, extracted, and merged into a spreadsheet. Students without complete demographic information and those who did not complete all three CogAT batteries were removed. In addition, scores for students in four schools were not included in the study because they completed CogAT testing much later than the district-designated testing window, which may have impacted the norming of student scores at those schools. The individual names of students and schools were removed prior to sharing the spreadsheet with the research team. It should be noted that our decision to include complete cases should not be taken as an indication that CogAT administration does not allow for missing observations. To the contrary, operational CogAT administration specifies a minimum number of answers per subtest as a complete criterion. We excluded missing observations because our focus was on the subtests as a whole and not on individual student performances or scores.

Analytic Framework

Rasch Modeling

For this study, we calibrated the student response data from each of the domains on Levels 9, 10, and 11 to the dichotomous Rasch model. The Rasch family of mathematical models is used for establishing measures of latent constructs by placing items and persons onto a common, interval-level scale called logits (i.e., logarithmic odds units). The Rasch model can be used to calculate the probability that an individual will correctly respond to an item, given the characteristics of the item and the person. The item and person characteristics are called measures. The item measure reflects the item’s difficulty, whereas the person measure reflects a person’s amount of the latent construct. The Rasch model for dichotomous data can be expressed:

\log (\frac{P_{n i 1}}{P_{n i 0}}) = B_{n} - D_{i},

where P_ni1 is the probability that person n correctly answers item i, P_ni0 is the probability that person n answers item i incorrectly, B_n is the measure for person n, and D_i is the measure for item i. For the purposes of this study, the person measures reflect the verbal, quantitative, or nonverbal abilities of students taking CogAT Form 8.

The Rasch model requires several conditions for valid inference using the parameter estimates: (1) unidimensionality, (2) monotonicity (i.e., higher scores on CogAT reflect higher latent abilities), and (3) the item fit indices (i.e., infit, outfit) are within acceptable boundaries. To evaluate the unidimensionality condition, several approaches can be taken. First, a principal components analysis can be conducted on the standardized residuals to determine if the extracted dimension explains more than 50% of the variance (Linacre, 2006). Second, the Martin-Löf test can be conducted to examine whether, after the data are separated based on above and below median scores, the Rasch model fits both subsets of data equally well. To evaluate the monotonicity, the item-to-total correlations are used to evaluate whether correct item responses are correlated with higher scores on the rest of the subtest. The fit indices are the information-weighted fit index (i.e., infit) and outlier-sensitive fit index (i.e., outfit). Outfit examines unexpected responses far from a person’s or item’s measure, and infit examines unexpected responses near a person’s or item’s measure (Bond et al., 2020). When the data fit the Rasch model, the expected value of the item infit and outfit is 1.0, and values between 0.5 and 1.5 are considered productive for measurement (Linacre, 2006).

The Rasch model provides a reliability and separation estimate for items and persons. The Rasch reliability index is comparable to the alpha coefficient popularized by Cronbach (1951) but conceptually developed previously by Kuder and Richardson (1937) and Hoyt (1941). When applied to persons, the Rasch reliability index can also be interpreted as the replicability of the order of the persons if we could administer a parallel set of items (Wright & Masters, 1982). The Rasch separation index indicates how well the items or persons are separated by the persons or items, respectively. Higher values of each index can be interpreted as stronger evidence of reliability-related measurement quality.

An additional useful piece of output from the Rasch model is the person-item map, or Wright map, which plots persons and items along the same dimension to demonstrate the degree of overlap between the person and item measure distributions (Wilson, 2023; Wright & Stone, 1979). As noted, Rasch models create a common scale for persons and items so the person-item map is a graphical representation of this feature. Although the expected alignment between distributions will depend on the theoretical purpose of the instrument/test, in many testing or measurement contexts, close alignment between distributions is commonly desired. For more details on the Rasch measurement model, see (Schumaker, 2004) and (Smith & Smith, 2004).

Data Analysis

Prior to interpreting the Rasch model output, we collected evidence related to the conditions for valid inference (i.e., assumptions) and present them with the output in the next section. For the model calibrations and most condition checking, we used the eRm package (Mair & Hatzinger, 2007) in R (v. 4.4.2, R Core Team, 2024). We used WINSTEPS (v. 3.63, Linacre, 2006) for additional dimensionality evaluation (i.e., principal components analysis of the standardized residuals), generating the person-item maps, and estimating the reliability and separation index for each domain. We analyzed records for whom test scores could be generated; that is, we mirrored the scoring routines from the operational CogAT but only used records for which each item was attempted.

Results

Please note that the content, structure, and language of the following subsections are all but identical for ease of reading and parallel structure. The full output from all analyses can be found in the accompanying supplemental materials, including classical test theory-based item analyses that are not presented in this paper. The analytic sample size for each domain of each level is reported within the corresponding subsection below. The demographic characteristics for each analytic subsample are also available in the accompanying supplemental materials. A summary of the fit information for each model is provided in Table 2.

Table 2.

Summary of Rasch Model Results for Verbal, Quantitative, and Nonverbal Domains across Grades

Domain	Variability (%)	Item-to-total correlation range (median)	Infit mean (SD)	Outfit mean (SD)	Person reliability	Person separation	Item reliability	Item separation
Grade 3
Verbal	57.8	0.15–0.59 (0.44)	0.97 (0.13)	0.99 (0.30)	0.91	3.11	0.99	10.60
Quantitative	74.8	0.01–0.67 (0.48)	0.97 (0.12)	1.00 (0.52)	0.90	3.01	0.99	10.26
Nonverbal	62.2	0.12–0.60 (0.39)	0.98 (0.11)	0.95 (0.26)	0.87	2.60	0.99	10.35
Grade 4
Verbal	62.6	0.04–0.62 (0.43)	0.96 (0.14)	1.01 (0.38)	0.89	2.91	0.99	10.70
Quantitative	80.5	0.11–0.57 (0.40)	0.96 (0.11)	1.03 (0.54)	0.88	2.76	0.99	10.28
Nonverbal	60.0	−0.07–0.53 (0.36)	0.98 (0.08)	0.95 (0.22)	0.87	2.58	0.99	10.06
Grade 5
Verbal	66.1	0.02–0.66 (0.37)	0.97 (0.12)	0.96 (0.30)	0.88	2.70	0.99	9.79
Quantitative	81.5	0.11–0.68 (0.44)	0.97 (0.17)	1.06 (0.45)	0.91	3.16	0.98	7.85
Nonverbal	59.5	0.03–0.55 (0.34)	0.97 (0.09)	0.97 (0.27)	0.87	2.58	0.99	9.26

Common Findings Across Levels and Domains

Generally, there was adequate to good model-data fit for each domain according to the Rasch-based dimensionality analyses as well as item infit and outfit statistics, which are unique to Rasch models. Several misfitting items were observed with higher outfit statistics, which indicated unexpected scores when students and items were far apart (i.e., higher scoring students getting easier items incorrect or lower scoring students getting harder items correct). The misfitting items also tended to be the most difficult items on the subtests, which suggests that even the students with the highest scores tended to answer these items incorrectly. A few items were also observed with low outfit statistics, which indicates model overfit. That is, almost no higher scoring students made any mistakes and almost no lower scoring students correctly answered difficult items (i.e., correctly guessed), both of which would be expected with some frequency albeit uncommon. When viewed from the entire subtest perspective, these types of item misfit likely cancel each other out over the whole set of items. The CogAT data showed good person separation (i.e., ≥2.58) and reliability (i.e., ≥.87) and excellent item separation (i.e., ≥7.85) and reliability (i.e., ≥0.98). The person separation was very slightly lower for Level 10 (i.e., Grade 4) and item separation was very slightly lower for Level 11 (i.e., Grade 5). Finally, the scale was set by setting the distribution of item measures, arbitrarily, to be standard normal. The person measure distribution was centered at about the first standard deviation above the mean of the item measure distribution, which indicates that a student at the average of the person distribution had a probability of correctly answering the average item of greater than .50. The person distribution being high relative to the item distribution was expected because the students had all been referred for potential identification for gifted education services.

Grade 3 – Level 9

Verbal

The analytic sample for the Level 9 verbal domain was 563 students. The dimensionality evaluation showed that the dimension extracted by the Rasch model accounted for 57.8% of the total variability, which exceeds the recommended cutoff (Linacre, 2006). No other extracted contrast had unexplained variance greater than 4.2%. The Martin-Löf test supported the unidimensionality condition ( $χ$ ² = 558, p = .99). The item-to-total correlations ranged from 0.15 to 0.59, with a median of 0.44, which suggests that correct item responses were associated with higher scores on the rest of the subtest. Most fit indices were within acceptable boundaries although several item outfit indices suggested model-data misfit. The items with the most extreme misfit tended also to have more extreme levels of easiness or difficulty for the subtest. The mean (SD) of the infit and outfit distributions were, respectively, 0.97 (0.13) and 0.99 (0.30).

For the verbal domain, the person reliability and separation estimates were 0.91 and 3.11, and the item reliability and separation estimates were 0.99 and 10.6. Given the focus of these analyses, these estimates strongly support the use of CogAT. The scale of the item distribution was set (arbitrarily) to standard normal, which served as a reference distribution, and the distribution of the person measures had a mean of 0.93 (SD = 1.14, sk = −0.49). The distributions are shown in the person-item map in Figure 1(A). Thus, most of the students who took the Level 9 verbal subtest had higher cognitive ability, as indicated by the average score being about a standard deviation above the mean of the item measure distribution. The item characteristic curves for 10 randomly selected items from the verbal subtest are provided in Figure 2.

Figure 1.

Person-Item Map of CogAT Form 8 Level 9 (Grade 3)

Figure 2.

Item Characteristic Curves from Verbal Subtest of CogAT Form 8 Level 9

Quantitative

The analytic sample for the Level 9 quantitative domain was 479 students. The dimensionality evaluation indicated a single dominant dimension accounting for 74.8% of the total variability. No secondary contrasts exceeded 5.0% in unexplained variance. The Martin-Löf test supported the unidimensionality condition ( $χ^{2}$ = 460.00, p = .99). Item-to-total correlations ranged from 0.01 to 0.67 with a median of 0.48. Fit indices demonstrated acceptable fit with mean (SD) infit and outfit values of 0.97 (0.12) and 1.00 (0.52), respectively. Misfit primarily occurred in extremely easy or difficult items. Person reliability and separation estimates were 0.90 and 3.01, respectively; item reliability and separation were 0.99 and 10.26, respectively.

The person-item map revealed that the distribution of student abilities extended beyond the range of item difficulties, on average, with a person measure mean of 1.24 (SD = 1.37, sk = 0.11) (see Figure 1(B)). This difference in the overlap between person and item distributions indicates possible ceiling effects and a limited ability to distinguish highest achieving students within this subtest.

Nonverbal

The analytic sample for Level 9 nonverbal domain was 570 students. The dimensionality evaluation supported a single dimension, accounting for 62.2% of the total variability. No secondary contrast had unexplained variance greater than 4.3%. The unidimensionality condition was supported by the Martin-Löf test ( $χ^{2}$ = 455.00, p = .99). Item-to-total correlations ranged from 0.12 to 0.60 with a median of 0.39. Fit indices were generally acceptable, with mean (SD) infit and outfit values of 0.98 (0.11) and 0.95 (0.26), respectively. Misfit was primarily observed among very easy or difficult items. Person reliability and separation were 0.87 and 2.60, respectively, and item reliability and separation were 0.99 and 10.35, respectively. The person-item map revealed that the distribution of student abilities was shifted slightly above the average item difficulty, with a person measure mean of 1.13 (SD = 1.10, sk = −0.13), suggesting that most students were able to perform well on the nonverbal items (see Figure 1(C)).

Grade 4 – Level 10

Verbal

The analytic sample for the Level 10 verbal domain was 565 students. The dimensionality evaluation showed that the extracted dimension accounted for 62.6% of the total variability. No other extracted contrast had unexplained variance greater than 3.7%. The Martin-Löf test also supported the unidimensionality condition ( $χ^{2}$ = 465.10, p = .99). The item-to-total correlations ranged from 0.04 to 0.62, with a median of 0.43. Most fit indices were within acceptable boundaries although several item outfit indices suggested model-data misfit. The mean (SD) of the infit and outfit distributions were, respectively, 0.96 (0.14) and 1.01 (0.38).

The person reliability and separation estimates were 0.89 and 2.91, and the item reliability and separation estimates were 0.99 and 10.7. The students had higher cognitive ability, on average, as indicated by the person distribution being located about a standard deviation above the mean of the item measure distribution.

Quantitative

The analytic sample for the Level 10 quantitative domain was 373 students. The primary extracted dimension accounted for 80.5% of the variance. No secondary contrast had unaccounted variance exceeding 5.5%. The Martin-Löf test also supported the unidimensionality condition ( $χ^{2}$ = 367.50, p = .99). Item-total correlations ranged between 0.11 and 0.57, with a median of 0.40. Most of the fit indices were acceptable; however, several outfit indices suggested slight model-data misfit. The infit and outfit distributions had respective means (SD) of 0.96 (0.11) and 1.03 (0.54). The person reliability and separation were 0.88 and 2.76, respectively. In contrast, the item-reliability separation was 0.99 and 10.28. Based on the person-item map, the person distribution was slightly higher, on average, than the item distribution.

Nonverbal

The analytic sample for Form 8 Level 10 nonverbal domain was 547 students. The primary extracted dimension accounted for 60.0% of the total variability. No other extracted contrast possessed unexplained variance exceeding 3.8%. The Martin-Löf test also supported the unidimensionality condition ( $χ^{2}$ = 456.90, p = .99). The item-total correlations were observed within the range of −0.07 to 0.53, with the median being 0.36. Most fit indices were within the acceptable ranges, although several item outfit indices indicated slight model-data misfit. The means (SD) of the infit and outfit distributions were 0.98 (0.08) and 0.95 (0.22), respectively.

The person reliability and separation estimates were 0.87 and 2.58, and item reliability and separation estimates were 0.99 and 10.06, respectively. The person-item map showed slight misalignment between the person and item distributions. The person measure distribution had a mean of 0.91 (SD = 0.95, sk = −0.11).

Grade 5 – Level 11

Verbal

The analytic sample for Form 8 Level 11 verbal domain was 391 students. The dimension extracted by the model accounted for 66.1% of the total variability. No other extracted contrast had unexplained variance greater than 4.2%. The Martin-Löf test supported the unidimensionality condition ( $χ^{2}$ = 433.62, p = .99). The item-to-total correlations ranged from 0.02 to 0.66, with a median of 0.37. Most fit indices were within acceptable boundaries although several item outfit indices suggested model-data misfit. The mean (SD) of the infit and outfit distributions were, respectively, 0.97 (0.12) and 0.96 (0.30).

The person reliability and separation estimates were 0.88 and 2.7, and item reliability and separation estimates were 0.99 and 9.79. The person-item map showed slight misalignment between the person and item distributions. The person measure distribution had a mean of 1.05 (SD = 1.01, sk = −0.45).

Quantitative

The analytic sample for Form 8 Level 11 quantitative domain was 232 students. The primary extracted dimension accounted for 81.5% of the total variability. No other contrast had unexplained variance greater than 5.4%. Furthermore, the Martin-Löf test supported the unidimensionality condition ( $χ^{2}$ = 246.86, p = .99). The item-to-total correlations ranged from 0.11 to 0.68, with a median of 0.44. Most fit indices were within acceptable boundaries although several item outfit indices suggested model-data misfit. The mean (SD) of the infit and outfit distributions were, respectively, 0.97 (0.17) and 1.06 (0.45).

The person reliability and separation estimates were .91 and 3.16, and item reliability and separation estimates were 0.98 and 7.85. The person-item map showed slight misalignment between the person and item distributions. The person measure distribution had a mean of 0.80 (SD = 1.34, sk = 0.33).

Nonverbal

The analytic sample for Form 8 Level 11 nonverbal domain was 360 students. The extracted dimension accounted for 59.5% of the total variability. No other extracted contrast had unexplained variance of more than 4.2%. Furthermore, the Martin-Löf test supported the unidimensionality condition ( $χ^{2}$ = 444.70, p = .99). The item-to-total correlation of the items ranged from 0.03 to 0.55, with a median of 0.34. Most fit indices were well within acceptable boundaries although several item outfit indices suggested model-data misfit. The mean (SD) of the infit and outfit distributions were, respectively, 0.97 (0.09) and 0.97 (0.27).

The person reliability and separation estimates were 0.87 and 2.58, and the item reliability and separation estimates were 0.99 and 9.26. The person-item map showed slight misalignment between the person and item distributions. The person measure distribution had a mean of 0.83 (SD = 0.94, sk = −0.24).

Discussion

The purpose of this study was to evaluate the psychometric properties of CogAT Form 8 Levels 9–11 using the Rasch model towards providing additional validity and reliability evidence supports the test’s use. Considering the history and popularity of CogAT, it should come as no surprise that there is considerable psychometric evidence. That said, there seems to be a lack of independent psychometric research for Form 8 of the CogAT, which is a gap we sought to address. The results of our analysis provide strong evidence supporting the construct validity and overall utility of the instrument for assessing cognitive ability in this population.

The Rasch family of models offers numerous benefits that were useful for these investigations, including fit indices, person-item maps, reliability estimates, and item and person measure estimates. The Rasch calibrations for each domain indicated that item difficulty measures were slightly lower, on average, than the person measure distributions. This was expected considering the sample consisted of students referred for testing as part of the district’s gifted education program. The person-item map for each calibration demonstrated appropriate targeting, with test items effectively spanning the range of ability levels observed in the sample. This suggests that the instrument is capable of distinguishing among students with varying levels of cognitive ability, which is a critical characteristic for a test like CogAT.

Item fit statistics—infit and outfit mean square values—for most items across domains and levels were within the acceptable ranges, indicating that the responses conformed well to the expectations of the Rasch model and that each item contributed meaningfully to the measurement of its respective domain. It should be noted that when there was some degree of item misfit, the misfit was related to unexpected responses when the item and person measures were farther apart (i.e., outfit). Across items within domains and levels, the overfit (i.e., outfit <.5) and underfit (i.e., outfit >1.5) likely cancel each other in terms of measurement quality.

The Rasch calibrations yielded satisfactory person and item reliability indices, suggesting that the instrument has the precision necessary to support reliable inferences about student ability across domains and levels. The separation indices across the calibrations are of particular importance. The person and item separation estimates were approximately 3 and 10, respectively, across calibrations. The person separation estimate suggests the persons are well separated by the items, which supports the use of the items, because separating persons across the latent dimension is a priority of testing programs like CogAT. The item separation strongly suggests that the items are well separated by the people, which also supports the appropriateness of the items among this diverse population of students.

Overall, the results of our Rasch analyses support the continued but cautious use of the CogAT among students in grades 3–5 in diverse student populations similar to ones in our studies. Our analyses yielded many pieces of output and model-data fit information, the vast majority of which can be reasonable interpreted as providing strong validity and reliability evidence for the CogAT. Despite the strength of the evidence based on these analyses, caution is urged because we did observe item misfit, potential ceiling effects due to responses coming from a referred sample, and potentially limited generalization due to geographical limitations.

Limitations and Future Research

As with any study, ours has limitations. The primary limitation is the potential for generalization, or lack thereof, based on our sample. As noted in the Method section, only complete records are scorable based on the CogAT scoring rules, which means that dozens of students were excluded from our analyses. The demographic characteristics were not identical for completers versus non-completers, which may limit the generalizability of our findings to the extent that the demographic differences as well as that contextual or cultural variables they represent are not accounted for. Additional research must be done to evaluate these types of CogAT performance contributors.

Additionally, although our sample did have considerable variability across an array of demographic variables and was taken across 71 schools, our sample was geographically restricted to one large urban school district in one state. On the basis of the demographic diversity represented in our sample, we are optimistic that our findings will be relevant to other diverse urban settings, but we certainly cannot make any definitive claims along these lines. We believe that (1) Rasch analyses offer significant benefits and (2) replicating the analyses with a more geographically diverse sample, while maintaining demographic representation, would be an important piece of additional validity evidence. Despite having several hundred student responses in each analysis across three grade levels, replication with larger samples would increase confidence in the results and further increase the estimation precision of our parameters. Furthermore, replication with more grade levels and ability levels is recommended, as well as in other settings and locations.

A second area of future research is examining potential differential item functioning (DIF) across demographic variables, which can provide fairness insights. If items are evidenced to be free of DIF, then subsequent comparisons become possible and could be of substantive interest. Longitudinal investigations, including evaluation of longitudinal invariance, would also be of interest for continued validation of the CogAT.

As noted above, the responses collected and analyzed were from a referred sample. The student measure distribution was above the item measure distribution for all subtests, which raises the question about ceiling effects, a common concern with the assessment of gifted or high-ability individuals. Cognitive and psychological assessments for which a cutoff score is used for classification, typically, ceiling (or floor) effects are rarely of primary concern because individuals in the extreme tails of distributions exceed any cutoff score. The distribution of scores around the cutoff scores is of much greater concern and consequence. Future research could look at classification accuracy at different cutoff points, or thresholds, in the distribution to determine the utility of CogAT for making identification decisions for differing levels of cognitive ability. Such studies would also be able to quantify the extent to which ceiling effects negatively impact identification.

This study was not intended to justify the use of the CogAT to increase the identification of traditionally underserved student populations in gifted education programs. Determining which students qualify for gifted education program services involves many considerations beyond one assessment measure, including local or state policy requirements, when available, and decisions on the assessment and score types used. Additional research using students’ gifted education status would be useful for improved understanding on how well the Rasch-based CogAT scores differentiate between students who qualify for gifted education services compared to students who did not qualify for services. Furthermore, comparing CogAT scores through Rasch modeling for twice exceptional students who qualified for gifted education and other specialized programs, such as bilingual education and special education, would also support understanding the benefits of using this assessment with diverse student populations.

Supplemental Material

Supplemental Material - Rasch Investigation of the CogAT Form 8 Verbal, Quantitative, and Nonverbal Domains in Grades 3 Through 5

Supplemental Material for Rasch Investigation of the CogAT Form 8 Verbal, Quantitative, and Nonverbal Domains in Grades 3 Through 5 by Debi K. Torres, Liesel A. Lutz, Jianwen Song, Fatih Ozkan, Tim Spitsberg, Tricia Filippini, Hiroki Matsuo, and Grant B. Morgan in Journal of Psychoeducational Assessment

Footnotes

ORCID iDs

Liesel A. Lutz

Fatih Ozkan

Tim Spitsberg

Grant B. Morgan

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental Material

Supplemental material for this article is available online.

References

Ackerman

P. L.

(2017). Review of the cognitive abilities test, form 7. In Carlson

J. F.

Geiser

K. F.

Jonson

J. L.

(Eds.), The twentieth mental measurements yearbook. Buros Center for Testing.

Bond

T. G.

Yan

Heene

(2020). Applying the Rasch model: Fundamental measurement in the human sciences. Routledge.

Cao

T. H.

Jung

J. Y.

Lee

(2017). Assessment in gifted education: A review of the literature from 2005 to 2016. Journal of Advanced Academics, 28(3), 163–203. https://doi.org/10.1177/1932202x17714572

Carlson

J. F.

Geisinger

K. F.

Jonson

J. L.

(Eds.). (2017). The twentieth mental measurements yearbook. Buros Center for Testing.

Carman

C. A.

Walther

C. A. P.

Bartsch

R. A.

(2018). Using the cognitive abilities test (CogAT) 7 nonverbal battery to identify the gifted/talented: An investigation of demographic effects and norming plans. Gifted Child Quarterly, 62(2), 193–209. https://doi.org/10.1177/0016986217752097

Carman

C. A.

Walther

C. A. P.

Bartsch

R. A.

(2020). Differences in using the cognitive abilities test (CogAT) 7 nonverbal battery versus the Naglieri nonverbal ability test (NNAT) 2 to identify the gifted/talented. Gifted Child Quarterly, 64(3), 171–191. https://doi.org/10.1177/0016986220921164

Cronbach

L. J.

(1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. https://doi.org/10.1007/bf02310555

Hoyt

C. J.

(1941). Note on a simplified method of computing test reliability. Educational and Psychological Measurement, 1(1), 91–103. https://doi.org/10.1177/001316444100100109

Kuder

G. F.

Richardson

M. W.

(1937). The theory of the estimation of test reliability. Psychometrika, 2(3), 151–160. https://doi.org/10.1007/bf02288391

10.

Lakin

(2012). Assessing the cognitive abilities of culturally and linguistically diverse students: Predictive validity of verbal, quantitative, and nonverbal tests. Psychology in the Schools, 49(8), 756–768. https://doi.org/10.1002/pits.21630

11.

Lakin

Driver

(2017). CogAT: Introducing form 8. Riverside Insights, LLC. https://info.riversideinsights.com/hubfs/CogSpeakings/CognitivelySpeaking-IntroducingCogATForm8.pdf

12.

Lee

Karakis

Olcay Akce

Azzam Tuzgen

Karami

Gentry

Maeda

(2021). A meta-analytic evaluation of Naglieri nonverbal ability test: Exploring its validity evidence and effectiveness in equitably identifying gifted students. Gifted Child Quarterly, 65(3), 199–219. https://doi.org/10.1177/0016986221997800

13.

Linacre

J. M.

(2006). WISTEPS: Rasch measurement software. https://www.winsteps.com/winsteps.htm

14.

Lohman

D. F.

(2012). Cognitive abilities test form 7 research and development guide. Riverside Insights, LLC.

15.

Lohman

D. F.

(2015). Test review. Journal of Psychoeducational Assessment, 33(2), 188–192.

16.

Lohman

D. F.

Korb

K. A.

Lakin

J. M.

(2008). Identifying academically gifted English-language learners using nonverbal tests: A comparison of the Raven, NNAT, and CogAT. Gifted Child Quarterly, 52(4), 275–296. https://doi.org/10.1177/0016986208321808

17.

Mair

Hatzinger

(2007). Extended Rasch modeling: The eRm package for the application of IRT models in R. Journal of Statistical Software, 20(9), 1–20. https://doi.org/10.18637/jss.v020.i09

18.

Miller

M. D.

(2017). Review of the cognitive abilities test, form 7. In Carlson

J. F.

Geiser

K. F.

Jonson

J. L.

(Eds.), The twentieth mental measurements yearbook. Buros Center for Testing.

19.

Naglieri

J. A.

Ford

D. Y.

(2015). Misconceptions about the Naglieri nonverbal ability test: A commentary of concerns and disagreements. Roeper Review, 37(4), 234–240. https://doi.org/10.1080/02783193.2015.1077497

20.

Ozen

Pereira

Karatas

Castillo-Hermosilla

Maeda

(2025). A meta-analytic evaluation: Investigating evidence for the validity of the cognitive abilities test. Gifted Child Quarterly, 69(1), 3–15. https://doi.org/10.1177/00169862241285593

21.

R Core Team . (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing.

22.

Schumaker

R. E.

(2004). Rasch measurement: The dichotomous model. In Smith

E. V.

Smith

R. M.

(Eds.), Introduction to Rasch measurement (pp. 226–257). JAM Press.

23.

Smith

E. V.

Smith

R. M.

(2004). Introduction to Rasch measurement. JAM Press.

24.

Wilson

(2023). Constructing measures: An item response modeling approach (2nd ed.). Routledge.

25.

Wright

B. D.

Masters

G. N.

(1982). Rating scale analysis. MESA Press.

26.

Wright

B. D.

Stone

M. H.

(1979). Best test design. MESA Press.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.85 MB