Confirmatory Factor Analysis of the WAIS-IV and WMS-IV in Older Adults

Abstract

New editions of the Wechsler Adult Intelligence and Memory scales are now available. Yet, given the significant changes in these new releases and the skepticism that has met them, independent evidence on their psychometric properties is much needed but currently lacking. We administered the WAIS-IV and the Older Adult version of the WMS-IV to 145 older adults. We examined how closely our data matched the normative sample by comparing our scaled scores with those of the publisher and by evaluating interrelations among subtests using confirmatory factor analysis. Not surprisingly, scaled scores from our sample were somewhat higher than those from the normative sample on some tests. Factor analysis on our sample provided support for a higher-order model of the WAIS-IV/WMS-IV Older Adults battery combined. In addition, allowing some subtests to load on more than one factor significantly improved model fit. The best fitting model for our sample was also the best for the normative sample. Overall, the data suggest that the factor analysis models generated from the normative samples for the new WAIS-IV and WMS-IV are reliable.

Keywords

memory intelligence neuropsychological tests normative data confirmatory factor analysis

The Wechsler Adult Intelligence and Memory scales are among the most commonly used by neuropsychologists (Butler, Retzlaff, & Vanderploeg, 1991; Rabin, Barr, & Burton, 2005; Sullivan & Bowden, 1997) and have been considered by many to be the gold standard (Hartman, 2009; Stanos, 2004). Recently, new editions of both tests have been released (the Wechsler Adult Intelligence Scale—fourth edition (WAIS-IV; Wechsler, 2008) and the Wechsler Memory Scale—fourth edition (WMS-IV; Wechsler, 2009). These new versions were deemed necessary to improve the match with the psychological constructs they are purported to measure and to provide updated norms. Yet, given the significant changes in these new releases and the questions that have met them (e.g., Loring & Bauer, 2010), independent evidence on their psychometric properties is much needed but currently lacking. Here, we present data from 145 older adults who completed the WAIS-IV and the Older Adult version of the WMS-IV. We examined how closely our data matched the normative sample by comparing our scaled scores with those of the publisher and by evaluating interrelations among subtests in our data using covariances and factor analyses. We then examined the factor structure of the WAIS-IV and the WMS-IV Older Adult battery in the normative sample. In addition, we tested for measurement invariance in the covariance structure across the two samples.

Construction of the WAIS and WMS

All versions of the WAIS and WMS combine scores from multiple subtests into factors or indexes, with the goals of improving reliability and validity and of providing an interpretative framework for the observed measures. Although the WAIS and WMS have practical (but not atheoretical; see Coalson, Raiford, Saklofske, & Weiss, 2010; Kaufman, 2010) origins, their evolution has been influenced not only by factor analyses of previous versions, but also by current theories of intelligence, cognition, and neuropsychology. The WAIS-IV subtests are similar to those in the WAIS-III, with two core additions: visual puzzles (included in the perceptual reasoning index) and digit span sequencing (included in the working memory index; Wechsler, 2008).

Compared with its previous version, the WMS-IV contains several major changes, including a new visual designs subtest and two new working memory subtests (spatial addition and symbol span). In addition, an abbreviated battery is now recommended for older adults (age 65 to 90). For a schematic presentation of the evolution of the two batteries, see Table 1.

Table 1.

Evolution of Factors in the Various Versions of the WAIS and WMS.

WAIS	WAIS-R	WAIS-III	WAIS-IV
(1955)	(1981)	(1997)	(2008)
FSIQ	FSIQ	FSIQ	FSIQ
Verbal scale	VIQ	VIQ	VCI
• Information	• Information	VCI	• Similarities
• Comprehension	• Comprehension	• Vocabulary	• Vocabulary
• Arithmetic	• Arithmetic	• Similarities	• Information
• Similarities	• Digit span	• Information	PRI
• Digit span	• Similarities	WMI	• Block design
• Vocabulary	• Vocabulary	• Arithmetic	• Matrix reasoning
Performance scale	PIQ	• Digit span	• Visual puzzles
• Digit symbol	• Picture arrangement	• Letter number sequencing	WMI
• Picture completion	• Picture completion	PIQ	• Arithmetic
• Block design	• Block Design	POI	• Digit span
• Picture arrangement	• Object Assembly	• Picture completion	PSI
• Object assembly	• Digit symbol	• Block design	• Coding
		• Matrix reasoning	• Symbol search
		PSI
		• Digit symbol coding
		• Symbol search
WMS	WMS-R	WMS-III	WMS-IV
(1945)	(1987)	(1997)	(2009)
Memory scale score	General memory composite	Immediate memory	Auditory memory
• Information	• Figural memory	Auditory immediate	• Logical memory I and II
• Orientation	• Logical memory I	• Logical Memory I	• Verbal paired associates I and II
• Mental control	• Visual paired associates I	• Verbal Paired Associates I	Visual memory
• Logical memory	• Verbal paired associates I	Visual Immediate	• Visual reproduction I and II
• Digit span (forward and backward)	• Visual reproduction I	• Faces I	• Designs I and II (adult battery only)
• Visual reproduction	Attention/concentration composite	• Family pictures I	Visual working memory (adult battery only)
• Associate learning	• Mental control	General memory	• Spatial addition (adult battery only)
	• Digit span	Auditory (delayed)	• Symbol span
	• Visual memory span	• Logical memory II	• Logical memory I
	Verbal memory	• Verbal paired associates II	• Verbal paired associates I
	• Logical memory	Auditory Recognition Delayed	• Visual reproduction I
	• Verbal paired associates	• Logical Memory II Recognition	Delayed memory
	Visual memory	• Verbal paired associates II	• Logical memory II
	• Figural memory	Visual (Delayed)	• Verbal paired associates II
	• Visual paired associates	• Faces II	• Visual reproduction II
	• Visual reproduction	• Family pictures II
	Delayed recall index	Working memory
	• Logical memory II	Auditory
	• Visual paired associates II	• Letter-number sequencing
	• Verbal paired associates II	Visual
	• Visual reproduction II	• Spatial span

Note. FSIQ = full scale IQ; PIQ = performance IQ; VIQ = verbal IQ; PSI = processing speed index; VCI = verbal comprehension index; PRI = perceptual reasoning index; POI = perceptual organization index; WMI = working memory index.

One recent strategy has been to examine the WAIS and WMS simultaneously, in part because the newer versions have been co-normed. Such studies of previous editions of the WAIS and WMS have found evidence for five- and/or six-factor models. For example, Bowden and coworkers (Bowden, Carstairs, & Shores, 1999; Bowden et al., 2001) advanced a five-factor model for the WAIS-R and WMS-R consisting of verbal comprehension, perceptual organization, attention-concentration/working memory, and verbal memory and visual memory. Tulsky and Price (2003) proposed a similar model for the third edition of the tests, except that they added a processing speed factor. Allowing some subtests to load on more than one factor significantly improved model fit.

Research on the WAIS-IV and WMS-IV

To date, only one study has examined the WAIS-IV and WMS-IV together. Holdnack, Xiaobin, Larrabee, Millis, and Salthouse (2011) found support for a higher-order six-factor model with a first order general ability factor and second order verbal comprehension (consisting of the vocabulary, similarities, and information subtests), perceptual reasoning (block design, visual puzzles, and matrix reasoning subtests), working memory (digit span, arithmetic, symbol addition, and symbol span subtests), processing speed (coding and symbol search subtests), and memory (logical memory 2, verbal paired associates 2, designs 2, and visual reproduction 2 subtests) factors. Allowing the arithmetic, symbol span, logical memory 2, and visual reproduction 2 subtests to load on more than one factor significantly improved model fit.

We built on Holdnack et al.’s (2011) report in two ways. First, they used the publisher’s normative dataset, as have all the other extant reports on WAIS-IV and/or WMS-IV (Benson, Hulac, & Kranzler, 2010; Bowden, Saklofske, & Weiss, 2011; Brooks, Holdnack, & Iverson, 2011; Canivez & Watkins, 2010; Drozdick & Cullum, 2011; Hoelzle, Nelson, & Smith, 2011; Salthouse & Saklofske, 2010). Thus, to date, the WAIS-IV and WMS-IV have not been evaluated in an independent sample. Second, Holdnack et al. excluded participants who were over the age of 65, because they completed the WMS-IV Older Adult battery only. Thus, to date, no study has evaluated the factor structure of the WAIS-IV and WMS-IV in older adults.

Method

Participants

The study presented here was approved by the Ethics Committee of the University of Ottawa. One hundred and forty-five (94 females: 65%) community dwelling people between 65 and 92 years of age (mean = 73.17 years, SD = 6.50) were recruited from diverse socioeconomic backgrounds, using advertisements in two free magazines for seniors and flyers in community centers and subsidized housing buildings. Participants’ education ranged from 7 to 22 years (mean = 13.96 years, SD = 2.83); 2.1% of participants had Grade 8 or less, 13.8% had between Grade 9 and Grade 12, 33.1% had a high school diploma, 17.9% had some college or university, and 33.1% had a bachelor’s, graduate, or professional degree. The exclusion criteria included age younger than 65, lack of proficiency in English, diabetes, brain disease, chronic hepatitis, and presence of mental health problems such as anxiety and depression. Participants were compensated CAN$100. In the sample, 87.6% were Caucasian, 0.7% African American, 3.4% Asian, 4.8% South Asian, 0.7% Hispanic, and 2.8% were from a mixed background. Sixty-six percent of the sample reported experiencing memory problems.

The publisher’s normative sample consisted of 286 participants who completed both the WAIS-IV and the WMS-IV Older Adults battery. The mean age of participants in this subset of the normative sample was 78.78 years (SD = 6.91). In this sample, 17% of people had Grade 8 or less, 13% had between Grade 9 and Grade 12, 38% had a high school diploma, 19% had some college or university, and 13% had a bachelor’s, graduate, or professional degree.¹

Measures

Wechsler Adult Intelligence Scale Fourth Edition

The 10 core subtests yield four index scores (verbal comprehension, perceptual reasoning, working memory, and processing speed), as well as Full-Scale IQ. The WAIS-IV was normed on 2,200 people aged 16 to 90 years old, 600 of whom were over the age of 65 (mean age of 75.68 years, SD = 7.68). In that sample, 14% of people had Grade 8 or less, 12% had between Grade 9 and Grade 12, 35% had a high school diploma, 20% had some college or some university education, and 19% had a bachelor’s, graduate, or professional degree. Full Scale IQ construct validity was assessed by the publisher using a number of other cognitive measures including the WAIS-III (r = 0.94) and the subtests of the WMS-III (rs range from r = 0.34 to r = 0.69). For people 65 years of age and older, reliability coefficients for the WAIS-IV subtests range from r = 0.78 to r = 0.96 and for the WAIS-IV composite scores range from r = 0.91 to r = 0.98. The reliability coefficient for Full Scale IQ is r = 0.98 (Wechsler, 2008).

Wechsler Memory Scale Fourth Edition

The Older Adult battery (for people 65 to 90 years old) consists of seven subtests: logical memory 1 and 2, verbal paired associates 1 and 2, visual reproduction 1 and 2, and symbol span, yielding four indexes: auditory memory, visual memory, immediate memory, and delayed memory. The WMS-IV Older Adult battery was normed on 500 people aged 65 to 90 (mean age of 77.35 years, SD = 7.11). In that sample, 13% of people had Grade 8 or less, 13% had between Grade 9 and Grade 12, 35% had a high school diploma, 19% had some college or some university education, and 20% had a bachelor’s, graduate, or professional degree. According to the publisher, the WAIS-IV FSIQ index’s correlations with the different subtests of the WMS-IV Older Adult battery range from r = 0.44 to r = 0.62, and with the WMS-IV index scores range from r = 0.57 to r = 0.71. The reliability coefficients for the WMS-IV Older Adult battery subtests range from r = 0.74 to r = 0.96, and for the indexes range from r = 0.92 to r = 0.97.

Analyses

Analyses of Variance

In order to determine how similar the normative data were to our new sample, we obtained the scaled scores (i.e., age-adjusted; mean = 10, SD = 3) for healthy older adults from the normative samples for the WAIS-IV (n = 600) and WMS-IV (n = 500) from the publisher. We compared their data against ours using a pair of mixed analyses of variance (ANOVAs), one for WAIS-IV subtests and the other for WMS-IV subtests that were included in the factor analyses. We report effect sizes (Cohen’s d) in all post hoc comparisons to help interpret the practical significance of these findings: d = 0.2 is considered small, d = 0.5 moderate, and d = 0.8 is considered large (Cohen, 1988).

Correlations

Before proceeding with our factor analyses, we ran exploratory Pearson correlations among subtest scores (shown in Table 2). Following the procedure employed by Holdnack et al. (2011), we omitted the immediate versions of the WMS-IV subtests (e.g., logical memory 1) from our analyses (see Holdnack et al., 2011).

Table 2.

Correlation Matrix With Means and Standard Deviations of Subtest Scaled Scores for Our Sample.

	BD	Si	Dsp	Matrix	Voc	Arith	SS	VPuz	In	CD	LM1	LM2	VP1	VP2	VR1	VR2	SSp
Si	0.308	1
Dsp	0.201	0.216	1
Matrix	0.514	0.411	0.319	1
Voc	0.193	0.638	0.308	0.394	1
Arith	0.333	0.332	0.436	0.395	0.351	1
SS	0.360	0.257	0.260	0.273	0.273	0.213	1
VPuz	0.483	0.199	0.208	0.433	0.152	0.355	0.389	1
In	0.283	0.429	0.163	0.419	0.494	0.307	0.183	0.268	1
CD	0.459	0.311	0.242	0.445	0.320	0.292	0.589	0.353	0.258	1
LM1	−0.065	0.049	0.010	−0.107	0.107	−0.069	−0.081	−0.068	−0.061	−0.026	1
LM2	0.160	0.249	0.120	0.228	0.344	0.328	0.218	0.189	0.361	0.152	0.017	1
VP1	−0.007	0.131	0.033	0.009	0.018	0.025	−0.113	−0.028	0.011	0.014	0.338	0.132	1
VP2	0.159	0.236	0.253	0.240	0.316	0.170	0.311	0.182	0.281	0.202	0.283	0.451	0.042	1
VR1	0.221	0.009	0.041	−0.029	0.005	0.036	0.166	0.063	−0.039	0.137	0.342	0.109	0.245	.121	1
VR2	0.246	0.145	0.077	0.332	0.015	0.063	0.252	0.221	0.202	0.141	−0.065	0.268	0.053	.301	.109	1
SSp	0.338	0.220	0.372	0.498	0.310	0.207	0.345	0.351	0.320	0.405	−0.038	0.310	0.084	.388	.155	.256	1
Means	9.96	9.91	10.01	10.25	10.68	10.35	10.66	9.74	10.68	11.44	11.01	11.01	11.07	11.37	9.91	9.03	10.22
SD	2.63	3.03	3.07	3.24	3.01	2.77	2.55	2.52	2.53	2.60	2.79	2.73	2.81	2.85	3.15	3.23	2.72

Note. BD = block design; Si = similarities; Dsp = digit span; Matrix = matrix reasoning; Voc = vocabulary; Arith = arithmetic; SS = symbol search; Vpuz = visual puzzles; In = information; CD = coding; LM1 = logical memory 1; LM2 = logical memory 2; VP1 = verbal pairs 1; VP2 = verbal pairs 2; VR1 = visual reproduction 1; VR2 = visual reproduction 2; SSp = symbol span.

Confirmatory Factor Analyses

We used AMOS-18 and AMOS-19 to discover the best fit for our four main a priori specified models. CFA is preferred over exploratory factor analysis when a specific theoretical model exists (Tabachnick & Fidel, 2007).

Invariance Analyses

We used AMOS-19 to test for strong factorial invariance across the two groups by specifying that factor loadings and intercepts to be equal (constraints were imposed on all factor loadings and latent factors in the model.)

Models

We began by replicating the typical model for WAIS-IV alone, given that the WAIS-IV model is very similar to its previous versions, and has been relatively well accepted. Higher-order models presented below include general ability as an overarching second-order factor, whereas first-order models do not. The typical WAIS model (shown in Figure 1) is a higher-order model (HO WAIS-IV), that includes a second-order general ability factor and first-order verbal comprehension (similarities, vocabulary, and information subtests), perceptual reasoning (block design, matrix reasoning, and visual puzzles subtests), working memory (arithmetic and digit span subtests), and processing speed (coding and symbol search subtests) factors. We also evaluated a first-order model of the WAIS-IV (FO WAIS-IV), which was identical to the higher-order model except that it did not include the second-order general ability factor. We examined the modification indices for potential cross-loading paths that would improve the model fit.

Figure 1.

Higher-order model for the WAIS-IV using the present sample.

We then added scores from the WMS-IV to evaluate the best-fitting possible model advanced by Holdnack et al. (2011). First we tested the first-order model, which consisted of the same verbal comprehension, perceptual reasoning, working memory, and processing speed factors as the WAIS-IV only models, but also included the publisher’s delayed memory factor from the WMS-IV (logical memory 2, verbal pairs 2 and visual reproduction 2 subtests) and added the symbol span subtest to the working memory factor. We examined the modification indices for cross-loading paths that would improve the model fit. In addition, we examined whether the cross-loadings described in Holdnack et al. (2011) would also improve the model fit in our models. The variants included freeing up the correlated uniqueness of error terms 8 and 9, which was also kept for all consequent variants (FOa. WAIS/WMS-IV), allowing the arithmetic subtest to cross-load on the verbal comprehension and working memory factors (FOb. WAIS/WMS-IV), allowing the logical memory 2 subtest to cross-load on the delayed memory and the verbal comprehension factors (FOc. WAIS/WMS-IV), allowing the visual reproduction 2 subtest to cross-load on the perceptual reasoning and delayed memory factors (FOd. WAIS/WMS-IV), and allowing the visual reproduction 2 subtest to cross-load on the perceptual reasoning and delayed memory factors and the symbol span subtest to cross-load on the delayed memory and the working memory factors (FOe. WAIS/WMS-IV; see Table 4).

We then evaluated a higher-order model for our sample by adding a second-order general ability factor (HOa WAIS/WMS-IV; shown in Figure 2). This provided information regarding the statistical contribution of the general ability factor to the model fit.

Figure 2.

Higher-order model for the combined WAIS-IV and WMS-IV batteries for the present sample.

Next, we conducted the same factor analyses on the normative sample, using the publisher’s data on the 286 older adults who completed both the WAIS-IV and the Older Adult battery of the WMS-IV (see Figure 3).

Figure 3.

Higher-order model for the combined WAIS-IV and WMS-IV batteries for the normative sample.

For all models, we used a χ²-test to evaluate goodness of fit (Byrne, 2001). However, because χ² is potentially over-sensitive to larger sample sizes, we examined additional fit indices (as suggested by (Barrett, 2007; Byrne, 2001): the adjusted goodness-of-fit index (AGFI; Bentler, 1983), root mean squared error of approximation (RMSEA; Steiger, 1990), standardized root mean square residual (SRMR; Bentler & Wu, 1995), Tucker–Lewis nonnormed fit index (TLI; Tucker & Lewis, 1973), comparative fit index (CFI; Bentler, 1990), and Schwarz’s Bayesian information criterion (BIC; Schwartz, 1978). RMSEA indicates the extent of fit between the model and the population covariance matrix under optimal parameter values; adequate fit is indicated by RMSEA values of 0.05 or less. SRMR indicates the match between the observed and implied model covariance matrices; a good fit is indicated by smaller residuals; values less than 0.08 are considered a good fit (Hu & Bentler, 1999; Meade, Johnson, & Braddy, 2008). CFI reflects how well the hypothesized model fits with the independence model where all correlations among variables are zero; a good fit occurs when CFI is 0.95 or higher (Hu & Bentler, 1999). Smaller BIC values are preferred and a difference of more than 10 points in the indices suggests a better model fit (Raftery, 1993).

Results

ANOVA

Our sample’s WAIS-IV and WMS-IV scores are shown in Table 2. For WAIS-IV, a mixed 2 (sample: ours vs. normative) × 10 (subtest) ANOVA yielded no significant main effect of sample (F[1, 743] = 1.91, MSE = 46.47, p = .17), but a significant effect of subtest (F[9, 6687] = 6.50, MSE = 4.63, p < .001), and a significant interaction between sample and subtest (F[9, 6687] = 9.71, MSE = 4.63, p < .001). Post hoc independent t-tests with α Bonferroni corrected to 0.005 indicated that two of our sample’s subtest scores were significantly above the normative means (for all normative WAIS and WMS scaled scores, mean = 10 and SD = 3): Information (t[743] =3.07, p = .002, d = 0.31), and coding (t[743] = 5.10, p < .001, d = 0.49). Our sample’s vocabulary scores were marginally higher than the normative group’s (t[743] =2.78, p =.006, d = 0.26). The Cohen’s d values suggested that the differences between our sample and the normative data were small (on vocabulary and information) to moderate (on coding).

For WMS-IV, a mixed 2 (sample: independent versus normative) × 7 (subtest) ANOVA indicated a main effect of sample (F[1, 642] = 6.29, MSE = 30.33, p = .01), a main effect of subtest (F[6, 3852] = 17.70, MSE = 5.37, p < .001), and a significant interaction between sample and subtest (F[6, 3852] = 16.62, MSE = 5.37, p < 0.001). Post hoc independent t-tests (Bonferroni corrected to 0.007) showed that four of our scores were significantly above the normative mean: logical memory 1 (t[643] =4.05, p < .001, d = 0.38), logical memory 2 (t [643] =2.91, p = .004, d = 0.28), verbal paired associates 1 (t[643] =4.45, p < .001, d = .43), and Verbal Paired Associates 2 (t[643] =4.12, p < .001, d = 0.39); these differences were in the small-to-moderate range. One of our scores was significantly below the normative mean: visual reproduction 2 (t[643] = −4.27, p < .001, d = −0.39).

Correlations

As expected, all correlations among the subtests were positive and almost all were statistically significant (even when we used a stringent alpha level of 0.005, to adjust for multiple correlations), as shown in Table 2. Particularly high correlations occurred between scores that are part of the same index. For example, vocabulary and similarities both load on the verbal comprehension index and yielded r = 0.64, and symbol search and coding both load on the processing speed index and yielded r = 0.59.

Confirmatory Factor Analysis

Examination of the WAIS-IV higher-order and first-order (HO WAIS-IV and FO WAIS-IV) models of our sample data (Figure 1 and Table 3) revealed a similar pattern. The fit statistics for the two models indicated a good fit for both as evident by CFI values close to 1, high TLI values, and RMSEA values close to and lower than 0.50. Evaluation of the modification indexes for both models did not suggest that allowing cross-loadings would significantly improve the model fit. The lower BIC value of the higher-order model was indicative of better fit for the more parsimonious model that included general ability as a second-order factor, thus we preferred the higher-order model.

Table 3.

First-Order and Higher-Order Models for the WAIS-IV Using the Present Sample.

Model	χ²	df	AGFI	RMSEA	SRMR	CFI	TLI	BIC
FO WAIS-IV our sample	39.139	29	0.905	0.049	0.049	0.976	0.963	168.534
HO WAIS-IV our sample	42.860	31	0.901	0.052	0.052	0.972	0.960	162.301

Note. AGFI = adjusted goodness-of-fit index; RMSEA = root mean squared error of approximation; SRMR = standardized root mean square residual; CFI = comparative fit index; TLI = Tucker–Lewis nonnormed fit index; BIC = Schwarz’s Bayesian information criterion

When we examined the combined WAIS-IV/WMS-IV first-order model (Table 4), the fit statistics of the model were less than satisfactory as indicated by CFI and TLI values of less than 0.95, and a RMSEA of 0.077, which was higher than the suggested 0.05 or less. Examination of the fit indices suggested that freeing up the unique variances of two subtests within the same factor (symbol span and arithmetic, which loaded on the working memory factor) would significantly improve the model fit. Freeing up the two unique error variances let to a χ² reduction of 22 points, df = 1, p < .001, and a higher CFI (0.940), higher TLI (0.917), and lower RMSEA (0.061). In addition, we examined if the cross-loadings described in Holdnack et al. (2011) will also improve the model fit in our models. Only one of the cross-loadings (allowing the visual reproduction subtest to cross-load on the perceptual reasoning and delayed memory factors) had a significant χ² value, however, the factor loading was small (less than 0.25), thus we did not retain this path in our final model. We then evaluated a higher-order model for our sample by adding a second-order general ability factor (HOa WAIS/WMS-IV; shown in Figure 2). The fit statistics of the two models were comparable, however, similarly to the WAIS-IV only model, the BIC value favoured the more parsimonious higher-order model. Thus, in the end we retained the higher-order model with freed up unique error variances of the arithmetic and symbol span subtests (HOa WAIS/WMS-IV).

Table 4.

First-Order and Higher-Order Models for the WAIS-IV/WMS-IV Using the Present Sample.

Model	χ²	df	AGFI	RMSEA	SRMR	CFI	TLI	BIC	Δχ²	df	p
FO WAIS/WMS-IV	123.693	67	0.835	0.077	0.064	0.904	0.870	312.809
FOa. WAIS/WMS-IV	101.767	66	0.856	0.061	0.061	0.940	0.917	295.859	21.926	1	.000
FOb. WAIS/WMS-IV	101.525	65	0.854	0.062	0.061	0.938	0.914	300.594	0.242	1	.623
FOc. WAIS/WMS-IV	101.212	65	0.855	0.062	0.060	0.939	0.914	300.281	0.555	1	.456
FOd. WAIS/WMS-IV	97.317	65	0.860	0.059	0.058	0.945	0.924	296.386	4.450	1	.035
FOe. WAIS/WMS-IV	96.348	64	0.859	0.059	0.057	0.945	0.922	300.394	0.969	1	.325
HOa. WAIS/WMS-IV	110.216	71	0.855	0.062	0.062	0.934	0.915	279.425

Freeing up the correlated uniqueness of e8 and e9.

Freeing up the correlated uniqueness of e8 and e9 and arithmetic cross-loads on working memory and verbal comprehension.

Freeing up the correlated uniqueness of e8 and e9 and logical memory 2 cross-loads on delayed memory and verbal comprehension.

Freeing up the correlated uniqueness of e8 and e9 and visual reproduction 2 cross-loads on delayed memory and perceptual reasoning

Freeing up the correlated uniqueness of e8 and e9 and visual reproduction 2 cross-loads on delayed memory and perceptual reasoning and symbol span cross-loads on working memory and delayed memory.

Once we had completed the analyses for our sample, we returned to the normative dataset and replicated our initial analyses with it (Figure 3 and Table 5). We found essentially the same patterns in the normative dataset as we found in our sample. That is, freeing up the unique error variances of the same two subtests significantly improved the model fit. In the normative sample model, the same cross-loading path led to a significant χ² value (allowing the visual reproduction to cross-load on the perceptual reasoning and delayed memory factors). In this model, however, the factor loading was higher (0.38), thus we retained the cross-loading path in the model. Adding the general ability factor as a second-order factor in the model led to a better fitting model as indicated by the lower BIC value. The model that we retained was the higher-order model with freed up unique error variances for the arithmetic and symbol span subtests and a cross-loading of the visual reproduction subtest to the perceptual reasoning and delayed memory factors (HOd. WAIS/WMS-IV normative sample).

Table 5.

First-Order and Higher-Order Models for the WAIS-IV/WMAS-IV Using the Normative Sample.

	χ²	df	AGFI	RMSEA	SRMR	CFI	TLI	BIC	Δχ²	df	p
FO WAIS/WMS-IV	145.974	67	0.891	0.064	0.050	0.956	0.940	360.902
FOa. WAIS/WMS-IV	134.731	66	0.898	0.060	0.048	0.961	0.947	355.314	11.243	1	.000
FOb. WAIS/WMS-IV	129.773	65	0.900	0.059	0.047	0.964	0.949	356.013	4.958	1	.026r = 0.24
FOc. WAIS/WMS-IV	128.572	65	0.902	0.059	0.047	0.964	0.950	354.812	6.159	1	.013r = 0.27
FOd. WAIS/WMS-IV	109.829	65	0.913	0.049	0.040	0.975	0.965	336.069	24.902	1	.000r = 0.40
FOe WAIS/WMS-IV	102.439	64	0.918	0.046	0.038	0.978	0.969	334.335	7.39	1	.007r = 0.28
HOd. WAIS/WMS-IV	121.757	70	0.914	0.051	0.044	0.971	0.962	319.717

Freeing up the correlated uniqueness of e8 and e9.

Freeing up the correlated uniqueness of e8 and e9 and arithmetic cross-loads on working memory and verbal comprehension.

Freeing up the correlated uniqueness of e8 and e9 and logical memory 2 cross-loads on delayed memory and verbal comprehension.

Freeing up the correlated uniqueness of e8 and e9 and visual reproduction 2 cross-loads on delayed memory and perceptual reasoning.

Freeing up the correlated uniqueness of e8 and e9 and visual reproduction 2 cross-loads on delayed memory and perceptual reasoning and symbol span cross-loads on working memory and delayed memory.

In addition to examining the factor structure of the WAIS-IV and WMS-IV batteries combined in our sample and the normative sample, we conducted analyses of invariance to test the assumption of equal variance across the two samples. We imposed equality of variance constraints on all factor loadings and all latent variables, including the second-order general ability factor (Model 2; see Table 6). The model we used to test the equality of variance assumption was the most parsimonious higher-order model with freed up unique error variances for the arithmetic and symbol span subtests. Results indicated that there was a statistically significant difference between Model 1 (the model with no constraints imposed) and Model 2, Δχ² = 57.002, df = 16, p < .001, thus we failed to establish strong measurement invariance. Next, we removed the constraints imposed on the second-order general ability factor to evaluate the contribution of the factor to the invariance in the model across the two samples and compared this model to Model 2. Results indicated that there was a statistically significant invariance across the two samples in their scores on the general ability factor, Δχ² = 5.164, df = 1, p = .023. Next, we systematically evaluated the invariance in the other five factors by removing the equality constraints for one factor at a time (see Table 7). The only other factor for which results were statistically significant was the perceptual reasoning factor, Δχ² = 7.624, df = 2, p = .022. Thus, we established weak measurement invariance between our sample and the normative sample, but failed to establish strong measurement invariance. The two factors contributing to the invariance in the most constrained model were the general ability factor and the perceptual reasoning factor.

Table 6.

Measurement Invariance Testing Between the Present and the Normative Sample.

Model	Specifications	χ²	df	p
Model 1	No constraint	251.983	142	.000
Model 2	Strong Invariance testing	308.985	158	.000
Model 3	General ability factor constraint removed	303.821	157	.000
Model 4	Verbal comprehension factor constraint removed	303.372	156	.000
Model 5	Perceptual reasoning factor constraint removed	301.361	156	.000
Model 6	Working memory factor constraint removed	303.510	156	.000
Model 7	Processing speed factor constraint removed	303.813	156	.000
Model 8	Delayed memory factor constraint removed	303.762	156	.000

Table 7.

Invariance Testing, Comparisons Between Models.

Comparisons	Δχ²	df	p
Comparing Model 1 and Model 2	57.002	16	.000
Comparing Model 3 and Model 2	5.164	1	.023
Comparing Model 4 and Model 2	5.613	2	.060
Comparing Model 5 and Model 2	7.624	2	.022
Comparing Model 6 and Model 2	5.475	2	.065
Comparing Model 7 and Model 2	5.172	2	.075
Comparing Model 8 and Model 2	5.223	2	.073

Discussion

We collected independent evidence on the new WAIS-IV and WMS-IV for older adults. Our scaled scores for the WAIS-IV and WMS-IV subtests were relatively close to the published norms, albeit slightly higher. This finding is similar to many previous studies bringing community members into a university for testing (e.g., Glisky, Rubin, & Davidson, 2001; Salthouse, 2010; Soubelet & Salthouse, 2011; Tucker-Drob, 2011). We had a slightly younger, more highly educated sample than the WAIS and WMS normative groups. The largest differences between our sample and the normative one were for coding (d = 0.49) and verbal paired associates 1 (d = 0.43), but these were still only approximately half a standard deviation in size. Note too that our mean scaled scores were not always above the norm: visual reproduction 2 was significantly below the normative mean.

Our best-fitting models were very similar to the ones previously published in young and middle-aged people from the normative sample (Holdnack et al., 2011). This was the case even though we had to omit the designs and spatial addition WMS-IV subtests used by those authors, because those subtests are not part of the WMS-IV Older Adult battery. In addition, the freeing of the same unique error variance led to a significant improvement in the model fit in our sample and the normative sample. Once we had ascertained the best-fitting model for our data, we found that it also fit well with the normative data.

Thus, even though cognition declines with age (especially in memory and processing speed), the interrelations among the factors that make up the WAIS-IV and WMS-IV appear to remain relatively stable in aging. Consistent with this idea, Salthouse and Saklofske (2010) reported that the factor structure of the WAIS-IV normative sample data was similar in younger and older adults (e.g., see also (Bowden, Weiss, Holdnack, & Lloyd, 2006).

We were able to establish weak measurement invariance between our sample and the normative sample, indicating that the factor loading variances remained the same across the two samples. However, we failed to establish strong measurement invariance; the two factors contributing to the variability across the two samples were general ability and perceptual reasoning. This finding makes clear the need for new, independent samples to be collected and compared against the normative one.

Future Work

The present study is the first to independently examine the factor structure of the combined WAIS-IV and WMS-IV Older Adult batteries. In future work, along with further replication of the aging findings in new datasets, we must also examine performance in dementia as well as in other (e.g., developmental and psychiatric) disorders. When constructing the new norms, the publisher screened out possibly impaired participants using a new brief cognitive status test. The publisher also provides normative data from people with Alzheimer’s disease and mild cognitive impairment, but these need to be supplemented by researchers in the field. For instance, only 36 people with MCI (collapsed across subtype) were administered the Older Adult WMS-IV (Wechsler, 2009). Arguably the best strategy would be to follow cognitively normal and mildly impaired participants for a few years and then retroactively exclude those who end up showing signs of dementia. Of course, very few studies do this, for reasons of feasibility.

Finally, further theoretical and empirical work is needed on WAIS-IV and WMS-IV. On a theoretical level, although both the WAIS and WMS have evolved to better conform to current theories of intelligence, cognition, and neuropsychology (Coalson et al., 2010; Drozdick, Wahlstrom, Zhu, & Weiss, 2012; Kaufman, 2010), in particular the WAIS remains the focus of considerable controversy. For example, many researchers have argued that the WAIS is better described by the Cattell–Horn–Carroll theory than by the model outlined in the Wechsler manual (e.g., Benson et al., 2010; Ward, Bergman, & Hebert, 2012); for a review, see (McGrew, 2009), but there is still disagreement over this issue, and competing theories and measures of intelligence exist (e.g., Reynolds & Kamphaus, 2003).

On an empirical level, our confirmatory analyses were guided by Holdnak et al. (2011), but we needed to make adjustments to our models because the WMS-IV Older Adult battery does not include two of the subtests used in the general WMS-IV battery that Holdnack et al. used in their study. Thus, we could not test all their possible models. Thus, future work using both the Older Adult and the standard WMS-IV battery is potentially fruitful. Not only confirmatory, but also further exploratory factor analyses (especially with cognitively-impaired groups) will likely be useful. Exploratory factor analyses have yielded several interesting findings with previous editions of the WAIS and WMS (e.g. Bowden et al., 1999; Bowden et al., 2001; Burton, Ryan, Axelrod, Schellenberger, & Richards, 2003; Millis, Malina, Bowers, & Ricker, 1999; Price, Tulsky, Millis, & Weiss, 2002; Tulsky & Price, 2003), and it is likely that such work with the new versions will too.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by grants from the Natural Sciences and Engineering Research Council of Canada (D. M., P. D., C.M.).

Notes

References

Barrett

(2007). Structural equation modeling: Adjusting model fit. Personality and Individual Differences, 42, 815-824.

Benson

Hulac

D. M.

Kranzler

J. H.

(2010). Independent examination of the Wechsler adult intelligence scale-fourth edition (WAIS-IV): What does the WAIS-IV measure? Psychological Assessment, 22(1), 121-130.

Bentler

P. M.

(1983). Some contributions to efficient statistics for structural models: Specification and estimation of moment structures. Psychometrika, 48, 493-571.

Bentler

P. M.

(1990). Comparative fit indexes in structural models. Psychological bulletin, 107, 238-246.

Bentler

P. M.

E. J. C.

(1995). Eqs for windows user’s guide. Encino, CA: Multivariate Software.

Bowden

S. C.

Carstairs

J. R.

Shores

E. A.

(1999). Confirmatory factor analysis of combined Wechsler adult intelligence scale—revised and Wechsler memory scale—revised scores in a healthy community sample. Psychological Assessment, 11, 339-344.

Bowden

S. C.

Ritter

A. J.

Carstairs

J. R.

Shores

E. A.

Pead

Greeley

J. D.

Clifford

C.C.

(2001). Factorial invariance for combined WAIS-R and WMS-R scores in a sample of patients with alcohol dependency. Clinical Neuropsychologist, 15, 69-80.

Bowden

S. C.

Saklofske

D. H.

Weiss

L. G.

(2011). Augmenting the core battery with supplementary subtests: Wechsler adult intelligence scale-iv measurement invariance across the united states and canada. Assessment, 18(2), 133-140.

Bowden

S. C.

Weiss

L. G.

Holdnack

J. A.

Lloyd

(2006). Age-related invariance of abilities measured with the Wechsler adult intelligence scale-iii. Psychological Assessment, 18, 334-339.

10.

Brooks

B. L.

Holdnack

J. A.

Iverson

G. L.

(2011). Advanced clinical interpretation of the WAIS-IV and WMS-IV: Prevalence of low scores varies by level of intelligence and years of education. Assessment, 18, 156-167.

11.

Burton

D. B.

Ryan

J. J.

Axelrod

B. N.

Schellenberger

Richards

H. M.

(2003). A confirmatory factor analysis of the wms-iii in a clinical sample with cross-validation in the standardized sample. Archives of Clinical Neuropsychology, 18, 629-641.

12.

Butler

Retzlaff

Vanderploeg

(1991). Neuropsychological test usage. Professional Psychology-Research and Practice, 22, 510-512.

13.

Byrne

B. M.

(2001). Structural equation modeling: Perspectives on the present and the future. International Journal of Testing, 1, 327-334.

14.

Canivez

G. L.

Watkins

M. W.

(2010). Investigation of the factor structure of the Wechsler adult intelligence scale-fourth edition (WAIS-IV): Exploratory and higher order factor analyses. Psychological Assessment, 22, 827-836.

15.

Coalson

Raiford

S. E.

Saklofske

D. H.

Weiss

(2010). Advances in the assessment of intelligence. In Weiss

Saklofske

D. H.

Coalson

Raiford

S. E.

(Eds.), WAIS-IV clinical use and interpretation: Scientist-practitioner perspectives (pp. 3-24). San Diego, CA: Academic Press.

16.

Cohen

(1988). Statistical power analysis for the behavioral sciences (2nd edition). Hillsdale, NJ: Erlbaum.

17.

Drozdick

L. W.

Cullum

C. M.

(2011). Expanding the ecological validity of WAIS-IV and WMS-IV with the texas functional living scale. Assessment, 18(2), 141-155.

18.

Drozdick

L. W.

Wahlstrom

Zhu

Weiss

L. G.

(2012). The Wechsler adult intelligence scale—fourth edition and the Wechsler memory scale—fourth edition. In Flanagan

D. P.

Harrison

P. L.

(Eds.), Contemporary intellectual assessment (3rd ed., pp 197-223). New York, NY: Guilford Press.

19.

Glisky

E. L.

Rubin

S. R.

Davidson

P. S.

(2001). Source memory in older adults: An encoding or retrieval problem? Journal of experimental psychology, Learning, memory, and cognition, 27, 1131-1146.

20.

Hartman

D. E.

(2009). Test review Wechsler adult intelligence scale iv (WAIS IV): Return of the gold standard. Applied Neuropsychology, 16, 85-87.

21.

Hoelzle

J. B.

Nelson

N. W.

Smith

C. A.

(2011). Comparison of Wechsler memory scale-fourth edition (wms-iv) and third edition (wms-iii) dimensional structures: Improved ability to evaluate auditory and visual constructs. Journal of Clinical and Experimental Neuropsychology, 33, 283-291.

22.

Holdnack

J. A.

Xiaobin

Larrabee

G. J.

Millis

S. R.

Salthouse

T. A.

(2011). Confirmatory factor analysis of the WAIS-IV/WMS-IV. Assessment, 18, 178-191.

23.

Bentler

P. M.

(1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1-55.

24.

Kaufman

A. S.

(2010). Foreword. In Weiss

Saklofske

D. H.

Coalson

Raiford

S. E.

(Eds.), WAIS-IV clinical use and interpretation: Scientist-practitioner perspectives (pp. xiii-xxi). San Diego, CA: Academic Press.

25.

Loring

D. W.

Bauer

R. M.

(2010). Testing the limits: Cautions and concerns regarding the new Wechsler iq and memory scales. Neurology, 74, 685-690.

26.

McGrew

K. S.

(2009). Chc theory and the human cognitive abilities project: Standing on the shoulders of the giants of psychometric intelligence research. Intelligence, 37, 1-10.

27.

Meade

A. W.

Johnson

E. C.

Braddy

P. W.

(2008). Power and sensitivity of alternative fit indices in tests of measurement invariance. Journal of applied psychology, 93, 568-592.

28.

Millis

S. R.

Malina

A. C.

Bowers

D. A.

Ricker

J. H.

(1999). Confirmatory factor analysis of the wms-iii. Journal of Clinical and Experimental Neuropsychology, 21, 87-93.

29.

Price

Tulsky

Millis

Weiss

(2002). Redefining the factor structure of the Wechsler memory scale-iii: Confirmatory factor analysis with cross-validation. Journal of Clinical and Experimental Neuropsychology, 24, 574-585.

30.

Rabin

L. A.

Barr

W. B.

Burton

L. A.

(2005). Assessment practices of clinical neuropsychologists in the united states and canada: A survey of ins, nan, and apa division 40 members. Archives of Clinical Neuropsychology: The Official Journal of the National Academy of Neuropsychologists, 20(1), 33-65.

31.

Raftery

A. E.

(1993). Bayesian model selection in structural equation models. In Bollen

K. A.

Long

J. S.

(Eds.), Testing structural equation models (pp. 163-180). Newbury Park, CA: SAGE.

32.

Reynolds

C. R.

Kamphaus

R. W.

(2003). Reynolds intellectual assessment scales and the reynolds intellectual screening test. Lutz, FL: Psychological Assessment Resources.

33.

Salthouse

T. A.

(2010). Does the meaning of neurocognitive change change with age? Neuropsychology, 24, 273-278.

34.

Salthouse

T. A.

Saklofske

D. H.

(2010). Do the WAIS-IV tests measure the same aspects of cognitive functioning in adults under and over 65? In Weiss

L. G.

Saklofske

D. H.

Coalson

D. L.

Raiford

S. E.

(Eds.), WAIS-IV: Clinical use and interpretation (pp. 217-235). San Diego, CA: Elsevier.

35.

Schwartz

(1978). Estimating the dimensions of a model. Annals of Statistics, 6, 461-464.

36.

Soubelet

Salthouse

T. A.

(2011). Personality-cognition relations across adulthood. Developmental psychology, 47, 303-310.

37.

Stanos

J. F.

(2004). Test review: Wechsler abbreviated scale of intelligence. Rehabilitation Counseling Bulletin, 48, 56-57.

38.

Steiger

J. H.

(1990). Structural model evaluation and modification—An interval estimation approach. Multivariate Behavioral Research, 25, 173-180.

39.

Sullivan

Bowden

S. C.

(1997). Which tests do neuropsychologists use? Journal of Clinical Psychology, 53, 657-661.

40.

Tabachnick

B. G.

Fidel

L. S.

(2007). Using multivariate statistics (5th ed.). Boston, MA: Allyn and Bacon.

41.

Tucker

L. R.

Lewis

(1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1-10.

42.

Tucker-Drob

E. M.

(2011). Global and domain-specific changes in cognition throughout adulthood. Developmental psychology, 47, 331-343.

43.

Tulsky

D. S.

Price

L. R.

(2003). The joint WAIS-III and WMS-III factor structure: Development and cross-validation of a six-factor model of cognitive functioning. Psychological Assessment, 15(2), 149-162.

44.

Ward

L. C.

Bergman

M. A.

Hebert

K. R.

(2012). WAIS-IV subtest covariance structure: Conceptual and statistical considerations. Psychological Assessment, 24, 328-340.

45.

Wechsler

(2008). Wechsler adult intelligence scale (4th ed.). San Antonio, TX: Pearson.

46.

Wechsler

(2009). Wechsler memory scale (4th ed.). San Antonio, TX: Pearson.