Abstract
Functional assessment methods are an important element in multidimensional neuropsychological evaluations, particularly in older adults. The Adults and Older Adults Functional Assessment Inventory is a new measure of basic and instrumental activities of daily living. Rasch model analyses were used to analyze the psychometric characteristics of the instrument in a sample of 803 participants. The original categories did not provide an optimal assessment of functional incapacity. The scale was dichotomized to achieve a better reliability score and item fit. The final 50 items revealed a moderately high variability in item difficulty, acceptable fits to items and persons, and a good Person Separation Reliability score. The scores were able to discriminate between normal controls and clinical patients. None of the items showed Differential Item Functioning associated with age, gender, or education. The instrument is able to achieve measures of functional incapacity with the useful properties of the Rasch model.
Introduction
Age projections reveal that the proportion of individuals over 60 years old is growing rapidly (World Health Organization, 2002). Projections from the Portuguese census estimate that by 2060, 32.3% of the total Portuguese population will be older than 65 years and 13.3% will be older than 80 years (Instituto Nacional de Estatística, 2009). This increase in the proportion of elderly individuals in the population has been associated with a higher prevalence of chronic conditions such as cardiovascular disease, stroke, diabetes, and musculoskeletal conditions as well as mental health conditions such as dementia and depression (World Health Organization, 2002). These conditions have a considerable incidence in adult population, although it is in the elderly population they became more relevant. These medical conditions, as well as the normative cognitive changes that occur as part of the aging process, are associated with important impairments in the capacity to perform basic activities of daily living (BADLs) and instrumental activities of daily living (IADLs; Wood et al., 2005). Therefore, the elderly population experiences a higher level of dependency and a lower quality of life (e.g., Bourdel-Marchasson, Helmer, Fagot-Campagna, Dehail, & Joseph, 2007; Wada et al., 2005).
In 1990, the World Health Organization introduced the term “Active Aging,” which was defined as “ … the process of optimizing opportunities for health, participation and security in order to enhance quality of life as people age” (World Health Organization, 2002, p. 12). The year 2012 was the international year of Active Aging, in which the fundamental goal was to maintain and/or increase the quality of life in older adults as well as to ensure their autonomy and independence. Autonomy refers to the ability to make decisions according to one’s own values and preferences, and independence refers to the ability to perform the BADLs and IADLs with little or no help from others (World Health Organization, 2002). To help an individual achieve autonomy and independence in clinical practice, a multidimensional evaluation of the individual must include an assessment of that individual’s ability to conduct BADLs and IADLs (e.g., Burns, Lawlor, & Craig, 2004; Potter & Attix, 2006).
This functional assessment should be integrated with neuropsychological evaluations for several reasons. First, functional limitations are considered strong predictors of overall health (Marengoni et al., 2004) and death as well as of the likelihood of an admission to a nursing home (Gill, 2010). Furthermore, although cognitive function and daily living function are closely associated, neuropsychological tests of cognitive function do not explain all of the variance in the ability to perform daily living activities (Potter & Attix, 2006). In addition, functional decline is also used as a separate criterion from cognitive impairment to identify neurodegenerative conditions such as mild cognitive impairment and dementia (Marson & Hebert, 2006).
Functional Assessment
Functional capacity encompasses a wide range of specific abilities that are required to function independently in daily living. The BADLs refers to self-care tasks, including feeding, dressing, bathing, continence, mobility, and transference. These activities normally involve automatic procedural memory processes and basic motor functions but do not require attentional processes. In contrast, IADLs requires higher level cognitive functions (memory, attention, and executive function) and refers to complex tasks that are necessary to function independently in the home and the community. These IADLs include the preparation of meals, housekeeping tasks, and home security (IADLs-Household; IADLs-H) as well as comprehension and communication skills to make medical or financial decisions (IADLs-Advanced; IADLs-A; Marson & Hebert, 2006). There are several existing methods to determine functional capacity. Although self-report measures and reports by a third party (e.g., a family member or caretaker) are the most common methods in clinical practice, direct observation and performance-based methods show greater benefits than self-report measures (Moore, Palmer, Patterson, & Jeste, 2007). However, some studies have concluded that there are no significant differences between these methods (Hoeymans, Feskens, van den Bos, & Kromhout, 1996; West et al., 1997), meaning that the association between performance and self-report is strong.
Several instruments are available to examine functional status. Some instruments have been designed specifically to examine the BADLs, including the Katz (Katz, Ford, Moskowitz, Jackson, & Jaffe, 1963) and Barthel Indexes (Mahoney & Barthel, 1965). Other instruments, such as the Lawton and Brody Instrumental Activities of Daily Living Scale (Lawton & Brody, 1969), have been designed to assess the IADLs. However, some instruments include items to assess both BADLs and IADLs tasks, for example the Functional Independence Measure (FIM; Keith, Granger, Hamilton, & Sherwin, 1987) and the Functional Activities Questionnaire (FAQ; Pfeffer, Kurosaki, Harrah, Chance, & Filos, 1982). In addition, several instruments have been developed to assess functional capacity in specific medical conditions, such as the Disability Assessment for Dementia Scale (DAD; Gélinas, Gauthier, McIntyre, & Gauthier, 1999) for dementia, the Alzheimer’s Disease Cooperative Study Scale for ADL in Mild Cognitive Impairment (ADCS MCI ADL; Galasko et al., 1997) for mild cognitive impairment, and the World Health Organization Disability Assessment Schedule–II (WHODAS-II; World Health Organization, 2000) for general mental health conditions.
These functional assessment instruments use several methods to determine functional capacity, including by difficulty level (e.g., the WHODAS-II includes four difficulty levels, namely, “mild,” “moderate,” “severe,” and “extreme”), by dependence/independence level (e.g., the Barthel Index includes options for “dependent” or “independent” as well as “needs help” for a subset of items), and by execution level (e.g., the laundry item of the Lawton & Brody scale includes three execution levels, that is, “does personal laundry completely,” “launders small items; rinses stockings, etc.,” and “all laundry must be done by others”). These methods offer distinct ways to assess functional capacity. However, this variety of methods also hinders the ability to develop a comprehensive and integrative functional assessment instrument. Therefore, an Item Response Theory (IRT) procedure such as the Rasch model may provide an optimal way to develop new functional measures.
Applying the Rasch Model to Functional Assessment Instruments
In Classical Test Theory, interpretations are based on group-referenced norms. The main advantage of the Rasch model is that this model considers interactions between persons and items using the same logits interval scale for both (Hobart & Cano, 2009). Therefore, the Rasch model facilitates the interpretation of the relationship between latent variables and items (Thomas, 2011). Although Rasch models were originally developed for the analysis of dichotomous items (two response categories), these models have been adapted to analyze polytomous items (more than two response categories). For dichotomous items, Rasch model is represented as:
where P
ni
is the probability that person n passes item i, B
n
is the person ability level, and D
i
is the item location (Rasch, 1960).
For the polytomous items, the Rating Scale Model (RSM) is the more used model (Andrich, 1978). The RSM, which is an extension of the Rasch model (Thomas, 2011) with good metric properties (Prieto & Delgado, 2007), is represented as:
where P
nik
is the probability that person n chooses category k for the response to item i, P
ni
(k−1) is the probability that person n chooses category k − 1 for the response to item i, B
n
is the overall ability level of person n, D
i
is the difficulty of item i, and F
k
is the likelihood of choosing a response from category k relative to k−1. This step calibration is a rating scale threshold that is defined as the location that corresponds to an equal probability of a response in adjacent categories k−1 and k (Andrich, 1978; Bond & Fox, 2007).
The RSM has been important for instrument development (Walker, Böhnke, Cerny, & Strasser, 2010) because it enables an empirical study of the response categories (Knutsson, Rydstrom, Reimer, Nyberg, & Hagell, 2010). An analysis of the response categories is important for instrument development because response categories must reflect the construct to be assessed and should not produce ambiguous responses (Bond & Fox, 2007). The RSM has been applied to several functional assessment instruments (Lindeboom, Vermeulen, Holman, & de Haan, 2003). Rasch models and other IRT procedures have also been useful in the psychometric characterization of functional assessment measures, including reliability, construct validity, content validity (Fieo, Austin, Starr, & Deary, 2011), and dimensionality (Breithaupt & McDowell, 2001). The psychometric characteristics of several well-known functional assessment instruments, including the Lawton IADL scale (McGrory, Shenkin, Austin, & Starr, 2013), Barthel Index (Morton, Keating, & Davidson, 2008), and the FIM (Granger, Deutsch, & Linn, 1998), have been assessed by IRT procedures.
The main purpose of this study is to apply the RSM to the Adults and Older Adults Functional Assessment Instrument–experimental version (Inventário de Avaliação Funcional de Adultos e Idosos [IAFAI]; Sousa, Simões, Pires, Vilar, & Freitas, 2008), which is a new instrument to assess the functional incapacity of adults and older adults that includes BADL, IADL-H, and IADL-A items. The RSM is used to study the original response categories of the IAFAI as well as its dimensionality, reliability, item difficulty, fit indexes for items and persons, ability to differentiate normal controls from patients with several clinical conditions, effects of age, gender, and education as well as Differential Item Functioning (DIF).
Method
Participants and Procedures
The sample of 803 participants (Table 1) included a comparison group of 567 community-dwelling adults and older adults and a clinical group of 236 patients with several neurological (mild cognitive impairment, dementia, epilepsy, traumatic brain injury, and stroke) or psychiatric diagnoses (depression, anxiety, and schizophrenia). Participants were excluded from the comparison group if they had an actual or previous neurological, psychiatric, or psychological disease as well as if they had some orthopedic or other medical condition that affects functional status. Informed consent was obtained from all participants. A trained psychologist administered the neuropsychological assessment to each participant that includes not only the IAFAI but also the instruments for cognitive and depressive symptoms screening. Participants in the clinical group were referred to the study by their doctors and examined in a hospital setting. Only the participants with recognized diagnosis were considered. Participants in the comparison group were assessed in the community (through the presentation of the study in day care centers and parish councils) or in general medical centers (in contexts of the routine medical examinations), all over the country.
Sample Characteristics.
Note. M = Mean; SD = Standard Deviation; N = 803 participants.a18 missing values in clinical group.
Measure
The IAFAI (Sousa et al., 2008) is a new comprehensive instrument to assess the functional incapacity of adults and older adults. The IAFAI was developed to provide a useful and specialized tool to be used in contexts of neuropsychological assessment in both clinical and forensic settings (Sousa, Simões, Firmino, & Peisah, 2013). During the development of the IAFAI, the conceptual model considered was the model proposed by Marson and Hebert (2006) who differentiated daily living activities into three main groups: BADL, IADL-H, and IADL-A (Marson & Hebert, 2006). The International Classification of Functioning, Disability, and Health (World Health Organization, 2001) was also considered to integrate contextual factors in the definition of functional incapacity.
The IAFAI includes both BADL and IADL items to enable a comprehensive assessment of functional incapacity in neuropsychological evaluations. A prior study has shown that including both BADL and IADL items in the same scale improves measurement sensitivity (Spector & Fleishman, 1998). The first experimental version of the IAFAI was composed of 84 items; however, the final experimental version (Sousa, Vilar, Pires, Freitas, & Simões, in press) that will be studied with IRT analysis is composed of 53 items, including 18 BADL items, 18 IADL-H items, and 17 IADL-A items. The BADL items encompass four domains (feeding, dressing, bathing and continence, and mobility and transference), the IADL-H items encompass four domains (conversation and telephone use, meal preparation, housekeeping, and home security), and the IADL-A items encompass five domains (comprehension and communication, health-related decision making, finances, going out and transportation use, and leisure and interpersonal relationships).
Existing functional assessment instruments use several methods to determine functional incapacity, including by difficulty level, by dependence/independence level, or by execution level. The IAFAI attempts to combine these measurement approaches to generate a more reliable indicator of functional incapacity. Nine distinct response categories were developed for the IAFAI to assess the dynamic process of functional decline, namely, the independence levels include “independence without difficulty,” “independence with little difficulty,” “independence with much difficulty,” and “modified independence” (i.e., independence that involves some external devices); the dependence levels include “supervision without difficulty,” “supervision with little difficulty or help without difficulty,” “supervision with much difficulty or help with little difficulty,” “help with much difficulty,” and “incapable/unable to do” (i.e., the extreme level of functional incapacity; Sousa et al., in press).
Statistical Analysis
Descriptive statistics were computed with the Statistical Package for Social Sciences 20.0 (SPSS 20.0; IBM SPSS, Chicago, IL). Rasch analyses were performed in WINSTEPS (Linacre, 2012). The RSM was used because all of the IAFAI items include multiple response categories (Wright, 1999). To analyze the functionality of the response categories, the Linacre (2002) criteria were used, that is, (i) a minimum of 10 observations from each response category, (ii) a regular distribution of the observations among the categories, (iii) a monotonic increase in the average measure in each category, (iv) an average residual (infit and/or outfit) with a value less than 2.0, and (v) a monotonic increase in the step calibration between categories. When some of these criteria are not met, adjacent categories should be combined, and the data should be reanalyzed (Andrich, de Jong, & Sheridan, 1997). The IAFAI category responses were also analyzed visually with category characteristic curves. After the category response analysis was conducted, the model fit was analyzed for persons and items. In our study, fit analysis was done using outfit and infit indexes. Outfit is the mean of the squared standardized residuals (differences between the observed responses and those predicted by the model) and infit is the mean of the squared standardized residuals weighted by the information function. The interpretation of the misfit values (infit and/or outfit) followed the criteria that were established by Linacre (2012), that is, (i) values between 0.5 and 1.5 indicate that the items are important for the measure, (ii) values between 1.5 and 2.0 indicate that the items produce a moderate misfit to the measure, and (iii) values higher than 2.0 indicate that the items produce a severe misfit to the measure (and should be excluded from the measure).
Because Rasch models are highly dependent on unidimensionality (Tennant & Pallant, 2006), the dimensionality of the IAFAI was analyzed using a principal component analysis (PCA) of the residuals. This analysis looks for patterns in the residuals, which represent the portion of the data that do not agree with the Rasch measures. PCAs attempt to find a component that explains the largest amount of variance in the residuals under the assumption that the residuals do not represent random noise. Linacre (2012) proposes that a fundamental unidimensionality exists if the eigenvalue of the first component of the residuals is small (usually less than 2.0) and the percentage of the raw explained variance is large (usually over 50% as a rule thumb).
Significant contrasts between normal control and clinical group means were performed using Welch’s t, which is an adaptation of Student’s t-test intended for use with two samples having possible unequal variances. This same test was used to explore the effect of age, gender, and education on functional incapacity score measured by IAFAI. The ability of the IAFAI items to discriminate between the normal controls and the clinical group was done through the probability difference between both the groups (P N − P C; P N is the probability of a person with a mean ability of the control group, P C is the probability of a person with a mean ability of the clinical group).
The most important property of the Rasch model, known as specific objectivity (Andrich, 1988), means that individuals with the same ability (B) will have the same likelihood of correctly answering an item, regardless of whether they belong to groups with different cultures, gender, or native language. The DIF detection procedure in the Rasch model is based on the item characteristic curve (ICC), the proportion of individuals at the same ability level who answer a given item correctly. If the item measures the same ability across groups then, except for random variations, the same proportion is found irrespective of the nature of the group, that is, in the absence of DIF, the ICC in the different groups and the item parameter of difficulty (D) will be invariant. Thus, the hypothesis of the absence of DIF was tested by calculating the difference between the estimators of the item parameter of difficulty for each group (Df
– Dr
), thus controlling for the possible differences between the groups (focal and reference) in the latent variable. Wright and Douglas (1976) found that differences lower than 0.50 logits had negligible consequences regarding the validity of the measure. The t-test with the Bonferroni adjustment (Benjamini & Hochberg, 1995) was used to test the significance and is described as:
where
Df
is the difficulty parameter in focal group,
Dr
is the difficulty parameter in reference group,
SE
Df
is the standard error of difficulty parameter in focal group, and
SE
Dr
is the standard error of difficulty parameter in reference group.
According to this method, the presence of DIF is detected by a difference greater than 0.50 logits and statistically significant (Bonferroni’s correction: p = .05/50 = .001) between the difficulty parameters of the reference group and the focal group.
The Mantel–Haenszel (MH) method was also used for DIF analysis. The procedure is based on an analysis of the contingency tables corresponding to the different levels in which the variable has been divided. For each level j, the odds ratio (α) is calculated as:
where pRj
is the odd of a correct answer to the item in the reference group, and pFj
is the odd of a correct answer to the item in the focal group.
The null hypothesis of the absence of DIF can be tested using the chi-square statistic (MHχ2; Holland & Thayer, 1988), which is distributed as χ2 with one degree of freedom. Testing the absence of DIF on a test involves multiple comparisons (at least 1 for each item). Zwick and Ercikan (1989) found that differences lower than 1.5 Delta-MH (0.64 logits) had negligible consequences as regards the validity of the measures. Thus, the DIF is usually considered substantial if the Delta MH value is classified as C (“large DIF,” according to the criteria of the Educational Testing Service), that is, size higher than 0.64 and significant χ2 statistic (using Bonferroni’s correction).
Results
Participants
The main demographic characteristics of the sample are presented in Table 1, for the comparison and clinical groups. There are a higher percentage of women in both groups. The mean age is also equivalent in both the groups (comparison group: M = 69.92; SD = 7.87; clinical group: M = 65.35; SD = 14.72). A quite higher proportion of the sample has fewer years of formal education (4 years or less).
Response Categories
The original nine categories of the IAFAI do not adequately assess functional incapacity because the category thresholds are disordered. In addition, the average measure and the step calibrations by category do not change monotonically (Table 2). Therefore, the original categories were collapsed into three categories to obtain a better assessment of functional incapacity. However, the three modified categories also failed to produce an optimal assessment of functional incapacity because the central category appears to have less utility than the other two categories due to a compressed range of responses. Therefore, the analysis was repeated for two categories (0 = total independence category; 1 = modified independence/dependence category). According to these categories, functional incapacity in performing some daily living activity represents not only the dependence on others but also the difficulty in performing that daily living activity. By doing this, we try to detect the minor changes in functional capacity with IAFAI scores. These two categories result in a better item fit and fewer items with a moderate misfit (4 misfit items) compared with the model with three categories (8 misfit items). In addition, the person variability and the person separation reliability (PSR) scores are both higher in the model with two categories (PSR = 0.79) compared to the model with three categories (PSR = 0.75).
IAFAI: Analysis of the Categories’ Properties.
Note. IAFAI = Inventário de Avaliação Funcional de Adultos e Idosos; Chosen F(%) = Observed count and percentage of occurrences in each category; average B = The average of the measures that are modeled to produce the responses observed in each category; infit/outfit = The average of the infit and outfit mean squares associated with the responses in each category; step = Rating scale threshold between two adjacent categories K and K − 1. (1) Original categories; (2) Modified categories (Category 0 = original 0 category; Category 1 = 1, 2, and 3 original categories; Category 2 = 4, 5, 6, 7, and 8 original categories).
Dimensionality
The Rasch PCA of the residuals was conducted on the dichotomized items and shows that the percentage of the raw variance that is explained by the Rasch measures is higher (34.0%) than the minimum acceptable value for unidimensionality (20%) proposed by Reckase (1979). The PCA of the residuals also shows no discernible pattern (the first factor explains only 5.7% of the variance in the residuals), which further supports unidimensionality (Tennant & Pallant, 2006). The flexible consideration of the unidimensionality of the IAFAI is also supported by the negligible number of moderate misfit items and no severe misfit items that emerged from the analysis.
Fit Indexes for Items and Persons
The next analyses were conducted on dichotomized items. Three IAFAI items were eliminated (Using a computer, Driving near your area of residence, and Driving far from your area of residence) because these items had a moderate misfit (1.90, 1.98, and 1.98, respectively) and did not apply to the majority of the sample population. The item using a computer was answered by only 168 of the total 803 participants (21% of the total sample). The items driving near your area of residence and driving far from your area of residence were only applicable to 322 (40%) and 314 (39%) subjects, respectively, in the sample population. Once these items were excluded and the optimal number of categories was established, new data analyses were conducted to quantify the model fit and the indicators of validity as well as to determine the item and person parameters (Tables 3 and 4). The final 50 items of the IAFAI are presented in the Appendix.
IAFAI: Statistics of the Items.
Note. IAFAI = Inventário de Avaliação Funcional de Adultos e Idosos; p = proportion of persons that have functional incapacity (Score 1) in performing the daily living activity; RiX = Item-total correlations; D = Difficulty of the items; SE = standard error; infit/outfit = Rasch model adjustment parameters; P N = probability of not doing the daily living activity for B = −2.67 (mean of the normal control group); P C = probability of not doing the daily living activity for B =−1.56 (mean of the Clinical group); P N − P C = probability difference between normal control and clinical groups (discriminant efficiency; highest presented in boldface).
IAFAI: Summary of the Statistics for Items and Persons.
Note. p = proportion of persons who have functional incapacity (Score 1) in performing the daily living activity; RiX = item-total correlations; D = difficulty of the items; SE = standard error; infit/outfit = Rasch model adjustment parameters; X = Number of activities of daily living where the subjects have functional incapacity (Score 1); B = ability of the persons; ISR = item separation reliability; PSR = person separation reliability.
The item difficulty ranges from −2.05 (Item 10) to 1.95 logits (item 15; Figure 1). The item difficulty parameters are estimated with a good reliability (SE: M = 0.12; SD = 0.02; item separation reliability = .98), which means that the IAFAI items were measured with a high precision. The classical difficulty index was also computed (p) to assess the difficulty of the items. In this study, p represents the proportion of individuals who have functional incapacity (Score 1) in performing the daily living activity for each item. The minimum p value was .05 (Item 10) and the maximum p value was .45 (Item 15). The mean p value was .19, which indicates that 19% of the individuals in the sample population have functional incapacity (Score 1) in performing the daily living activity in each items evaluated by IAFAI (p: M = 0.19; SD = 0.10). The low values of the difficulty indexes are associated with the sample characteristics because the majority of the sample includes normal controls. This is also the main reason for some of the floor effect detected.

Inventário de Avaliação Funcional de Adultos e Idosos (IAFAI): item–person map. Each “#” in the person column is 4 persons and each “.” is 1–3.
The item-total correlations are moderately high (RiX: M = 0.50; SD = 0.09) with values between 0.24 (Item 47) and 0.65 (Item 27). The item fit to the model is acceptable, with none of the items resulting in a severe misfit (i.e., an outfit and/or infit values higher than 2.0). A moderate misfit (i.e., outfit and/or infit values between 1.5 and 2.0) occurred in only 4 (8%) of the 50 items. Although these items revealed a moderate misfit, they were applied to a large proportion of the sample and were not excluded from the inventory.
Table 4 presents the main statistical results about the person fit statistics. In the analyzed sample (n = 567 normal controls, 70.6% of the total sample, and n = 236 clinical patients, 29.4% of the sample), the mean number of daily living activities where the participants have functional incapacity (Score 1) is low (X: M = 8.60; SD = 9.20). However, the variability in the levels of functional incapacity is high, with a range between 0 and 46. These results are similar to the results that were observed on the logit scale, in which the sample mean indicated low levels of functional incapacity but high variability in the scores (B: M = −2.34; SD = 1.93; values between −5.52 and 3.88). These results may be attributed to the composition of the sample population, which includes mostly normal control participants. The PSR (PSR = 0.79) score is acceptable and is associated with a high Cronbach’s α (α = .93). There are a negligible number of individuals with a severe misfit (only 2% of the total sample).
Ability to Discriminate Normal Controls From Clinical Conditions
The IAFAI scores are able to discriminate between the comparison group (M = −2.67) and the clinical group (M = −1.56). The difference in the means between the two groups (−1.12) is statistically significant (t = −7.61; p < .01) and is associated with a medium effect size (Cohen’s d = 0.60). The results using the Rasch logistic function equation (Table 3) reveal that some of the IAFAI items are significantly different between the normal controls and the clinical patients. Some of the highest discriminative items (presented in boldface) include items 15, 32, and 43.
Effects of Age, Gender, and Education
The IAFAI scores are higher in older (≥70 years old; n = 392; M = −1.97) than in younger (<70 years old; n = 411; M = −2.70) persons, meaning that higher age is associated with higher levels of functional incapacity. This difference is statistically significant (t = −5.43; p < .01) but is associated with a small effect size (d = 0.39). Concerning gender, males (n = 260; M = −2.58) have better functional status than females (n = 543; M = −2.23). Despite the significant difference (t = −2.34; p = .019), the effect size was small (d = 0.18). The less educated persons (≤4 years; n = 533; M = −1.94) have higher scores (poorer functional status) than the persons with higher education levels (>4 years; n = 252; M = −3.33). This is a statistically significant difference (t = 10.50; p < .01) and associated with a medium effect (d = .79).
DIF
DIF analyses were conducted to explore the likelihood that individual items of the IAFAI might work differently as a function of group (normal controls vs. clinical), age (<70 years vs. ≥70 years), gender (male vs. female), and educational level (≤4 vs. >4). The absence of DIF involves a difference lower than 0.50 logits (without statistical significance) between the estimators of the item parameter of difficulty for each group and a Delta MH value classified as C (i.e., size higher than 0.64 and significant). The results reveal that there are no items of the IAFAI with DIF associated with age, gender, and education. Two items (Number 8 and 15) revealed DIF associated with group being more difficult for the clinical group.
Discussion
An empirical study of the original response categories on the IAFAI was performed with the RSM, which is an extension of the Rasch model that accounts for polytomous items (Bond & Fox, 2007; Wright, 1999). The results did not support the original nine categories on the IAFAI. Therefore, the categories were consolidated by collapsing adjacent categories. The best model included only two categories, with a score of 0 representing the absence of functional incapacity (absence of difficulty or dependence in the execution of the activity of daily living) and a score of 1 representing the presence of functional incapacity (presence of difficulty or dependence in the execution of the activity of daily living). The Rasch analysis of the dichotomized IAFAI items reveals a better reliability and item fit parameters. Despite the existence of different and distinct ways to measure function (difficulty level, dependence/independence level, execution level), which were integrated with IAFAI initial categories, the results demonstrate that more simple methods are preferable. The dichotomous categories not only made an instrument more easily administered but also improved its psychometric characteristics.
Finlayson, Mallinson, and Barbosa (2005) also found that a dichotomous rating scale provided a better fit (with no misfit items and a higher person variability) on the AIM Longitudinal Study compared to a rating scale with five response categories in a sample of 607 older adults (238 living at home without services, 187 living at home with some care services, and 182 living in a nursing home). In addition, a study of the Motor Subscale of the FIM showed that seven categories provided an adequate fit for only 5 of the 13 items; for the remaining items, dichotomous categories provided a better fit (Tennant et al., 2004). Dichotomous categories have been used in several functional assessment instruments, including those measuring BADL (e.g., Katz Index) and IADL (e.g., Disability Assessment for Dementia Scale; DAD). However, several functional assessment instruments use more than two categories—three categories (e.g., the ADL subscale of the Older Americans Resources and Services Program; OARS); four categories (e.g., WHODAS-II); five categories (e.g., Health Assessment Questionnaire); and seven categories (FIM). Despite this evidence, the majority of these instruments have not been analyzed with IRT procedures or the Rasch model. Studies in which these analyses have been conducted have concluded that category reduction is necessary (e.g., the Tennant et al., 2004 study of the FIM) and is associated with an improvement in the psychometric characteristics of the instruments (Tennant et al., 2004).
The analysis in this study revealed the IAFAI essential unidimensionality, which agrees with other studies (Spector & Fleishman, 1998; Finlayson, Mallinson, & Barbosa, 2005; LaPlante, 2010). However, the dimensionality of the items that evaluate the BADLs and IADLs is inconsistent across studies. For example, Breithaupt and McDowell (2001) found a two-factor structure in which ADLs and IADLs items represented different dimensions that were strongly correlated (r = .79). Thomas, Rockwood, and McDowell (1998) found a factor structure with three main factors that were related to “basic self-care,” “medium self-care,” and “complex management.”
In this study, both a low item difficulty value and a low mean value of functional incapacity were found. These values are associated with the distribution of the sample population and indicate a higher number of normal controls than clinical patients in the sample. Other studies have also found that normal controls report greater independence compared with dementia patients (Breithaupt & McDowell, 2001). Similar results have been found in samples that are composed of individuals with several clinical conditions (Morton et al., 2008). Some studies have attempted to determine the point at which the older population begins to experience functional limitations. Community-dwelling older adults appear to start to lose the ability to perform the more complex activities of daily living around age 80 (Royall et al., 2007). This result may explain the lower values of functional incapacity that are observed in community-dwelling older adults when lower age-groups are included in the sample.
IAFAI scores were able to discriminate between comparison group and clinical patients. The items that were associated with the greatest ability to discriminate between these two groups include Item 15 (BADL item), Item 32 (IADL item), and Item 43 (IADL item). Breithaupt and McDowell (2001) found that BADLs items (Getting out of bed, Toilet transfer, and Dressing), and IADLs items (Shopping, Getting places, and Preparing meals) were the best discriminators between dementia patients and normal controls in a sample of 1,364 elderly Canadians from the Canadian Study of Health and Aging (Breithaupt & McDowell, 2001). These results are not directly comparable to the present study because the Canadian Study of Health and Aging considered a specific clinical group (Dementia) instead of a more general clinical population. However, the ability to discriminate between comparison group and a clinical group by IAFAI scores agrees with several studies that have found a decline in functional capacity in distinct clinical conditions, including depression (Wada et al., 2005), schizophrenia (Green, Kern, & Heaton, 2004), mild cognitive impairment (Yeh et al., 2011), dementia (Sauvaget, Yamada, Fujiwara, Sasaki, & Mimori, 2002), and stroke (Landi et al., 2006). Although these clinical conditions have been aggregated in this study, each condition has been associated with functional decline in previous studies. Additional analyses revealed that IAFAI scores are associated with age, gender, and education—poorer functional status was observed in older persons, females, and with lower educational levels. This was also observed in previous studies (Østerås et al., 2007; Palacios-Ceña et al., 2012).
DIF analyses showed that the items have invariance properties for young and older adults, males and females, lower and higher educated individuals, as no items showed age-, gender-, and education-related DIF. Only 2 items revealed DIF associated with group. According to this, the IAFAI scores are able to measure the same level of functional incapacity in young and older adults, males and females, higher and lower educated individuals, normal controls, and clinical conditions (both neurological and psychiatric). Despite this, other studies revealed that men are more likely to need help in some activities (preparing meals, doing laundry, and taking medications; Niti, Ng, Chiam, & Kua, 2007), although there were some evidence against the DIF effect related to gender in items related to shoulder functional status (Crane, Hart, Gibbons, & Cook, 2006). The DIF effect related to age was also detected in some studies (LaPlante, 2010; Niti et al., 2007). For example, older elderly are more likely to need help in preparing meals (Niti et al., 2007). Additionally, LaPlante (2010) concludes that DIF effects by age are balanced and do not bias the measure.
Conclusions
In Portugal, the absence of systematic research in adapting and validating instruments for the functional capacity assessment led to the development of the IAFAI. Specifically, only a few functional assessment instruments have some type of validation studies for Portuguese population, for example, the Barthel Index (Araújo, Pais Ribeiro, Oliveira, & Pinto, 2007), and the Lawton & Brody Instrumental Activities of Daily Living Scale (Araújo, Pais Ribeiro, Oliveira, Pinto, & Martins, 2008). The main advantages of the IAFAI were (i) the exam of both BADLs and IADLs, (ii) the content of the items were appropriate to the Portuguese population, and (iii) with several validation and normalization studies to demonstrate its psychometric characteristics. In order to accomplish this, we already performed an initial exploratory study (Sousa et al., in press). In this article, we intended to study the psychometric characteristics of the IAFAI and develop the final version of the inventory regarding their items and response categories. The results of this study suggest that the IAFAI is a comprehensive and useful instrument to assess functional incapacity because it reveals good values of internal consistency, results in adequate person and item separation reliability indexes, and is able to differentiate between normal controls and clinical patients. The consolidation of the original IAFAI categories into dichotomous categories demonstrates a better model fit and increases the reliability indexes. The DIF analysis also demonstrated the IAFAI generalized validity according to important variables (group, age, gender, and education).
Future studies should validate the IAFAI in specific clinical conditions, such as traumatic brain injury, mild cognitive impairment, and Alzheimer’s disease as well as establish the normative parameters for the Portuguese population considering important variables such as gender, age, and medical conditions. Additional studies should consider other psychometric studies regarding rater reliability, test–retest stability, and follow-up studies in some clinical conditions (mild cognitive impairment and dementia) as well as the development of a short form.
Footnotes
Appendix A
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Portuguese Science and Technology Foundation through a PhD grant (SFRH/BD/47677/2008) that was awarded to the first author and by the Calouste Gulbenkian Foundation through the project “Validation of memory tests, functional assessment and quality of life inventories” (Process 74569; SDH 22 Neurosciences).
