Abstract
To address the dearth of literature on outcomes for autistic individuals with significant intellectual disability, researchers require validated measures to use in research. This study examined the psychometric properties of PROMIS quality-of-life caregiver-proxy scales included in the PROMIS Autism Battery–Lifespan among autistic children who are minimally verbal and with intellectual disability (MVID). We examined basic psychometric properties of the PROMIS caregiver-proxy scales and tested the scales for measurement invariance between groups of autistic children who are minimally verbal with intellectual disability and those without signficant intellectuatl disability (N = 448). We also descriptively examined feedback from caregivers regarding the appropriateness of the questions to capture meaningful outcomes for their autistic children who are minimally verbal with intellectual disability. Results indicated that some PROMIS caregiver-proxy scales (Anger, Positive Affect, and Life Satisfaction) exhibited strong psychometric evidence and content validity, but many other scales either did not demonstrate measurement invariance between groups or included a high proportion of items endorsed by caregivers as not applicable for their minimally verbal autistic child. Our findings emphasize the need for continued work developing appropriate measures for capturing meaningful outcomes among minimally verbal autistic people with significant intellectual disability.
Plain Language Summary
Researchers need reliable tools to study outcomes for autistic individuals with significant intellectual disability. This study looked at the PROMIS caregiver-proxy scales from the PROMIS Autism Battery–Lifespan for minimally verbal autistic children with intellectual disability. These scales were made to capture aspects of quality of life important for people on the autism spectrum. We compared responses from parents of autistic children with and without significant cognitive and language issues and checked if the questions were suitable for children with high support needs. Results showed that some scales (Anger, Positive Affect, and Life Satisfaction) worked well, but others did not work as well for this group. Our study highlights the need to develop better tools to measure meaningful outcomes for autistic people with the highest support needs.
Introduction
Measuring quality-of-life (QoL) outcomes is an identified priority of the autistic community (Pellicano et al., 2014). QoL is defined by the World Health Organization as “an individual’s perception of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards and, concerns” (World Health Organization, 1995). For individuals with intellectual and developmental disabilities, QoL is conceptualized as a multi-dimensional construct that includes the same factors and relationships as people without disabilities and additionally includes an emphasis on self-determination, social inclusion, and individualized supports (Schalock et al., 2002, 2008). Numerous studies report lower QoL among autistic people than non-autistic peers (Ikeda et al., 2014), including individuals with other mental and physical health diagnoses (Barneveld et al., 2014; Cottenceau et al., 2012). Autistic people and their caregivers report experiencing significantly poorer QoL than non-autistic individuals across the lifespan (Graham Holmes et al., 2020), including childhood (Kuhlthau et al., 2018; Lee et al., 2008), adolescence (Cottenceau et al., 2012; Ikeda et al., 2014; Shipman et al., 2011), and adulthood (Ayres et al., 2018; Kamio et al., 2013).
Much of this existing QoL literature in autism research has largely focused on verbal autistic individuals with average to above-average cognitive abilities. In the limited research that exists, autistic children with intellectual disability (ID) were rated by professionals as having lower QoL than children with only ID (Arias et al., 2018), demonstrating a potential compounding effect of membership in multiple disability groups. However, many studies examining autistic QoL do not report the percentage of participants with co-occurring ID, or include intelligence quotient (IQ)-based exclusion criteria that specifically exclude participants under a given IQ (e.g. Barneveld et al., 2014; Lin & Huang, 2019). Approximately 27% of autistic children have minimal verbal communication and significant ID (Hughes et al., 2023), though estimates vary widely (Clarke et al., 2024). Thus, a significant portion of the autistic population is excluded from research on this important outcome (Clarke et al., 2024).
To develop services and supports that improve QoL for autistic people, researchers must be able to confidently use validated QoL measurement tools for all autistic people. To address this, our study focused on a population of autistic people with ID who are minimally verbal or non-verbal (Koegel et al., 2020). Many minimally or non-verbal autistic children communicate using body language, gestures (La Valle et al., 2021), and single-words (La Valle et al., 2020) to agree, disagree, respond, and request, but do not communicate using complex, structured language. Here, we intend “minimally verbal” to describe those experiencing significant barriers expressing thoughts, emotions, ideas, and preferences to others using structured language. Although the term “profound autism” has been proposed to describe autistic individuals with an IQ < 50 or significant language barriers (Lord et al., 2022; Wachtel et al., 2024), some have raised conceptual and clinical issues with this term (Kapp, 2023; Woods et al., 2023). Thus, we use the acronym “MVID” to describe autistic children with minimal verbal language and ID in this study.
There is limited QoL research focused on MVID autistic people. One potentially limiting factor for conducting meaningful research on the QoL of MVID autistic people is how to measure this construct. To do such work, researchers require validated tools to measure QoL among MVID autistic people. Although numerous self-report scales for measuring the subjective experience of QoL exist, even those that were adapted for people with ID, such as the WHOQOL-DIS (Power et al., 2010), require an individual to be able to respond to written or verbal questions. Given the barriers in verbal expression for minimally verbal autistic people with significant ID, proxy reports, which are completed on behalf of the individual by someone who knows them well such as a parent or caregiver, are the current best-practice to understand QoL for this population (Schiltz et al., 2024). To our knowledge, only one proxy QoL scale, the KidsLife Scale (Gómez et al., 2016), has been specifically tested within a sample of autistic people with co-occurring ID (Gómez et al., 2020; Stone et al., 2020). This scale was developed in Spain, specifically for ID, leverages proxy ratings, and was recently translated for English speakers (Stone et al., 2020).
An alternative to developing new measures to capture QoL among MVID autistic people is to evaluate whether existing scales with strong psychometric properties in the general population could work for MVID autistic people. This could increase the number of tools available and provide more opportunities to compare scores across different groups. As such, this study tested selected subscales of a well-validated proxy measure of QoL—the Patient Reported Outcomes Measurement Information System (PROMIS®). PROMIS pediatric scales, available through the National Institutes of Health (Reeve et al., 2021), each assess a unidimensional QOL construct and were validated based on U.S. general and clinical population samples (Cella et al., 2010). In this study, we evaluated the subscales included in the PROMIS Autism Battery–Lifespan (PAB-L; Graham Holmes et al., 2020). To develop the PAB-L, the authors consulted with panels of stakeholders and autism providers, who provided feedback on the selected scales and added others. Thus, the PAB-L is a selection of PROMIS scales determined to be most relevant to assessing QoL for autistic individuals. The PAB-L demonstrated good reliability, feasibility, and acceptability among a large (N = 912) sample of autistic individuals and their families (Graham Holmes et al., 2020). We examined the PAB-L Parent Proxy’s utility for MVID autistic children by assessing its psychometric properties across two different groups of autistic children (MVID and non-MVID) and its content validity.
Method
Participants
Parents of autistic children ages 5–17 years participated via online surveys. We recruited participants in two waves: the first (n = 372) between November 2017 and June 2018 through The Children’s Hospital of Philadelphia and the Interactive Autism Network (IAN), and the second (n = 76) between January and October 2023 through the Autism Science Foundation, The Children’s Hospital of Philadelphia, and the Phelan-McDermid Syndrome Foundation. The second wave specifically enrolled families with MVID autistic children. Duplicate participants (n = 2) were removed. We made numerous efforts to ensure the response validity and guard against bots and scammers in survey data (Bottini et al., 2025). All participants provided written informed consent, and the study was approved by the institutional review board at The Children’s Hospital of Philadelphia.
We used conservative eligibility criteria to identify our group of MVID autistic children. Inclusion criteria included (1) diagnosed with ID or Developmental Delay, (2) below-age level language skills, and (3) speech skills at or below phrase speech. Participants who did not meet all criteria were assigned to the Non-MVID Group. This resulted in some participants with either an ID diagnosis (n = 18) or limited speech (n = 37) being included in the Non-MVID Group, but significant majority of Non-MVID participants had no cognitive disability and spoke in full sentences (Table 1).
Demographics for autistic children and caregivers.
Pearson’s χ2 test.
Two-sample t-test.
Not mutually exclusive categories.
Demographic characteristics for participants and their caregivers are reported in Table 1. MVID and Non-MVID participants did not differ in child age (p = 0.16), gender (p = 0.28), or ethnicity (p = 0.50). Compared with the Non-MVID Group, the MVID Group had a higher proportion of participants who identified as non-White, having an ID diagnosis, below age-level language, and less-advanced speech skills (all ps < 0.05). The MVID Group had a higher proportion of participants in Special Education and performing below grade-level than the Non-MVID Group (ps < 0.01). The MVID Group also had a higher proportion of non-White caregivers than the Non-MVID Group (p < 0.01). Caregivers in both groups did not differ in gender, ethnicity, relationship to child, or education levels (all ps > 0.05).
Measure
All caregivers completed the Parent-Proxy versions of the PAB-L (Graham Holmes et al., 2020). Participants use Likert-type scale responses from 1 to 5, with higher scores indicating higher endorsement of the construct, such that higher scores for a “positive” scale (e.g. Positive Affect) indicate more desired outcomes and higher scores for a “negative” scale (e.g. Sleep Impairment) indicate less desired outcomes. The PAB-L includes 13 parent proxy scales, theoretically (not empirically) organized into domains related to Emotional Distress, Subjective Well-Being, Health, and Social Functioning (Irwin et al., 2012). Due to initial survey feedback from parents of children with high support needs, a Not Applicable (N/A) checkbox was added in the second wave of data collection for parents of minimally verbal children. This allowed parents to indicate when a question was not applicable to their child’s functioning, while still requiring item completion. All participants could also provide additional feedback about the surveys in a free-response text box.
Data analysis
PROMIS scale psychometric analyses
We first examined internal reliability (Cronbach’s alpha) for each scale. Next, we conducted confirmatory factor analyses (CFAs) on baseline (i.e. unidimensional) models using the lavaan (Rosseel, 2012) and semTools (Jorgensen et al., 2021) packages in R Studio. We identified the models by setting factor variance to 1 and used robust diagonally weighted least squares estimator and pairwise deletion to handle missing values due to low levels of missingness the ordinal nature of PROMIS data (Xia & Yang, 2019). We evaluated model fit using the robust categorical maximum-likelihood Comparative Fit Index (CFIcML), which is the most commonly reported alternative fit index (Putnick & Bornstein, 2016). When a baseline model did not demonstrate acceptable fit (CFIcML ⩽ 0.90), we examined model modification indices, reviewed the theoretical justification for suggested modifications, and implemented modifications to establish acceptable fit. Modifications were made until model fit either met the acceptability cut-off, or model fit decreased with additional modifications, indicating an overfitting of the model.
After evaluating baseline model fit, we assessed for measurement invariance between MVID and Non-MVID Groups by testing a series of increasingly restrictive multigroup CFA models (Kline, 2015; van de Schoot et al., 2012) for each scale separately. Measurement invariance analyses evaluate whether a scale performs similarly between two groups of respondents, and in this study we used these analyses to understand whether the PROMIS parent-proxy scales function in a similar way for MVID and non-MVID autistic children. We first tested configural invariance, allowing all factor loadings and item intercepts to vary freely, which indicates whether the basic structure of the scale is the same across groups. Scales that met configural invariance were then tested for metric invariance by constraining factor loadings but allowing item intercepts to vary. Metric invariance indicates that the relationships between individual items and the underlying construct is the same for both groups. Finally, for scales that attained metric invariance, we tested scalar invariance by constraining both factor loadings and item intercepts across groups. Scalar invariance indicates that individuals with the same level of underlying trait will score the same, regardless of their group membership.
Changes in model fit were evaluated using permutation testing with 1000 permuted data sets (Jorgensen et al., 2018), requiring both Δχ2 and the ΔCFI to be nonsignificant (p ⩾ 0.01) to consider the model invariant. Permutation testing is preferred over benchmark values to evaluate changes in model fit because the method provides better Type-I error control and accounts for potentially meaningful factors of the dataset such as sample size, difference in sample size between groups, and level of invariance (Jorgensen et al., 2018; Kite et al., 2018). If configural invariance was not established, we reviewed and implemented model modification indices to evaluate if additional modifications were necessary to establish configural invariance (Table S1). For the Positive Affect and Sleep Impairment Scales adjacent response categories were collapsed (Table S2) due to low endorsement in the MVID sample (Colvin & Gorgun, 2020; Tsai et al., 2024).
“N/A” analysis
We used descriptive statistics to summarize the proportion of N/A responses for each scale in the second wave of MVID participants (n = 76). Due to the lack of a specific prompt and the small number optional written responses, we did not conduct formal coding of thematic analysis but reviewed the free-response text to identify preliminary common themes across caregiver responses. The first author and a trained research assistant first read all available free-response text and noted common ideas expressed by caregivers. The authorship team then reviewed these common ideas and provided feedback to consolidate to three distinct preliminary themes.
Data availability
De-identified data from this study are available upon request from the corresponding author and subject to an institutional data use agreement. Analysis code for the analyses presented in this manuscript can be found on Open Science Framework at https://osf.io/yu4k6/?view_only = 0c3c9fde47364c95aa91dd3e56d08c14.
Community involvement
Our team of autistic and non-autistic researchers collaborated on the research question, study design, methods, implementation, and interpretation of results, which is reflected in our authorship.
Results
Internal reliability
The PROMIS scales demonstrated good to excellent internal consistency (alpha range: 0.84–0.95) across both groups (Table 2).
Domain summaries by group.
CFA
Nine of the 13 PROMIS scales required minor model modifications (correlating error terms) to meet acceptable fit cut-offs (Table S1). Following modifications, baseline models for the full combined sample demonstrated adequate model fit, except for the Life Satisfaction Scale (Table 3). Additional modifications did not improve model fit for this scale.
Confirmatory factor analysis model results—combined sample.
df = Degrees of freedom, CFI = comparative fit index, cML = categorical maximum likelihood estimator.
Measurement invariance
Results of the measurement invariance analyses are presented in Table 4. A qualitative summary of the measurement invariance results is presented in Table 5.
Model comparison statistics (w/ Permutation).
ΔCFI = change in Comparative Fit Index.
Significant p-value < 0.01.
Results summary.
CFI values are based on the combined sample of participants
Percentage values are based on responses from the second wave of MVID participants (n = 76).
Emotional distress domain
The Anger and Psychological Stress scales demonstrated configural, metric, and scalar invariance (i.e. nonsignificant (ps > 0.01) permutation testing with increasing equality constraints). The Anxiety scale demonstrated metric but not scalar invariance. Specifically, the Δχ2and ΔCFI values for the configural and metric models were nonsignificant, but the values were significant (Δχ2 p = 0.003; ΔCFI values p = 0.003) when testing for scalar invariance. The Depression scale did not meet for configural invariance, even subsequent model modifications; thus, we returned to the model with the best fit (one modification) to report results. The Depression scale exhibited significant Δχ2 (p = 0.006) and ΔCFI (p = 0.006) when testing for configural invariance.
Subjective well-being domain
The Life Satisfaction and Positive Affect scales both demonstrated configural, metric, and scalar invariance (i.e. nonsignificant (ps > 0.01) permutation testing with increasing equality constraints). However, we note that the Life Satisfaction scale did not demonstrate adequate baseline fit (see Table 3) so measurement invariance results are interpreted with extreme caution. The Meaning & Purpose scale demonstrated metric but not scalar invariance. Specifically, the Δχ2 and ΔCFI values for the configural and metric models for the Meaning & Purpose scale were nonsignificant, but the values were significant (Δχ2 p < 0.001; ΔCFI p < 0.001) for scalar invariance.
Health domain
The Sleep Disturbance, Physical Activity, and Cognitive Function scales demonstrated metric invariance but not scalar invariance. Specifically, the Δχ2 and ΔCFI values for the configural and metric models for the Sleep Disturbance, Physical Activity, and Cognitive Function scales were nonsignificant; however, the scales exhibited significant Δχ2 (Sleep Disturbance p < 0.001; Physical Activity p = 0.009; Cognitive Function p < 0.001) and ΔCFI values (Sleep Disturbance p < 0.001; Physical Activity p = 0.009; Cognitive Function p < 0.001) when testing for scalar invariance. The Sleep Impairment scale did not meet for configural invariance (ΔCFI p = 0.008), even after subsequent model modifications; thus, we returned to the original baseline model (with one modification) to report results.
Relationship domain
The Peer Relationships scale demonstrated configural but not metric invariance. Specifically, the Δχ2 and ΔCFI values for the configural model for the scale were nonsignificant, but the values were significant (Δχ2 p = 0.006; ΔCFI p = 0.002) for metric invariance. The Family Relationships scale demonstrated metric but not scalar invariance. Specifically, the Δχ2 and ΔCFI values for the configural and metric models for the Family Relationships scale were nonsignificant, but the values were significant (Δχ2 p < 0.001; ΔCFI p < 0.001) for scalar invariance.
Descriptive and qualitative feedback from MVID caregivers
Across all PAB-L scale items, caregivers in the second wave of MVID participants (n = 76) endorsed 16.25% of items as Not Applicable for their child. The percentages of items on each scale caregivers endorsed as Not Applicable for their child are presented in Table 5. Scales that included more behaviorally-based items demonstrated a lower proportion of items endorsed as N/A, including Sleep Impairment (4.79%; for example, “My child had trouble staying awake during the day”), Sleep Disturbance (5.07%; for example, “It took my child a long time to fall asleep”), and Anger (8.11%; for example, “My child was so angry he or she felt like yelling at somebody”). In general, scales that probed for internal experiences demonstrated higher proportions of items endorsed as N/A, including Cognitive Function (35.80%; for example, “My child has to work really hard to pay attention or he or she makes mistakes”), Meaning and Purpose (34.80%; for example, “My child thinks his or her life is filled with meaning”), Peer Relationships (24.13%; for example, “My child felt accepted by other kids his or her age”), and Psychological Stress (23.65%; for example, “My child felt he or she had too much going on”). Although the Positive Affect scale included more subjective emotional language (e.g. “My child felt joyful”), it demonstrated a low proportion (7.09%) of items endorsed as N/A by caregivers.
In the free-response question at the end of the survey, caregivers gave optional feedback about their experience completing the questionnaires. 52% of caregivers of MVID children (across both data collection waves, n = 121) provided a response to the open-ended question for feedback on their study experience. Across caregiver responses, we noted several common themes. First, caregivers noted frustration with the design and wording of the survey items. Many parents reported difficulties answering many of the items in their current format:
“I feel the survey had some unfair questions, especially for nonverbal kiddos, and I had to select ‘never’ and also checked ‘not applicable’ for a lot of the questions.”
“It was hard to answer a lot of these as my child has very limited communication and I don’t even know that she’s capable of understanding a lot of these feelings and emotions. It didn’t feel like an appropriate questionnaire for children with limited communication skills.”
Second, caregivers expressed uncertainties and guesswork when answering the questions for their child due to their child not being able to verbally express many of the concepts that were probed for in the items:
“My child has practically no functional language so a lot of the questions are hard to answer accurately from his point of view. When you have a non verbal, severely autistic kid, there is a lot of guessing and trial & error involved.”
“For a child with almost no ability to communicate, the questionnaires were difficult. I don’t know how my son feels about most things, so many of my answers were absolute guesses.”
Relatedly, caregivers expressed the need for items to include concrete, behaviorally-based wording to evaluate QoL for their children:
“Much of the survey is difficult to answer for a child that is both autistic and intellectually disabled. It is one thing to ask a parents opinion about a child’s behavior, habits etc. But the survey also asks the parent to speculate on what their child is thinking and characterize their child’s feelings about many issues that the child often cannot themselves articulate due to limited verbal abilities and cognitive limitations that make it difficult for a child to conceptualize their future, their goals or even their feelings etc.”
“It’s extremely difficult for me to answer questions about my son’s state of mind. He is nonverbal and his communication on his AAC is very limited. I can only read his emotions through nonverbal cues. Most of these questions require a level of communication he just doesn’t have. It’s easier for me to answers questions about what I think about his emotional state, then questions like “my child thinks.” I don’t know what he thinks.”
Discussion
This study provides a critical examination of how well PAB-L scales function for caregivers who are providing a proxy report for an MVID autistic child. Although there is a substantial percentage of autistic people who are minimally verbal with ID (Clarke et al., 2024; Hughes et al., 2023), this population has been largely underrepresented in autism research (Koegel et al., 2020). The limited number of validated measures capturing meaningful outcomes for this population contributes to a lack of inclusion of MVID autistic people in research. As valid measurement is a necessary foundation for psychological, or indeed any, science (Flake & Fried, 2020), it is crucial to determine whether existing QoL measures are appropriate and develop new ones if needed. Toward this goal, we tested the degree to which 13 PROMIS parent-proxy scales demonstrated measurement invariance between MVID and non-MVID autistic children. A qualitative summary of results is presented in Table 5.
Regarding the scales’ psychometric properties, our results highlight varying degrees of measurement invariance across the PAB-L scales. The Life Satisfaction scale did not demonstrate adequate baseline model fit in the combined sample or in either the MVID or Non-MVID Groups separately. Although the scale demonstrated scalar invariance in subsequent analyses, a poor baseline model fit indicates that researchers and clinicians should avoid using the scale to measure or interpret parent-proxy reported life satisfaction among autistic children. We could not establish configural invariance for two of the 13 scales—Depression and Sleep Impairment scales—despite multiple iterations of model modifications. These results indicate that the underlying measurement structure of the scales differs between autistic children with and without significant language and cognitive impairments. Although it was beyond the scope of the current paper to explore whether different measurement structures (e.g. unidimensional versus two-factor models) would provide more appropriate fits for these scales, the data clearly suggest that researchers and clinicians should strongly consider not interpreting scores on the Depression and Sleep Impairment Parent Proxy scales for minimally verbal autistic children. The Peer Relationships scale demonstrated configural, but not metric or scalar invariance between the groups, indicating that these scales also function differently between autistic children with and without significant language and cognitive impairments and generally should be avoided for research and clinical purposes.
Six scales—Anxiety, Meaning & Purpose, Sleep Disturbance, Physical Activity, Cognitive Function, and Family Relationships—demonstrated configural and metric, but not scalar, invariance between the parent-proxy report for our two groups. These results suggest that the scales’ items capture equivalent constructs across autistic children with and without significant language and cognitive impairments, but that direct score comparisons between the two groups would not be advised because equivalent scores on the scales cannot be interpreted as equivalent levels of the trait between the groups. Finally, three scales—Anger, Psychological Stress, and Positive Affect—demonstrated configural, metric, and scalar invariance. These results indicate that the constructs are measured in a similar way across autistic children with and without significant language and cognitive impairments and the scores from these scales can be compared across groups.
We additionally collected qualitative responses from caregivers of MVID autistic children to evaluate the content validity of the items on the PROMIS scales. Descriptively, caregivers indicated varying degrees of item acceptability across the 13 scales (see Table 5). We found a wide range (4.79%-35.62%) in the proportion of items on each scale that were endorsed as not applicable for MVID autistic children. Across all scales, 16.25% of items were endorsed by caregivers as not applicable for their child. In free-response text, caregivers expressed frustration about the wording of items and uncertainty about how to answer items that probed for concepts such as what their child thinks or feels. Participants reported preferring to answer questions about their child’s observable behaviors. This feedback generally aligned with the proportion of items endorsed as not applicable, such that scales with more behaviorally-based items had lower proportions of items endorsed as not applicable, and scales that probed for children’s thoughts and internal experiences had higher proportions of items endorsed as not applicable.
We summarize the strength of the evidence for using each of these scales by integrating our measurement invariance analysis results with the qualitative caregiver feedback in Table 5. Only two out of 13 scales (Anger and Positive Affect) demonstrated strong evidence for their use in clinical and research capacities to capture relevant QoL domains among MVID autistic children. The Anger and Positive Affect scales both demonstrated acceptable model fit, scalar invariance between our two groups, and fewer caregivers reporting the item as “not applicable” for their child, which was correlated with more observable, behaviorally-based items. Three scales (Depressive Symptoms, Life Satisfaction, and Peer Relationships) demonstrated weak evidence across both psychometric analyses and caregiver feedback. Given the poor evidence for these measures, we do not recommend using the PROMIS Depressive Symptoms, Life Satisfaction, or Peer Relationships scale for MVID autistic children in either research or clinical settings. Three scales (Sleep Disturbance, Physical Activity, and Family Relationships) demonstrated fair evidence across the measurement invariance analyses and the feedback on item content provided by caregivers (see Table 5). We defined fair evidence as scales that exhibit both metric invariance and a low percentage of items endorsed as not applicable by caregivers of MVID autistic children. These data support the use of these scales for capturing the respective constructs among MVID autistic children, though not for making between-group comparisons with other autistic children without significant ID.
Notably, there were numerous times when the conclusions drawn from the measurement invariance analyses and the feedback on item content provided by caregivers differed. For example, although the Psychological Stress scale demonstrated scalar invariance between groups of autistic children, almost one quarter of the items were endorsed as not applicable by caregivers of MVID children. Demonstrating the opposite pattern of mismatch, although we could not establish configural invariance between groups of autistic children on the Sleep Impairment scale, caregivers of MVID autistic children did not endorse items as not applicable for their child.
These examples highlight the numerous dimensions of measurement that scale users must balance when attempting to capture a meaningful outcome in any population. In general, we caution researchers and clinicians against interpreting scores from scales that caregivers indicated had a large proportion of items that were not appliable for their MIVD autistic child. These include both scales with poor psychometric properties (i.e. Depressive Symptoms and Peer Relationships) and those that demonstrated some degree of measurement invariance between groups of autistic children (i.e. Psychological Stress, Meaning & Purpose, and Cognitive Function). When caregivers of MVID autistic children provide feedback that items are not applicable for understanding their child’s experiences, it is a strong indication that a parent-proxy scale of a child’s QoL should be revised. Although we could not conduct a formal qualitative analysis, multiple caregivers in our study suggested that more behaviorally-based language would make it easier for them to answer items confidently for their child. There is a strong need for additional measurement research to capture the full range of important QoL-related lived experiences of MVID autistic children.
Measurement considerations for minimally verbal autistic people
To heed the important calls for research to be more inclusive of autistic people with the highest support needs, researchers must be thoughtful, creative, and rigorous in their measurement of whatever constructs they seek to study. Participants are commonly excluded from research based on language level or IQ (Russell et al., 2019). Although there are undoubtedly many reasons why a researcher may choose to focus on a subset of the autistic population, it is undeniable that it is easier to conduct research when there are existing validated measures, tasks, and experimental methodology to use in a study. Using a proxy report to capture outcomes is one way that researchers can begin to be more inclusive of MVID autistic people, if those measures have been validated for this population. Where validated measures do not exist, adjustments to existing patient reported outcome measures can be made and tested. As suggested by parents in this study and other experts (Nicolaidis et al., 2020), behaviorally-based items that offer concrete language are preferred.
The primary question of measurement invariance analyses is whether a particular scale functions in the same way across two groups. A meaningful question that such analyses cannot answer is whether the underlying construct fundamentally differs between groups of people. For example, what considerations matter for evaluating peer relationships for a 17-year-old without autism and ID verses an autistic peer with ID who uses an augmentative/alternative communication (AAC) device? When the relevant factors of the underlying constructs differ between groups, different scales are likely needed. Importantly, the results of our study do not indicate that the PROMIS domains themselves are not important for understanding QoL among autistic children. Rather, caregivers in the MVID Group indicated that the items used to capture those domains are not relevant for their child with high support needs.
Limitations
We interpret our data in the context of several important limitations. First, our study included a wide age range of autistic children (5–17 years), and it is possible that results may differ based on developmental stage. Although there was not a significant difference between the ages of our MVID and Non-MVID Groups, future research might explore how parents of younger verses older minimally verbal autistic children perceive proxy-report measures for prioritized outcomes such as QoL. Second, our sample of caregiver reporters was predominantly White, non-Hispanic, and female. We recognize the need for continued representation of non-majority experiences in autism research to more fully support all autistic people and their families. Third, quantitative data were not available regarding participants’ intellectual ability, verbal ability, or autistic traits. Although this information (e.g. IQ scores, a measure of autistic traits) may have been helpful for more nuanced analyses, we were limited by the caregiver-report nature of the study and only used categorical reports of these constructs (see Table 1). Finally, we only collected data regarding caregiver’s perceptions of the applicability of the PROMIS items for the second wave of participants. While the first wave of participants was a mix of caregivers in both the MVID and Non-MVID Groups, the second wave was exclusively caregivers in the MVID group. Thus, we are not able to draw group comparisons for this metric.
Conclusion
The inclusion of autistic people with profound ID is an imperative for inclusive, comprehensive, and clinically meaningful autism research; yet this imperative presents some methodological and measurement challenges for researchers to address. Our results indicate researchers and clinicians can confidently use some PROMIS caregiver-proxy reports of child outcomes, including Anger, Positive Affect, and Life Satisfaction scales for minimally verbal children with ID. In contrast, many other PROMIS caregiver-proxy scales either did not demonstrate measurement invariance between groups or included a high proportion of items that were endorsed by caregivers as not applicable to understanding their child. While our two types of analyses (measurement invariance and applicability) address different types of research questions, they each point to a need for additional work to capture these meaningful outcomes among autistic people with significant cognitive and language impairment. Future measures would benefit from close collaboration with caregivers and minimally verbal autistic people to ensure that the topics and wording are consistent with their priorities and lived experiences of QoL.
Supplemental Material
sj-docx-1-aut-10.1177_13623613251394995 – Supplemental material for Understanding and measuring caregiver-reported quality of life among minimally verbal autistic children with intellectual disability
Supplemental material, sj-docx-1-aut-10.1177_13623613251394995 for Understanding and measuring caregiver-reported quality of life among minimally verbal autistic children with intellectual disability by Elizabeth A. Kaplan-Kahn, Rachel M. Benecke, Laura Graham Holmes and Judith S. Miller in Autism
Footnotes
Acknowledgements
We would like to thank Alyssa Clayton for her contribution to the descriptive and qualitative analyses for this project. We would also like to thank the Phelan-McDermid Syndrome Foundation for their partnership in recruitment, as well as all the caregivers who participated in this research.
Author contributions
Ethical considerations
All recruitment and study procedures were approved by the Institutional Review Board at the Children’s Hospital of Philadelphia.
Consent to participate
All participants provided written informed consent prior to their participation in the study.
Consent for publication
Not applicable.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Autism Science Foundation and the McMorris Family Foundation.
Data availability statement
The de-identified data sets generated and analyzed for this study are available from the corresponding author on reasonable request. The data supporting the findings of this study are stored in a secure repository, and access can be granted upon valid request for further research purposes.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
