Abstract
Purpose:
We aimed to establish content validity and assess the psychometric properties of the FACE-Q Craniofacial Module, a patient-reported outcome measure, for use in pediatric and adolescent patients with head and neck cancer (HNC).
Methods:
To establish content validity (Part 1), between June 2017 and August 2019, cognitive interviews were conducted with survivors of pediatric HNC (n = 15), and input was obtained from clinical experts (n = 21). To examine item and scale performance (Part 2), Rasch Measurement Theory (RMT) analysis was performed using data from two international studies (n = 121).
Results:
Part 1: Qualitative data from 15 survivors and input from 21 experts provided evidence to support the use of the FACE-Q Craniofacial Module in pediatric HNC. Part 2: The field-test study sample included 121 survivors of pediatric HNC. RMT analysis provided evidence of reliability and validity for 10 FACE-Q scales. Data for each scale fit the RMT model. Scale reliability was high, with Person Separation Index and Cronbach's alpha values ≥0.82 for 9 scales. Mean scores on the Appearance, Psychological, and Social scales were higher for those who liked aspects of their face more. For participants with (vs. without) a facial difference, mean scores were lower for the Face, Jaws, Psychological, and Social scales.
Conclusion:
The FACE-Q Craniofacial Module evidenced reliability and validity for HNC survivors aged 8–29 years and can be used in research and clinical care to measure quality of life of pediatric survivors with HNC.
Introduction
Head and neck cancers (HNC), including Hodgkin lymphoma, rhabdomyosarcoma, thyroid carcinoma, and nasopharyngeal carcinoma, are estimated to account for between 0.25% and 15% of cancers diagnosed within the pediatric and adolescent population. 1 HNC and its treatment may impose a significant detriment to patients health-related quality of life (HRQL) and leave patients with a visible facial difference and problems with facial function (e.g., ability to show facial expression or to eat and drink).2,3 Survivors of HNC often require rehabilitative treatment and reconstructive surgery to recuperate speech, swallowing, maxillofacial function, and to restore facial appearance. 4 Tools are needed to measure outcomes from the patient perspective, to capture their experience, and how they feel the condition impacts their HRQL and function. 5 For a patient-reported outcome measure (PROM) to be useful, its validity and reliability must be ascertained within the target population.6,7
The FACE-Q Craniofacial Module is a PROM developed for children and young adults aged 8–29 years with facial differences. 8 This module was developed because it was reported that existing PROMs used in this population lacked content validity, missing content related to appearance and facial function.9,10 The FACE-Q Craniofacial Module was internationally field-tested in a sample of 2233 children and young adults, including HNC, from 12 countries.11–13 Although the FACE-Q Craniofacial Module may be valuable for assessing outcomes after diagnosis and treatment of HNC, the qualitative phase of the development, only included a few patients with HNC. COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) criteria, which assess the quality of PROM development, require that at least seven individuals from the target population review the PROM to achieve the highest quality rating for a content validation study. 14 The development study did not meet this threshold for HNC participants.12,13
Furthermore, the analysis did not report separately on the validity and reliability of the FACE-Q Craniofacial Module among the HNC subgroup. Presenting psychometric properties for the HNC subgroup will help provide more direct evidence for the use of FACE-Q in this target population. 15 Therefore, the aim of this article was to establish content validity and assess the psychometric properties of the FACE-Q Craniofacial Module, a PROM measuring outcomes in conditions associated with facial differences, for use in pediatric and adolescent patients with HNC. We hypothesized that scales from the FACE-Q Craniofacial Module would evidence sufficient content validity in pediatric and adolescent survivors of HNC.
Materials and Methods
This study had two parts. In Part 1, we examined the content validity of the FACE-Q Craniofacial Module scales in survivors of pediatric HNC and with clinical experts. In Part 2, we performed an exploratory psychometric analysis using data from an international field-test study sample. All aspects of the study were conducted according to COSMIN user's manual criteria.14,15
This study was approved by the Hamilton Integrated Research Ethics Board No. 14-763 and by the Ethics Board at each participating site.
Part 1: Establishing content validation
Participants
Inclusion criteria
English-speaking, diagnosed with a HNC before 18 years of age, a visible and/or functional facial difference caused by HNC, and at least 8 years of age at time of study.
Exclusion criteria
Cognitive impairment limiting independent participation in an interview.
Recruitment
Recruitment took place between June 2017 and August 2019 at the following sites: McMaster Children's Hospital (Ontario), The Hospital for Sick Children (Ontario), and British Colombia's Children's Hospital (British Columbia). A research team member approached potential patients to explain the study, and obtained informed consent for those interested. An interview was scheduled to take place either in person or by telephone (option available only for those at least 12 years of age), depending on participant preference. A $50 gift card was provided to participants after the interview to thank them for their time.
Interviews
Semistructured interviews were conducted by trained interviewers using a cognitive interview guide. The cognitive interview approach used was adapted from Willis. 16 Participants were asked about the relevance, comprehensiveness, and comprehensibility of the instructions, response options, and items. The “think aloud” approach was used whereby participants verbalized their thoughts as they worked through each scale.17–19 Probing questions were used to better understand problems with item interpretation.19–23
Twenty-six FACE-Q Craniofacial Module scales/checklists were included for review, with only the Birthmark scale excluded from the original module (see Table 1). All participants (n = 15) were asked to review six broadly applicable scales measuring appearance (Face scale), adverse effects (Face Adverse Effects scale), and HRQL (Appearance Distress, Psychological, School, and Social scales). Additional scales (Table 1) were reviewed if deemed relevant to participants based on their facial difference(s). Interviews were audio recorded, transcribed verbatim, and coded line-by-line by two researchers (Y.W. and E.T.). The first six interviews were double coded. Codes were transferred into a Microsoft Excel (2016) spreadsheet for analysis and reviewed by the research team to determine content validity.
FACE-Q Craniofacial Module Scales
Core scales.
HRQL, health-related quality of life.
Expert input
In September 2017, experts provided feedback on the FACE-Q Craniofacial Module via a secure web-based Research Electronic Data Capture (REDCap).24,25 Invitations were sent via email, with one reminder sent after 7 days. Experts (n = 21) reviewed each scale and were asked to provide feedback on all aspects of each scale (instructions, response options, items), and to indicate items that should be added or removed. A comment section was provided after each scale and at the end of the survey for additional feedback. Data were exported from REDCap into Excel for analysis to identify suggestions for scale improvements.
Part 2: Psychometric properties
Participants
Participant data (n = 121) came from the following two sources:
The FACE-Q Craniofacial Module field-test study12,13: Participants were aged 8–29 years with a visible and/or functional facial difference.12,13 Participants were recruited in plastic surgery clinics in 12 countries between 2016 and 2019. Participants completed relevant FACE-Q scales and questions. A clinical form was used by recruiters to collect information about the type and severity of their facial condition. The form included a matrix that asked about the severity (i.e., no, yes-minor, yes-major) of impact of the condition on facial appearance (e.g., eyes, forehead, lips) and facial functions (e.g., eating, drinking, speech). Data were entered into a secure REDCap database hosted at McMaster University (Canada).24,25 A study of HNC patients from France, The Netherlands, United Kingdom, and the United States diagnosed before 18 years of age and who were 8–29 years of age at the time of the study. Recruitment took place in oncology outpatient clinics.
26
Participants completed the following 10 FACE-Q Craniofacial scales: Face, Nose, Lips, Teeth, Jaws, Speech, Speech Distress, Psychological, Social, and School. Clinical data were obtained from hospital records.
Statistical analysis
An exploratory Rasch Measurement Theory (RMT) analysis was performed in RUMM2030 software (RUMM Laboratory Pty Ltd., Duncraig, Western Australia, 1998–2020) using the unrestricted Rasch model for polytomous data. 27 Fit of the data to the Rasch model was examined statistically and graphically. The following tests were performed to assess how well the scales and items performed psychometrically:
Threshold maps were examined to determine if the response categories (e.g., not at all, a little, quite a bit, very much) for each scale worked as intended.
Item fit was examined graphically (item characteristic curves) and statistically [(log residuals (item–person interaction) and chi-square values (item–trait interaction)]. Ideal fit residuals should fall between ±2.5, with chi-square values nonsignificant after Bonferroni adjustment. 28 We examined individual item fit and overall fit of the data to the Rasch model.
We inspected local independence of items by examining the residual correlation matrix to identify any pairs of items with residuals that correlated ≥0.30. Locally dependent pairs of items were included in a subtest to identify their impact on Person Separation Index (PSI) values.
Finally, we examined scale reliability with PSI and Cronbach's alpha values. Reliability values ≥0.70 were considered adequate.
SPSS (IBM SPSS Statistics, Version 26; IBM Corp.) was used for further analysis. Each scale was scored from 0 to 100 score using FACE-Q Craniofacial Module transformation tables. 11 Construct validity was assessed using predefined hypotheses of expected differences. First, we hypothesized that appearance scale scores would be incrementally higher for those who reported liking (not at all/a little, quite a bit, very much) the appearance of their face, and specific facial areas (i.e., jaws, lips, nose, teeth). Second, we predicted that appearance scale scores would be lower for those with a “major or minor” difference compared with those who had “no” difference. Differences were tested using Independent t-tests or analysis of variance.
Results
Part 1: Establishing content validity
Sample characteristics
Fifteen survivors of HNC aged 8–30 years participated in a cognitive interview (Table 2). Ten participants had completed treatment in the past 5 years. The sample included nine females and six males. Most participants had a history of rhabdomyosarcoma (n = 8). Treatments included chemotherapy (n = 11), surgery (n = 8), and radiation (n = 8). Participants provided feedback on 23 of the 26 FACE-Q Craniofacial Module scales. The number of survivors who provided feedback on the core scales was as follows: Face (n = 15), Appearance Distress (n = 13), Psychological (n = 13), School (n = 10), Social (n = 12), and Face Adverse Treatment Effects (n = 13). Table 2 shows additional information on each scale.
Participant Demographics and Clinical Characteristics of the Content Validation Sample
A single participant may have had one or more forms of treatment.
Fifty experts in the field of oncology were invited to provide feedback and 21 (42%) responded. Experts were from Canada (n = 17), United States (n = 3), and the Netherlands (n = 2). Most had a clinical focus in pediatric oncology (n = 17). Participants included oncologists (n = 9), nurse practitioners (n = 3), psychologists (n = 3), otolaryngologists (n = 2), HNC surgical trainee (n = 1), researchers (n = 2), and one was unspecified. The 21 experts provided feedback on all 26 scales.
Comprehensibility
Instructions and response options for the core scales were thought to be clear and “self-explanatory” (Age 30, Male) by most survivors and experts. All instructions were interpreted as intended. Most participants thought the response options were good with “clear distinct categories” (Age 22, Female). Most items in the core scales were interpreted by participants as intended. A total of six items in the core scales and six items from three appearance scales (Chin, Lips, Smile), and two function items from the Facial and Speech scales were identified as unclear or difficult to understand (see Table 3).
Items in Core Scales and Example Quotes Showing Comprehensibility
Comprehensiveness
Eight concepts were suggested for new items, seven by patient participants and one by an expert. Patient participants suggested adding tongue movement and taste items to the Eating and Drinking scale, ear wax to the Ears Adverse Effects scale, and lip color to the Lips scale. Four experts identified swallowing/dysphagia as an important concept that was missing from the Eating and Drinking scale.
Relevance
The recall periods (now, past week) and response options were deemed appropriate by all patients and experts. Most participants found the Face (11 participants of 15), Appearance Distress (10 participants out of 13), Psychological (10 participants out of 13), School (8 participants out of 10), and Social (9 participants out of 12) scales to measure relevant concepts to HNC. Participant and expert impressions of the FACE-Q Craniofacial Module were positive. General comments from participants include recounting that the questionnaire “made [them] think a lot more about all the different struggles that people experience from the same thing” (Age 22, Female) and another participant “[liking] the questionnaire a lot because [she] got to express… how [she feels] about [herself] and someone was actually listening” (Age 14, Female).
Experts noted that the “drawings/illustrations are very helpful [USA, social worker] and that scales are “…detailed enough to really assess for fine defects in eyelid, lacrimal, and visual function” (United States, Pediatric otolaryngologist) (see Table 4).
Items in Core Scales and Example Quotes Showing Relevance
Part 2: Quantitative
Psychometric analyses included 121 survivors of pediatric HNC (Table 5). Participants were mostly greater than 14 years of age at time of recruitment (66.9%), with a total of 63 male and 58 female participants. The majority of the sample were from either the Netherlands (33.1%), or the United Kingdom (24.8%).
Participant Demographics and Clinical Characteristics of the 121 Survivors of Head and Neck Cancer from the Field-Test Sample
Table 6 shows the results for the scale level RMT results, including the sample size for each scale. Of the 105 items tested, 102 had ordered thresholds, 104 had fit residuals within the ±2.5 criteria, and all items had nonsignificant chi-square p-values after Bonferroni adjustment. The Speech Distress scale had PSI and Cronbach's alpha values of >0.71 and >0.79, respectively. Pairs of items in five scales had residual correlations >0.30. When subtests were performed, the PSI values dropped a maximum of 0.03 for one scale (Lips scale). The proportion of the sample to score on the range of each scale spanned from 73.1% for the Lips scale to 96.7% for the Face scale. Data from the sample fit the Rasch Model for seven scales tested, with marginal misfit in the remaining three scales. Reliability was high, with PSI and Cronbach's alpha values ≥0.82 with and without extremes for nine scales (Table 5).
Rasch Measurement Theory Scale Level Statistics
DF, degrees of freedom; PSI, Person Separation Index.
Descriptive statistics for scale scores in comparison to a cleft lip and palate population 29 are provided in Table 7. Most of the construct validation hypotheses were met. Appearance scale scores were incrementally higher for those who reported that they liked their face overall and specific parts of their face more (p ≤ 0.001; Appendix Table A1). Psychological and Social scales scores were also higher for those who reported that they liked their face overall and facial areas more (p ≤ 0.003). An exception to these findings was the Teeth scale (p ≥ 0.07; Appendix Table A1). As predicted, the mean scores for those with a facial difference were lower for the Face (p = 0.009), Jaws (p = 0.025), Psychological (p = 0.022), and Social (p = 0.023) scales, compared with participants without an observable facial difference. No differences were observed between groups for the remaining scales.
Comparison of Descriptive Statistics for FACE-Q Craniofacial Module Scale Scores
Discussion
PROMs validation is an ongoing process, and it is important to ensure that a PROM is both valid and reliable in target populations, especially when that population may not have been the primary focus of the PROM development process. Since the qualitative phase to develop FACE-Q Craniofacial Module was not focused on survivors of HNC, our team conducted a supplementary qualitative study to determine if the scales had content validity for this patient population. Our study provides evidence that the FACE-Q Craniofacial Module measures concepts relevant to survivors of pediatric HNC. Furthermore, analysis of the data from 121 participants supported the validity and reliability of the FACE-Q Craniofacial Module for use in survivors of pediatric HNC.
Content validity is important from a clinical perspective because it ensures that the measure is capturing both the intended construct and outcomes important to the patient. Overall, participants found the scales to be both understandable and relevant, as well as covering all aspects that they considered important under the construct. Experts in HNCs also found the scales relevant and comprehensive. Comprehensibility of the scales is not a property for experts to assess in relationship to content validation. Our study exceeded COSMIN guidelines for sample sizes in qualitative content validation studies, with seven or more responses per item for most of the scales. 14
From a psychometric perspective, this study has shown that the 10 FACE-Q Craniofacial Module scales evidenced reliability and validity for children and young adult survivors of HNC. The response categories for the 10 scales work as intended, and item fit statistics indicated a good fit of the data to the Rasch model. The scales had high reliability indicated by PSI values. Cronbach's alpha values provided evidence of the scales' internal consistency, exceeding COSMIN requirements. 15 These finding are important because they show that the items in each scale worked together to measure the intended constructs. The acceptance of the predefined hypotheses showed that the scales were able to detect differences between groups, providing further support that the scales measured what was intended in this population.
The measurement of patient perception of HRQL is crucial to assessing clinical outcomes of rehabilitative treatments and reconstructive surgery, as the primary goal of these treatments is to restore appearance and facial function. A recent study found that adverse effects graded by physicians were weakly correlated to many patient-reported outcomes in survivors of HNC and advised the use of PROMs to help incorporate patient concerns into care plans. 26 At least 40 different instruments have been used to measure HRQL of patients with HNC. 30 While there is a large number of HRQL instruments used with pediatric HNC patients, many of these PROMs were not developed or validated in samples that include such patients.9,30,31 Reviews of existing PROMs for patients with HNC and for pediatric patients with facial differences highlight the heterogeneity of measures that have been used, which has impeded comparisons of outcomes across studies.9,31
Importantly, existing PROMs that are used in patients with facial conditions, including HNC, lack concepts related to appearance and facial function, which are considered important aspects of HRQL in this patient population.9,30 The FACE-Q Craniofacial Module addresses these identified gaps in available PROMs, and can be used to evaluate outcomes for pediatric HNC.
A limitation of part one of this study was that only participants with five distinct HNC diagnoses were included for content validation. While rarer HNC diagnoses in the pediatric population were not included (e.g., oral cancer), our sample included the most common pediatric HNCs. 1 Not all participants reviewed all core scale due to time constraints. However, the 10 core scales reviewed met the highest COSMIN rating, which requires ≥7 participants per item. In addition, the current study lacks feedback from other health care professionals involved in the care of pediatric HNC patients, such as speech language pathologists.
Part two of this study was performed on a limited sample size, which meant that the psychometric findings should be considered exploratory. 32 Furthermore, only 10 FACE-Q Craniofacial Module scales were examined. In addition, participants' specific HNC type in the field-test study is unknown. Future research with a large sample of patients is warranted to confirm findings.
Conclusion
The FACE-Q Craniofacial Module addresses an important gap in measurement of appearance and facial function from the patient perspective. This module evidenced content validity and acceptable psychometric properties for patients with HNC, providing support for its use within clinical practice or research for this population. Further information about the FACE-Q Craniofacial Module can be found at https://qportfolio.org/face-q/craniofacial/.
Footnotes
Authors' Contributions
Y.W.: Conceptualization, Methodology, Data Curation, Investigation, Formal Analysis, and Writing-Original draft preparation; C.R.: Data Curation, Formal Analysis, Investigation, and Writing-Original draft preparation; E.T.: Methodology, Investigation, Data Curation, and Writing-Reviewing and Editing; P.C.N.: Investigation and Writing-Reviewing and Editing; E.B.: Investigation and Writing-Reviewing and Editing; D.D.: Investigation and Writing-Reviewing and Editing; K.W.R.: Investigation and Writing-Reviewing and Editing; A.K.: Formal Analysis, Supervision, Funding Acquisition, Methodology, and Writing-Reviewing and Editing.
Disclaimer
The analyses, conclusions, opinions, and statements expressed herein are solely those of the authors and do not reflect those of the funding or data sources; no endorsement is intended or should be inferred.
Author Disclosure Statement
A.K. and K.W.R. are codevelopers of the FACE-Q Craniofacial Module described in this publication and share in any license revenues as royalties based on their institutions' inventor sharing policy for their use in for-profit study. The other authors have no conflict of interest to declare in relationship to this work.
Funding Information
This qualitative portion of this study was supported by the Pediatric Oncology Group of Ontario (POGO) Research Unit. The quantitative portion of this study was supported by the Canadian Institutes of Health Research (FRN #148779).
Appendix
Comparison of Scores Between Participants With and Without a Facial Difference
| Facial difference | N | Mean | Standard deviation | Standard error mean | p |
|---|---|---|---|---|---|
| Face | |||||
| No | 39 | 61 | 19 | 3 | 0.009 |
| Yes | 51 | 51 | 15 | 2 | |
| Jaws | |||||
| No | 33 | 70 | 26 | 4 | 0.025 |
| Yes | 44 | 57 | 22 | 3 | |
| Lips | |||||
| No | 38 | 79 | 22 | 4 | 0.075 |
| Yes | 49 | 70 | 22 | 3 | |
| Nose | |||||
| No | 39 | 73 | 23 | 4 | 0.16 |
| Yes | 49 | 66 | 22 | 3 | |
| Teeth | |||||
| No | 39 | 54 | 21 | 3 | 0.743 |
| Yes | 48 | 52 | 17 | 2 | |
| Psychological | |||||
| No | 38 | 72 | 16 | 2 | 0.022 |
| Yes | 50 | 63 | 20 | 3 | |
| Social | |||||
| No | 38 | 76 | 17 | 3 | 0.023 |
| Yes | 49 | 67 | 15 | 2 | |
| Speech | |||||
| No | 37 | 79 | 17 | 3 | 0.214 |
| Yes | 51 | 75 | 17 | 2 | |
| Speech distress | |||||
| No | 37 | 77 | 18 | 3 | 0.909 |
| Yes | 51 | 77 | 21 | 3 |
