Exploring face validity of patient-reported outcomes assessing subjective norms,self-regulation,and illness perceptions in lumbar spinal stenosis: A think aloud study

Abstract

Outcome measures are essential in healthcare. This study aimed to use Think Aloud methodology to explore the face validity of three patient-reported outcome measures (PROMs)—the Subjective Norms, Self-Regulation, and Expected Illness Perception questionnaires—among individuals undergoing surgery for lumbar spinal stenosis (LSS). Twelve participants aged 59–88 verbalised their thoughts while completing the questionnaires. Content analysis revealed 96 issues across the 3 PROMs. The Expected Illness Perception and Subjective Norms questionnaires yielded the highest number of problems, particularly around phrasing and inconsistent response scales. While the Self-Regulation questionnaire was generally well understood, some items still prompted confusion. Findings highlight the importance of clear item phrasing, scale consistency, and contextual framing to reduce cognitive burden and enhance usability. Recommendations for item revision are provided to improve face validity and interpretability. Further qualitative and quantitative evaluation in a more diverse participant population is warranted before clinical or research use.

Keywords

patient-reported outcome measures lumbar spinal stenosis cognitive interviewing think aloud subjective norms

Introduction

Patient-reported outcome measures (PROMs) are essential in healthcare, offering insights into patients’ perspectives on their health status, functional outcomes, and quality of life (Black, 2013). These subjective measures complement clinical and physiological assessments and are crucial for evaluating the effectiveness of interventions, guiding treatment decisions, and improving patient-centred care (Black, 2013). The validity and reliability of PROMs must be established within specific contexts and populations to ensure their appropriateness and interpretability as linguistic, cultural, and condition-specific nuances can influence how patients interpret and respond to PROMs (Brod et al., 2009). Furthermore, physical and psychosocial factors can interplay significantly with health outcomes, necessitating careful consideration of PROMs appropriateness (McIlroy et al., 2021b).

Lumbar spinal stenosis (LSS), a degenerative condition characterised by the narrowing of the spinal canal and compression of neural and vascular structures, can result in bilateral leg and/or back pain, limited mobility, and reduced quality of life (Tomkins-Lane et al., 2016). Walking restriction, the hallmark impairment of LSS, and its improvement may be influenced by biological factors and also psychosocial constructs, including attitudes, beliefs, and expectations (McIlroy et al., 2021a). Understanding these factors with use of PROMs is essential for developing tailored interventions to optimise patient outcomes (McIlroy et al., 2025).

Three psychosocial constructs—self-regulation, subjective norms, and illness expectations—have been identified as important determinants of health behaviours, including walking and recovery from surgery in conditions such as cardiac surgery or orthopaedic surgery (Gehring et al., 2020; Jenkins and Gortner, 1998; McHugh et al., 2008). Self-regulation, rooted in goal-directed theories (Bandura, 1991), involves the capacity to manage thoughts, emotions, and actions for desired outcomes, such as increasing walking after surgery. Subjective norms and expectations both involve anticipatory beliefs that influence motivation. Subjective norms, from the Theory of Planned Behaviour (Ajzen, 1991), reflect perceived social expectations on behaviour and inform motivation to engage in physical activity. Illness expectations, shaped by appraisals of health and treatment (Weinman et al., 1996) influence perceived control over recovery and engagement in rehabilitation. Despite these constructs having demonstrated relevance in other surgical populations, their role in walking behaviour and post-operative recovery has not been thoroughly explored within the context of LSS.

PROMs are widely used to capture constructs such as subjective norms, self-regulation, and expectations. Existing PROMs have been validated in other health conditions (Anderson et al., 2007; Mondloch et al., 2001), but lack evidence of their applicability to individuals with LSS undergoing decompressive surgery. Addressing this gap is critical to ensure these measures accurately reflect the lived experiences, beliefs and recovery challenges of this specific population, particularly in terms of content and face validity. In the present study, face validity was understood as the extent to which questionnaire items appeared clear, relevant, and appropriate to respondents in the target population (Ranganathan et al., 2024).

The Think Aloud method is a widely used and valuable approach for evaluating how individuals interpret and respond to PROM items (French et al., 2007; Hyland et al., 2015; Sekhon et al., 2022). This cognitive interviewing technique involves participants verbalising their thoughts in real time as they complete a questionnaire, revealing their interpretations, hesitations, and reasoning (Someren et al., 1994). By capturing participants’ real-time thought processes while completing PROMs, the think aloud method can highlight issues of ambiguity, cognitive burden, and misinterpretation, providing insights into the clarity, relevance, and comprehensibility of questionnaire items (Willis, 2005).

This study employs the Think Aloud method to assess the face validity of three PROMs: the Subjective Norms questionnaire (Francis et al., 2004), the Self-Regulation questionnaire (Sniehotta et al., 2005), and the Expected Illness Perception questionnaire (Laferton et al., 2013) for use in assessing psychosocial factors influencing walking behaviour in individuals undergoing surgery for LSS.

Methods

Design

A “Think Aloud” study was conducted between December 2021 and May 2022. Cognitive interviewing was used to capture participants’ thought processes and encourage participants to verbalise silent thoughts about the questionnaires. Ethical approval for the study was received from the East Midlands – Nottingham 1 Research Ethics Committee (Reference number: 20/EM/0307).

Participants

Participants were recruited from three NHS trusts. Purposive sampling was used to recruit participants from the parent study (McIlroy et al., 2025) in order to include a diverse range of individuals in terms of characteristics such as age, gender, and ethnic background. Eligible participants were individuals aged 50 years or older, diagnosed with LSS, and scheduled for elective decompressive spinal surgery. Additional inclusion criteria were ability to communicate in English and willingness to engage in the think aloud process. Exclusion criteria consisted of LSS caused by tumour, fracture or significant deformity (>15° lumbar scoliosis; ⩾ grade II spondylolisthesis); patients requiring emergency surgery; >1 level fusion surgery or if they reported other conditions that were the primary cause of walking restriction.

Procedure

Potential participants were invited to complete the think aloud study as a sub-study embedded within a longitudinal study investigating prognostic factors for change in walking after surgery for LSS (McIlroy et al., 2025). If interested in taking part in this study participants were posted an information sheet, consent form, a demographic survey, and the questionnaires sealed in an opaque envelope. Participants returned the signed the consent form and demographic survey in a stamped addressed envelope prior to their interview.

One researcher (SMc), a physiotherapist not involved in participants’ clinical care, conducted the interviews via Microsoft Teams and participants could choose whether their cameras were on. Verbal instructions were adapted from a think aloud study by French et al. (2007). Participants were instructed to read aloud each question, verbalise their thought process of forming their responses to the questions, and provide their answer. After going over the instructions, participants opened the envelope containing the questionnaires. Following a practice session on weather-related questions to clarify the think-aloud technique, participants proceeded to complete the three questionnaires. They were encouraged to speak continuously, with prompts provided after 10 seconds of silence (e.g. “Tell me what you’re thinking”). The facilitator noted any hesitations, questions, or difficulties. Afterwards, participants were asked for their overall impressions and any specific challenges they encountered. Interviews were recorded via the Microsoft Teams software, transcribed verbatim, and anonymised and recordings were deleted.

Questionnaires

The questionnaires investigated were “Subjective Norms (Social Norms)” (Francis et al., 2004), “Self-Regulation” (Sniehotta et al., 2005) and “Expected Illness Perception” (Laferton et al., 2013; questionnaires in Supplemental materials 1).

The Subjective Norms questionnaire comprised three items asking people to rate how strongly they agreed with the items regarding perceived social pressure on walking. Item 1 (“Most people who are important to me think that I should go for a walk every day”) used a seven-point Likert scale from 1 (“I should”) to 7 (“I should not”), while Items 2 (“It is expected of me that I go for a walk every day”) and 3 (“I feel under social pressure to go for a walk every day”) ranged from 1 (“strongly disagree”) to 7 (“strongly agree”).

The Self-Regulation questionnaire included 10 items evaluating participants’ planning and monitoring of physical activity, aligned with guidelines of daily walking and weekly moderate exercise. Four items assessed planning (e.g. “I have made a detailed plan regarding when to do my physical activity”), and six focussed on self-monitoring over 6 weeks (e.g. “I have constantly monitored myself to ensure I completed physical activity frequently enough”). Responses ranged from 1 (“not at all true”) to 4 (“exactly true”) on a four-point scale.

The Expected Illness Perception questionnaire (IPQ-E) consisted of 11 items measuring expectations about their low back condition (LSS) 3 months post-surgery. Participants rated statements like “. . .my low back condition will have major consequences on my life” and “. . .surgery will control my low back condition” on a five-point Likert scale from 1 (“strongly disagree”) to 5 (“strongly agree”), covering consequences, control, and surgical outcomes.

Analysis

Content analysis was used to systematically code and categorise the data to identify recurring themes and specific issues with questionnaire items (Elo and Kyngäs, 2008).

A deductive coding framework was applied, using predefined categories adapted from French et al. (2007) and Willis (2005). Two researchers (AH, TC) independently reviewed the transcriptions and coded each response to the predefined categories, recording each response in Microsoft Excel. The categories were as followed:

(1) Did they sufficiently think aloud? - Whether participants verbalised their thought processes adequately, providing continuous insight into their interpretation and reasoning, as judged by the coders based on transcript content. Or: Whether participants provided sufficient verbal insight into their interpretation and reasoning, as judged by coders from the transcripts

(2) No significant problems identified

(3) Re-reading the question

(4) Stumbled reading the question

(5) Seriously floundered in answering the

(6) Misunderstanding of scale used

(7) Felt unable to answer accurately

(8) Survey context misunderstood

(9) Felt questions asked the same thing - Participant explicitly stated that items seemed redundant, based on their verbalised perception.

After initial coding of Participant 1 transcripts, a meeting between the researchers (AH, SMc, SN, TC) reviewed coding for consistency, following which the remaining questionnaires were coded. Any discrepancies were resolved through revisiting the transcripts, re-evaluating the problematic items, and agreeing on the most appropriate categorisation. The level of agreement between coders was calculated as the proportion of items on which both coders agreed. Percent agreement values between 75 and 90% have been described as acceptable (Stemler, 2004)

Results

Seventeen people were invited to take part, five declined as it was not convenient. Twelve interviews were conducted; mean age of participants was 71 years (range 59–88 years) and 50% were female. The characteristics of the participants are summarised in Table 1. Three participants had undergone surgery at the time of the interview. Duration of the interviews ranged from 17 to 48 minutes.

Table 1.

Participant characteristics.

Characteristics		n = 12
Sex	Male	6 (50%)
Sex	Female	6 (50%)
Age	Mean (standard deviation)	71 (±8.5)
Self-reported ethnicity	White British	9 (75%)
	White Irish	1 (8%)
	Black African	1 (8%)
	Black Caribbean	1 (8%)
Education	Secondary school	6 (50%)
Education	High-professional/university	6 (50%)
Operative status	Pre-operative	9 (75%)
Operative status	Post-operative	3 (75%)

The percentage agreement between coders was 73%, indicating a satisfactory level of inter-coder consistency.

Subjective norms questionnaire

Table 2 and Supplemental material 1 present an overview of the problems identified for each item per person in the Subjective Norms Questionnaire. For Item 1 of the Subjective Norms questionnaire, a total of 15 problems were identified. Eight participants (67%) re-read the question, indicating a need for clarity. Four participants (33%) seriously floundered in answering it, suggesting that the question was challenging to interpret. Additionally, three participants (25%) misunderstood the scale used, highlighting a potential issue with the response format.

Table 2.

Frequency and type of problems per item of the subjective norms questionnaire^a.

Labels
	No problems	Re-read question	Stumbled reading question	Seriously floundered in answering	Misunderstanding of scale	Felt unable to answer accurately	Survey context misunderstood	Felt questions to similar
Questionnaire items	n/12 (%)	n/12 (%)	n/12 (%)	n/1 2(%)	n/12 (%)	n/12 (%)	n/12 (%)	n/12 (%)
1. Most people think that I should . . . go for a walk everyday	4 (33%)	8 (67%)	0 (0%)	4 (33%)	3 (25%)	0 (0%)	0 (0%)	0 (0%)
2. It is expected of me that I go for a walk everyday	10 (83%)	2 (17%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	1 (8%)
3. I feel under social pressure to go for a walk everyday	10 (100%)	2 (17%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)

Multiple codes possible; participants can score on more than one domain.

Ten (83%) participants had no significant problems answering Item 2, with two (17%) participants re-reading the question. One (8%) participant felt that Item 2 asked the same thing as Item 1, which suggests a need for differentiation between the items. For Item 3, ten (83%) participants did not encounter significant problems, two (17%) participants re-read the question.

“That’s the same question worded slightly different, that’s why two and six are equal if you know what I mean.” (P2, male aged 70)

Recommendation from participants

Participants highlighted the need for improved clarity and consistency in the Subjective Norms questionnaire. Clearer instructions were recommended, particularly regarding the response scale and unfamiliar terms, with detailed guidelines at the questionnaire’s start to aid understanding. Ensuring consistent formatting of questions and response scales throughout the questionnaire was emphasised to reduce confusion and enhance flow.

One participant remarked: “I was totally stuck on this one because I’m not sure it says I should. Then we have numbers through 1 to 7 then it says I should not. So, is it high to low or is it low to high?” (P8, female aged 73).

Self-regulation questionnaire

Table 3 and Supplemental material 2 present an overview of the problems identified for each item in the Self-Regulation questionnaire. Eleven (92%) participants answered the questionnaire, while one (8%) participant did not receive it due to a printing error and therefore did not provide responses. Most items were generally well understood, with the primary issue being participants re-reading the questions. However, Items 2 and 4 presented additional challenges.

Table 3.

Frequency and type of problems per item of the self-regulation questionnaire^a.

Labels
	No problems	Re-read question	Stumbled reading question	Seriously floundered in answering	Misunderstanding of scale	Felt unable to answer accurately	Survey context misunderstood	Felt questions to similar
Questionnaire items	n/11 (%)	n/11 (%)	n/11 (%)	n/11 (%)	n/11 (%)	n/11 (%)	n/11 (%)	n/11 (%)
I have made a detailed plan regarding . . .
(a) When to do my physical activity.	9 (82%)	2 (18%)	2 (18%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
(b) Where to do my physical activity.	9 (82%)	2 (18%)	2 (18%)	2 (18%)	1 (9%)	1 (9%)	1 (9%)	1 (9%)
(c) I how to do my physical activity.	11 (100%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
(d) How often to do physical activity.	10 (91%)	1 (9%)	0 (0%)	0 (0%)	1 (9%)	0 (0%)	0 (0%)	0 (0%)
During the last 6 weeks, I have . . .
(a) Constantly monitored myself to ensure I completed physical activity frequently enough.	10 (91%)	1 (9%)	1 (9%)	0 (0%)	0 (0%)	1 (9%)	0 (0%)	0 (0%)
(b) Tried to make sure that I performed physical activity for at least 150 minutes a week.	10 (91%)	0 (0%)	0 (0%)	0 (0%)	1 (9%)	0 (0%)	0 (0%)	0 (0%)
(c) I had the recommended physical activity guidelines often on my mind.	10 (91%)	1 (9%)	1 (9%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
(d) Always been aware of the recommended physical activity guidelines.	11 (100%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
(e) I really tried to do physical activity regularly.	11 (100%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
(f) Tried my best to follow through with recommended physical activity guidelines.	11 (100%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)

Multiple codes possible; participants can score on more than one domain.

For Item 2, two (18%) participants re-read the question and stumbled while reading it. One (9%) of these participants seriously floundered in answering, misunderstood the response scale, and felt unable to answer accurately due to confusion about the survey context. These issues suggest that Item 2 may require rephrasing and additional clarification to improve its comprehensibility.

“Yeah, I’m not reading this very well. I don’t really understand it. See I well, I would, I would say my bedroom, but obviously that’s not an option, so. Well, I’m going for three again.” (P2, male aged 70)

Item 4 presented some difficulties. Ten (91%) participants re-read the question, suggesting that the wording may have been unclear. One (9%) participant struggled to understand the scale, leading to uncertainty in selecting a response, while another felt unable to answer accurately.

Recommendation from participants

Participants noted that some questions in the Self-Regulation questionnaire were too vague and suggested adding specific examples or scenarios to provide better context and improve relatability. They recommended simplifying complex or lengthy questions and breaking them into smaller, more straightforward items to enhance understanding and accuracy. For instance, the item “I have made a detailed plan regarding how often to do my physical activity” could be split into “I have planned how many days a week I will walk” and “I have decided how long each walk will be,” making it easier to respond precisely. Additionally, participants proposed including a brief introductory section with practice questions to help respondents become familiar with the questionnaire format and provide guidance on how to approach answering.

Expected illness perception questionnaire

Table 4 and Supplemental material 3 present an overview of the problems identified for each item in the Expected Illness Perception Questionnaire. Most items were generally well understood, with the primary issue being participants re-reading the questions.

Table 4.

Frequency and type of problems per item of the expected illness perception questionnaire^a.

Labels
	No problems	Re-read question	Stumbled reading question	Seriously floundered in answering	Misunderstanding of scale	Felt unable to answer accurately	Survey context misunderstood	Felt questions to similar
Questionnaire items	n/12 (%)	n/12 (%)	n/12 (%)	n/12 (%)	n/12 (%)	n/12 (%)	n/12 (%)	n/12 (%)
3 months after my low back surgery. . .
1. My low back condition will have major consequences on my life.	6 (50%)	3 (25%)	0 (0%)	3 (25%)	0 (0%)	3 (25%)	3 (25%)	0 (0%)
2. My low back condition will strongly affect the way others will see me.	9 (75%)	0 (0%)	0 (0%)	1 (8%)	0 (0%)	2 (17%)	1 (8%)	0 (0%)
3. My low back condition will have serious financial consequences.	10 (83%)	0 (0%)	1 (8%)	0 (0%)	0 (0%)	0 (0%)	1 (8%)	0 (0%)
4. My low back condition will cause difficulties for those who are close to me.	10 (83%)	1 (8%)	0 (0%)	0 (0%)	0 (0%)	1 (8%)	1 (8%)	0 (0%)
5. My low back surgery, there will be a lot which I can do to control my symptoms.	10 (83%)	1 (8%)	1 (8%)	0 (0%)	0 (0%)	1 (8%)	1 (8%)	0 (0%)
6. My low back surgery, what I do will determine whether my low back condition will get better or worse.	11 (92%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	1 (8%)	0 (0%)
7. The course of my low back condition will depend on me.	9 (75%)	3 (25%)	1 (8%)	2 (17%)	0 (0%)	0 (0%)	1 (8%)	0 (0%)
8. I will have the power to influence my low back condition.	9 (75%)	2 (17%)	0 (0%)	1 (8%)	0 (0%)	2 (17%)	1 (8%)	0 (0%
9. My surgery will have been effective in curing my low back condition	10 (83%)	2 (17%)	0 (0%)	2 (17%)	0 (0%)	2 (17%)	1 (8%)	0 (0%)
10. The negative effects of my low back condition will be prevented by my surgery.	9 (75%)	3 (25%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	1 (8%)	0 (0%)
11. My surgery will control my low back condition.	10 (83%)	2 (17%)	0 (0%)	0 (0%)	0 (0%)	2 (17%)	1 (8%)	0 (0%)

Multiple codes possible; participants can score on more than one domain.

For Item 1, six (50%) participants encountered problems, with some misunderstanding the 3-month post-surgery timeframe they were meant to consider while answering. Three (25%) participants specifically noted that they thought about their current situation rather than projecting their expectations 3 months post-surgery, which the questionnaire aimed to measure. These issues suggest that Item 1 may require rephrasing and clarification to ensure the intended timeframe is explicitly communicated.

“To be truthful, I was thinking more about today . . . not about three months after surgery.” (P5, male aged 66)

In item 7, three (25%) participants re-read the question, suggesting potential issues with wording. One (8%) participant misunderstood the response scale, leading to uncertainty in selecting an answer, and another found it difficult to imagine their situation 3 months post-surgery.

Recommendations from participants

Participants highlighted that some questions in the Expected Illness Perception Questionnaire were too vague and suggested adding specific examples or scenarios to provide better context and improve relatability. For example, Item 5 (“. . .there will be a lot which I can do to control my symptoms”) could be clarified with a scenario like “After surgery, I can control my symptoms by walking regularly or adjusting my posture,” helping participants relate it to concrete actions. They emphasised the need for questions to be detailed enough to elicit specific responses. Simplifying complex or lengthy questions and breaking them into simpler, more straightforward items was also recommended to enhance understanding and accuracy.

“Some of them are a bit long-winded, aren’t they? You sort of forget what the start of the question was by the time you get to the end. . . I had to read it again.” (P4, male aged 65)

Additionally, participants proposed including a brief introductory section with practice questions to familiarise respondents with the format and provide guidance on how to approach answering, which would make completing the main questionnaire easier.

Discussion

In this study, the Think Aloud method was employed to assess the face validity of three Patient-Reported Outcome Measures (PROMs)—the Subjective Norms Questionnaire, the Self-Regulation Questionnaire, and the Expected Illness Perception Questionnaire—in individuals undergoing surgery for LSS. Overall, the findings indicate that participants encountered several difficulties when completing these measures, raising concerns about their clarity and suitability in this patient population. Although the primary aim of the study was to assess face validity, some of the issues identified through the think-aloud interviews, such as ambiguity, redundancy, and difficulties interpreting key constructs, may also have implications for content and construct validity. However, these aspects were not formally evaluated in the present study and require further investigation in future research.

Across all three PROMs, a total of 96 issues were identified. The majority of these arose in the Expected Illness Perception Questionnaire (n = 55, 57%) and the Self-Regulation Questionnaire (n = 21, 22%), with fewer observed in the Subjective Norms Questionnaire (n = 20, 21%).

The most frequent problems included re-reading the question (n = 35, 36%), seriously floundering when answering (n = 15, 16%) and feeling unable to answer accurately (n = 15, 16%). Participants also commonly misunderstood the survey context (n = 14, 14%). Less frequent issues included misunderstanding the response scale (n = 6, 6%), stumbling when reading (n = 5, 5%) and perceived redundancy across items within a questionnaire (n = 2, 2%).

Subjective norms questionnaire

The Subjective Norms Questionnaire elicited the highest number of reported difficulties. Participants frequently paused to re-read or stumbled while reading the items aloud, indicating that the phrasing was not immediately accessible. Several (n = 2, 17%) individuals commented on the ambiguity of the questions, particularly when attempting to distinguish between social expectations and internal motivation. Items referencing perceived social pressure or expectations from “important others” were not always intuitive, especially given the older adult population’s varied interpretations of who these “others” might be.

One key issue was the use of a bipolar response scale (e.g. “I should not” to “I should”), which was unfamiliar to many participants. Some were unsure of whether “1” indicated agreement or disagreement, and this confusion disrupted their flow of answering. Furthermore, the cognitive load introduced by abstract social constructs, combined with ambiguous language, created a sense of fatigue. Several participants noted that the questions were “too wordy” or difficult to pin down, echoing previous research on the link between complex phrasing and response bias (Kool et al., 2010).

The difficulty participants experienced in interpreting subtle semantic differences—for example, distinguishing between “expected of me” and “social pressure”— may indicate overlap between concepts or ambiguity in how these items were understood. Research by Knafl et al. (2007) highlights how abstract social constructs can be particularly challenging to assess in older adults, especially when combined with complex response scales.

To reduce ambiguity and support clearer interpretation, item phrasing should be revised to be more specific and concrete. For example, including a brief explanation of who “important others” might be (e.g. family, doctors, caregivers) could help ground participants’ responses. In addition, adopting more intuitive Likert-style scales with clear, labelled endpoints (e.g. “strongly disagree” to “strongly agree”) would likely improve consistency in how participants interpret and use the response options. Aligning all scales across the questionnaire would further reduce confusion and enhance usability (de Rada, 2005).

Self-regulation questionnaire

The Self-Regulation Questionnaire was generally the most acceptable of the three PROMs, with participants reporting fewer difficulties than in other measures. Most participants demonstrated an understanding of the planning and self-monitoring items, particularly when the questions referenced concrete behaviours like walking. However, some (n = 4, 33%) still floundered or hesitated, particularly in relation to items about long-term intentions or structured goal setting. A few respondents expressed uncertainty about how their physical limitations, particularly around the time of surgery, influenced their ability to plan or monitor activity. Prior research (e.g. Bandura (1997)) highlights the importance of context and specificity in enhancing perceived self-efficacy, which aligns with these findings.

Another issue was the varying interpretation of physical activity guidelines included in some items. While some participants appreciated the inclusion of benchmark recommendations (e.g. 150 minutes per week), others felt overwhelmed or confused by how this applied to their individual context. There was also occasional uncertainty regarding how to use the response scale, especially when it differed slightly from other questionnaires in the study.

To improve clarity, language referring to planning and monitoring should be simplified using everyday terms, such as “keeping track” instead of “monitoring” or “deciding in advance” instead of “planning.” Linking questions more clearly to recent behaviour (e.g. within the past week) could help respondents draw on concrete experiences rather than abstract goals. Maintaining consistent response formats throughout the questionnaire is also important to minimise unnecessary confusion and reduce cognitive effort.

Expected illness perception questionnaire

Participants encountered several difficulties with the Expected Illness Perception Questionnaire, particularly due to the temporal ambiguity of many items. Respondents often questioned whether they were meant to reflect on their current symptoms or anticipate future outcomes post-surgery. This was especially evident in items referencing terms such as “control,” “cure,” and “what I do.” Many participants interpreted these phrases differently—some focussing on physical recovery, others on medication adherence or psychological wellbeing—suggesting a lack of shared understanding of key concepts.

Such difficulties underscore the cognitive demands placed on respondents when abstract health concepts are used without contextual support. As noted by Gibbons et al. (2013), PROMs often suffer from reduced validity when they rely on vague or future-oriented constructs without anchoring examples. Given that many participants expressed difficulty anticipating life post-surgery, it is unsurprising that these items prompted hesitations and verbal uncertainty.

The cognitive burden of these items was amplified by their abstract and hypothetical nature. Several participants expressed frustration when trying to imagine how they might feel or function 3 months after surgery, a timeframe that felt uncertain or unrealistic to project. As Kool et al. (2010) have shown, such abstraction increases the likelihood of inaccurate or inconsistent responses, particularly in older adults managing complex health conditions. Additionally, inconsistent response scales across questionnaire sections added to the confusion.

Improving clarity in this questionnaire will likely require multiple revisions. Items should be anchored more explicitly to a clear timeframe—for instance, beginning each question with “Three months after my surgery . . . ” can help orient respondents. Similarly, abstract terms such as “cure” and “control” should be replaced or accompanied by brief clarifications (e.g. “reduce pain,” “improve mobility”). The inclusion of illustrative examples for complex concepts could also support comprehension, particularly for individuals with limited experience completing health-related surveys.

Response scale issues

A recurrent issue across participants was inconsistency in understanding response scales, with confusion surrounding whether a scale ranged from 1–5 or 1–7. Misalignment in scale formatting across different sections of the questionnaire compounded this problem, making it more difficult for participants to respond consistently. Prior research has suggested that ensuring uniform formatting and providing explicit scale instructions can improve response accuracy and minimise errors (Radoux et al., 2020). Addressing these issues in future iterations of the PROMs could enhance clarity and reduce potential sources of response variability.

Based on the issues identified through the Think Aloud interviews, several items across the three PROMs were revised to improve clarity, reduce cognitive burden, and address specific problems such as vague wording or misunderstanding of scale anchors. Table 5 presents specific examples of problematic items, suggested revisions, and the rationale for each change. These revisions are preliminary and intended to guide future refinement; they have not yet been formally psychometrically evaluated.

Table 5.

Suggestion for change of the questionnaires.

Original content	Suggestion for change	Reason for change
Subjective norms questionnaire
Questionnaire item 1: Most people who are important to me think that I should / should not go for a walk every day.	Most people who are important to me think that I should go for a walk every day. Please answer using the scale from 1 (Strongly Disagree) to 7 (Strongly Agree).	Participants were often confused by the original scale format, unsure whether 1 or 7 indicated agreement. Clarifying the direction and labels of the scale can improve understanding and response accuracy.
Self-regulation questionnaire
Questionnaire item 1: I have made a detailed plan regarding when to do my physical activity.	I have decided what days and times I will go for a walk each week.	Participants found “detailed plan” abstract. Using everyday language and a specific behaviour (e.g., walking) improves clarity and reduces cognitive burden.
Questionnaire item 5: During the last 6 weeks I have constantly monitored myself to ensure I completed physical activity frequently enough.	I keep track of when I go for a walk or exercise.	The word “monitor” was difficult for some participants. Replacing it with “keep track” provides a simpler and more familiar term.
Expected IPQ questionnaire
Questionnaire item 1: My low back condition will have major consequences on my life.	In 3 months after surgery, my low back condition will still affect my daily life a lot.	Participants were unsure if they should answer based on now or after surgery. Clarifying timeframe and making “consequences” more specific helps accuracy.
Questionnaire item 5: After my low back surgery, there will be a lot I can do to control my symptoms.	3 months after surgery, there will be things I can do to reduce my symptoms (e.g., exercise, medication).	The term “control” was unclear and interpreted in different ways. Replacing it with “reduce” and adding examples improves comprehension.
Questionnaire item 9: My surgery will have been effective in curing my low back condition.	3 months after my surgery will help reduce my low back symptoms. Or: 3 months after my surgery, I expect my lower back symptoms to have improved.	Participants questioned the meaning of “cure,” which felt unrealistic. “Reduce symptoms” is a more concrete and relatable alternative.

Strengths and limitations

The Think Aloud method provided rich, real-time qualitative insights into how participants process PROM items. Unlike traditional validation methods, this approach allowed researchers to identify subtle issues, such as participants perceiving two items as redundant or interpreting questions differently than intended. This aligns with previous findings where Think Aloud protocols were shown to capture cognitive processing difficulties that conventional validation studies might overlook (Byrd et al., 2023). The ability to observe participants’ thought processes while responding to PROMs offers valuable information for refining these measures and ensuring that they align more closely with the intended constructs. The study involved a relatively small sample of 12 participants of primarily older adults from a specific clinical population. The study involved a relatively small sample of 12 participants from a specific clinical population, most of whom were older adults and from a predominantly White, English-speaking UK background. While the findings offer valuable insights, the limited cultural and linguistic diversity of the sample may affect generalisability. Although we did not identify a clear pattern of misunderstanding specifically related to cultural background, future research should include more culturally and linguistically diverse participants to explore the readability, appropriateness, and comprehensibility of these PROMs across different groups. The inter-coder agreement was 73%, slightly below the 75% threshold specified in the methods , however it remains above the 70% level often considered acceptable for exploratory qualitative research (Lombard et al., 2002) and is therefore unlikely to have a substantial impact on study findings. A further limitation is that reading ability was not formally assessed. As a result, it is not possible to determine the extent to which literacy or health literacy may have influenced participants’ understanding of questionnaire items. Future studies would benefit from including a measure of reading ability or health literacy, particularly when evaluating PROMs in older clinical populations. The recommendations are likely relevant to a broader population, but future versions of the questionnaires would benefit from further validity testing. Future research should include more diverse participants, including non-native English speakers, to better understand how different groups interpret and respond to PROM items. A more varied sample would support a more comprehensive evaluation of the questionnaires across demographics.

The online interview format may have influenced participant engagement or comfort levels. Some respondents encountered technical difficulties or expressed unfamiliarity with virtual platforms, potentially affecting the depth of their responses. While digital methodologies provide accessibility benefits, these challenges should be considered in future studies using remote data collection methods. Addressing these issues through alternative methods, such as hybrid approaches that combine online and in-person interviews, could provide more reliable and comprehensive data. Accordingly, the proposed revisions should be regarded as preliminary and interpreted in light of the small, predominantly White, English-speaking UK sample and the qualitative nature of the data.

Additionally, the suggested item revisions are based on qualitative findings and have not yet been psychometrically tested. Any revised versions should therefore be evaluated quantitatively before being recommended for broader clinical or research use.

Conclusion

The findings from this Think Aloud study highlight several ways in which item design, response scale formatting, and temporal framing can impact the interpretability of PROMs among older adults preparing for or recovering from surgery for LSS. While the Self-Regulation Questionnaire was mostly well-received, the Subjective Norms and Expected Illness Perception questionnaires revealed key areas for improvement. Enhancing the clarity, specificity, and consistency of questionnaire items—alongside efforts to reduce cognitive load—will likely improve the usability, face validity, and overall accuracy of PROM data in this clinical context. The suggested revisions are based on qualitative findings from a small, predominantly White, English-speaking UK sample. Further qualitative and quantitative evaluation, including psychometric testing, is recommended prior to clinical or research use.

Supplemental Material

sj-docx-1-hpq-10.1177_13591053261458611 – Supplemental material for Exploring face validity of patient-reported outcomes assessing subjective norms, self-regulation, and illness perceptions in lumbar spinal stenosis: A think aloud study

Supplemental material, sj-docx-1-hpq-10.1177_13591053261458611 for Exploring face validity of patient-reported outcomes assessing subjective norms, self-regulation, and illness perceptions in lumbar spinal stenosis: A think aloud study by Arshia Honarvar, Sam Norton, Tomal Choudhury and Suzanne McIlroy in Journal of Health Psychology

Footnotes

ORCID iDs

Arshia Honarvar

Tomal Choudhury

Suzanne McIlroy

Ethical considerations

Ethical Approval for the study was received from the East Midlands – Nottingham 1 Research Ethics Committee (Reference number: 20/EM/0307).

Consent to participate

Patients were given an information sheet and consent form which they had to complete and return prior to taking part. Written informed consent for inclusion in this research was obtained from the participants prior to participation.

Consent for publication

Consent for publication is not applicable to this article as it does not contain any identifiable data

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding for this study was received from Training fellowship from Dunhill medical trust; Grant number RFT2006/14.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The data supporting the findings of this study are available from the corresponding author upon reasonable request*.

Supplemental material

Supplemental material for this article is available online.

Trial registration

The study protocol was registered at Open Science Framework (DOI 10.17605/OSF.IO/BHQJZ)

References

Ajzen

(1991) The theory of planned behavior. Organizational Behavior and Human Decision Processes 50: 179–211.

Anderson

Winett

Wojcik

(2007) Self-regulation, self-efficacy, outcome expectations, and social support: Social cognitive theory and nutrition behavior. Annals of Behavioral Medicine 34(3): 304–312.

Bandura

(1991) Social cognitive theory of self-regulation. Organizational Behavior and Human Decision Processes 50: 248–287.

Bandura

(1997) Self-Efficacy: The Exercise of Control. W H Freeman/Times Books/ Henry Holt & Co.

Black

(2013) Patient reported outcome measures could help transform healthcare. BMJ 346: f167.

Brod

Tesler

Christensen

(2009) Qualitative research and content validity: Developing best practices based on science and experience. Quality of Life Research 18(9): 1263–1278.

Byrd

Joseph

Gongora

, et al. (2023) Tell us what you really think: A think aloud protocol analysis of the verbal cognitive reflection test. Journal of Intelligence 11(4): 76.

de Rada

(2005) Influence of questionnaire design on response to mail surveys. International Journal of Social Research Methodology 8(1): 61–78.

Elo

Kyngäs

(2008) The qualitative content analysis process. Journal of Advanced Nursing 62(1): 107–115.

10.

Francis

Eccles

Johnston

, et al. (2004) Constructing questionnaires based on the theory of planned behaviour. A Manual for Health Services Researchers 2004: 2–12.

11.

French

Cooke

Mclean

, et al. (2007) What do people think about when they answer theory of planned behaviour questionnaires? A ‘Think Aloud’ study. Journal of Health Psychology 12(4): 672–687.

12.

Gehring

Lerret

Johnson

, et al. (2020) Patient expectations for recovery after elective surgery: A common-sense model approach. Journal of Behavioral Medicine 43(2): 185–197.

13.

Gibbons

Casañas

Comabella

Fitzpatrick

(2013) A structured review of patient-reported outcome measures for patients with skin cancer, 2013. British Journal of Dermatology 168(6): 1176–1186.

14.

Hyland

Whalley

Jones

, et al. (2015) A qualitative study of the impact of severe asthma and its treatment showing that treatment burden is neglected in existing asthma assessment scales. Quality of Life Research 24(3): 631–639.

15.

Jenkins

Gortner

(1998) Correlates of self-efficacy expectation and prediction of walking behavior in cardiac surgery elders. Annals of Behavioral Medicine 20(2): 99–103.

16.

Knafl

Deatrick

Gallo

, et al. (2007) The analysis and interpretation of cognitive interviews for instrument development. Research in Nursing & Health 30(2): 224–234.

17.

Kool

McGuire

Rosen

, et al. (2010) Decision making and the avoidance of cognitive demand. Journal of Experimental Psychology General 139(4): 665–682.

18.

Laferton

Shedden Mora

Auer

, et al. (2013) Enhancing the efficacy of heart surgery by optimizing patients’ preoperative expectations: Study protocol of a randomized controlled trial. American Heart Journal 165(1): 1–7.

19.

Lombard

Snyder-Duch

Bracken

(2002) Content analysis in mass communication: Assessment and reporting of intercoder reliability. Human Communication Research 28(4): 587–604.

20.

McHugh

Luker

Campbell

, et al. (2008) Pain, physical functioning and quality of life of individuals awaiting total joint replacement: A longitudinal study. Journal of Evaluation in Clinical Practice 14(1): 19–26.

21.

McIlroy

Bearne

Weinman

, et al. (2025) Identifying modifiable factors that influence walking in patients undergoing surgery for neurogenic claudication: A prospective longitudinal study. Scientific Reports 15(1): 4959.

22.

McIlroy

Jadhakhan

Bell

, et al. (2021a) Prediction of walking ability following posterior decompression for lumbar spinal stenosis. European Spine Journal 30(11): 3307–3318.

23.

McIlroy

Walsh

Sothinathan

, et al. (2021b) Pre-operative prognostic factors for walking capacity after surgery for lumbar spinal stenosis: A systematic review. Age and Ageing 50(5): 1529–1545.

24.

Mondloch

Cole

Frank

(2001) Does how you do depend on how you think you’ll do? A systematic review of the evidence for a relation between patients’ recovery expectations and health outcomes. CMAJ: Canadian Medical Association Journal 165(2): 174–179.

25.

Radoux

Waldner

Bogaert

(2020) How response designs and class proportions affect the accuracy of validation data. Remote Sensing 12: 1–22.

26.

Ranganathan

Caduff

Frampton

CMA

(2024) Designing and validating a research questionnaire : Part 2. Perspectives in Clinical Research 15(1): 42–45.

27.

Sekhon

Cartwright

Francis

(2022) Development of a theory-informed questionnaire to assess the acceptability of healthcare interventions. BMC Health Services Research 22(1): 279.

28.

Sniehotta

Schwarzer

Scholz

, et al. (2005) Action planning and coping planning for long-term lifestyle change: Theory and assessment. European Journal of Social Psychology 35(4): 565–576.

29.

Someren

Barnard

Sandberg

(1994) The think aloud method: A practical guide to modelling cognitive processes. (Knowledge-based systems). Academic Press.

30.

Stemler

(2004) A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research, and Evaluation 9: 1–19.

31.

Tomkins-Lane

Melloh

Lurie

, et al. (2016) ISSLS prize winner: Consensus on the clinical diagnosis of lumbar spinal stenosis: Results of an international Delphi Study. Spine 41(15): 1239–1246.

32.

Weinman

Petrie

Moss-Morris

, et al. (1996) The illness perception questionnaire: A new method for assessing the cognitive representation of illness. Psychology and Health 11(3): 431–445.

33.

Willis

(2005) Cognitive Interviewing. Sage Publications, Inc.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.03 MB