Abstract
Background
A single-item, global measure could be valuable for identifying patients with high treatment burden within clinical practice and research.
Aim
To validate a novel single-item global measure, named the ‘Global Treatment Burden Question’ (GTBQ).
Methods
GTBQ: “how hard have you found the work of looking after your health conditions?” (responses “not hard”, “slightly hard”, “moderately hard”, “very hard”, “extremely hard”). Included participants: 18-65 years, ≥2 long-term conditions. Baseline survey: GTBQ, socio-demographics, Multimorbidity Treatment Burden Questionnaire (MTBQ), quality of life, health literacy. Follow-up survey: GTBQ. Electronic health records data: long-term conditions, consultations. Spearman’s Rank correlation (Rs) and intraclass correlation coefficient (ICC) assessed construct validity and test-retest reliability. GTBQ performance was examined against global MTBQ scores to determine optimal thresholds.
Results
974 (mean age 51) and 97 participants returned baseline and follow-up surveys, respectively. Responses were positively skewed with 22% reporting no burden at baseline. GTBQ scores were positively associated with global MTBQ scores (Rs 0.70); weakly associated with health literacy (Rs 0.36) and healthcare use (Rs 0.27); and negatively associated with physical health (Rs –0.66) and mental health (Rs –0.65). ICC was 0.66. GTBQ demonstrated excellent ability to discriminate between high and non-high treatment burden (area under the curve 0.838). Applying a GTBQ threshold of ≥3 yielded specificity 90%, sensitivity 53%, positive-predictive-value 82%, and negative-predictive-value 69%, indicating utility in ‘ruling in’ high treatment burden.
Conclusion
This novel single-item measure has demonstrated good content and construct validity, moderate test-retest reliability, and strong ability to discriminate between high and non-high treatment burden.
Introduction
Treatment burden is the effort required of patients to look after their health and the impact this has on their everyday life. 1 This includes taking medication, monitoring health conditions (e.g. blood pressure or glucose monitoring), co-ordinating and attending appointments in different healthcare settings, arranging time off work to attend appointments and interacting with a range of healthcare professionals. High treatment burden is associated with poor health outcomes, including reduced quality of life and decreased concordance with medical treatment.2–4
Multimorbidity (multiple long-term conditions in the same individual) is common, with up to two-thirds of patients consulting in UK general practice having two or more long-term conditions. 5 High treatment burden is associated with increased number of long-term conditions.2,3,6,7 Healthcare services are often organised around specific disease areas rather than individual patients. People with multimorbidity are therefore required to attend separate appointments and often need multiple medical treatments to manage their cluster of conditions.
Patient reported outcome measures (PROMs) have been developed and validated to measure treatment burden in research settings.2,8–10 Short measures of treatment burden for use in everyday clinical practice are lacking, however. We are aware of several existing PROMs (the Patient Experience with Treatment and Self-management pilot tool (PETS-Now), the Multimorbidity Treatment Burden Questionnaire (MTBQ;10 and 13 item versions), the Instrument for Patient Capacity Assessment (ICAN) Discussion Aid, and the Morris Single-item Measure).2,11–14 These PROMS have the potential to be used in a clinical setting, but all have limitations.
The Patient Experience with Treatment and Self-management PETS-Now was developed in the United States (US) from the original PETS questionnaire (60-item) and Brief PETS (32-item) as a novel, computer-based tool for use in clinical practice.11,15,16 The tool was designed to be completed in a clinical setting using a computer tablet. The patient selects a single treatment burden domain they are finding most difficult from a list of eight domains (e.g. monitoring health, medicine, getting health info etc) and then answers questions related to that domain. PETS-Now was co-designed by patients, clinicians and academics, and was found to be acceptable. 11 Limitations of the tool are that patients can only report on one aspect of treatment burden, and it does not include a global treatment burden measure.
The Multimorbidity Treatment Burden Questionnaire (MTBQ) is a 10-item measure of treatment burden developed and validated in the UK.2,13 There is also an extended, 13-item version of the MTBQ which includes a further three optional questions. The questionnaire includes a range of treatment burden domains and produces a global score. This PROM demonstrated good content validity, construct validity, internal consistency reliability and responsiveness in the research setting. However, the questionnaire has not been validated for use in a clinical setting.
The ICAN tool was developed to aid discussion between patients and clinicians about the burden of looking after their health and their capacity to do this.12,17 Patients complete three questions: first, a checklist asking ‘are these areas of your life a source of satisfaction, burden, or both?’, second, an open ended question ‘what are the things that your doctors or clinic have asked you to do to care for your health... do you feel that they are a help, a burden, or both?’, and third, a space for free text comments. The clinician then uses one of three set opening questions to spark conversation based on the questionnaire responses, and asks in more detail ‘what stands out to you on this sheet you filled?’ The discussion aid was designed to support relationships between patients and their healthcare team, however, it has not undergone further testing or development for general clinical use.
In the UK, Morris et al. explored a novel single-item measure of treatment burden: ‘On a scale of 0–10, where 0 is no effort and 10 is the highest effort you can imagine, how would you rate the amount of effort you have to put in to manage your health conditions?’ 14 Study findings suggested the tool may have some utility in ruling out high treatment burden. While novel, the measure was not subject to any formal development processes and the authors described it as a ‘starting point’ for iterative work with a patient group.
We aimed to validate a novel single-item global measure of treatment burden, named the ‘Global Treatment Burden Question’ (GTBQ).
Methods
Study setting & design
The GTBQ was developed and validated as part of the ‘Supporting People to Live Well with Multiple Long-Term Conditions’ (SPELL) study, a multi-centre mixed methods study of treatment burden for adults aged 18 to 65 years with multimorbidity in the UK. 18 A key aim of the wider SPELL study was to explore treatment burden in younger adults (18-65 years) with multimorbidity – an under researched group. We performed a cross-sectional study to validate the GBTQ.
GTBQ development
The initial GTBQ was based on the validated MTBQ and the concept of a self-rated global score from the single-item measure.2,13,14 The GTBQ was designed with ease of patient and healthcare provider use in mind, intending to produce a validated global score and to generate discussion between patients and clinicians. Development included three rounds of cognitive interviews with adults with multimorbidity (15 interviews in total), with the GTBQ being refined after each round of interviews. The outcome of this iterative process was the GBTQ; a single question to screen for high treatment burden. A detailed account of the GBTQ development has been reported separately Ref to Development Paper.
Structure and content of GTBQ
The GTBQ comprises a short explanation of the concept of treatment burden and a single global rating question, “Thinking about the last three months, how hard have you found the work of looking after your health conditions?” Response options are “not hard”, “slightly hard”, “moderately hard”, “very hard” and “extremely hard”. The GTBQ, was developed as part of a PROM called the ‘Short Treatment Burden Questionnaire’ (STBQ). This includes the GTBQ (section 1) and two additional sections. Section 2: “Please tick any things you have found hard in the last three months” (list of 13 options). Section 3: free text question, “Is there anything you want to tell us about the things you find hard?” This paper focuses on validating section 1 (the GTBQ).
Study population, eligibility criteria and recruitment
Participants aged 18-65 with two or more long-term conditions (included in the Cambridge Multimorbidity Score) were recruited from 20 general practices serving a range of populations (e.g. socioeconomically deprived/affluent, rural/urban) in and around Bristol and Greater Manchester, UK. 19 Practices were sampled to ensure that areas of socio-economic disadvantage were represented, and more participants were invited from practices serving deprived populations (Index of Multiple Deprivation (IMD) deciles 1-5). 20 Staff from participating general practices identified potential participants through a pre-prepared search of the electronic health records (EHR). . Patients with dementia, those lacking capacity to consent, patients receiving palliative care, and nursing or care home residents were excluded.
Participants were offered a choice of online or postal survey for both the baseline and follow up surveys. Initially, patient participants were sent a paper invitation letter with study information. A paper copy of the questionnaire was enclosed, along with a QR code link to an online version for participants to complete according to their preference. One reminder was sent to non-responders by post, email, text or telephone. For the follow up questionnaire, an additional consent form was sent to participants alongside the baseline questionnaire, and participants were given a choice of online or paper versions.
A follow up survey was sent to a subsample of 180 participants from five of the original 20 practices, 1-4 weeks after their baseline surveys were returned. The responses generated from this subsample were used to measure test-retest reliability of the GTBQ. Participants were sent a £5 Love2Shop voucher for each completed questionnaire. Data collection for baseline and follow-up surveys was undertaken between March and July 2024.
Survey content
The baseline survey included socio-demographic information (age, gender, ethnicity, employment status and postcode); the STBQ (including the GTBQ), the 13-item MTBQ (comparator measure of treatment burden2,13; the Patient-Reported Outcomes Measurement Information System (PROMIS-10) measures of physical and mental health related quality of life 21 ; and the Single-item Literacy Screener (SILS) to detect limited health literacy). 22 The follow-up survey included only the STBQ.
Data from electronic GP records
Anonymised data were obtained from the electronic health records (EHR) of consenting participants, including age, sex, long-term conditions (from a list of 20 included in the Cambridge Multimorbidity Score; CMS) 19 and number of general practice consultations recorded in the preceding 12 months.
Multimorbidity was measured using the validated 20-item CMS. 19 For each participant, a score was assigned based on the presence of the condition: 1 for present and 0 for absent. Each condition was then assigned a weight using the ‘general outcome’ weighting. For instance, the weight for anxiety/depression (0.47) was multiplied by 1 if the condition was present. A CMS score was calculated for each participant by adding up the weights of their conditions.
Analysis
The analysis was performed in Stata (version 18). 23 Descriptive statistics were used to summarise participant characteristics. Psychometric properties of the GTBQ were tested against the International Society for Quality of Life Research (ISOQOL) standards. 24 The analysis plan and results are reported in reference to the six ISOQOL recommended standards:
Conceptual and measurement model
Conceptual framework
Please see the development of the questionnaire section.
Question properties
Question properties were assessed by examining the distribution of responses to the GTBQ, and the proportion of missing responses at baseline.
Dimensionality
Not applicable as the GTBQ is a single-item PROM.
Reliability
To assess test–retest reliability, we calculated the intraclass correlation coefficient (ICC) to assess the agreement (along with the 95% confidence interval) between the GTBQ scores at baseline and follow-up. 25 Values of <0.50, 0.50-0.74,0.75-0.89, and ≥0.90 indicate poor, moderate, good, and excellent reliability, respectively. 25 Follow-up surveys completed within six weeks of the baseline survey were included in the analysis.
Validity
Content validity
The content validity of the GTBQ was assessed via 15 cognitive interviews in the iterative development stage of constructing the questionnaire. A detailed account of the GTBQ development has been reported separately. 26
Construct validity
A GTBQ global score was generated by assigning a numerical value between 0 to 4 to the response selected by the study participant. Scoring was as follows: 0 “not hard”, 1 “slightly hard”, 2 “moderately hard”, 3 “very hard” and 4 “extremely hard”.
Construct validity was examined by testing five pre-specified hypotheses: first, a positive association between GTBQ and MTBQ scores2,13; second, a positive association between GTBQ score and number of general practice appointments in the preceding 12 months; third, a negative association between GTBQ score and health literacy (SILS) 22 ; fourth, a negative association between GTBQ score and physical health (PROMIS-10 Global Physical Health) 21 ; and fifth, a negative association between GTBQ score and mental health (PROMIS-10 Global Mental Health). 21 To test these hypotheses, we applied Spearman’s rank correlation (Rs).
Responsiveness
Assessing responsiveness was beyond the scope of this study.
Interpretability and test performance
To assess the interpretability of the GTBQ, the sensitivity, specificity, positive predictive value, and negative predictive value at each score were calculated. A global MTBQ score of ≥22 was used as a reference standard for high treatment burden. 2 A receiver operating characteristic (ROC) curve was generated, and the area under the curve (AUC) was calculated to assess the effectiveness of the GTBQ at discriminating between high and non-high treatment burden.
We dichotomized the global score into non-high treatment burden (GTBQ 0-2) and high treatment burden (GTBQ 3-4). This binary enables clear identification of those who have high treatment burden compared to those who do not. We then summarized the participant characteristics and key outcome variables, including number of long-term conditions, across these two categories.
Translation
The GBTQ was only administered in English in this study.
Demands on patient respondents and investigators
Demands on patient respondents were assessed during the cognitive interviews. Demands on investigators were not formally assessed given the minimal response processing required to generate a global score.
Sample size
Sample size calculations were conducted to ensure that the assessment of test-retest reliability would yield an interval estimate with adequate precision, rather than aiming for a specific hypothesis test power. 27 To achieve a 95% confidence interval (CI) with a width of 0.2 (i.e., ranging from 0.6 to 0.8) for an ICC of 0.7, 101 participants were needed to complete both the baseline and follow-up questionnaires.
Power calculations were performed to determine a baseline questionnaire sample size for the original SPELL study. This was an exploratory study without one specific hypothesis. The calculations were based on a total sample size of 1,000, a significance level of 0.05, and a binary exposure where the prevalence of that exposure was allowed to vary from 0.2 to 0.5. The baseline risk of high burden was 20% with a risk of high burden of 30% in the group with the exposure. This gave an observed risk of high burden in the population close to what was reported in the original MTBQ paper (26.6%).
Patient and public involvement (PPI)
A PPI group, consisting of eight people with lived experience of multimorbidity, was established at the outset of the SPELL study. The members of this group contributed to the development of the research questions, design of the study and creation of study documents including the initial, interim and final versions of the GTBQ.
Ethical approval and data sharing
Participants gave informed consent to participate in the study before taking part. The SPELL study has ethical approval from the London – Westminster Research Ethics Committee (REC reference 22/PR/1750). Finalised, tabulated data will be freely available as a Final Study Report from the University of Bristol Research Portal.
Results
8532 eligible patients were invited to participate in the baseline survey, resulting in 974 participants (11% response rate) included in the analysis, of whom 968 completed the global GTBQ. Of these, 180 individuals were invited to complete the follow-up survey. 107 completed follow-up surveys were returned, with 97 included in final analysis after removing duplicate responses.
Participant characteristics
Baseline participant characteristics and GTBQ scores.
*Most common 20 long-term conditions included in the Cambridge Multimorbidity Score 19 .
Conceptual and measurement model
Conceptual framework
The GTBQ was adapted from the original validated MTBQ.2,13 The focus of this paper is to validate the GTBQ global rating score. The STBQ was developed with three rounds of five cognitive interviews comprising a think-aloud task and prompts leading to modifications of the STBQ at each round. The GBTQ is a single item question designed to screen for high treatment burden, and a detailed description of the development of the GTBQ and wider STBQ has been reported separately. 26
Question properties
The GTBQ was completed by 968 SPELL study participants (99.4%). The GTBQ scores were positively skewed, with 22% of participants reporting no burden (GTBQ score 0) at baseline.
Dimensionality
Not applicable.
Reliability
The ICC for agreement between GTBQ scores at baseline and follow-up was 0.66 (95% CI 0.53 - 0.76). This indicates moderate test-retest reliability. 25
Validity
Content validity
A detailed account of the GTBQ development has been reported separately.
26
The STBQ underwent iterative development in response to discussion with the PPI group and through cognitive interviews. The 5-point Likert scale for the GBTQ was discussed with the wider research team and PPI group after issues were identified during the cognitive interviews. The response options were revised for clarity (Figures 1 and 2). As an example, the response option “a little” was changed to “slightly” while “quite” became “moderately”. Global Treatment Burden Question (GTBQ) final version. Participant flow diagram.

Construct validity
Correlations between GTBQ score and global MTBQ score, health literacy, healthcare use and health related quality of life.
Responsiveness
Not applicable.
Interpretability and test performance
The receiver operating characteristic (ROC) curve is shown in Figure 3. The area under the curve was 0.838 (95% CI =0.814-0.862), indicating excellent ability to discriminate between high and non-high treatment burden. Applying a GTBQ threshold of ≥3 for high treatment burden yielded a specificity of 90%, sensitivity of 58%, positive predictive value of 82% and negative predictive value of 69% (Table 3). Setting a GTBQ threshold of ≥2 yielded a specificity of 71%, sensitivity of 84%, positive predictive value of 71% and negative predictive value of 84%. Receiver operating characteristic curve for the GBTQ. Sensitivity, specificity, predictive values and likelihood ratios of GTBQ scores to predict high treatment burden (a global score of ≥22 on the 13-item MTBQ).
Comparison of characteristics between participants with non-high and high treatment burden.
Translation
Not applicable.
Demands on patient respondents and investigators
The GTBQ is a single-item global measure of treatment burden. Response options for the global question (GBTQ) were presented in both words and pictures (smiley faces). Missing data were minimal with 0.6% baseline survey respondents not returning a completed baseline GTBQ score. Demands on investigators were also intended to be minimal, with no calculations required to generate a GTBQ score. The Short Treatment Burden Questionnaire (STBQ) comprises the GTBQ and two additional questions and fits onto a single A4 page.
Discussion
Summary of findings
We have validated a novel, single-item global measure of treatment burden, named the ‘Global Treatment Burden Question’ (GTBQ), designed for use in clinical practice and research. The psychometric properties of the GTBQ meet the standards set out by ISOQOL for a novel PROM, demonstrating good content validity, construct validity, interpretability and moderate test-retest reliability. 24 Importantly, the GTBQ’s ability to discriminate between high and non-high treatment burden, based on the area under the curve, was excellent.
Applying a GTBQ threshold of ≥3 generated high specificity (90%) and positive predictive value (82%), indicating utility as a screening tool to identify patients with high treatment burden (low chance of false positives). Setting a GTBQ threshold of ≥3 yielded a moderate sensitivity (53%), indicating that some patients with high treatment burden would be missed (false negatives). The GTBQ is designed as a ‘rule in’ screening tool to detect high treatment burden. The implication of a sensitivity of 53% is that some patients with high treatment burden would go undetected and therefore not receive appropriate support. This could also lead to underestimation of the number of people with high treatment burden.
Applying a threshold of ≥2 improved sensitivity, but at the expense of poorer specificity, though the positive predictive value and negative predictive value remained high. Therefore, if using the GTBQ as a ‘rule in’ screening tool, we recommend applying a threshold of ≥3 to distinguish between high and non-high treatment burden. For simplicity, it is helpful to dichotomise the scale to identify those who have high treatment burden compared to those who do not. However, it may also be helpful to think of the five levels of the GTBQ as no burden, low burden, medium burden, high burden and very high burden.
Our descriptive analysis found that younger individuals living in more deprived areas with higher levels of multimorbidity (particularly painful or mental health conditions) were more likely to report high treatment burden. Further in-depth analysis of these data is planned (to be reported separately). These findings align with the existing literature. The association between younger age and high treatment burden is consistent across several studies using different PROMs.2,3,7,14 One explanation for this is that due to work and childcare commitments, younger adults may have less capacity to manage the work of looking after their health. The original MTBQ study did not find an association between high treatment burden and deprivation, 2 but this has been reported in the US. 16 Several studies using different PROMs report an association between high treatment burden and multimorbidity.2,3,7,14
A strength of this study is that we have validated the GTBQ in a large sample of individuals with multimorbidity. Our sample included a high proportion of individuals living in disadvantaged areas and a proportion of individuals from global majority ethnicities more closely reflective of the UK population than previous work in this area. 28 This improves representation of a diverse population and subsequent generalisability. The GTBQ was developed using a robust process, involving cognitive interviews with people with multimorbidity. A key aim was for the PROM to be simply worded and user friendly. The full STBQ (including the GTBQ and two optional questions) fits on a single A4 page.
A limitation is that the tool has been validated only in adults aged 18-65. This age range was chosen deliberately due to limited research in younger adults with multimorbidity, the focus of the wider SPELL study. 18 However, the GTBQ is designed to be used by adults of any age and it would be useful to validate it in older adults, as well as people with single long-term conditions. A further limitation is that we were unable to assess responsiveness to change. A final limitation is the low response rate. This may limit the generalisability of the study findings as those who took part may differ from those who did not take part. Other primary care and PROM validation studies have also reported low response rates of around 20-35%13,16,29 but not quite as low as this study (11%). One explanation is that response rates in the UK are known to be lower in individuals living in more disadvantaged areas 30 and, to increase representation, we purposefully invited more people from practices serving disadvantaged populations.
Conclusion
The GTBQ is a single-item, global measure of treatment burden that has demonstrated excellent ability to discriminate between high and non-high treatment burden. The brevity of the measure is a strength, particularly for use in clinical practice and trials with multiple outcome/*-s. The GTBQ was developed through a robust and iterative process and validated in a cross-sectional sample of adults with multimorbidity. It has demonstrated good content validity, construct validity, interpretability and moderate test-retest reliability. Further work is planned to assess the feasibility of using the GTBQ and STBQ (with two additional questions) in clinical practice.
Footnotes
Acknowledgements
The study team extends appreciation to the participating general practices and their patients for taking part in the study and to the patient and public involvement group members for their valuable perspectives into the study design. We would also like to thank Dr Michael Lawton from Bristol Centre for Academic Primary Care for his statistical advice.
Ethical considerations
Participants gave informed consent to participate in the study before taking part. The SPELL study has ethical approval from the London – Westminster Research Ethics Committee (REC reference 22/PR/1750). Finalised, tabulated data will be freely available as a Final Study Report from the University of Bristol Research Portal.
Consent to participate
Written informed consent was obtained from all participants in the study at baseline and follow up.
Consent for publication
Consent for publication was not applicable because the manuscript does not contain any individual person’s data in any form.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by the NIHR School for Primary Care Research (Grant Reference Number 564) and the South West GP Trust Ref: 2584159.
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: PD developed and validated the original Multimorbidity Treatment Burden Questionnaire (MTBQ).
