Validation of the Mandarin-Chinese Teacher Well-Being Scale and Development of a Chinese Context-Specific Extension

Abstract

This study validated the Mandarin-Chinese Teacher Well-Being Scale (TWBS) and developed a context-specific extension (CS-TWBS) to capture culturally grounded dimensions of teacher well-being in China. Two studies were conducted with samples of in-service and former Chinese teachers. Study 1 evaluated the psychometric properties of the translated TWBS using exploratory and confirmatory factor analyses, reliability testing, and measurement invariance analyses. A bifactor model provided the best fit, supporting a predominantly unidimensional structure reflecting general teacher well-being. Study 2 developed and validated the CS-TWBS through cognitive interviews and psychometric testing. The CS-TWBS showed strong internal consistency, clear factorial structure, and expected associations with flourishing, burnout, and job stress. Together, the Mandarin-Chinese TWBS and CS-TWBS provide psychometrically sound and culturally appropriate instruments for assessing teacher well-being in Chinese educational settings and for informing targeted interventions.

Keywords

teacher well-being scale validation Chinese context psychometric properties cultural adaptation cross-cultural assessment

Introduction

Teacher well-being has attracted increasing scholarly attention due to its association with both teacher functioning and student development. At the teacher level, teacher well-being has been linked to greater job satisfaction, lower occupational stress, reduced burnout, and lower intentions to leave the profession (Collie, 2023; Gao et al., 2025; Klassen & Chiu, 2010). These outcomes are important given growing concerns about teacher workload and retention across many education systems. Beyond teacher outcomes, extensive research has shown that teacher well-being also has implications for classroom processes and student learning. Teachers who experience higher well-being tend to demonstrate greater instructional quality and teaching effectiveness, and report more supportive relationships with students (Aldrup et al., 2018; Harrison et al., 2023; Klusmann et al., 2008). These classroom practices are in turn associated with higher levels of student engagement, improved academic achievement, and stronger social-emotional development (Granziera et al., 2023; Jennings & Greenberg, 2009; Klusmann et al., 2016; Wang et al., 2021). Such patterns have been documented across school levels. In primary school settings, teacher well-being plays an important role in shaping early teacher-student relationships and students’ socio-emotional adjustment (Aldrup et al., 2018; Jennings & Greenberg, 2009). Within the secondary school context, as academic pressures intensify and the role of teachers becomes increasingly professionalised, the correlation between teacher well-being and both teaching quality and student academic outcomes has grown significantly stronger (Granziera et al., 2023; Klusmann et al., 2008). In primary and secondary contexts, teacher well-being serves as an upstream condition that shapes the quality of the educational environment experienced by students. This perspective has been thoroughly substantiated in the literature across cultures (Granziera et al., 2023; Wang et al., 2021), which highlights the importance of accurately measuring teacher well-being.

Contemporary frameworks conceptualise teacher well-being as multidimensional flourishing that extends beyond the absence of stress and burnout (Diener et al., 2010; Ryan & Deci, 2001; Ryff, 1989; Seligman, 2011). These frameworks emphasise positive functioning, including individuals’ sense of purpose, meaningful engagement in valued activities, and satisfaction of basic psychological needs. In educational contexts, such perspectives suggest that teacher well-being reflects not only the absence of negative states such as stress or exhaustion, but also teachers’ experiences of professional fulfilment, engagement in teaching, and feelings of competence, autonomy, and connectedness within school environments (Curren et al., 2024; Ryan & Deci, 2001).

Despite the established importance of teacher well-being, measuring it remains challenging. A systematic review by Fox et al. (2023) identified considerable heterogeneity in both the conceptualisation and operationalisation of teacher well-being, with many studies relying primarily on deficit-oriented constructs such as stress or burnout. As a result, existing measures often capture negative aspects of teachers’ work experience while overlooking broader dimensions of positive functioning. In response to this gap, several teacher-specific instruments have been developed to assess well-being within the professional context of teaching. One widely used instrument is the Teacher Well-Being Scale (TWBS; Collie et al., 2015), which assesses three dimensions of teacher well-being: workload well-being, organisational well-being, and student interaction well-being. By focusing on teachers’ experience within their professional environment, the TWBS provides a contextually grounded assessment of teacher well-being and has demonstrated strong reliability (α = .82–.85), good model fit (CFI = .94, RMSEA = .055), and expected associations with related constructs such as job stress, job satisfaction, and general well-being (r = −.49, .65, .48).

However, the experience and expression of well-being may vary across contexts (Diener et al., 2010). Cross-cultural evidence suggests that societal values such as power distance, emotional expression norms, and valuation of individual autonomy or collective harmony can influence both the experience of well-being and the ways individuals interpret and respond to self-report measures (Huppert & So, 2013; Ryan & Deci, 2001). Consequently, instruments developed in one cultural context may not fully capture culturally specific dimensions of well-being when applied in different settings.

These considerations are particularly relevant in Chinese educational contexts. Cultural traditions rooted in collectivist orientations prioritising social harmony and interdependence, while Confucian values emphasise teachers’ moral responsibility and devotion to students (Gao et al., 2025). Such cultural influences may shape how teachers perceive their professional roles and sources of fulfilment in teaching, including relationships with students and colleagues, as well as expectations regarding work commitment. Cultural norms may therefore influence not only the experience of teacher well-being but also how teachers interpret and respond to well-being survey items. Consequently, instruments developed in Western contexts may not fully capture culturally relevant dimensions of teacher well-being when applied in Chinese settings.

Research on teacher well-being in China expanded considerably over the past two decades. Much of this literature, however, has been framed through deficit-oriented perspectives focusing primarily on occupational stress and burnout (Cheng et al., 2023). Studies consistently report high levels of emotional exhaustion among Chinese primary and secondary school teachers, often attributed to factors such as examination-driven accountability systems (Zhao et al., 2015), heavy workloads, and extensive administrative demands associated with ongoing educational reforms (Cheng et al., 2023; Gao et al., 2025). While this body of work has provided important insights into the challenges faced by Chinese teachers, emerging studies have begun to highlight positive aspects of teacher well-being. For example, professional identity rooted in Confucian ideals of the teacher as a moral educator, supportive collegial relationships, and strong commitment to students’ development have been identified as meaningful sources of flourishing (Gao et al., 2025). Nevertheless, measurement approaches in this context have largely relied on instruments adapted from Western burnout or stress frameworks, raising questions about whether existing tools adequately capture the broader and culturally embedded dimensions of well-being experience by Chinese teachers.

Current Research

The present study comprised two complementary phases aimed at validating and extending teacher well-being measurement in the Chinese educational contexts. Study 1 adapted and validated the TWBS for Chinese teachers through forward-backward translation and cognitive interviews (n = 12) to evaluate item appropriateness and identify culturally salient dimensions. Online surveys (N = 653) assessed internal consistency and factorial structure through exploratory factor analysis (EFA) and confirmatory factor analysis (CFA), and measurement invariance across demographic groups. Study 2 (N = 653) developed the Context-Specific Teacher Well-Being Scale (CS-TWBS) based on cognitive interview responses, generating 11 items that collectively capture context-specific concerns identified by Chinese teachers (e.g., compensation, work-life balance, policy implementation, professional status, parental involvement, psychological support) as a unidimensional supplement to the TWBS. Study 2 examined the psychometric properties and factorial structure of CS-TWBS, and its associations with job stress, burnout, and flourishing. Together, these studies provide comprehensive psychometric evidence for a culturally responsive teacher well-being assessment in the Chinese context and contribute to theoretical understanding and practical measurement of teacher well-being across diverse cultural contexts.

Study 1: Validation of the Mandarin-Chinese Teacher Well-Being Scale

Method

Study 1 translated, culturally adapted, and validated the Teacher Well-Being Scale (TWBS; Collie et al., 2015) for use with Chinese teachers through: (a) translation and cultural adaptation via forward-backward translation and cognitive interviews, and (b) psychometric evaluation through a large-scale survey.

Cross-Cultural Translation and Adaptation

Translation and cultural adaptation of the Teacher Well-Being Scale (TWBS) followed a structured two-stage process designed to ensure semantic, conceptual, and cultural equivalence between the English and Mandarin-Chinese versions.

Three bilingual translators were involved across the two translation phases, following the forward-backward translation approach outlined by Brislin (1970). Two independent forward translations were first produced by bilingual translators who were native Mandarin Chinese speakers with advanced English proficiency and academic background in educational psychology; one of the two forward translators additionally had expertise in psychometric scale development and validation. Each translator independently translated all 16 TWBS items from English into Mandarin Chinese. The two translated versions were then compared item by item. Differences in wording, nuance, and implied meaning were discussed until consensus was reached on a reconciled version. Of the 16 items, 14 were considered equivalent across the two forward translations without modification, while two items required minor revisions following discussion to ensure semantic accuracy and naturalness of expression in Chinese. All decisions regarding item wording were documented throughout this process. Formal interrater agreement statistics (e.g., kappa) were not computed, as translation decisions were reached through structured consensus discussion rather than independent coding.

A third independent bilingual translator, with expertise in Chinese-English translation and interpretation, who had not been involved in the forward translation and was blinded to the original English version, back-translated the preliminary Mandarin-Chinese TWBS into English. All three translators then reviewed the original and back-translated version item by item to assess equivalence. No substantial discrepancies were identified at this stage. Decisions at each stage were documented in writing.

Following translation, cognitive interviews were conducted to evaluate item comprehension, cultural appropriateness, and response processes at the individual respondent level. Twelve Chinese teachers participated in the cognitive interview stage, consistent with recommendations that cognitive pretesting typically involves small samples (e.g., 5 to 15 participants) sufficient to identify comprehension issues in survey instruments (Willis, 2005). Participants indicated a preference for written asynchronous responses, which also offered the practical advantage of reducing evaluation apprehension that may arise when discussing workplace well-being (Tourangeau & Yan, 2007). Participants received the translated TWBS, a structured response booklet with a demographics form, via a Microsoft Word document and provided written responses.

The structured questions addressed five domains: (a) overall representativeness of the scale for Chinese teachers’ experiences, (b) identification of confusing items, (c) evaluation of item phrasing alignment with Chinese linguistic conventions and cultural norms, (d) identification of missing dimensions important to teacher well-being in the Chinese context, and (e) general suggestions regarding scale content and wording. Participant responses were reviewed and thematically examined by the research team to identify potential comprehension or wording problems. Two items were identified as presenting comprehension difficulties due to ambiguous phrasing in the Chinese context. These items were revised to improve clarity while maintaining the original semantic content. The remaining 14 items required no modification. The full bilingual Mandarin-Chinese TWBS is presented in Appendix A.

Responses to question (d) additionally identified several context-specific concerns not captured by the original TWBS, including work-life balance (n = 5 participants), implementation of educational policies (n = 4), school-provided support (n = 4), and salary and benefits (n = 2). These themes informed the development of context-specific items in Study 2, where they were operationalised as a unidimensional measure (see Study 2 Method).

Participants and Procedure

Cognitive Interview

Twelve Chinese teachers (n = 12; 9 female, 3 male; ages 21–50 years), including eight in-service teachers (66.7%) and four former teachers (33.3%) participated. Participants taught (or used to teach) at primary (n = 5), middle (n = 3), or high schools (n = 4), mostly in public (n = 10) and urban settings (n = 10). Teaching experience averaged 6.75 years (SD = 4.40) for in-service teachers and 3.25 years (SD = 3.30) for former teachers. Participants were recruited using snowball sampling through existing contacts; additional recruitment occurred through intermediaries to protect privacy. Participants received materials (translated TWBS, structured questions, demographics) via a Microsoft Word document and provided written responses asynchronously.

Online Survey

Participants were 653 Chinese teachers (559 in-service teachers, 94 former teachers) recruited via Xiaohongshu (Rednote), a social media platform enabling access to diverse teacher populations. Twelve participants initially indicated in-service status but also reported a non-zero number of years since leaving the profession, were recoded as former teachers after data screening. The sample included 201 male teachers (30.8%) and 452 female teachers (69.2%), with ages ranging from 21 to over 60 years (93.1% aged 40 years or younger). Teaching experience ranged from less than 1 year to over 30 years. 404 teachers (61.9%) had 5 or fewer years of experience, 173 (26.5%) had 6 to 10 years, and 56 (8.6%) had 11 or more years of experience. One participants did not report teaching experience, and nineteen cases were identified with logically inconsistent values (e.g., reported teaching experience of 30 years for a participant aged 21-30); all twenty were coded as missing and excluded from teaching experience analyses. Participants teach at primary (n = 317, 48.5%), middle (n = 242, 37.1%), or high schools (n = 94, 14.4%), mostly in public (n = 582, 89.1%) and urban (n = 462, 70.8%) settings.

Former teachers were asked to report how long they had left teaching. Seven former teachers did not provide this information. Among those who did (n = 87), the average time since leaving teaching was M = 1.80 years (SD = 1.32, range from 0.08 to 7 years), indicating that the majority had left the profession relatively recently. Former teachers were instructed to complete the TWBS retrospectively, reflecting on their well-being during their final year of teaching. This inclusion of former teachers allowed examination of whether the scale functions similarly across teaching status. Table 1 shows the sociodemographic information.

Table 1.

Demographic Characteristics of Participants

Variable	Category	Sample 1 (n = 12)	Sample 2 (n = 653)	Retest subsample (n = 168)
Gender	Male	3 (25.0%)	201 (30.8%)	20 (11.9%)
Gender	Female	9 (75.0%)	452 (69.2%)	148 (88.1%)
Age range	21–30 years	3 (25.0%)	392 (60.0%)	136 (81.0%)
	31–40 years	8 (66.7%)	216 (33.1%)	26 (15.5%)
	41–50 years	1 (8.3%)	31 (4.7%)	5 (3.0%)
	51–60 years	–	10 (1.5%)	1 (0.6%)
	>60 years	–	4 (0.6%)	–
Teaching status	In-service	8 (66.7%)	559 (85.6%)	147 (87.5%)
	Resigned	4 (33.3%)	85 (13.0%)	20 (11.9%)
	Retired	–	9 (1.4%)	1 (0.6%)
Teaching experience (years)	In-service	6.75 (SD = 4.40)	5.40 (SD = 3.99)	4.55 (SD = 5.91)
Teaching experience (years)	Former	3.25 (SD = 3.30)	8.12 (SD = 9.00)	5.70 (SD = 7.18)
Years out	Former	4.75 (SD = 4.50), n = 4	1.80 (SD = 1.32), n = 87	0.14 (SD = 0.19), n = 9
School level	Primary	5 (41.7%)	317 (48.5%)	92 (54.8%)
	Middle	3 (25.0%)	242 (37.1%)	47 (28.0%)
	High	4 (33.3%)	94 (14.4%)	29 (17.3%)
School type	Public	10 (83.3%)	582 (89.1%)	146 (86.9%)
School type	Private	2 (16.7%)	71 (10.9%)	22 (13.1%)
School region	Urban	10 (83.3%)	462 (70.8%)	130 (77.4%)
School region	Rural	2 (16.7%)	191 (29.2%)	38 (22.6%)

Note. The retest subsample (n = 168) was drawn from Sample 2. Percentages are based on valid responses within each sample. For Sample 2, valid responses for teaching experience were provided by 633 participants (96.9%); 1 excluded due to missing teaching experience; 19 excluded due to logically inconsistent values; all coded as missing. Seven participants originally identified as in-service but reported a nonzero number of years since leaving and were reclassified as former teachers. Chi-square tests comparing the retest subsample to the full sample indicated no significant differences for teaching status (χ² = 1.11, p = .573), school type (χ² = 0.86, p = .352), or school region (χ² = 4.38, p = .036). However, the retest subsample included a significantly higher proportion of females (χ² = 36.64, p < .001) and younger teachers (χ² = 41.85, p < .001), with less teaching experience (t = 3.03, p = .003), likely reflecting voluntary self-selection into the follow-up survey. Dashes indicate categories not applicable to Sample 1.

Data were collected from July to September 2025 using Wenjuanxing (www.wjx.cn). The research received ethical approval from the University of Nottingham Research Ethics Committee. All participants provided informed consent.

Test-Retest Reliability

Participants who completed the initial survey were invited to voluntarily complete the Mandarin-Chinese TWBS a second time approximately 2 weeks (14 days) after their first response. The retest survey link was sent to each participant individually 14 days after their initial response, and most participants completed the retest within 1 day of receiving the link. This interval was selected to minimise memory effects while remaining short enough to capture stable well-being scores rather than genuine change (Koo & Li, 2016). To facilitate matching across the two administrations, participants who volunteered for the retest provided their email address at the end of both surveys; email addresses were used solely for matching purposes and were removed from the dataset upon completion of matching to protect participant privacy.

Data Screening

A total of 679 responses were initially received. Data were screened for quality and completeness before analysis. 26 (3.8%) cases were excluded for: (a) completion time was less than 2 minutes (n = 14), suggesting careless or insufficient effort responding (Curran, 2016); (b) suspected duplicate submissions (n = 6); or (c) straightlining (n = 6; Herzog & Bachman, 1981), resulting in 653 valid cases retained for analyses. The overall exclusion rate of 3.8% (26/679) is within acceptable ranges for online survey research (Meade & Craig, 2012).

Additionally, teaching experience data were cross-referenced with reported age to identify implausible values. One participant did not report teaching experience, and nineteen cases were identified with logically inconsistent values. These cases were coded as missing for teaching experience analysis but retained for other analyses where data were valid.

For the test-retest reliability assessment, 172 responses were initially received. Four cases were excluded during the first-stage survey data screening due to quality concerns identified above. The corresponding cases were also removed from the retest dataset to ensure only matched pairs were analysed. This resulted in a final sample of 168 responses used in the test-retest reliability analysis.

Measures

Participants completed a self-report questionnaire that included sociodemographic questions, the Mandarin-Chinese TWBS, a job stress questionnaire, the Simplified Chinese Version of Flourishing Scale, the Primary and Secondary School Teachers’ Job Burnout Questionnaire, and the Chinese Short Form Marlowe-Crowne Social Desirability Scale. Descriptive statistics for all scales are presented in Table 2.

Table 2.

Descriptive Statistics for All Variables

Variables	M	SD	Skewness	Kurtosis	α	CR
Teacher well-being
Overall teacher well-being	4.83	1.10	−.549	−.314	.95	.95
Workload well-being	4.60	1.22	−.410	−.534	.87	.88
Organisational well-being	4.84	1.20	−.676	−.122	.90	.91
Student interaction well-being	5.13	1.20	−.832	.337	.84	.85
Flourishing	5.34	1.16	−1.21	1.12	.93	.93
Job stress	3.34	0.90	−.035	−.360	–	–
Burnout
Total burnout	1.99	1.08	.316	−.562	.94	.92
Emotional exhaustion (EE)	3.03	1.72	.121	−1.07	.97	.97
Personal accomplishment (PA)	1.45	1.09	1.078	.750	.92	.92
Depersonalisation (DP)	1.48	1.23	.932	.226	.84	.84
Social desirability	8.36	3.16	−.253	−.721	.76	–

Note. M = mean; SD = standard deviation; α = Cronbach’s alpha; CR = composite reliability; skewness and kurtosis values were within acceptable ranges for most variables; three variables (Flourishing, Emotional Exhaustion, Personal Accomplishment) showed values slightly exceeding ±1 but within ±2, which remains acceptable for MLR estimation (Kline, 2023). For the total burnout score, items from the personal accomplishment subscale were reverse-coded so that higher values reflected higher burnout. Higher scores on emotional exhaustion and depersonalisation indicate higher burnout, whereas higher scores on personal accomplishment indicate lower burnout when reported as a subscale. Job Stress was measured using a single-item scale; therefore, α and CR were not applicable.

The Teacher Well-Being Scale

The TWBS consists of 16 items assessing teachers’ workplace experiences across three dimensions: workload well-being (6 items, e.g., “marking work”), organisational well-being (6 items, e.g., “relations with administrators at my school”), and student interaction well-being (4 items, e.g., “student behaviour”). Items are rated on a 7-point Likert scale ranging from 1 (negatively) to 7 (positively), with higher scores indicating greater teacher well-being. The original TWBS demonstrated strong reliability: α = .82–.85 for subscales, and .89 total (Collie et al., 2015). The Mandarin-Chinese version was developed through the translation and adaptation process described above. Internal consistency in the current sample was excellent (see Table 2).

External Constructs

Three external constructs were examined to evaluate convergent and discriminant patterns: flourishing, job stress, and burnout. Scales used to assess these constructs have shown evidence of reliability and validity (e.g., Boyle et al., 1995; Diener et al., 2010; Tang et al., 2016; Wu et al., 2016). Flourishing was selected as a convergent indicator because teacher well-being, as a positive psychological construct, is theoretically expected to be positively associated with broader eudaimonic well-being (Collie et al., 2015; Diener et al., 2010). The Simplified Chinese Version of Flourishing Scale (FS; Diener et al., 2010; Tang et al., 2016) was used to measure flourishing. The FS comprises 8 items rated on a 7-point Likert scale (1 = strongly disagree to 7 = strongly agree), with higher scores indicating greater psychological well-being. Tang et al. (2016) reported α = .93; in the current sample, α = .93. Job stress was selected because it was used in the original TWBS validation and showed a moderate negative correlation with teacher well-being (r = −.49; Collie et al., 2015). Job stress was measured with a single item: “In general, how stressful do you find being a teacher?” (Boyle et al., 1995), rated from 1 (not stressful) to 5 (extremely stressful), with higher scores indicating greater stress.

For burnout, although related to well-being, it represents a conceptually distinct construct, and prior research has consistently demonstrated negative associations between teacher well-being and burnout (Granziera et al., 2023); therefore, it was selected as a discriminant indicator. The 22-item Chinese Primary and Secondary School Teachers’ Job Burnout Questionnaire (Wu et al., 2016), a Chinese adaptation of the Maslach Burnout Inventory-Educators Survey (MBI-ES; Maslach & Jackson, 1981), was used to measure three dimensions of burnout: emotional exhaustion, personal accomplishment, and depersonalisation. Items are rated from 0 (never) to 6 (daily), with higher scores indicating greater burnout. Wu et al. (2016) reported α = .75–.90 across subscales; in the current sample, α = .84–.97 across subscales.

Social Desirability

The 14-item Chinese Short Form Marlowe-Crowne Social Desirability Scale (Crowne & Marlowe, 1960; Tao et al., 2009) was included to assess whether socially desirable responding substantially influenced self-reported well-being scores. Items are answered dichotomously (Yes/No), with higher scores indicating a greater tendency toward socially desirable responding. Tao et al. (2009) reported α = .66; in the current sample, α = .76.

Demographics

Participants reported their gender, age range, teaching status (in-service, resigned, or retired), teaching experience, school level (primary, middle, or high school), school type (public or private), and school region (urban or rural). Former teachers additionally reported the time since leaving the profession.

Data Analysis

Study 1 examined the psychometric properties of the Mandarin-Chinese TWBS through multiple sources of validity evidence. Consistent with contemporary validity theory, which conceptualises validity as a unitary concept referring to the degree to which evidence supports the intended interpretation of test scores for a proposed use (American Educational Research Association, 2014), validity evidence was gathered from multiple sources: response processes, internal structure, and relations to other variables. Internal consistency and test-retest reliability were also examined. All statistical analyses were conducted using IBM SPSS Statistics 29.0 and R 4.5.0.

Data Screening

Data were screened for normality, outliers, and missing values. Distributional assumptions were examined through skewness and kurtosis values, with values within ±2 considered acceptable for normality (Kline, 2023). Cases with more than 20% missing data on key variables were excluded. All scale items were completed by every participant, as the survey platform required responses to all items before proceeding. Missing data occurred only on demographic variables: seven former teachers did not provide years since leaving the profession, and twenty cases of missing or inconsistent teaching experience data. Full information maximum likelihood (FIML) estimation was therefore applied to handle these missing demographic values in the relevant analyses (Enders & Bandalos, 2001).

Internal Consistency

Cronbach’s alpha (α) and composite reliability (CR) assessed internal consistency, with Cronbach’s alpha coefficient of .70 or higher being considered acceptable (Nunnally & Bernstein, 1994), and CR values of .70 or above were also regarded as adequate indicators of reliability (Hair et al., 2010).

Confirmatory Factor Analysis

Four competing models were compared using confirmatory factor analysis (CFA): (a) three-factor (workload, organisational, and student interaction well-being), (b) two-factor (workload/organisational combined, student interaction separate), (c) single-factor (all items on one general factor), and (d) bifactor (general well-being factor plus three specific factors) (Reise et al., 2010). Model fit was evaluated using the robust CFI, TLI, RMSEA, and SRMR. CFI and TLI values of .95 or above were considered evidence of good fit, and values of .90 or above indicated acceptable fit (Hu & Bentler, 1999; McDonald & Marsh, 1990). RMSEA values of .06 or below were interpreted as evidence of good fit, and values of .08 or below indicated acceptable fit. SRMR values of .08 or below were considered indicative of acceptable fit (Hu & Bentler, 1999).

Relations to Other Variables

Convergent and discriminant patterns were evaluated through correlations with flourishing, job stress, and burnout. Composite scores were computed for each scale and subscale by averaging the relevant item responses, rather than using factor scores derived from the CFA model, to ensure transparency and practical replicability. Positive correlations with flourishing were expected, indicating alignment with related well-being constructs. Negative or weak correlations with job stress and burnout were expected, reflecting the conceptual distinction between well-being and these related but distinct constructs. Correlations of .30 or greater indicated meaningful convergence (Cohen, 1988); correlations below .30 or negative indicated discrimination (Campbell & Fiske, 1959). Magnitudes were interpreted as small (r = .10–.29), medium (r = .30–.49), or large (r = .50 or above).

Measurement Invariance

Measurement invariance was tested across gender (male vs. female), age range (40 years or younger vs. older than 40 years), teaching status (in-service vs. former teachers), teaching experiences (5 years or less, 6 to 10 years, 11 years or more), school level (primary, middle, or high), school type (public vs. private), and school region (urban vs. rural), using configural, metric, and scalar models (Putnick & Bornstein, 2016). Specifically, (a) configural invariance assessed whether the same factor structure held across groups; (b) metric invariance tested whether factor loadings were equivalent across groups; and (c) scalar invariance examined whether both factor loadings and intercepts were equal across groups. Model fit at each stage was evaluated using CFI, TLI, and RMSEA. Changes of .010 or less in CFI and .015 or less in RMSEA were considered evidence of invariance (Chen, 2007; Cheung & Rensvold, 2002).

Test-Retest Reliability

Among 168 participants who completed the TWBS again over 2 weeks, Pearson correlation coefficients were calculated between Time 1 and Time 2 scores to assess the temporal stability. Correlations of .70 or above indicate good test-retest reliability (Koo & Li, 2016).

Social Desirability

Social desirability scores were used to assess the potential influence of socially desirable responding on self-reported well-being scores. Descriptive statistics, reliability coefficients, and bivariate correlations with the TWBS scores were computed to evaluate whether teachers’ self-reported well-being was substantially influenced by socially desirable responding. Correlations below .30 would indicate that social desirability did not substantially confound well-being assessments (Podsakoff et al., 2003).

Results

Descriptive Statistics and Internal Consistency

All variables demonstrated acceptable univariate normality, with skewness and kurtosis values within ±2, indicating approximately normal distributions (Kline, 2023). Descriptive statistics and internal consistency coefficients are presented in Table 2. The Mandarin-Chinese TWBS demonstrated excellent overall reliability (α = .95, CR = .95). Subscale reliability was also excellent: workload well-being (α = .87, CR = .88), organisational well-being (α = .90, CR = .91), and student interaction well-being (α = .84, CR = .85). All coefficients exceeded the .70 threshold (Nunnally & Bernstein, 1994).

Factor Structure

Confirmatory factor analysis (CFA) examined the latent structure of the Mandarin-Chinese TWBS. Four competing models were compared using weighted least squares mean and variance adjusted (WLSMV) estimation, appropriate for ordinal data with Likert-type response scales (Finney & DiStefano, 2013). Model fit indices are presented in Table 3.

Table 3.

Model Fit Indices for Four Competing Models for the Mandarin-Chinese TWBS

Model	χ²	df	CFI	TLI	RMSEA (90% CI)	SRMR
Three-factor	445.42	101	.93	.92	.086 [.080, .093]	.044
Two-factor	481.33	103	.93	.91	.089 [.083, .096]	.047
One-factor	661.46	104	.89	.87	.107 [.100, .113]	.052
Bifactor	284.04	88	.96	.95	.070 [.063, .077]	.033

Note. χ² = chi-square; df = degrees of freedom; CFI = comparative fit index; TLI = Tucker-Lewis index; RMSEA = root mean square error of approximation; CI = confidence interval; SRMR = standardised root mean square residual.

**p < .001.

The original three-factor model (workload, organisational, and student interaction well-being) demonstrated acceptable but slightly poor fit: χ²(101) = 445.42, CFI = .93, TLI = .92, RMSEA = .086 (90% CI [.080, .093]), SRMR = .044. While SRMR indicated good fit (below .08), CFI and TLI fell slightly below the .95, and the RMSEA exceeded the .08 criterion for acceptable fit (Hu & Bentler, 1999). Critically, interfactor correlations were extremely high: workload–organisational r = .96, workload–student interaction r = .86, organisational–student interaction r = .78, suggesting considerable overlap and limited discrimination between factors (Brown, 2015).

Given high interfactor correlations, alternative models were tested. The two-factor model (workload/organisational combined, student interaction separates) showed marginally poorer fit: χ² (103) = 481.33, CFI = .93, TLI = .91, RMSEA = .089 (90% CI [.083, .096]), SRMR = .047. The single-factor model demonstrated noticeable poorer fit: χ² (104) = 661.46, CFI = .89, TLI = .87, RMSEA = .107 (90% CI [.100, .113]), SRMR = .052.

The bifactor model (see Figure 1) demonstrated substantially improved fit: χ² (88) = 284.04, CFI = .96, TLI = .95, RMSEA = .070 (90% CI [.063, .077]), SRMR = .033. This model specified a general teacher well-being factor (all items loaded), alongside three specific factors (workload, organisational, and student interaction well-being) that captured residual variance. All items showed significant loadings on the general factor (λ = .65–.81), indicating strong common variance. However, several specific-factor loadings were weak (λ below .30) or non-significant, suggesting that the specific factors contributed little unique variance beyond the general factor. Chi-square difference testing confirmed the bifactor model fit significantly better than the three-factor model, Δχ² (13) = 147.72, p < .001. Given the superior fit and theoretical meaningfulness of a general teacher well-being construct, the bifactor model was retained as the best representation of the Mandarin-Chinese TWBS structure. Standardised factor loadings for the bifactor model are reported in Appendix C.

Figure 1.

Path diagram of the bifactor model of the Mandarin-Chinese TWBS. Note. The model includes one general well-being factor (G) and three orthogonal specific factors: workload well-being (S₁), organisational well-being (S₂), and student-interaction well-being (S₃). Standardised factor loadings are displayed beside each path. All items loaded significantly (p < .001) on the general factor (λ = .65–.81). Several loadings on the specific factors were weak or negative, indicating that the subfactors contributed relatively little unique variance beyond the general factor. Circles represent latent variables, and rectangles represent observed items

Bifactor Model Evaluation: Unidimensionality

Bifactor-specific indices assessed whether the general factor dominates common variance and justifies unidimensional interpretation (Rodriguez et al., 2016). Results are presented in Table 4. The general well-being factor explaining a large proportion of common variance (ECV = .85) and demonstrating high reliability (ωH = .93). Although the PUC (.70) fell below the .80 threshold, the combination of strong ECV (above .80) and ωH (above .70) supports the dominance of the general factor and justifies a unidimensional interpretation (Reise et al., 2013). The general factor showed excellent construct replicability (H = .95) and factor determinacy (FD = .97), indicating it was well-defined and reliably measured. In contrast, specific factors (workload, organisational, and student interaction well-being) showed substantially lower reliability (ωH = .01–.19) and construct replicability (H = .27–.44), indicating these specific factors contributed limited unique variance beyond the general factor.

Table 4.

Unidimensionality and Reliability Statistics for the Mandarin-Chinese TWBS

Level	EVC	PUC	ω	ωH	H	FD
Model-level (general)	.85	.70	.96	.93	.95	.97
Workload well-being (S1)	.14	—	.89	.01	.40	.80
Organisational well-being (S2)	.09	—	.89	.07	.27	.63
Student interaction well-being (S3)	.25	—	.84	.19	.44	.77
Item level
IECV range	.65–1.00	—	—	—	—	—
Items with IECV>.70	15 of 16	—	—	—	—	—

Note. ECV = explained common variance; PUC = percent of uncontaminated correlations; ω = omega total reliability; ωH = omega hierarchical; H = construct replicability (H-index); FD = factor determinacy index. IEVC = item explained common variance. Em dashes (—) indicate indices not applicable at that level. Recommended thresholds for essential unidimensionality: ECV above .80 and ωH above .70, PUC above .80, H above .70 for well-defined factors (Reise et al., 2013; Rodriguez et al., 2016). Item Q3 (“Student behaviour”) had an IECV of .65, slightly below the .70 threshold.

At the item level, IECV values ranged from .65 to 1.00, with 15 of 16 items exceeding .70. Only Q3 (“Student behaviour”) fell slightly below this threshold (IECV = .65), indicating items were predominantly explained by the general factor. These findings support essentially unidimensional interpretation. The total TWBS score reliably indicates overall teacher well-being, while the three subscale scores provide supplementary domain-specific information.

Relations to Other Variables

Correlations between the Mandarin-Chinese TWBS and three external constructs (flourishing, job stress, and burnout) were used to examine convergent and discriminant patterns (Table 5).

Table 5.

Correlations Between Teacher Well-Being Scores and External Measures

Variable	Flourishing		Job stress		Burnout		Social desirability
Variable	r	95% CI	r	95% CI	r	95% CI	r	95% CI
Total TWBS	.762***	[.73, .79]	−.498***	[−.55, −.44]	−.723***	[−.76, −.68]	.510***	[.45, .56]
Workload WB	.689***	[.65, .73]	−.481***	[−.54, −.42]	−.649***	[−.69, −.60]	.480***	[.42, .54]
Organisational WB	.730***	[.69, .76]	−.480***	[−.54, −.42]	−.688***	[−.73, −.65]	.500***	[.44, .56]
Student interaction WB	.695***	[.65, .73]	−.399***	[−.46, −.33]	−.678***	[−.72, −.63]	.420***	[.35, .48]

Note. N = 653. WB = well-being. Flourishing, job stress, and burnout served as validity evidence based on relations to other variables (convergent and discriminant patterns). Social desirability was assessed separately to evaluate potential response bias. All correlations significant at p < .001.

Convergent Evidence

The TWBS correlated strongly with flourishing (r = .76, 95% CI [.73, .79], p < .001). All three subscales also showed strong positive correlations with flourishing: workload well-being (r = .69), organisational well-being (r = .73), and student interaction well-being (r = .70), all p < .001.

Discriminant Evidence

The TWBS correlated negatively with job stress (r = −.50, p < .001, 95% CI [–.55, −.44]) and burnout (r = −.72, p < .001, 95% CI [–.76, −.68]). All subscales showed significant negative correlations with both job stress (workload: r = −.48, organisational: r = −.48, student interaction: r = −.40) and burnout (workload: r = −.65, organisational: r = −.69, student interaction: r = −.68), all p < .001. Together, these results provide validity evidence based on relations to other variables, demonstrating appropriate convergence with positive well-being indicators and divergence from negative psychological constructs.

Measurement Invariance

Measurement invariance was examined across seven demographic variables using multigroup CFA with MLR estimation and FIML for missing data (Brown, 2015). Following Chen (2007), invariance was supported when changes in CFI were .010 or less and changes in RMSEA were .015 or less. Results are summarised in Table 6.

Table 6.

Measurement Invariance Across Groups

Grouping variable	Model	χ² (df)	CFI	TLI	RMSEA [90% CI]	SRMR	ΔCFI	ΔRMSEA	p
Gender (Male/female)
	Configural	554.04 (202)	.93	.91	.088 [.081, .095]	.049	—	—	—
	Metric	585.97 (215)	.92	.92	.088 [.081, .095]	.061	−.004	.000	.002**
	Scalar	636.39 (228)	.92	.91	.088 [.082, .095]	.065	−.008	.001	<.001***
	Strict	666.53 (244)	.91	.91	.087 [.081, .093]	.065	−.003	−.001	.017*
Age (≤40/>40)
	Configural	650.31 (202)	.92	.91	.091 [.084, .098]	.048	—	—	—
	Metric	665.76 (215)	.92	.91	.089 [.082, .096]	.050	.000	−.002	.258
	Scalar	677.99 (228)	.92	.92	.086 [.079, .092]	.051	.000	−.002	.811
	Strict	714.47 (244)	.92	.92	.084 [.077, .090]	.051	−.004	−.001	.008**
Teaching status (in-service/former)
	Configural	611.52 (202)	.93	.91	.090 [.083, .097]	.048	—	—	—
	Metric	638.80 (215)	.93	.92	.088 [.082, .095]	.053	−.003	−.002	.016*
	Scalar	664.37 (228)	.92	.92	.087 [.081, .094]	.054	−.004	−.001	.001**
	Strict	698.02 (244)	.92	.92	.086 [.080, .092]	.055	−.003	−.001	.010*
Teaching experience (≤5/6–10/11+)
	Configural	696.96 (303)	.93	.91	.091 [.083, .098]	.054	—	—	—
	Metric	730.99 (329)	.93	.92	.088 [.081, .095]	.063	−.002	−.002	.136
	Scalar	763.83 (355)	.92	.92	.085 [.078, .092]	.064	−.001	−.002	.349
	Strict	799.15 (387)	.92	.93	.082 [.076, .089]	.064	−.001	−.003	.300
School level (primary/middle/high)
	Configural	735.96 (303)	.92	.91	.095 [.088, .102]	.052	—	—	—
	Metric	781.79 (329)	.92	.91	.093 [.086, .100]	.066	−.004	−.002	.015*
	Scalar	832.70 (355)	.91	.91	.091 [.084, .098]	.067	−.005	−.001	.005**
	Strict	893.83 (387)	.91	.91	.090 [.084, .097]	.067	−.005	−.001	.001**
School type (public/private)
	Configural	588.63 (202)	.93	.92	.089 [.082, .096]	.047	—	—	—
	Metric	600.80 (215)	.93	.92	.086 [.079, .093]	.050	.000	−.002	.595
	Scalar	618.85 (228)	.93	.93	.084 [.077, .090]	.050	−.001	−.002	.366
	Strict	642.73 (244)	.93	.93	.082 [.075, .088]	.051	−.001	−.002	.135
School region (urban/rural)
	Configural	582.42 (202)	.93	.91	.091 [.084, .098]	.048	—	—	—
	Metric	597.19 (215)	.93	.92	.088 [.081, .095]	.053	.000	−.002	.606
	Scalar	614.24 (228)	.93	.92	.085 [.079, .092]	.053	−.001	−.002	.525
	Strict	627.34 (244)	.93	.93	.082 [.076, .089]	.054	.001	−.003	.716

Note. ΔCFI ≤.010 and ΔRMSEA ≤.015 indicate acceptable invariance (Cheung & Rensvold, 2002). CI = confidence interval. All models estimated using robust maximum likelihood (MLR) with full information maximum likelihood (FIML) for missing data.

*p < .05, **p < .01, ***p < .001.

The Mandarin-Chinese TWBS demonstrated full measurement invariance (configural, metric, scalar, and strict) across all seven variables: gender (male, n = 201; female, n = 452), age (40 years or younger, n = 608; older than 40 years, n = 45), teaching status (in-service, n = 559; former, n = 94), teaching experience (5 years or fewer, n = 404; 6–10 years, n = 173; 11 years or more, n = 56), school level (primary, n = 317; middle, n = 242; high, n = 94), school type (public, n = 582; private, n = 71), and school region (urban, n = 462; rural, n = 191). Across all comparisons, changes in CFI ranged from .000 to .008, and changes in RMSEA ranged from .000 to .003, well within established thresholds (Chen, 2007; Cheung & Rensvold, 2002). Although the scaled chi-square difference test indicated a significant model deterioration at the metric, scalar, and strict levels for teaching status, chi-square difference tests are known to be overly sensitive to sample size and tend to reject invariance even when practical fit does not meaningfully worsen (Chen, 2007; Cheung & Rensvold, 2002). Accordingly, invariance decisions were based on changes in CFI and RMSEA, which are less susceptible to sample-size inflation and are recommended as the primary criteria in applied measurement invariance research (Chen, 2007). Twenty participants with missing or inconsistent teaching experience data were excluded from the teaching experience analysis, resulting in n = 633. These findings indicate that the scale measures teacher well-being equivalently across diverse subgroups and supports meaningful group comparisons (Putnick & Bornstein, 2016).

Test-Retest Reliability

Test-retest reliability was examined among 168 participants who completed the TWBS a second time approximately 2 weeks (14 days) after their initial response. Descriptive statistics indicated stable mean scores: Time 1 (M = 5.14, SD = 0.83) and Time 2 (M = 5.07, SD = 0.87). The total scale demonstrated excellent reliability: ICC(2, 1) = .88, 95% CI [.84, .91]; ICC(2, k) = .93 (95% CI [.91, .95]); r = .88, p < .001 [.84, .91], all exceeding the .75 threshold (Koo & Li, 2016). Subscales showed good reliability: workload (ICC = .80, r = .81), organisational (ICC = .82, r = .86), and student interaction (ICC = .75, r = .74). Paired t-test revealed no significant mean differences except workload (t [166] = 2.17, p = .031, d = 0.10), a minimal effect not affecting temporal stability. These results support the TWBS temporal consistency over 2 weeks (Table 7).

Table 7.

Test-Retest Reliability of the Mandarin-Chinese TWBS (n = 168)

Scale	Time 1 M (SD)	Time 2 M (SD)	r [95% CI]	ICC(2,1) [95% CI]	ICC(2,k) [95% CI]
Total TWBS	5.14 (0.83)	5.07 (0.87)	.88 [.84, .91]	.88 [.84, .91]	.93 [.91, .95]
Workload well-being	4.90 (1.01)	4.79 (1.05)	.81 [.75, .86]	.80 [.74, .85]	.89 [.86, .92]
Organisational well-being	5.18 (0.92)	5.14 (0.95)	.86 [.80, .89]	.86 [.81, .89]	.92 [.89, .94]
Student interaction well-being	5.44 (0.79)	5.40 (0.83)	.74 [.67, .81]	.75 [.68, .81]	.86 [.81, .90]

Note. N = 168. r = Pearson correlation; ICC(2,1) = single-measure intraclass correlation coefficient; ICC(2,k) = average-measure intraclass correlation coefficient. Two-week interval between administrations.

All correlations significant at p < .001.

Social Desirability

Correlations between the Chinese-language Short Form of the Marlowe-Crowne Social Desirability Scale (MC-SDS; Tao et al., 2009; M = 8.36, SD = 3.16) and the TWBS assessed response bias (see Table 5). The MC-SDS correlated positively with total TWBS (r = .51, p < .001, 95% CI [.45, .56]), and subscales: workload (r = .48 (95% CI [.42, .54]), organisational (r = .50 (95% CI [.44, .56]), and student interaction (r = .42 (95% CI [.35, .48]), all p < .001. While statistically significant, correlations fell below the .70 threshold for problematic overlap (Paulhus, 1991). This pattern is consistent with previous research in collectivistic cultures, where expressing contentment reflects both genuine well-being and culturally valued self-presentation (Lu & Gilmour, 2004; Ones et al., 1996), supporting the scale’s utility for research purposes.

Study 2: Development and Validation of the Context-Specific Teacher Well-Being Scale

Building on Study 1, Study 2 developed and validated a context-specific extension to the TWBS, capturing aspects of teacher well-being unique to the Chinese educational context. During the cognitive interviews in Study 1, participants were asked: “Are there any aspects of your well-being as a teacher that are important but were not captured by the questions in this scale?” Congnitive interview responses revealed consistent concerns related to the Chinese educational context, workplace dynamics, and societal expectations. Based on these concerns, an 11-item Context-Specific Teacher Well-Being Scale (CS-TWBS) was developed as a unidimensional supplement to the original 16-item TWBS.

Study 2 examined CS-TWBS psychometric properties, its relationship with the TWBS, and whether it provides incremental evidence beyond the original scale. Based on findings, recommendations are provided regarding when researchers should use the TWBS alone, the CS-TWBS alone, or the combined 27-item scale.

Method

Item Development

Eight of twelve interviewed participants identified missing content, generating 20 content units that revealed six primary themes: (a) Work-Life Balance (25%), (b) Compensation and facilities (15%), (c) Career Development and Professional Status (20%), (d) Policy Implementation Burden (20%), (e) External Support Systems (20%), and (f) Professional Identity and Self-Efficacy (10%). These themes were used to guide item generation, ensuring comprehensive coverage of Chinese-specific contextual factors affecting teacher well-being; however, the CS-TWBS was designed from the outset as a unidimensional measure of the overall influence of context-specific factors on teacher well-being, rather than as a multidimensional scale with separate subfactors for each theme.

Based on these themes, 11 items were developed following the TWBS noun-phrase format (e.g., “Work-life balance,” “salary and benefits”), using the same instructional prompt (“Currently, how do the following aspects of being a teacher affect your well-being as a teacher?”) and 7-point Likert scale (1 = Negatively to 7 = Positively). Eleven items are more than sufficient for a unidimensional scale (Worthington & Whittaker, 2006), and this item count allowed representation of the breadth of context-specific concerns identified by teachers while maintaining parsimony for practical use alongside the 16-item TWBS. The CS-TWBS was intended to supplement rather than replace universal TWBS dimensions.

Participants and Procedure

The same 653 Chinese teachers from Study 1 completed the CS-TWBS (demographics in Table 1). Using the same sample allowed direct examination of the relationship between universal (TWBS) and context-specific (CS-TWBS) dimensions and rigorous incremental prediction testing. Data collection conducted in a single online survey via Wenjuanxing: participants completed TWBS, CS-TWBS, and measures of related constructs (approximately 10 minutes total). This order minimises potential fatigue effects on the longer TWBS while allowing comprehensive well-being consideration.

Measures

The Mandarin-Chinese TWBS

The 16-item Mandarin-Chinese TWBS validated in Study 1 assessed workload, organisational, and student interaction well-being (7-point scale: 1 = Negatively to 7 = Positively; α = .95, ω = .96) and was used as the reference measure for CS-TWBS validation and incremental validity assessment in Study 2.

Context-Specific Teacher Well-Being Scale (CS-TWBS)

The 11-item scale measured context-specific dimensions using the same format and 7-point scale (1 = Negatively to 7 = Positively) as TWBS. Items were drawn from six content areas identified during cognitive interviews: work-life balance (1 item), compensation and facilities (2 items), career development and professional status (2 items), policy implementation burden (2 items), external support systems (2 items), and professional identity and self-efficacy (2 items). These content areas guided item development to ensure comprehensive coverage, but do not represent separate subscales. The CS-TWBS is scored as a unidimensional scale. Higher scores indicate that the aspect affects well-being more positively. The full bilingual CS-TWBS is presented in Appendix B.

Measures of Related Constructs

The Simplified Chinese version of Flourishing Scale (Tang et al., 2016), the Chinese Primary and Secondary School Teachers' Job Burnout Questionnaire (Wu et al., 2016), and the Job Stress Scale (Boyle et al., 1995) were used to examine convergent and discriminant patterns in relation to other variables (details of scales presented in Study 1).

Data Analysis

Data analyses examined the psychometric properties of CS-TWBS and its relationships to TWBS using IBM SPSS Statistics version 29.0 and R 4.5.0. Exploratory factor analysis (EFA) examined the underlying factor structure of the CS-TWBS using principal axis factoring with oblique rotation; Kaiser-Meyer-Olkin (KMO) and Bartlett’s test assessed data suitability. Confirmatory factor analyses (CFA) validated the factor structure of: (a) CS-TWBS alone, (b) TWBS, and (c) the combined 27-item scale using MLR estimation. Model fit criteria: CFI/TLI of .90 or above, RMSEA and SRMR of .08 or below (Hu & Bentler, 1999). For the combined scale, a bifactor model specified a general well-being factor plus specific subfactors (workload, organisational, student interaction, context-specific).

Unidimensionality was assessed via omega hierarchical (ωH), explained common variance (ECV), and percentage of uncontaminated correlations (PUC); ωH and ECV of .70 or above and PUC above .80 indicate essential unidimensionality (Rodriguez et al., 2016). Internal consistency was assessed using Cronbach’s α and McDonald’s ω (of .70 or above acceptable, of .80 or above good). Convergent and discriminant patterns were evaluated through correlations with flourishing, job stress, and burnout, and the relationship with TWBS was examined to assess the degree of overlap between universal and context-specific dimensions. Composite scores were computed by averaging item responses for each scale, consistent with the approach used in study 1. Incremental validity was tested using hierarchical multiple regression: TWBS entered in Step 1, CS-TWBS added in Step 2. A statistically significant ΔR² indicates a unique CS-TWBS contribution. Competing structural models were compared using fit indices (AIC, BIC) and theoretical interpretability.

Results

Descriptive Statistics and Internal Consistency

The CS-TWBS showed excellent internal consistency (α = .96, ω = .96). Mean total score was M = 4.75 (SD = 1.38), indicating moderate context-specific well-being. Item means ranged from 4.34 (Facilitating non-teaching activities) to 5.08 (teachers’ professional identity), with adequate variability (SD = 1.53–1.85). Teachers reported the highest well-being for professional identity (M = 5.08) and self-efficacy (M = 4.93), and lowest for facilitating non-teaching activities (M = 4.34) and psychological support (M = 4.58). All items showed approximately normal distributions (skewness: −0.49 to −0.96; kurtosis: −0.79 to 0.31), supporting parametric analyses. Descriptive statistics are in Table 8.

Table 8.

Descriptive Statistics for the Context-Specific Teacher Well-Being Scale (CS-TWBS)

Item	Item wording (English)	M	SD	Skewness	Kurtosis
Q1	Facilitating non-teaching activities	4.34	1.75	−0.49	−0.79
Q2	Work-life balance	4.72	1.59	−0.72	−0.27
Q3	Implementation of educational policies	4.75	1.57	−0.51	−0.56
Q4	Work and leisure facilities provided by school	4.82	1.66	−0.71	−0.29
Q5	Salary and benefits	4.87	1.64	−0.69	−0.25
Q6	Professional development opportunities	4.78	1.63	−0.57	−0.43
Q7	Teachers’ social status	4.66	1.78	−0.66	−0.57
Q8	Parental cooperation with and recognition of educational work	4.74	1.60	−0.54	−0.51
Q9	Psychological support for teacher provided by school	4.58	1.85	−0.63	−0.65
Q10	Teachers’ teaching self-efficacy	4.93	1.53	−0.88	0.31
Q11	Clear sense of professional identity	5.08	1.63	−0.96	0.21
CS-TWBS total		4.75	1.38	−0.75	−0.38

Note. N = 653. Items rated on a 7-point scale (1 = Negatively, 7 = Positively). CS-TWBS = Context-Specific Teacher Well-Being Scale. Internal consistency: Cronbach’s α = .96; McDonald’s ω = .96.

Exploratory Factor Analysis

EFA was conducted in three stages using principal axis factoring with oblique rotation.

Stage 1: Original TWBS (16-Item)

Preliminary checks indicated excellent sampling adequacy (KMO = .96; Bartlett’s test of sphericity, χ² (120) = 6887.35, p < .001). A two-factor solution explaining 63.14% of the total variance: Factor 1 (8 items: Q2, Q4, Q5, Q7, Q10, Q11, Q15, Q16), Factor 2 (6 items: Q1, Q3, Q6, Q9, Q12, Q14); Q8 and Q13 showed weak/cross-loadings. High interfactor correlation (r = .78) indicated strongly related dimensions. The pattern matrix for the original TWBS is presented in Appendix D.

Stage 2: CS-TWBS (11-Item)

Sampling adequacy was excellent (KMO = .956; Bartlett’s χ² (55) = 6145.75, p < .001). A clear single-factor solution explained 69.21% variance, with all items loading strongly (λ = .74–.88), confirming the unidimensional structure of the CS-TWBS. The factor matrix for the CS-TWBS is presented in Appendix E.

Stage 3: Combined Scale (27-Item)

The combined 27-item scale also demonstrated excellent sampling adequacy (KMO = .976; Bartlett’s χ² (351) = 14,485.50, p < .001). A two-factor solution explained 62.90% variance: Factor 1 (all 11 CS-TWBS items and seven TWBS items: Q2, Q5, Q7, Q10, Q11, Q15, Q16), Factor 2 (6 TWBS items: Q1, Q3, Q6, Q9, Q12, Q14); Q4, Q8, and Q13 showed cross-loadings across both factors. The correlation between the two factors was also high (r = .80). The pattern matrix for the combined scale is presented in Appendix F.

Confirmatory Factor Analysis

A single-factor CFA model for the CS-TWBS showed acceptable fit (χ² [44] = 375.91, CFI = .95, TLI = .93, RMSEA = .108, SRMR = .035). To further evaluate dimensionality, alternative two-factor and three-factor CFA models were compared with the single-factor solution. The two-factor model showed negligible improvement in fit (χ² [43] = 369.31, CFI = .95, TLI = .93, RMSEA = .108) and produced a near-unity factor correlation (r = .99). The three-factor model similarly showed minimal fit improvement (χ² [41] = 352.75, CFI = .95, TLI = .93, RMSEA = .108), with interfactor correlations ranging from .97 to .99. These near-unity correlations indicate that the factors were empirically indistinguishable, providing further support for the unidimensional interpretation of the CS-TWBS.

For the combined 27-item scale, four competing models were compared (Table 9). The three-factor model, based on the combined EFA factor structure, showed acceptable fit (χ² [321] = 1123.73, CFI = .93, TLI = .92, RMSEA = .062, SRMR = .038). The two-factor model showed slightly weaker fit (χ² [323] = 1305.85, CFI = .91, TLI = .90, RMSEA = .068, SRMR = .041). The second-order model yielded an identical fit to the two-factor model (χ² [322] = 1301.81, CFI = .91, TLI = .90, RMSEA = .068, SRMR = .041). The bifactor model demonstrated best fit (χ² [297] = 1022.8, CFI = .93, TLI = .92, RMSEA = .061, SRMR = .035), representing a general well-being factor plus four specific domains (workload, organisational, student interaction, context-specific). Despite lower AIC/BIC for the three-factor model, the bifactor model was preferred for superior fit and theoretical advantages.

Table 9.

Fit Indices for Competing Model of the Combined 27-Item Scale

Model	χ²	df	CFI	TLI	RMSEA [90% CI]	SRMR	AIC	BIC
3-Factor	1123.73	321	.93	.92	.062 [.058, .065]	.038	51772	52028
2-Factor	1305.85	323	.91	.90	.068 [.065, .072]	.041	52005	52251
2nd-order	1301.81	322	.91	.90	.068 [.065, .072]	.041	52007	52258
Bifactor	1022.8	297	.93	.92	.061 [.057, .065]	.035	51699	52183

Note. N = 653. All models estimated using robust maximum likelihood (MLR). All fit indices reported are robust scaled values. 3-Factor = based on combined EFA factor structure (see Appendix F); 2-Factor = based on EFA results; 2nd-Order = two first-order factors with a higher-order general factor; Bifactor = general well-being factor plus four specific factors (workload, organisational, student interaction, context-specific). CFI = comparative fit index; TLI = Tucker-Lewis index; RMSEA = root mean square error of approximation; SRMR = standardised root mean square residual; AIC = Akaike information criterion; BIC = Bayesian information criterion. Lower AIC/BIC values indicate better model parsimony.

Bifactor Modelling and Unidimensionality

In the bifactor model, all items loaded strongly on the general factor (λ = .63–.83), with weaker domain-specific loadings (λ = .01–.50). Unidimensionality indices (Table 10) strongly supported essential unidimensionality: ω_h = .88 (88% reliable variance from general well-being factor, exceeding .80 threshold), ECV = .80 (80% shared variance from general factor, exceeding .70 criterion), PUC = .79 (approached .80 threshold). Overall reliability was excellent (α = .97 and ω_t = .98). These findings support using a total score to represent overall teacher well-being.

Table 10.

Indices of Unidimensionality and Reliability for the Combined TWBS (27 Items)

Index	Value
Omega total (ω_t)	.98
Omega hierarchical (ω_h)	.88
Cronbach’s α	.97
Explained common variance (ECV)	.80
Mean percent general variance	.79
RMSEA	.055
Correlation of factor scores with general factor	.96
R²	.91

Note. N = 653. ω_t = omega total, representing the proportion of total variance in scores attributable to all sources of common variance (general and specific factors combined); ω_h = omega hierarchical, representing the proportion of total variance attributable to the general factor alone. ω_h represents the proportion of reliable variance due to the general factor. ECV represents the proportion of common variance due to the general factor. PUC represents the proportion of correlations unaffected by specific factors. All criteria for essential unidimensionality were met or approached, supporting the use of the total score.

Relations to Other Variables

Both CS-TWBS and combined scales demonstrated theoretically expected patterns (Table 11). CS-TWBS correlated positively with flourishing (r = .78, p < .001) and negatively with job stress (r = −.52, p < .001) and burnout (r = −.73, p < .001). The combined scale showed similar patterns: flourishing (r = .79), job stress (r = −.53), and burnout (r = −.75), supporting convergent and discriminant patterns. In addition, the CS-TWBS showed a similar pattern of association with social desirability (r = .50, p < .001) as the TWBS (r = .51), suggesting comparable levels of socially desirable responding across both scales (see Study 1 for interpretation).

Table 11.

Correlations Between Teacher Well-Being Scales and Measures of Related Constructs

External measure	CS-TWBS (11-item)	Combined scale (27-item)
Flourishing	.78***	.79***
Job stress	−.52***	−.53***
Burnout	−.73***	−.75***

Note. N = 653. CS-TWBS = Context-Specific Teacher Well-Being Scale. All correlations are significant at p < .001. Higher well-being scores indicate greater well-being.

Relationship Between Mandarin-Chinese TWBS and CS-TWBS

The relationship between the TWBS and CS-TWBS was examined to determine whether the two instruments assess overlapping or distinct aspects of teacher well-being, and whether their combination is empirically justified. Conceptually, both scales assess teacher well-being but from complementary perspectives: the TWBS captures universal workplace dimensions (workload, organisational, and student interaction), while the CS-TWBS reflects contextual factors specific to the Chinese educational environment (e.g., compensation, policy implementation, parental involvement). If the two scales share substantial common variance while the CS-TWBS also contributes meaningful unique variance, this would support their combined use.

The TWBS and CS-TWBS were strongly correlated (r = .89, p < .001, 95% CI [.87, .90]), indicating substantial shared variance (78.8%), consistent with both scales assessing the broader construct of teacher well-being. Critically, the CS-TWBS also explained 21.2% unique variance, indicating that context-specific factors capture meaningful aspects of well-being not assessed by the TWBS alone (Table 12). At the subscale level, the CS-TWBS correlated most strongly with organisational well-being (r = .87), followed by workload (r = .83), and student interaction (r = .74, all p < .001), consistent with the CS-TWBS emphasis on organisational and systemic factors.

Table 12.

Correlations Between TWBS-16 and CS-TWBS

Variable	r	95% CI	r ²
TWBS total	.888***	[.87, .90]	.788
Workload well-being	.829***	[.80, .85]	.687
Organisational well-being	.869***	[.85, .89]	.756
Student interaction well-being	.735***	[.70, .77]	.540

Note. N = 653. CS-TWBS = Context-Specific Teacher Well-Being Scale.

All correlations are significant at ***p < .001.

Structural Model Comparison and Usage Recommendations

To determine the optimal structural representation of the combined instruments, three competing models were compared (Table 13). The correlated four-factor model (TWBS three factors plus CS-TWBS as the fourth factor) tested whether the CS-TWBS functions as an additional distinct dimension alongside the existing TWBS factors. This model showed acceptable fit (CFI = .93, TLI = .93, RMSEA = .071, SRMR = .039). The combined 27-item bifactor model tests whether all items reflect a single general well-being factor with domain-specific residual variance. This model demonstrated the best overall fit (CFI = .95, TLI = .94, RMSEA = .061, SRMR = .028). The bifactor TWBS plus correlated CS-TWBS model tested whether the TWBS is best represented as a bifactor structure with the CS-TWBS as a correlated but structurally distinct supplementary scale; this model also showed excellent fit (CFI = .95, TLI = .94, RMSEA = .063, SRMR = .032).

Table 13.

Structural Model Comparison for TWBS and CS-TWBS Integration

Model	χ²	df	CFI	TLI	RMSEA	SRMR	AIC	BIC
4-Factor correlated	1144.49	318	.93	.92	.071	.039	51,805	52,073
Combined bifactor	844.67	287	.95	.94	.061	.028	51,452	51,860
TWBS bifactor + CS-TWBS*	909.89	298	.95	.94	.063	.032	51,529	51,887

Note. N = 653. All models estimated using robust maximum likelihood (MLR). 4-Factor = TWBS-16 three factors (workload, organisational, student interaction) + CS-TWBS as fourth factor; Combined Bifactor = all 27 items on general factor + four specific factors; TWBS-16 Bifactor + CS-TWBS = TWBS-16 as bifactor (general + three specific) with CS-TWBS as correlated supplementary scale (recommended model). CFI = comparative fit index; TLI = Tucker-Lewis index; RMSEA = root mean square error of approximation; SRMR = standardised root mean square residual; AIC = Akaike information criterion; BIC = Bayesian information criterion. Best-fitting model in bold; recommended model indicated with an asterisk.

Although the combined bifactor model showed slightly superior fit (ΔCFI = .006; ΔRMSEA = .002), the bifactor TWBS plus supplementary CS-TWBS model was selected based on both psychometric evidence and theoretical interpretability. Psychometrically, the minimal fit difference between the two models indicates that treating the CS-TWBS as a correlated supplement does not meaningfully reduce model adequacy. Theoretically, this structure preserves the distinction between universal workplace dimensions (captured by the TWBS) and culturally specific contextual factors (captured by the CS-TWBS), which is important for two reasons. First, maintaining the TWBS as a standalone instrument allows cross-cultural comparisons with studies using the original English version. Second, treating the CS-TWBS as a modular supplement allows researchers to add it when studying Chinese teachers specifically without altering the core TWBS structure. This structure allows researchers to: (a) use the TWBS alone for cross-cultural comparisons or when cultural factors are not of primary interest, (b) add the CS-TWBS when studying Chinese teachers specifically to capture context-specific variance, or (c) use the combined 27-item scale for comprehensive assessment of both universal and contextual dimensions.

Incremental Validity

Hierarchical regression tested whether CS-TWBS provides unique predictive information beyond TWBS (Table 14). TWBS entered in Step 1, and CS-TWBS were added in Step 2.

Table 14.

Incremental Validity of CS-TWBS Beyond TWBS

Outcomes	Step	Predictors	R ²	ΔR²	ΔF	β TWBS	β CS-TWBS
Flourishing	1	TWBS	.580***	—	—	.808***	—
	2	+CS-TWBS	.632***	.052***	91.86***	.342***	.417***
Job stress	1	TWBS	.248***	—	—	−.407***	—
	2	+CS-TWBS	.280***	.032***	28.91***	−.125*	−.253***
Burnout	1	TWBS	.223***	—	—	−.337***	—
	2	+CS-TWBS	.282***	.058***	52.86***	−.005	−.297***

Note. N = 653. β = standardised regression coefficient. All ΔF tests: df = 1, 650.

*p < .05. ***p < .001.

Flourishing: TWBS predicted flourishing (R² = .580, p < .001). Adding CS-TWBS significantly improved prediction (ΔR² = .052, ΔF[1, 650] = 91.86, p < .001), with final model explaining 63.2% variance. Both TWBS (β = .342) and CS-TWBS (β = .417) remained significant, with CS-TWBS showing stronger prediction. Job stress: TWBS predicted job stress (R² = .248, p < .001). Adding CS-TWBS improved prediction (ΔR² = .032, ΔF[1, 650] = 28.91, p < .001), with CS-TWBS (β = −.253) emerging as stronger predictor than TWBS (β = −.125, p = .035). Burnout: TWBS predicted burnout (R² = .223, p < .001). Adding CS-TWBS produced largest incremental effect (ΔR² = .058, ΔF[1, 650] = 52.86, p < .001). In the full model, CS-TWBS (β = −.297, p < .001) was only significant predictor, while TWBS became non-significant (β = −.005, p = .924).

CS-TWBS demonstrated significant incremental validity across all outcomes (ΔR² = .032–.058), with particularly strong effects for burnout. Context-specific dimensions account for virtually all burnout predictive variance, suggesting culture-specific contextual factors (policy implementation, compensation, parental involvement) may be especially important for understanding teacher burnout in Chinese settings.

Discussion

This research adapted and extended the Teacher Well-Being Scale (TWBS; Collie et al., 2015) for use in the Chinese educational context. Study 1 validated the Mandarin-Chinese TWBS in a sample of 653 teachers, while Study 2 developed and validated a Context-Specific Teacher Well-Being Scale (CS-TWBS) capturing culturally relevant aspects of teacher well-being. Together, these findings provide robust evidence for a comprehensive, culturally grounded framework for assessing teacher well-being in China.

Mandarin-Chinese TWBS Validation

The Mandarin-Chinese TWBS demonstrated excellent psychometric properties: strong internal consistency (α = .95), measurement invariance across demographic groups, test-retest reliability (r = .88), and theoretically expected associations with flourishing (r = .76), burnout (r = −.72), and job stress (r = −.50). While the three-factor model fit acceptably, high interfactor correlations (r = .82–.95) and superior bifactor model fit (ωH = .93, ECV = .85) indicated that most variance is explained by a general well-being factor, justifying use of total scores.

Development and Validation of the CS-TWBS

Cognitive interviews revealed culturally specific concerns missing from TWBS (e.g., salary and benefits, educational policy implementation demands, and professional identity). The 11-item CS-TWBS demonstrated excellent reliability (α = .96), clear unidimensional factor structure explaining 69.21% of variance, and expected patterns of association with related constructs. The combined 27-item scale exhibited excellent bifactor model fit (CFI = .93–.95, RMSEA = .055–.061), with unidimensionality indices (ωH = .88, ECV = .80), supporting a general well-being factor encompassing both universal and culture-specific dimensions.

Incremental Validity of Culture-Specific Assessment

CS-TWBS correlated strongly with TWBS (r = .89) yet explained unique variance (21.2%), with the strongest correlations with organisational well-being (r = .87) and weakest with student interaction well-being (r = .74), consistent with its organisational/systemic focus. Hierarchical regressions demonstrated significant incremental validity: CS-TWBS explained additional variance in flourishing (ΔR² = .052), job stress (ΔR² = .032), and burnout (ΔR² = .058), all p < .001. Critically, for burnout, CS-TWBS was the only significant predictor (β = −.297), while TWBS became non-significant (β = −.005), suggesting culture-specific factors, such as compensation, policy implementation, and parental involvement, may be more strongly linked to teacher burnout than general workplace factors.

Usage Recommendations

Based on these findings, practical usage recommendations are provided. Researchers should use Mandarin-Chinese TWBS alone for cross-cultural comparisons or brief assessments; add the CS-TWBS when studying Chinese teacher populations specifically or investigating predictors of burnout. Use the combined 27-item scale when comprehensive assessments of both universal and contextual factors are needed (see Results for detailed guidance).

Methodological Considerations

Studies 1 and 2 used the same sample, which offered distinct methodological advantages. The two studies tested different constructs (universal vs. culture-specific well-being) using distinct items, so the participant overlap does not constitute a replication issue but rather enables direct comparison of complementary measurement approaches. We employed a sequential strategy, treating Study 1 as model fitting and Study 2 as model testing, consistent with best practices in instrument development (Anderson & Gerbing, 1988). The strong correlation between scales (r = .89) alongside demonstrated incremental validity (ΔR² = .032–.058) represents a particularly stringent within-subjects test of unique predictive value, controlling for all individual differences.

Social Desirability in Cultural Context

Teachers reported moderate levels of socially desirable responding (M = 8.36), which correlated positively with both TWBS (r = .51) and the CS-TWBS (r = .50). While these correlations exceed the .30 threshold typically indicating meaningful construct overlap, they remain well below the .70 threshold associated with problematic redundancy. In Western contexts, such correlations might raise concerns about defensive self-presentation inflating well-being scores. However, in a collectivistic context like China, this likely reflects cultural values of harmony, modesty, and face-saving (Lalwani et al., 2006) rather than measurement bias. For teachers, who have a morally authoritative role in Chinese culture, positive well-being reports may reflect internalisation of the “good teacher” image. Researchers should recognise social desirability as an integral component of how well-being is experienced and expressed within Chinese cultural frameworks.

Limitations and Implications

While the same-sample design offered methodological advantages, independent validation in separate samples would strengthen confidence in generalisability. The CS-TWBS was developed based on cognitive interviews with 12 in-service and former teachers who provided feedback on the TWBS and identified culturally relevant concerns not captured by the original scale. While this sample size is consistent with recommendations for cognitive interviewing (Willis, 2005) and yielded coherent, theoretically meaningful themes, a larger and more diverse qualitative sample might have identified additional culturally specific dimensions of teacher well-being in China. Future research using broader qualitative inquiry across diverse teaching contexts could refine and extend the CS-TWBS item pool.

Additionally, the test-retest subsample (n = 168) was drawn from volunteers who provided their email address for follow-up, and demographic comparisons revealed that this subsample included a significantly higher proportion of female and younger teachers with less teaching experience than the full sample. Although the subsample was comparable on teaching status, school type, and school region, the demographic differences suggest that the temporal stability estimated may not fully generalise to all subgroups, particularly older and more experienced male teachers. Future research could employ strategies to improve retest participation across demographic groups.

Our social media recruitment strategy yielded a geographically and demographically diverse sample from across China, representing teachers from various provinces, career stages, and school contexts. Although this diversity enhances the generalisability of our findings to Chinese teachers broadly, it includes an in-depth examination of how context-specific well-being dimensions may function differently across distinct teaching contexts. Theoretical perspectives on teacher development (Day & Gu, 2014) and occupational well-being (Bakker & Oerlemans, 2019) suggest that the salience and impact of specific well-being dimensions may vary systematically by career stage, school type, and educational level. For example, early-career teachers may prioritise organisational support and professional development opportunities, while experienced teachers may be more concerned with compensation adequacy and professional status recognition (Klassen & Chiu, 2010). Similarly, rural teachers may have fewer opportunities for professional development (Liu & Onwuegbuzie, 2012), and high school teachers may face distinct workload challenges compared to primary teachers due to the Gaokao (Zhao et al., 2015). Future research employing purposive sampling within specific contexts, such as studies focused exclusively on rural teachers, secondary school teachers, early-career teachers, or private school contexts, would enable a more nuanced examination of how universal versus context-specific well-being dimensions operate within homogeneous teaching contexts. Such targeted investigations could reveal whether certain CS-TWBS items are more or less relevant in specific contexts and could inform more precisely tailored interventions.

Although these studies provide strong psychometric evidence, all data were self-reported and cross-sectional, introducing potential shared-method variance and limiting causal inference. While social desirability was explicitly examined, future research could incorporate multi-method and longitudinal designs to triangulate self-reported well-being with behavioural indicators and to examine temporal stability and sensitivity to intervention.

Despite these limitations, our findings offer clear practical guidance. For school administrators and policymakers, effective interventions should target the context-specific factors identified by CS-TWBS, with particular attention to compensation, policy implementation burden, professional status, psychological support, and parental involvement, which were most prominently identified by Chinese teachers. By addressing these context-specific factors alongside universal workplace dimensions, educational leaders can more effectively support teacher well-being and reduce burnout in Chinese educational contexts.

Conclusion

This research demonstrates that while universal well-being dimensions matter, culture-specific contextual factors capture meaningful variance in Chinese teacher well-being beyond universal measures, with particularly strong implications for understanding burnout. By providing validity evidence for score interpretations from both universal (Mandarin-Chinese TWBS) and context-specific (CS-TWBS) instruments with clear usage guidance, we advance culturally informed teacher well-being assessment. For Chinese teachers, compensation, additional burdens arising from the policy implementation, professional status, and organisational support are not secondary but central to well-being and burnout. Recognising this complexity is essential for developing effective, culturally appropriate interventions supporting teacher well-being and retention in Chinese educational systems.

Supplemental Material

Supplemental material - Validation of the Mandarin-Chinese Teacher Well-Being Scale and Development of a Chinese Context-specific Extension

Supplemental material for Validation of the Mandarin-Chinese Teacher Well-Being Scale and Development of a Chinese Context-Specific Extension by Hongchen Li, David Murphy, Urszula Plust, Shun Chen, Sui He in Journal of Psychoeducational Assessment

Footnotes

Acknowledgements

The authors thank the participating teachers, including current teachers, former teachers, and retired teachers, for their invaluable contributions to this study.

ORCID iDs

Hongchen Li

David Murphy

Shun Chen

Sui He

Ethical Considerations

This study has received ethical approval from the School of Education Ethics Committee at the University of Nottingham (Reference Number: LiH_366).

Consent to Participate

All participants provided informed consent prior to data collection.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to the terms of General Data Protection Regulation (GDPR).*

Supplemental Material

Supplemental material for this article is available online.

References

Aldrup

Klusmann

Lüdtke

Göllner

Trautwein

(2018). Student misbehavior and teacher well-being: Testing the mediating role of the teacher-student relationship. Learning and Instruction, 58(1), 126–136. https://doi.org/10.1016/j.learninstruc.2018.05.006

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education . (2014). Standards for educational and psychological testing. American Educational Research Association.

Anderson

J. C.

Gerbing

D. W.

(1988). Structural equation modeling in practice: A review and recommended two-step approach. Psychological Bulletin, 103(3), 411–423. https://doi.org/10.1037/0033-2909.103.3.411

Bakker

A. B.

Oerlemans

W. G. M.

(2019). Daily job crafting and momentary work engagement: A self-determination and self-regulation perspective. Journal of Vocational Behavior, 112(2), 417–430. https://doi.org/10.1016/j.jvb.2018.12.005

Boyle

G. J.

Borg

M. G.

Falzon

J. M.

Baglioni

A. J.

(1995). A structural model of the dimensions of teacher stress. British Journal of Educational Psychology, 65(1), 49–67. https://doi.org/10.1111/j.2044-8279.1995.tb01130.x

Brislin

R. W.

(1970). Back-translation for cross-cultural research. Journal of Cross-Cultural Psychology, 1(3), 185–216. https://doi.org/10.1177/135910457000100301

Brown

T. A.

(2015). Confirmatory factor analysis for applied research (2nd ed.). Guilford Press.

Campbell

D. T.

Fiske

D. W.

(1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81–105. https://doi.org/10.1037/h0046016

Chen

F. F.

(2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 14(3), 464–504. https://doi.org/10.1080/10705510701301834

10.

Cheng

Fan

Lau

(2023). An integrative review on job burnout among teachers in China: Implications for human resource management. The International Journal of Human Resource Management, 34(3), 529–561. https://doi.org/10.1080/09585192.2022.2078991

11.

Cheung

G. W.

Rensvold

R. B.

(2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9(2), 233–255. https://doi.org/10.1207/S15328007SEM0902_5

12.

Cohen

(1988). Statistical power analysis for the behavioral sciences (2nd ed.). L. Erlbaum Associates.

13.

Collie

R. J.

(2023). Teacher well-being and turnover intentions: Investigating the roles of job resources and job demands. British Journal of Educational Psychology, 93(3), 712–726. https://doi.org/10.1111/bjep.12587

14.

Collie

R. J.

Shapka

J. D.

Perry

N. E.

Martin

A. J.

(2015). Teacher well-being: Exploring its components and a practice-oriented scale. Journal of Psychoeducational Assessment, 33(8), 744–756. https://doi.org/10.1177/0734282915587990

15.

Crowne

D. P.

Marlowe

(1960). A new scale of social desirability independent of psychopathology. Journal of Consulting Psychology, 24(4), 349–354. https://doi.org/10.1037/h0047358

16.

Curran

P. G.

(2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66(3), 4–19. https://doi.org/10.1016/j.jesp.2015.07.006

17.

Curren

Boniwell

Ryan

R. M.

Oades

Brighouse

Unterhalter

Kristjánsson

de Ruyter

Macleod

Morris

White

(2024). Finding consensus on well-being in education. Theory and Research in Education, 22(2), 117–157. https://doi.org/10.1177/14778785241259852

18.

Day

(2014). Resilient teachers, resilient schools: Building and sustaining quality in testing times. Routledge. https://doi.org/10.4324/9780203578490

19.

Diener

Wirtz

Tov

Kim-Prieto

Choi

Oishi

Biswas-Diener

(2010). New well-being measures: Short scales to assess flourishing and positive and negative feelings. Social Indicators Research, 97(2), 143–156. https://doi.org/10.1007/s11205-009-9493-y

20.

Enders

C. K.

Bandalos

D. L.

(2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 8(3), 430–457. https://doi.org/10.1207/S15328007SEM0803_5

21.

Finney

S. J.

DiStefano

(2013). Nonnormal and categorical data in structural equation modeling. In Hancock

G. R.

Mueller

R. O.

(Eds.), Structural equation modeling: A second course, (2nd ed.) (pp. 439–492). Information Age Publishing.

22.

Fox

H. B.

Walter

H. L.

Ball

K. B.

(2023). Methods used to evaluate teacher well-being: A systematic review. Psychology in the Schools, 60(10), 4177–4198. https://doi.org/10.1002/pits.22996

23.

Gao

Liu

Wang

(2025). Cultural foundations of teacher occupational health: The impact of Confucian work dynamism on burnout through grit among Chinese teachers. BMC Psychology, 13(1), Article 390. https://doi.org/10.1186/s40359-025-02724-x

24.

Granziera

Martin

A. J.

Collie

R. J.

(2023). Teacher well-being and student achievement: A multilevel analysis. Social Psychology of Education, 26(2), 279–291. https://doi.org/10.1007/s11218-022-09751-1

25.

Hair

J. F.

Black

W. C.

Babin

B. J.

Anderson

R. E.

(2010). Multivariate data analysis (7th ed.) Pearson Education.

26.

Harrison

M. G.

King

R. B.

Wang

(2023). Satisfied teachers are good teachers: The association between teacher job satisfaction and instructional quality. British Educational Research Journal, 49(3), 476–498. https://doi.org/10.1002/berj.3851

27.

Herzog

A. R.

Bachman

J. G.

(1981). Effects of questionnaire length on response quality. Public Opinion Quarterly, 45(4), 549–559. https://doi.org/10.1086/268687

28.

Bentler

P. M.

(1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118

29.

Huppert

F. A.

T. T. C.

(2013). Flourishing across Europe: Application of a new conceptual framework for defining well-being. Social Indicators Research, 110(3), 837–861. https://doi.org/10.1007/s11205-011-9966-7

30.

Jennings

P. A.

Greenberg

M. T.

(2009). The prosocial classroom: Teacher social and emotional competence in relation to student and classroom outcomes. Review of Educational Research, 79(1), 491–525. https://doi.org/10.3102/0034654308325693

31.

Klassen

R. M.

Chiu

M. M.

(2010). Effects on teachers’ self-efficacy and job satisfaction: Teacher gender, years of experience, and job stress. Journal of Educational Psychology, 102(3), 741–756. https://doi.org/10.1037/a0019237

32.

Kline

R. B.

(2023). Principles and practice of structural equation modeling (5th ed.). Guilford Press.

33.

Klusmann

Kunter

Trautwein

Lüdtke

Baumert

(2008). Teachers’ occupational well-being and quality of instruction: The important role of self-regulatory patterns. Journal of Educational Psychology, 100(3), 702–715. https://doi.org/10.1037/0022-0663.100.3.702

34.

Klusmann

Richter

Lüdtke

(2016). Teachers’ emotional exhaustion is negatively related to students’ achievement: Evidence from a large-scale assessment study. Journal of Educational Psychology, 108(8), 1193–1203. https://doi.org/10.1037/edu0000125

35.

Koo

T. K.

M. Y.

(2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012

36.

Lalwani

A. K.

Shavitt

Johnson

(2006). What is the relation between cultural orientation and socially desirable responding? Journal of Personality and Social Psychology, 90(1), 165–178. https://doi.org/10.1037/0022-3514.90.1.165

37.

Liu

Onwuegbuzie

A. J.

(2012). Chinese teachers’ work stress and their turnover intention. International Journal of Educational Research, 53(2), 160–170. https://doi.org/10.1016/j.ijer.2012.03.006

38.

Gilmour

(2004). Culture and conceptions of happiness: Individual oriented and social oriented swb. Journal of Happiness Studies, 5(3), 269–291. https://doi.org/10.1007/s10902-004-8789-5

39.

Maslach

Jackson

S. E.

(1981). The measurement of experienced burnout. Journal of Organizational Behavior, 2(2), 99–113. https://doi.org/10.1002/job.4030020205

40.

McDonald

R. P.

Marsh

H. W.

(1990). Choosing a multivariate model: Noncentrality and goodness-of-fit. Psychological Bulletin, 107(2), 247–255. https://doi.org/10.1037/0033-2909.107.2.247

41.

Meade

A. W.

Craig

S. B.

(2012). Identifying careless responses in survey data. Psychological Methods, 17(3), 437–455. https://doi.org/10.1037/a0028085

42.

Nunnally

J. C.

Bernstein

I. H.

(1994). Psychometric theory (3rd ed.). McGraw-Hill.

43.

Ones

D. S.

Viswesvaran

Reiss

A. D.

(1996). Role of social desirability in personality testing for personnel selection: The red herring. Journal of Applied Psychology, 81(6), 660–679. https://doi.org/10.1037/0021-9010.81.6.660

44.

Paulhus

D. L.

(1991). Measurement and control of response bias. In Robinson

J. P.

Shaver

P. R.

Wrightsman

L. S.

(Eds.), Measures of personality and social psychological attitudes (pp. 17–59). Academic Press.

45.

Podsakoff

P. M.

MacKenzie

S. B.

Lee

J.-Y.

Podsakoff

N. P.

(2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88(5), 879–903. https://doi.org/10.1037/0021-9010.88.5.879

46.

Putnick

D. L.

Bornstein

M. H.

(2016). Measurement invariance conventions and reporting: The state of the art and future directions for psychological research. Developmental Review, 41(3), 71–90. https://doi.org/10.1016/j.dr.2016.06.004

47.

Reise

S. P.

Moore

T. M.

Haviland

M. G.

(2010). Bifactor models and rotations: Exploring the extent to which multidimensional data yield univocal scale scores. Journal of Personality Assessment, 92(6), 544–559. https://doi.org/10.1080/00223891.2010.496477

48.

Reise

S. P.

Scheines

Widaman

K. F.

Haviland

M. G.

(2013). Multidimensionality and structural coefficient bias in structural equation modeling: A bifactor perspective. Educational and Psychological Measurement, 73(1), 5–26. https://doi.org/10.1177/0013164412449831

49.

Rodriguez

Reise

S. P.

Haviland

M. G.

(2016). Evaluating bifactor models: Calculating and interpreting statistical indices. Psychological Methods, 21(2), 137–150. https://doi.org/10.1037/met0000045

50.

Ryan

R. M.

Deci

E. L.

(2001). On happiness and human potentials: A review of research on hedonic and eudaimonic well-being. Annual Review of Psychology, 52(1), 141–166. https://doi.org/10.1146/annurev.psych.52.1.141

51.

Ryff

C. D.

(1989). Happiness is everything, or is it? Explorations on the meaning of psychological well-being. Journal of Personality and Social Psychology, 57(6), 1069–1081. https://doi.org/10.1037/0022-3514.57.6.1069

52.

Seligman

M. E. P.

(2011). Flourish: A visionary new understanding of happiness and well-being. Free Press.

53.

Tang

Duan

Wang

Liu

(2016). Psychometric evaluation of the simplified Chinese version of flourishing scale. Research on Social Work Practice, 26(5), 591–599. https://doi.org/10.1177/1049731514557832

54.

Tao

Guoying

Brody

(2009). Preliminary study of a Chinese language short form of the Marlowe–Crowne Social Desirability Scale. Psychological Reports, 105(3_suppl), 1039–1046. https://doi.org/10.2466/PR0.105.F.1039-1046

55.

Tourangeau

Yan

(2007). Sensitive questions in surveys. Psychological Bulletin, 133(5), 859–883. https://doi.org/10.1037/0033-2909.133.5.859

56.

Wang

Hall

N. C.

King

R. B.

(2021). A longitudinal investigation of teachers’ emotional labor, well-being, and perceived student engagement. Educational Psychology, 41(10), 1319–1336. https://doi.org/10.1080/01443410.2021.1988060

57.

Willis

G. B.

(2005). Cognitive interviewing: A tool for improving questionnaire design. Sage Publications. https://doi.org/10.4135/9781412983655

58.

Worthington

R. L.

Whittaker

T. A.

(2006). Scale development research: A content analysis and recommendations for best practices. The Counseling Psychologist, 34(6), 806–838. https://doi.org/10.1177/0011000006288127

59.

X.-C.

Y.-J.

R.-R.

Zang

W.-W.

(2016). Revision of Chinese primary and secondary school teachers’ job burnout questionnaire. Chinese Journal of Clinical Psychology, 24(5), 856–860. https://doi.org/10.16128/j.cnki.1005-3611.2016.05.020

60.

Zhao

Selman

R. L.

Haste

(2015). Academic stress in Chinese schools and a proposed preventive intervention program. Cogent Education, 2(1), Article 1000477. https://doi.org/10.1080/2331186X.2014.1000477

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.73 MB