The Association Between Physician Demographics on Patient Satisfaction Survey Results in Pediatrics: A Role for Implicit Bias?

Abstract

We assessed associations of pediatrician demographic characteristics with patient satisfaction (PS) scores. We performed a retrospective analysis of PS scores among pediatricians at a single academic institution and their associations with individual demographic features including gender, race, and geographic location of medical school education. We analyzed PS survey results for 153 pediatricians, 48.4% of whom were female. Males received higher scores in 4 out of 10 questions including “Likelihood of your recommending this care provider to others” (P = .007). We observed similar findings for white pediatricians compared to Black, Indigenous, and People of Color (BIPOC) (P = .033) and US medical school graduates compared to international school graduates (P = .044). Overall, we observed that pediatricians who are female, BIPOC, or international medical school graduates receive lower PS scores than their counterparts. The impact of potential survey responder bias should be acknowledged when interpreting PS scores for pediatricians.

Keywords

patient experience bias burnout

Introduction

Surveying patients regarding their satisfaction with the medical care they have received has become common practice in the United States over the last 10 years since the Affordable Care Act directed the US Secretary of Health and Human Services to compare physicians using patient satisfaction (PS) survey data.¹ There has been no consistent association between quality of care and PS scores, with some studies suggesting a direct correlation,^2-7 while others suggest an inverse correlation.^8-13 Despite the lack of clear evidence that PS scores reflect better medical care, some institutions alter physician compensation based on PS scores.¹⁴ The impact of this system on physicians has been studied in adult medicine;^15,16 however, there have only been limited studies assessing the impact on pediatricians. Recently, our group demonstrated that PS scoring systems negatively impact pediatricians’ practice behavior and job satisfaction,¹⁷ and there is some evidence that pediatricians may receive systematically lower scores than adult physicians.¹⁸

One of the potential concerns with PS scoring systems is the possibility that survey scores may be influenced by rater implicit bias.¹⁹ In this study, we sought to investigate the impact of physician demographics on PS scores, highlighting the potential role of rater implicit bias.

Methods

Patient satisfaction surveys are administered by Press Ganey at the study institution (Mayo Clinic), a single academic quaternary health care organization in Rochester, Minnesota. Following a clinical visit, PS surveys are sent to a random subset of patients and/or caregivers. Around 30 000 surveys are sent to pediatric patients’ families per year with a response rate typically around 15%. The target is for each physician to have at least 100 survey responses per year. Physicians receive a quarterly score report of the survey responses from their patients. A key survey question asks families to rate their “Likelihood to Recommend” (LTR) a specific physician who provided care by selecting one of the 5 choices: Very Poor, Poor, Fair, Good, and Very Good. The physician’s score report highlights the percentage of patients/caregivers who responded that the likelihood for them to recommend the specific physician was “Very Good.” Maximizing “Very Good” scores, deemed the “Top Box Score” (TBS) as it reflects the top rating a physician can receive for each survey question, is the highest priority metric for physicians to achieve at the study institution.

For this analysis, physicians with primary or secondary appointments in the Department of Pediatric and Adolescent Medicine who had PS surveys returned between January 2017 and December 2020 were included. Physician demographic data regarding gender, race, ethnicity (Hispanic or non-Hispanic), presence of a detectable accent that was not typical of the United States or United Kingdom, domestic (Doctor of Medicine [MD] or Doctor of Osteopathic Medicine [DO]) or foreign medical school, years since medical school graduation, and years since joining the faculty at Mayo Clinic were collected from institutional databases and personal interaction by the study team. Subjective categories (gender, race, ethnicity, and accent) were determined by at least 2 study team members, and any discrepancies were reconciled through discussion with the entire team. To facilitate analysis for this article, race was dichotomized into 2 categories: “white” and “BIPOC” (Black, Indigenous, and People of Color, which includes those who responded as Black or African American, Asian, Middle Eastern, or Mestizo). The authors acknowledge that race is a social construct and included this analysis specifically to identify biases that hinder marginalized people.

All Press Ganey PS surveys for included physicians were obtained from Press Ganey for the study time period. Based on recommendations from Press Ganey, we excluded from the analysis any physician with fewer than 8 survey responses over a period of 3 months (n = 512) to avoid the limitations created by small sample sizes. Surveys for patients age >18 (n = 1293) or who declined Minnesota Research Authorization were excluded (n = 960). All surveys were from outpatient visits, as inpatient surveys do not reflect individual physicians at our institution. Telemedicine visits were not included, as surveying procedures changed significantly over the study period and were not deemed consistent enough for analysis. A total of 9888 surveys from 153 physicians were included in analysis.

Data from survey responses to 10 questions (over 98 000 responses) related to physician performance were analyzed. Responses were aggregated into a single score for each physician in 2 different formats: (1) “TBS” and (2) “Average Linear Score” (ALS). To calculate both scores, individual question responses were assigned a value ranging from 1 (Very Poor) to 5 (Very Good). The TBS of a single question for an individual provider represents the percentage of responses of 5, where 5 is the best score achievable; the possible range of the TBS is 0 to 100. The ALS of a single question for an individual provider represents the average of responses for that provider; the possible range of the ALS is 1 to 5. If a provider had at least 1 non-missing response for a given survey question, the TBS and ALS were calculated for that question for the given provider.

Provider demographics were summarized as frequency (n) with percentage (%). The TBS and ALS were summarized as median, interquartile range (IQR = Q1-Q3), and range. To investigate associations between physician demographic data and PS score results, Kruskal-Wallis tests compared TBS and ALS by physician demographics. P < .05 was considered significant. All analysis was performed using SAS version 9.4 (SAS Institute Inc, Cary, North Carolina). This protocol was approved by the Mayo Clinic IRB.

Results

Survey data for a total of 153 pediatrician subjects were analyzed, and physician demographic distribution is displayed in Table 1. Half of physicians (48.4%) were female, 22.9% were BIPOC, and 22.9% attended an international medical school. Figure 1A and B illustrates the distribution of survey responses to the LTR query as analyzed with TBS and ALS methods by gender. Table 2 displays the TBS data by gender. Males received higher scores in 4 out of 10 questions: care providers’ discussion of any proposed treatment (options, risks, benefits, etc) (median [IQR: Q1-Q3] = 90.5 [82.1-94.4] vs 85.4 [76.2-90.2], P = .005); degree to which the care provider talked with you using words you could understand (87.5 [81.8-92.8] vs 84.8 [78.6-89.8], P = .025); your confidence in this care provider (88.5 [82.4-92.8] vs 83.3 [77.8-89.5], P = .004); likelihood of your recommending this care provider to others (LTR; 87.5 [82.8-92.1] vs 83.8 [77.4-89.0], P = .007). We observed similar gender differences on the same questions when analyzed by ALS (Table 3 and Figure 2). Median TBS or ALS was not higher for females on any question.

Table 1.

Summary of Physician Characteristics.

	Total(n = 153)
Provider gender, n (%)
Female	74 (48.4%)
Male	79 (51.6%)
Provider race, n (%)
Asian	7 (4.6%)
Black	1 (0.7%)
Middle Eastern	10 (6.5%)
South Asian	17 (11.1%)
White	118 (77.1%)
Provider race (white vs BIPOC), n (%)^a
White	118 (77.1%)
South Asian/Middle Eastern/Asian/Black	35 (22.9%)
Provider ethnicity, n (%)
Hispanic	7 (4.6%)
Non-Hispanic	146 (95.4%)
Provider accent, n (%)
No	124 (81.0%)
Yes	29 (19.0%)
Provider medical school (international vs American), n (%)
American	118 (77.1%)
International	35 (22.9%)
Provider medical school degree (international vs American MD vs American DO), n (%)
DO	10 (6.5%)
I-MBBCh	1 (0.7%)
I-MBBS	3 (2.0%)
I-MBChB	1 (0.7%)
I-MD	30 (19.6%)
MD	108 (70.6%)
Provider medical school degree, n (%)
Domestic MD	108 (70.6%)
Domestic DO	10 (6.5%)
International MD	30 (19.6%)
International other	5 (3.3%)
Provider years since medical school graduation
Median (Q1, Q3)	22 (15, 31)
Range	5, 53
Provider years on staff at Mayo
Median (Q1, Q3)	9 (5, 20)
Range	1, 47
Number of surveys per provider
Median (Q1, Q3)	58 (30, 90)
Range	8, 209
Top box score^b: Concern the care provider showed for your questions or worries
Median (Q1, Q3)	84.0 (79.2, 88.9)
Range	57.8, 100.0
Top box score: Explanations the care provider gave you about your problem or condition
Median (Q1, Q3)	83.1 (77.8, 89.7)
Range	56.7, 100.0
Top box score: Care provider’s efforts to include you in decisions about your care
Median (Q1, Q3)	83.5 (78.3, 88.3)
Range	58.9, 100.0
Top box score: Care provider’s discussion of any proposed treatment (options, risks, benefits, etc)^c
Median (Q1, Q3)	87.5 (79.2, 93.5)
Range	50.0, 100.0
Top box score: Friendliness/courtesy of the care provider
Median (Q1, Q3)	87.5 (82.4, 91.9)
Range	54.5, 100.0
Top box score: Instructions the care provider gave you about follow-up care (if any)
Median (Q1, Q3)	80.2 (73.7, 86.9)
Range	53.6, 100.0
Top box score: Degree to which the care provider talked with you using words you could understand
Median (Q1, Q3)	85.9 (80.6, 90.9)
Range	52.4, 100.0
Top box score: Amount of time the care provider spent with you
Median (Q1, Q3)	84.3 (76.9, 88.5)
Range	48.3, 100.0
Top box score: Your confidence in this care provider
Median (Q1, Q3)	86.5 (79.5, 90.9)
Range	59.1, 100.0
Top box score: Likelihood of your recommending this care provider to others
Median (Q1, Q3)	85.2 (80.0, 90.0)
Range	54.2, 100.0

BIPOC: Black, Indigenous, and People of Color, including Black or African American, Asian, Middle Eastern, or Mestizo.

Top Box Score is the percentage of respondents who answered with the top answer choice (“Very Good”) for each survey question.

n = 123 providers included in the analysis of Top Box Score for care provider’s discussion of any proposed treatment.

Figure 1.

(A) Distribution of “Top Box Score” (TBS) rating for “likelihood to recommend” (LTR) by gender. (B) Distribution of physicians’ average linear score (range = 1-5) for “likelihood to recommend” (LTR) by gender. Top Box Score is the percentage of respondents who answered with the top answer choice (“Very Good”) for each survey question.

Table 2.

Top Box Score by Provider Gender.

	Provider gender
Top Box Score	Female(n = 74)	Male(n = 79)	Total(n = 153)	P ^a
Concern the care provider showed for your questions or worries				.282
Median (Q1, Q3)	83.3 (78.8, 88.2)	84.6 (79.4, 89.8)	84.0 (79.2, 88.9)
Range	57.8, 96.5	63.6, 100.0	57.8, 100.0
Explanations the care provider gave you about your problem or condition				.072
Median (Q1, Q3)	82.6 (77.1, 87.7)	84.5 (77.8, 91.1)	83.1 (77.8, 89.7)
Range	56.7, 98.2	65.7, 100.0	56.7, 100.0
Care provider’s efforts to include you in decisions about your care				.051
Median (Q1, Q3)	81.6 (77.3, 87.5)	85.2 (79.5, 90.5)	83.5 (78.3, 88.3)
Range	58.9, 95.5	63.6, 100.0	58.9, 100.0
Care provider’s discussion of any proposed treatment (options, risks, benefits, etc)^b				.005
Median (Q1, Q3)	85.4 (76.2, 90.2)	90.5 (82.1, 94.4)	87.5 (79.2, 93.5)
Range	52.9, 100.0	50.0, 100.0	50.0, 100.0
Friendliness/courtesy of the care provider				.067
Median (Q1, Q3)	85.8 (80.7, 90.6)	88.3 (83.3, 92.8)	87.5 (82.4, 91.9)
Range	58.9, 100.0	54.5, 100.0	54.5, 100.0
Instructions the care provider gave you about follow-up care (if any)				.059
Median (Q1, Q3)	79.2 (72.9, 84.7)	82.1 (74.2, 88.9)	80.2 (73.7, 86.9)
Range	53.6, 98.0	55.6, 100.0	53.6, 100.0
Degree to which the care provider talked with you using words you could understand				.025
Median (Q1, Q3)	84.8 (78.6, 89.8)	87.5 (81.8, 92.8)	85.9 (80.6, 90.9)
Range	60.0, 100.0	52.4, 100.0	52.4, 100.0
Amount of time the care provider spent with you				.091
Median (Q1, Q3)	83.3 (75.9, 87.5)	84.9 (78.4, 89.6)	84.3 (76.9, 88.5)
Range	48.3, 96.3	61.9, 100.0	48.3, 100.0
Your confidence in this care provider				.004
Median (Q1, Q3)	83.3 (77.8, 89.5)	88.5 (82.4, 92.8)	86.5 (79.5, 90.9)
Range	59.1, 96.3	66.7, 100.0	59.1, 100.0
Likelihood of your recommending this care provider to others				.007
Median (Q1, Q3)	83.8 (77.4, 89.0)	87.5 (82.8, 92.1)	85.2 (80.0, 90.0)
Range	54.2, 96.4	66.7, 100.0	54.2, 100.0

Top Box Score is the percentage of respondents who answered with the top answer choice (“Very Good”) for each survey question.

All comparisons use the Kruskal-Wallis P-value.

n = 123 providers (62 female, 61 male) included in analysis of Top Box Score for care provider’s discussion of any proposed treatment.

Table 3.

Average Linear Score by Provider Gender.

	Provider gender
Average linear score (range = 1-5)	Female (n = 74)	Male (n = 79)	Total (n = 153)	P ^a
Concern the care provider showed for your questions or worries				.419
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.8 (4.7, 4.9)	4.8 (4.7, 4.9)
Range	4.1, 5.0	4.4, 5.0	4.1, 5.0
Explanations the care provider gave you about your problem or condition				.103
Median (Q1, Q3)	4.8 (4.7, 4.8)	4.8 (4.7, 4.9)	4.8 (4.7, 4.9)
Range	4.2, 5.0	4.4, 5.0	4.2, 5.0
Care provider’s efforts to include you in decisions about your care				.107
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.8 (4.7, 4.9)	4.8 (4.7, 4.9)
Range	4.2, 4.9	4.5, 5.0	4.2, 5.0
Care provider’s discussion of any proposed treatment (options, risks, benefits, etc)^b				.008
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.9 (4.8, 4.9)	4.8 (4.7, 4.9)
Range	4.2, 5.0	4.3, 5.0	4.2, 5.0
Friendliness/courtesy of the care provider				.065
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.9 (4.8, 4.9)	4.8 (4.8, 4.9)
Range	4.3, 5.0	4.3, 5.0	4.3, 5.0
Instructions the care provider gave you about follow-up care (if any)				.187
Median (Q1, Q3)	4.7 (4.6, 4.8)	4.7 (4.6, 4.9)	4.7 (4.6, 4.8)
Range	4.2, 5.0	4.3, 5.0	4.2, 5.0
Degree to which the care provider talked with you using words you could understand				.032
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.9 (4.8, 4.9)	4.8 (4.8, 4.9)
Range	4.5, 5.0	4.4, 5.0	4.4, 5.0
Amount of time the care provider spent with you				.102
Median (Q1, Q3)	4.8 (4.7, 4.8)	4.8 (4.7, 4.9)	4.8 (4.7, 4.9)
Range	4.0, 5.0	4.4, 5.0	4.0, 5.0
Your confidence in this care provider				.014
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.8 (4.8, 4.9)	4.8 (4.7, 4.9)
Range	4.2, 5.0	4.4, 5.0	4.2, 5.0
Likelihood of your recommending this care provider to others				.024
Median (Q1, Q3)	4.8 (4.7, 4.8)	4.8 (4.7, 4.9)	4.8 (4.7, 4.9)
Range	4.1, 4.9	4.1, 5.0	4.1, 5.0

All comparisons use the Kruskal-Wallis P-value.

n = 123 providers (62 female, 61 male) included in analysis of Top Box Score for care provider’s discussion of any proposed treatment.

Figure 2.

Average linear score by provider gender. Lower, middle, and upper values of each box represent 25th percentile, median, and 75th percentile, respectively, and whiskers and individual points display the range. P-values are from Kruskal-Wallis tests.

We did not observe significant differences in TBS by race. However, analysis of ALS by dichotomized race demonstrated higher scores for white physicians for 4 questions (Table 4 and Figure 3): explanations the care provider gave you about your problem or condition (median [IQR: Q1-Q3] = 4.8 [4.7-4.9] vs 4.7 [4.7-4.8], P = .013); care provider’s efforts to include you in decisions about your care (4.8 [4.7-4.9] vs 4.7 [4.7-4.8], P = .025); friendliness/courtesy of the care provider (4.9 [4.8-4.9] vs 4.8 [4.7-4.9], P = .046); likelihood of your recommending this care provider to others (4.8 [4.7-4.9] vs 4.7 [4.7-4.8], P = .033). We did not observe any differences when grouped by ethnicity (data not shown), and this is likely due to our small number of Hispanic physicians (n = 7) in the cohort.

Table 4.

Average Linear Score by Provider Race (white vs BIPOC).

	Provider race
Average linear score (range = 1-5)	White(n = 118)	BIPOC(n = 35)	Total(n = 153)	P ^a
Concern the care provider showed for your questions or worries				.058
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.7 (4.7, 4.8)	4.8 (4.7, 4.9)
Range	4.1, 5.0	4.4, 4.9	4.1, 5.0
Explanations the care provider gave you about your problem or condition				.013
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.7 (4.7, 4.8)	4.8 (4.7, 4.9)
Range	4.2, 5.0	4.4, 4.9	4.2, 5.0
Care provider’s efforts to include you in decisions about your care				.025
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.7 (4.7, 4.8)	4.8 (4.7, 4.9)
Range	4.2, 5.0	4.4, 4.9	4.2, 5.0
Care provider’s discussion of any proposed treatment (options, risks, benefits, etc)^b				.175
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.8 (4.7, 4.9)	4.8 (4.7, 4.9)
Range	4.2, 5.0	4.3, 5.0	4.2, 5.0
Friendliness/courtesy of the care provider				.046
Median (Q1, Q3)	4.9 (4.8, 4.9)	4.8 (4.7, 4.9)	4.8 (4.8, 4.9)
Range	4.3, 5.0	4.5, 5.0	4.3, 5.0
Instructions the care provider gave you about follow-up care (if any)				.126
Median (Q1, Q3)	4.8 (4.6, 4.8)	4.7 (4.6, 4.8)	4.7 (4.6, 4.8)
Range	4.2, 5.0	4.3, 5.0	4.2, 5.0
Degree to which the care provider talked with you using words you could understand				.050
Median (Q1, Q3)	4.8 (4.8, 4.9)	4.8 (4.7, 4.9)	4.8 (4.8, 4.9)
Range	4.4, 5.0	4.6, 5.0	4.4, 5.0
Amount of time the care provider spent with you				.133
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.8 (4.7, 4.8)	4.8 (4.7, 4.9)
Range	4.0, 5.0	4.4, 5.0	4.0, 5.0
Your confidence in this care provider				.061
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.8 (4.7, 4.8)	4.8 (4.7, 4.9)
Range	4.2, 5.0	4.2, 5.0	4.2, 5.0
Likelihood of your recommending this care provider to others				.033
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.7 (4.7, 4.8)	4.8 (4.7, 4.9)
Range	4.1, 5.0	4.1, 4.9	4.1, 5.0

BIPOC: Black, Indigenous, and People of Color, including Black or African American, Asian, Middle Eastern, or Mestizo.

All comparisons use the Kruskal-Wallis P-value.

n = 123 providers (95 white, 28 BIPOC) included in analysis of Top Box Score for care provider’s discussion of any proposed treatment.

Figure 3.

Average linear score by provider race (white vs BIPOC). Lower, middle, and upper values of each box represent 25th percentile, median, and 75th percentile, respectively, and whiskers and individual points display the range. BIPOC includes Black, Indigenous, and People of Color, including Black or African American, Asian, Middle Eastern, or Mestizo. P-values are from Kruskal-Wallis tests.

When we analyzed TBS for physicians based on where they attended medical school, we observed higher scores for US medical school graduates compared to international school graduates on 1 question (care provider’s efforts to include you in decisions about your care, median [IQR: Q1-Q3] = 85.0 [79.5-90.0] vs 79.8 [76.9-86.4], P = .017; data not shown), but 4 questions when analyzed by ALS (Table 5 and Figure 4): explanations the care provider gave you about your problem or condition (median [IQR: Q1-Q3] = 4.8 [4.7-4.9] vs 4.7 [4.7-4.8], P = .023); care provider’s efforts to include you in decisions about your care (4.8 [4.7-4.9] vs 4.7 [4.7-4.8], P = .017); degree to which the care provider talked with you using words you could understand (4.8 [4.8-4.9] vs 4.8 [4.7-4.9], P = .023); and likelihood of your recommending this care provider to others (4.8 [4.7-4.9] vs 4.8 [4.7-4.8], P = .044). Similar trends were observed when we compared accented to non-accented physicians, although these did not reach statistical significance (data not shown). No differences were observed between MD and DO physicians, and this is likely due to our small number of DO physicians (n = 10) in the cohort.

Table 5.

Average Linear Score by Physician Medical School Location.

	Medical school location
Average linear score (range = 1-5)	American(n = 118)	International(n = 35)	Total(n = 153)	P ^a
Concern the care provider showed for your questions or worries				.140
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.8 (4.7, 4.8)	4.8 (4.7, 4.9)
Range	4.1, 5.0	4.5, 4.9	4.1, 5.0
Explanations the care provider gave you about your problem or condition				.023
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.7 (4.7, 4.8)	4.8 (4.7, 4.9)
Range	4.2, 5.0	4.5, 4.9	4.2, 5.0
Care provider’s efforts to include you in decisions about your care				.017
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.7 (4.7, 4.8)	4.8 (4.7, 4.9)
Range	4.2, 5.0	4.5, 4.9	4.2, 5.0
Care provider’s discussion of any proposed treatment (options, risks, benefits, etc)^b				.201
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.8 (4.7, 4.9)	4.8 (4.7, 4.9)
Range	4.2, 5.0	4.3, 5.0	4.2, 5.0
Friendliness/courtesy of the care provider				.099
Median (Q1, Q3)	4.9 (4.8, 4.9)	4.8 (4.8, 4.9)	4.8 (4.8, 4.9)
Range	4.3, 5.0	4.5, 5.0	4.3, 5.0
Instructions the care provider gave you about follow-up care (if any)				.273
Median (Q1, Q3)	4.8 (4.6, 4.8)	4.7 (4.6, 4.8)	4.7 (4.6, 4.8)
Range	4.2, 5.0	4.4, 5.0	4.2, 5.0
Degree to which the care provider talked with you using words you could understand				.023
Median (Q1, Q3)	4.8 (4.8, 4.9)	4.8 (4.7, 4.9)	4.8 (4.8, 4.9)
Range	4.4, 5.0	4.6, 5.0	4.4, 5.0
Amount of time the care provider spent with you				.550
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.8 (4.7, 4.9)	4.8 (4.7, 4.9)
Range	4.0, 5.0	4.5, 5.0	4.0, 5.0
Your confidence in this care provider				.198
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.8 (4.7, 4.9)	4.8 (4.7, 4.9)
Range	4.2, 5.0	4.5, 5.0	4.2, 5.0
Likelihood of your recommending this care provider to others				.044
Median (Q1, Q3)	4.8 (4.7, 4.9)	4.8 (4.7, 4.8)	4.8 (4.7, 4.9)
Range	4.1, 5.0	4.4, 5.0	4.1, 5.0

All comparisons use the Kruskal-Wallis P-value.

n = 123 providers (94 American, 29 International) included in analysis of Top Box Score for care provider’s discussion of any proposed treatment.

Figure 4.

Average linear score by physician medical school location. Lower, middle, and upper values of each box represent 25th percentile, median, and 75th percentile, respectively, and whiskers and individual points display the range. P-values are from Kruskal-Wallis tests.

Discussion

This study is the first to evaluate the association between physician gender, race, training background, and PS scores for pediatricians in an outpatient practice. In our study, we observed differences in PS scores related to physician demographic characteristics. Specifically, the data demonstrate that pediatricians who are female, BIPOC, or international medical school graduates receive lower PS scores than pediatricians who are male, white, and graduates of US medical schools.

The use of different modalities to survey patients’ satisfaction with the health care they receive has become standard practice. Every day thousands of patients receive phone calls, electronic communication, or traditional mail with questions asking about their care after being evaluated or discharged from the US medical health system. Patient satisfaction surveys were designed to evaluate patient experiences with physicians and health care staff, as well as the entire institutional experience (eg, parking, ease of making an appointment, and wait time), as a tool for improving the care provided by individual physicians, medical health system, or networks.²⁰ Many employers and health insurance companies are using patient satisfaction scores as a metric to determine a portion of physician compensation and/or promotion (although not the institution in which this study was performed), which can result in significant stress and pressure to change behavior in order to receive higher scores.¹⁴

While most of the focus has been on the US health care system, other regions of the world are also measuring PS and exploring the impact. A recent paper compared US and European methods for evaluating PS and found a wide variety of approaches including incorporating Internet-based narrative feedback in the United Kingdom and posting all results publicly in Finland.²¹ Reports out of China, Japan, and Korea support growing interest in studying PS in Asian health care systems as well.^22-24 As the appetite to incorporate metrics for measuring PS grows worldwide, the priority remains that the patient/physician relationship should start with mutual respect and collaboration to reach the best outcome for the patients.

Most health organizations have implemented programs to mitigate implicit bias against patients and change any policies or practices that perpetuate it. These policies and education efforts are predominantly directed at the health care providers. Often overlooked are the biases of patients and how they can affect the care relationship, harm health care provider well-being, subject health care providers to discrimination, or impact PS survey results. One major concern about reporting individual PS scores for each provider is that they are often reported as percentiles, so very small differences can lead to a large variation in force-ranked percentile and lower scores, which results in significant stress, anxiety, and burnout among physicians.¹⁶ Our group has demonstrated a negative impact of PS scores on pediatricians, especially those who are female, BIPOC, subspecialists, younger, and attended non-US medical schools.¹⁷

Our findings are similar to some previous reports. Soto-Santiago et al²⁵ observed lower PS scores for female and non-white physicians. Martinez et al²⁶ noted lower scores for South Asian and East Asian physicians among physicians providing care to adults via telemedicine. Asian ethnicity was also associated with a lower likelihood of receiving a 5-star likelihood of recommending orthopedic surgeons.²⁷ The small sample size of BIPOC physicians in our cohort did not allow us to examine differences between Asian and Black American physicians. Higher PS scores have been associated with racial/ethnic concordance between patients and their physicians.²⁸ Given this, the higher PS scores we observed in white physicians may be partly related to the demographics of the patients served by our medical center, with the vast majority being white. While we are not aware of any PS measurement methods that include bi-directional surveying in which physicians rate patients in parallel, this could be an intriguing next step to provide insight into the impact of concordance and other deeper associations affecting PS and physician well-being.

We found women physicians had lower PS scores in comparison to their male colleagues. The data on the association between PS scores and physician gender are mixed. A study among gynecologists demonstrated that female providers are less likely to receive top scores when compared to their male colleagues,²⁹ while no association was found among otolaryngologists.³⁰ A meta-analysis investigating associations regarding gender in adult physicians showed that female physicians generally scored better, especially with younger patients.³¹ In a randomized clinical trial in which a clinical vignette was presented to participants with a black or white, male or female physician, no significant differences were found in simulated patients’ evaluations of female or black physicians.³² Furthermore, many studies support that female providers face common stereotypes and biases at work like their physician role being misidentified by patients³³ or getting referred to by their first name rather than their professional name by patients and male colleagues.^34,35

We found higher PS scores for physicians from American medical schools in comparison to those from foreign medical schools and a similar trend in physicians with an accent. We are not aware of any other studies that have examined the association between medical schools and accent with patient experience scores, although it is well-documented that international medical graduates do experience bias.³⁶

Strengths of our study include investigating the associations between PS scores in pediatrics, a specialty in which data are lacking. We wish to emphasize the importance of studying PS within pediatrics because the inherently different logistics in surveying make extrapolation from adult studies problematic. Specifically, the major difference is that in pediatrics (in most cases), the patient themselves does not complete the survey, but rather it is completed by the patients’ parent. This added layer of complexity could, perhaps, lead one to speculate that PS scores are even more difficult to link to quality of care than in adult medicine.

In addition, our cohort of pediatricians is well-balanced in terms of gender and had reasonably good representation among pediatricians who are BIPOC and international medical school graduates. Another strength is the availability of a large number of survey data points. It is also notable that we chose to have the investigators assign physician gender, race, and ethnicity rather than have them self-reported. This was an intentional choice based on the notion that this method would more accurately reflect that the survey responders are also making these same judgments of the physicians detached from the physicians’ own self-reporting.

Limitations of our study include that the cohort is from a single institution with limited sample size and patient ethnic diversity. In addition, the study design is cross-sectional and applied only univariate analysis, thus limiting interpretation of the results. Another limitation is the absence of data on patient race and ethnicity, given that racial and ethnic concordance between patients and their physicians has been associated with higher patient experience scores.²⁸ Also, patient race by itself has also been shown to influence patient experience scores with Asian patients less likely to give the highest patient’s satisfaction scores regardless of provider race and ethnicity.³⁷ We did not have information on the gender of the adult caregiver completing the survey, which previous studies suggest could impact the results.^31,38,39 Other potential causes of differences in PS scores must be acknowledged, such as physician experience and ability, patient load, duration of time spent with patients, ease of navigating the clinic visit, underlying medical condition(s) of the patient, and patient demographics/socioeconomic status, among others. Our study design did not allow for accounting for these and other potential confounding factors. We also acknowledge the challenge in determining the clinical relevance in differences where we identified statistically significant differences.

Acknowledging the above limitations, our findings highlight the potential bias in PS surveying within pediatrics, significantly impacting pediatricians who are female, BIPOC, and graduates of international medical schools. Although the study was not designed to determine a causal relationship between surveyor implicit bias and PS scoring, the results suggest that implicit bias could play a role and may require future study into the mechanisms undergirding the differences found here.

These observations, combined with the concern for the negative impact of PS surveying on pediatricians, should compel administrators and physicians to carefully weigh the risks and benefits of PS surveying and develop strategies to minimize harm while still making available the potentially useful information from this method of feedback. At the very least, our findings suggest that using PS scores to determine compensation or other tangible factors contributing to quality of life is inappropriate. Rather, physician compensation should be based on more objective metrics like outcomes, years of training and experience, and productivity. In an effort to promote an inclusive and diverse workforce in health care, we propose that PS scores should be used only as a tool to identify areas of improvement for patient experience and not as metrics for physician performance or competence. Larger controlled studies are needed to better understand the true role of implicit bias in patient experience and PS scoring.

Conclusions

Pediatricians who are female, BIPOC, or international medical school graduates are more likely to receive lower PS scores. The impact of these findings must be acknowledged when addressing institutional systems for utilizing PS scores as a performance metric, as well as addressing workforce burnout and physician well-being.

Author Contributions

Dr DJS: Conceptualized the study; designed the data collection instruments and methods; analyzed the data; drafted the initial manuscript; reviewed and revised the manuscript.

Dr IA: Conceptualized the study; designed the data collection instruments and methods; analyzed the data; reviewed and revised the manuscript.

Dr AYJ: Conceptualized the study; designed the data collection instruments and methods; analyzed the data; reviewed and revised the manuscript.

Dr ALC: Conceptualized the study; designed the data collection instruments and methods; analyzed the data; reviewed and revised the manuscript.

Dr SK: Conceptualized the study; designed the data collection instruments and methods; analyzed the data; reviewed and revised the manuscript.

Dr SMP: Designed the data collection instruments and methods; analyzed the data; reviewed and revised the manuscript.

Ms KTH: Designed the data collection instruments and methods; analyzed the data; reviewed and revised the manuscript.

Ms SB: Conceptualized the study and designed the data collection instruments and methods.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding for this study was from benefactor gifts through the Mayo Clinic Children’s Research Center, which had no role in the design or execution of this study or in the decision to submit the article for publication. None of the authors received any honorarium, grant, or other form of payment for contributing to this project. The funder/sponsor did not participate in the work.

Ethical Approval

This study was approved by the Mayo Clinic Institutional Review Board.

ORCID iDs

David J. Sas

Sean M. Phelan

Ana L. Creo

References

Glickman

Schulman

KA.

The mis-measure of physician performance. Am J Manag Care. 2013;19(10):782-785.

Bekelis

Missios

MacKenzie

O’Shaughnessy

PM.

Does objective quality of physicians correlate with patient satisfaction measured by hospital compare metrics in New York State?

World Neurosurg. 2017;103:852-858.e1.

Brousseau

Hoffmann

Nattinger

Flores

Zhang

Gorelick

Quality of primary care and subsequent pediatric emergency department utilization. Pediatrics. 2007;119(6):1131-1138.

Isaac

Zaslavsky

Cleary

Landon

BE.

The relationship between patients’ perception of care and measures of hospital quality and safety. Health Serv Res. 2010;45(4):1024-1040.

Kuo

Bird

Tilford

JM.

Associations of family-centered care with health care outcomes for children with special health care needs. Matern Child Health J. 2011;15(6):794-805.

Saman

Kavanagh

Johnson

Lutfiyya

MN.

Can inpatient hospital experiences predict central line-associated bloodstream infections?

PLoS ONE. 2013;8(4):e61097.

Stein

Day

Karia

Hutzler

Bosco

III . Patients’ perceptions of care are associated with quality of hospital care: a survey of 4605 hospitals. Am J Med Qual. 2015;30(4):382-388.

Fenton

Jerant

Bertakis

Franks

The cost of satisfaction: a national study of patient satisfaction, health care utilization, expenditures, and mortality. Arch Intern Med. 2012;172(5):405-411.

Godil

Parker

Zuckerman

, et al. Determining the quality and effectiveness of surgical spine care: patient satisfaction is not a valid proxy. Spine J. 2013;13(9):1006-1012.

10.

Jerant

Fiscella

Fenton

Magnan

Agnoli

Franks

Patient satisfaction with clinicians and short-term mortality in a US national sample: the roles of morbidity and gender. J Gen Intern Med. 2019;34(8):1459-1466.

11.

Kennedy

Tevis

Kent

KC.

Is there a relationship between patient satisfaction and favorable outcomes?

Ann Surg. 2014;260(4):592-598; discussion 8-600.

12.

Lyu

Wick

Housman

Freischlag

Makary

MA.

Patient satisfaction as a possible indicator of quality surgical care. JAMA Surg. 2013;148(4):362-367.

13.

Yadlapati

Gawron

Keswani

RN.

Patient satisfaction does not correlate with established colonoscopy quality metrics. Am J Gastroenterol. 2014;109(7):1089-1091.

14.

Zgierska

Rabago

Miller

MM.

Impact of patient satisfaction ratings on physicians and clinical care. Patient Prefer Adherence. 2014;8:437-446.

15.

Howell

Mylod

Lee

Shanafelt

Prissel

Physician burnout, resilience, and patient experience in a community practice: correlations and the central role of activation. J Patient Exp. 2020;7(6):1491-1500.

16.

Schneider

Ehsanian

Schmidt

, et al. The effect of patient satisfaction scores on physician job satisfaction and burnout. Future Sci OA. 2020;7(1):FSO657.

17.

Sas

Absah

Phelan

, et al. Patient satisfaction scores impact pediatrician practice patterns, job satisfaction, and burnout. Clin Pediatr (Phila). 2023;62:769-780.

18.

DeLoughery

EP.

Physician race and specialty influence Press Ganey survey results. Neth J Med. 2019;77(10):366-369.

19.

Poole

Jr.

Patient-experience data and bias—what ratings don’t tell us. N Engl J Med. 2019;380(9):801-803.

20.

Anhang Price

Elliott

Cleary

Zaslavsky

Hays

RD.

Should health care providers be accountable for patients’ care experiences. J Gen Intern Med. 2015;30(2):253-256.

21.

Friedel

Siegel

Kirstein

, et al. Measuring patient experience and patient satisfaction-how are we doing it and why does it matter? a comparison of European and U.S. American approaches. Healthcare (Basel). 2023;11(6):797.

22.

Han

, et al. Patient-centered care and patient satisfaction: validating the patient-professional interaction questionnaire in China. Front Public Health. 2022;10:990620.

23.

Kamo

Fujimori

Asai

, et al. Validity and reliability of the Japanese version of the Patient Satisfaction Questionnaire (PSQ-J) for evaluating oncologist consultations. PEC Innov. 2023;2:100166.

24.

Sohn

Nam

Joo

Kwon

Yim

JJ.

Patient-centeredness during in-depth consultation in the outpatient clinic of a tertiary hospital in Korea: paradigm shift from disease to patient. J Korean Med Sci. 2019;34(15):e119.

25.

Sotto-Santiago

Slaven

Rohr-Kirchgraber

(Dis)incentivizing patient satisfaction metrics: the unintended consequences of institutional bias. Health Equity. 2019;3(1):13-18.

26.

Martinez

Keenan

Rastogi

, et al. The association between physician race/ethnicity and patient satisfaction: an exploration in direct to consumer telemedicine. J Gen Intern Med. 2020;35(9):2600-2606.

27.

Sharabianlou Korth

Cheng

, et al. Provider personal and demographic characteristics and patient satisfaction in orthopaedic surgery. J Am Acad Orthop Surg Glob Res Rev. 2021;5(4):10.5435/JAAOSGlobal-D-20-00198.

28.

Takeshita

Wang

Loren

, et al. Association of racial/ethnic and gender concordance between patients and physicians with patient experience ratings. JAMA Netw Open. 2020;3(11):e2024583.

29.

Rogo-Gupta

Haunschild

Altamirano

Maldonado

Fassiotto

Physician gender is associated with Press Ganey patient satisfaction scores in outpatient gynecology. Womens Health Issues. 2018;28(3):281-285.

30.

Tracy

Jabbour

Rubin

, et al. Satisfaction in academic otolaryngology: do physician demographics impact Press Ganey survey scores. Laryngoscope. 2020;130(8):1902-1906.

31.

Hall

Blanch-Hartigan

Roter

DL.

Patients’ satisfaction with male versus female physicians: a meta-analysis. Med Care. 2011;49(7):611-617.

32.

Solnick

Peyton

Kraft-Todd

Safdar

Effect of physician gender and race on simulated patients’ ratings and confidence in their physicians: a randomized trial. JAMA Netw Open. 2020;3(2):e1920511.

33.

Olson

Dines

Ryan

, et al. Physician identification badges: a multispecialty quality improvement study to address professional misidentification and bias. Mayo Clin Proc. 2022;97(4):658-667.

34.

Files

Mayer

, et al. Speaker introductions at internal medicine grand rounds: forms of address reveal gender bias. J Womens Health (Larchmt). 2017;26(5):413-419.

35.

Harvey

Butterfield

Ochoa

Yang

YW.

Patient use of physicians’ first (given) name in direct patient electronic messaging. JAMA Netw Open. 2022;5(10):e2234880.

36.

Smith

Parkash

Normalized “medical inferiority bias” and cultural racism against international medical graduate physicians in academic medicine. Acad Pathol. 2023;10(4):100095.

37.

Garcia

Chung

Liao

, et al. Comparison of outpatient satisfaction survey scores for Asian physicians and non-Hispanic White physicians. JAMA Netw Open. 2019;2(2):e190027.

38.

Chekijian

Kinsman

Taylor

, et al. Association between patient-physician gender concordance and patient experience scores. Am J Emerg Med. 2021;45:476-482.

39.

Hall

Irish

Roter

Ehrlich

Miller

LH.

Gender in medical encounters: an analysis of physician and patient communication in a primary care setting. Health Psychol. 1994;13(5):384-392.