The role of AI in prostate cancer care: Assessing the role of chatbots versus urologists in patient communication and empathy

Abstract

Objective:

To date, the integration of artificial intelligence (AI) in healthcare has expanded rapidly, offering new tools for patient education and communication. In prostate cancer (PCa), where information needs are high and emotionally sensitive, AI-driven chatbots (CB) may enhance patient engagement. This study aims to compare the performance and perceived quality of responses from CB versus urologists (URO) to common PCa-related inquiries.

Methods:

We conducted a cross-sectional analysis of 20 frequently asked PCa general questions. Responses were generated by two AI-based CB and four certified URO in a simulated clinical messaging setting, without direct patient interaction. Expert reviewers first assessed each response for medical accuracy and completeness. Then, five blinded non-medical evaluators rated the responses using Likert scales to evaluate completeness (1–5), empathy (using a five-item adaptation of the Jefferson Scale), and overall preference.

Results:

A total of 600 responses were evaluated. Accuracy and completeness scores were comparable between CB and URO responses, according to experts’ evaluations (p = 0.45 and p = 0.12). However, CB responses scored significantly higher in completeness and empathy (both p < 0.001) for non-medical evaluators. Moreover, a statistically significant preference for overall CB-generated responses over those from urologists, was demonstrated (p < 0.001).

Conclusions:

While CB responses were as accurate as those from URO, they outperformed in completeness and empathy. These results suggest that AI-based CB could serve as effective tools in enhancing patient communication and satisfaction and may be a valuable complement to urologist-led care in clinical practice.

Keywords

chatbots prostate cancer artificial intelligence empathy patients

Introduction

In today’s digital era, the widespread availability of medical information on the internet has significantly transformed the healthcare landscape.¹ Health organizations have embraced innovative technologies across the entire spectrum of care, from diagnosis to treatment and follow-up, while patients are increasingly turning to online platforms to better understand their medical and surgical options.² However, the reliability of such information varies greatly. To address this issue, professional societies like the American Urological Association (AUA) and the European Association of Urology (EAU) have developed patient-centered educational content grounded in their official clinical guidelines.^3,4 These include accessible resources and high-quality videos created by various working urological groups, designed to make complex surgical information more understandable to the general public.⁵

Despite the value of these resources, direct interaction with healthcare providers remains essential throughout a patient’s care journey. In this context, the rise of artificial intelligence (AI)-powered chatbots marks a new frontier in digital health.⁶ These tools, which are both accessible and easy to use, offer support and health information in real time. Therefore, the accuracy, consistency, and quality of chatbot-generated responses must be rigorously assessed to ensure they are trustworthy.^7,8

In this panorama, prostate cancer (PCa) is a matter that exemplifies the intersection of technological innovation and patient communication.⁹ The use of novel technologies has revolutionized PCa care, including concerns about diagnosis, staging, treatments with the rise of robotic surgery,¹⁰ post-operative complications, post-surgical quality of life, including sexual and urinary functions,¹¹ often outweigh patients’ fears about cancer recurrence, above all during active surveillance.¹² Despite the growing role of AI in healthcare,¹³ there is a lack of focused research evaluating how accurately and empathetically AI chatbots respond to PCa-related questions.

While several studies have highlighted the limitations of chatbots in specialized medical fields^14,15 and in PCa care,¹⁶ none have specifically compared their performance to that of human experts in delivering PCa information. Therefore, this study aims to assess and compare the accuracy, consistency, quality of information, and patient preference regarding responses from chatbots (CB) and urologists (URO) to common PCa-related inquiries.

Methods

We conducted a cross-sectional comparative analysis to evaluate the quality of general and no-specific responses to PCa-related questions provided by AI CB and board-certified URO. A total of 20 clinically relevant questions were developed, covering 4 key domains (Table 1):

Diagnosis

Treatment options

Perioperative complications

Postoperative follow-up

Table 1.

Twenty prostate cancer-related questions divided in categories administrated to chatbots and urologists.

No.	Category	Question
1	Diagnosis	What are the early signs or symptoms of prostate cancer?
2	Diagnosis	At what age should I start getting screened for prostate cancer?
3	Diagnosis	What does a high PSA level mean, and does it always indicate cancer?
4	Diagnosis	How is prostate cancer diagnosed—what tests are usually done?
5	Diagnosis	Is prostate cancer hereditary? Should my sons get tested earlier?
6	Treatment options	What are the main treatment options for localized prostate cancer?
7	Treatment options	How do I decide between surgery and radiation therapy?
8	Treatment options	What is active surveillance, and is it safe?
9	Treatment options	Can prostate cancer be treated without surgery?
10	Treatment options	Are there any new or experimental treatments for prostate cancer?
11	Perioperative complications	What are the risks and side effects of prostate cancer surgery?
12	Perioperative complications	Will I have problems with urinary control after surgery?
13	Perioperative complications	How does treatment affect sexual function and erections?
14	Perioperative complications	What complications can happen after radiation therapy?
15	Perioperative complications	Is robotic surgery better or safer than open surgery?
16	Follow-up	How often do I need follow-up after prostate cancer treatment?
17	Follow-up	What does it mean if my PSA level starts rising again after treatment?
18	Follow-up	Is it possible for prostate cancer to come back after surgery or radiation?
19	Follow-up	Should I change my diet or lifestyle after prostate cancer treatment?
20	Follow-up	Are there support groups or resources for men living with prostate cancer?

These questions were selected to reflect common patient concerns and were framed in layperson-friendly language to simulate real-world patient inquiries. Responses to each question were generated independently by two widely used AI-powered CB (Chatbot 4 and GeminiPro), as well as by four experienced URO (⩾5 years of expertise in prostate cancer management). All responses were collected in a simulated clinical messaging environment that mirrored asynchronous communication, without any direct interaction with patients, and neither AI nor URO respondents were provided with additional patient context beyond the phrased question.

Computer engineering, data scientists and AI specialists were likely involved in the training process of the AI, using large amounts of medical data to ensure that the chatbot produced appropriate answers and subsequently the chatbot was designed to include a level of “humanization” in its responses, aiming to reply in a supportive and understanding manner.

To assess the quality of the responses, a panel of three independent expert urologists (⩾10 years of expertise in PCa management), evaluated each answer based on two main criteria: accuracy and completeness, according to a Likert Scale from 1 to 5. Accuracy was defined as the degree to which the response aligned with current clinical guidelines and evidence-based practices, while completeness referred to how thoroughly each response addressed the components of the question, including clarity, structure, and thematic relevance.

Following expert evaluation, a second assessment was carried out by a group of five non-medical participants (without any personal or family history of PCa), selected to represent typical patients or lay users. These evaluators were blinded to the source (CB vs URO) of each response and were instructed to assess each answer based on three main dimensions:

- Completeness was evaluated using a single-item five-point Likert scale, where 1 indicated “not complete at all” and 5 indicated “extremely complete.”

- Empathy was assessed using a modified version of the Jefferson Scale of Patient Perceptions of Physician Empathy (JSPPPE).¹⁷ While originally developed to measure physician empathy, this validated scale was adapted for written responses, including those from AI-based chatbots. The adapted version used in this study included five items; each rated on a seven-point Likert scale (1 = Strongly Disagree to 7 = Strongly Agree).

- Additionally, they indicated their overall preference for each answer.

Statistical analysis

Descriptive statistics were presented as median (IQR) for non-parametric data. Comparisons among the two groups were conducted using Pearson Chi squared or Wilcoxon test for normally distributed variables or for variables that did not meet the normality assumption. All analyses were conducted using SPSS version 27 (IBM Corp., Armonk, NY, USA). A p-value <0.05 was considered indicative of statistical significance.

Results

Two CB and 4 URO answered each 20 patients-centered questions, with a total of 120 responses generated. Afterward, these responses were assessed by a group of five non-medical volunteers, who were blinded to the authorship of each answer to minimize bias, evaluating 600 answers.

As shown in Table 2, expert urologists rated both CB and URO responses as sufficiently accurate and complete to be presented to non-medical volunteers. No significant differences were found between the two groups in terms of these two parameters (p = 0.45 and p = 0.12, respectively).

Table 2.

Expert urologists’ evaluations of responses on Likert scale for answer accuracy, and completeness between chatbots and urologists.

Characteristics	Chatbots	Urologists	p-value^a
Accuracy			0.45
Median (IQR)	4.5 (4–5)	4.5 (4–5)
Completeness			0.12
Median (IQR)	4 (3–4)	4 (3–4)

Pearson’s Chi-squared test; Wilcoxon test.

Conversely, as summarized in Table 3, concerning trust and satisfaction, CB responses received significantly higher scores, considering them completer and more empathetic compared to those from URO (all p < 0.001).

Table 3.

Non-medical volunteer evaluations of responses on Likert scale for answer accuracy, completeness, empathy and preference ranking between chatbots and urologists.

Characteristics	Chatbots	Urologists	p-value^a
Completeness			<0.001
Median (IQR)	4 (3–4)	3 (2–3)
Empathy			<0.001
Median (IQR)	3 (3–4)	2 (2–3)
Preference ranking			<0.001
Median (IQR)	4 (3–5)	3 (3–4)

Pearson’s Chi-squared test; Wilcoxon test.

Bold enhance the clarity of the p-value.

Finally, the preference ranking for CB responses was significantly higher than for URO responses (p < 0.001), suggesting that while URO and AI CB are comparable in delivering accurate PCa information, CB may have an edge in communication style and comprehensiveness factors that can strongly influence patient perception and engagement.

Discussion

This study represents the first targeted evaluation of chatbot-generated responses in the context of PCa, a complex and emotionally charged condition that significantly impacts men’s physical and psychological well-being.¹⁸ Seeing a physician and discussing treatment options following a diagnosis of localized or advanced PCa is a critical step in a patient’s care. For many, this may be their first interaction with the healthcare system, marking the beginning of a relationship with their urologist.¹⁹ This phase of the cancer journey is particularly vulnerable to misinformation or disinformation, as patients often seek information urgently and from multiple sources.²⁰ Indeed, recent studies have highlighted the inconsistent performance of chatbots and AI platforms in urology and oncology, reporting limited actionability, incomplete sourcing, and a lack of personalization, thus raising concerns about their clinical utility.^21,22

Based on these considerations, in the same way of Musheyev et al.,²³ our aim was to assess whether CB can provide accurate, reliable, and patient-oriented information on PCa, potentially reducing anxiety and improving awareness of the disease and its broader context. We also sought to compare these AI-generated responses with those of expert urologists. Our findings contribute to the growing body of literature on the role of artificial intelligence in patient education and digital health communication, offering a more nuanced perspective. While accuracy was generally upheld, our results suggest that newer AI models may be evolving to more effectively replicate human-like empathy and comprehensive patient support, two critical components of cancer care, where emotional factors heavily influence decision-making, as reported in previous works.²⁴ This occur in a context in which patients’ perception of completeness of the responses from URO and CB is similar suggesting that both AI-powered CB and experienced urologists provided medically sound answers in terms of factual correctness.

This aligns with emerging research suggesting that even artificial expressions of empathy can enhance user experience and foster trust, particularly when real-time access to physicians is limited.^25,26 Our study highlights that AI has the potential not only to deliver clinically valid content but also to simulate the compassionate, patient-centered communication that strengthens adherence to medical advice. Moreover, our results are consistent with findings in other fields,²⁷ where instruments like the Modified DISCERN, EQIP, and PEMAT have been employed to evaluate AI-generated content.^28,29 Although we did not use EQIP or Flesch-Kincaid metrics in this study, the high completeness scores assigned to CB responses suggest that the structure and depth of information may, in some cases, surpass traditional expert responses, potentially due to time limitations or assumptions about patients’ prior knowledge in real consultations.

It is important to acknowledge that our simulated clinical environment cannot fully capture the dynamic, bidirectional nature of actual physician–patient interactions. Nevertheless, in line with those reported to Rodler et al.,¹⁴ evaluators noted that CB responses tended to include more background information, a broader range of treatment options, and clearer explanations, making them appear more comprehensive and informative. One possible explanation for the higher completeness and perceived empathy of CB responses could relate to the inherent limitations of human communication in clinical settings. Urologists, who often repeat the same information about diagnosis, treatment options, and prognosis to multiple patients and their families each day, may experience fatigue or cognitive overload. This repetitive strain, both mental and emotional, can unintentionally affect the depth or tone of their explanations, as reported in a previous study.³⁰ In contrast, CB do not experience fatigue and can deliver consistent, detailed, and empathetic responses regardless of the number of interactions. This fundamental difference may contribute to the perception that chatbot responses are more comprehensive and emotionally attuned.

Taken together, these findings suggest that AI CB could serve as powerful adjuncts in urological care, enhancing patient understanding, supporting clinical communication, and alleviating some of the informational burden on physicians. However, these tools are not intended to replace clinical professionals. Rather, they should be viewed as complementary resources that can provide clear, compassionate, and accessible information, particularly at emotionally vulnerable stages of the patient journey.

Limitations

This study has several limitations that should be acknowledged. First, the evaluation was conducted in a simulated messaging environment without direct patient interaction, which may not fully replicate the dynamic and nuanced nature of real clinical consultations. The static responses lacked the opportunity for follow-up questions or personalized dialog, potentially limiting the assessment of CB and URO adaptability. Third, we did not evaluate other important factors such as readability, source citation accuracy, or factual consistency using standardized tools like modified DISCERN or EQIP, or ChatGPT’s FKGL scores which have been reported as critical in previous AI health information studies. While our study prioritized empathy and completeness, future work should integrate readability metrics to provide a holistic view of CB utility in patient education.

Finally, the study focused exclusively on PCa-related general questions, which limits the generalizability of findings to more specific PCa questions and additionally clinical consultations allow for clarifications, follow-up questions, and individualized recommendations.

Notwithstanding these limitations, to the best of our knowledge, this is the first global research in urology comparing chatbots and physicians’ answers on PCa care. Future research are needed in order to incorporate specific questions and real patient feedback, dynamic conversational assessments, and comprehensive content quality metrics.

Conclusions

While AI-generated responses to PCa inquiries demonstrated comparable accuracy and completeness assessed by expert urologists, for non-medical volunteers, they significantly outperformed in terms of completeness and perceived empathy. Non-medical evaluators consistently preferred chatbot responses, highlighting the potential of AI tools to enhance patient communication, engagement, and satisfaction in the field of PCa. These findings suggest that AI-driven platforms may serve as valuable adjuncts in clinical settings, particularly for providing accessible, emotionally attuned, and informative responses to common patient concerns. As digital health continues to evolve, the integration of AI-generated messaging into urological practice could support more efficient patient education and complement traditional physician–patient interactions.

Footnotes

ORCID iD

Loris Cacciatore

Ethical considerations

Not required. The study was conformed to the ethical guidelines of the 1975 Declaration of Helsinki without any changing in the standard practice.

Consent to participate

Indeed, all patients enrolled were requested to be involved, on a voluntary basis, in the study, signing a willing document of inclusion.

Author contributions

Conceptualization, L.C. and A.M‥; methodology, L.C. and A.M.; software, P.C.; validation, F.E. and R.P.; formal analysis G.R.; investigation, L.C. and A.M.; resources P.C.; data curation, F.E.; writing—original draft preparation, L.C.; writing—review and editing, L.C., A.M. and F.E.; visualization, F.E. and A.R.I‥; supervision R.P. and A.R.I. All authors have read and agreed to the published version of the manuscript.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Bhatt

Davis

Dalton

, et al. Quantitative analysis of technological innovation in urology. Urology 2018; 111: 230–237. https://doi.org/10.1016/j.urology.2017.07.068

Pereira Azevedo

Gravas

de la Rosette

. Mobile health in urology: the good, the bad and the ugly. J Clin Med 2020; 9(4): 1016. https://doi.org/10.3390/jcm9041016

European association of urology (EAU) guidelines. https://uroweb.org/guidelines (2025).

American urological association (AUA) guidelines. https://www.auanet.org/guidelines-and-quality/guidelines (2025).

Esperto

Cacciatore

Tedesco

, et al. Video consensus and radical prostatectomy: the way to chase the future? J Pers Med 2023; 13(6): 1013. https://doi.org/10.3390/jpm13061013

King

MR.

The future of AI in medicine: a perspective from a chatbot. Ann Biomed Eng 2023; 51(2): 291–295. https://doi.org/10.1007/s10439-022-03121-w

Goodman

Patrinely

Stone

Jr , et al. Accuracy and reliability of chatbot responses to physician questions. JAMA Netw Open 2023; 6(10): e2336483. https://doi.org/10.1001/jamanetworkopen.2023.36483

Teoh

Cacciamani

Gomez Rivas

Social media and misinformation in urology: what can be done?

BJU Int 2021; 128(4): 397. https://doi.org/10.1111/bju.15517

Taylor

Gao

, et al. TikTok and prostate cancer: misinformation and quality of information using validated questionnaires. BJU Int 2021; 128(4): 435–437. https://doi.org/10.1111/bju.15403

10.

Esperto

Cacciatore

Tedesco

, et al. Impact of robotic technologies on prostate cancer patients’ choice for radical treatment. J Pers Med 2023; 13(5): 794. https://doi.org/10.3390/jpm13050794

11.

Ragusa

Brassetti

Prata

, et al. Predictors of urinary continence recovery after laparoscopic-assisted radical prostatectomy: is surgical urethral length the only key factor? Life 2023; 13(7): 1550. https://doi.org/10.3390/life13071550

12.

Brassetti

Cacciatore

Bove

, et al. The impact of physical activity on the outcomes of active surveillance in prostate cancer patients: a scoping review. Cancers 2024; 16(3): 630. https://doi.org/10.3390/cancers16030630

13.

Johnson

Goodman

Patrinely

, et al. Assessing the accuracy and reliability of AI-generated medical responses: an evaluation of the ChatGPT model. Res Sq 2023; rs.3.Rs-2566942. https://doi.org/10.21203/rs.3.rs-2566942/v1

14.

Rodler

Cei

Ganjavi

, et al. GPT-4 generates accurate and readable patient education materials aligned with current oncological guidelines: a randomized assessment. PLoS One 2025; 20(6): e0324175. https://doi.org/10.1371/journal.pone.0324175

15.

Kayra

Anil

Ozdogan

, et al. Evaluating AI chatbots in penis enhancement information: a comparative analysis of readability, reliability and quality. Int J Impot Res 2025; 37: 558–563. https://doi.org/10.1038/s41443-025-01098-3

16.

Haas

Saberi

Gossler

, et al. Making prostate cancer research accessible: ChatGPT-4 as a tool to enhance lay communication. Urol 2025; 64(6): 574–583. https://doi.org/10.1007/s00120-025-02558-w

17.

Glaser

Markham

Adler

, et al. Relationships between scores on the Jefferson scale of physician empathy, patient perceptions of physician empathy, and humanistic approaches to patient care: a validity study. Med Sci Monit 2007; 13(7): CR291–CR294.

18.

Chen

McGee

Nethery

, et al. Guideline-based physical activity and health-related quality of life among prostate cancer survivors: a target trial emulation in the health professionals follow-up study. Am J Epidemiol 2026; 195: 966–974. https://doi.org/10.1093/aje/kwaf117

19.

Jiang

Stillson

Pollack

, et al. How men with prostate cancer choose specialists: a qualitative study. J Am Board Fam Med 2017; 30(2): 220–229. https://doi.org/10.3122/jabfm.2017.02.160163

20.

Cacciamani

Sebben

Tafuri

, et al. Consulting “Dr. Google” for minimally invasive urological oncological surgeries: a contemporary web-based trend analysis. Int J Med Robot 2021; 17(4): e2250. https://doi.org/10.1002/rcs.2250

21.

Eysenbach

Improving the quality of web surveys: the checklist for reporting results of internet E-surveys (CHERRIES). J Med Internet Res 2004; 6(3): e34. https://doi.org/10.2196/jmir.6.3.e34

22.

Stevenson

Leydon-Hudson

Murray

, et al. Patients’ use of the internet to negotiate about treatment. Soc Sci Med 2021; 290: 114262. https://doi.org/10.1016/j.socscimed.2021.114262

23.

Musheyev

Pan

Loeb

, et al. How well do artificial intelligence chatbots respond to the top search queries about urological malignancies? Eur Urol 2024; 85(1): 13–16. https://doi.org/10.1016/j.eururo.2023.07.004

24.

Bracey

Bhuiyan

Pietropaolo

, et al. Exploring the impact of artificial intelligence–enabled decision aids in improving patient inclusivity, empowerment, and education in urology: a systematic review by EAU endourology. Curr Opin Urol 2026; 36: 13–25. https://doi.org/10.1097/MOU.0000000000001301

25.

Shah

Ghosh

Hochberg

, et al. Artificial intelligence improves urologic oncology patient education and counseling. Can J Urol 2024; 31(5): 12013–12018.

26.

Eppler

Ganjavi

Knudsen

, et al. Bridging the gap between urological research and patient understanding: the role of large language models in automated generation of Layperson’s summaries. Urol Pract 2023; 10(5): 436–443. https://doi.org/10.1097/UPJ.0000000000000428

27.

Daraz

Morrow

Ponce

, et al. Readability of online health information: a meta-narrative systematic review. Am J Med Qual 2018; 33(5): 487–492. https://doi.org/10.1177/1062860617751639

28.

Simon

Gelikman

Turkbey

Evaluating the efficacy of artificial intelligence chatbots in urological health: insights for urologists on patient interactions with large language models. Transl Androl Urol 2024; 13(5): 879–883. https://doi.org/10.21037/tau-23-635

29.

Walker

Ghani

Kuemmerli

, et al. Reliability of medical information provided by ChatGPT: assessment against clinical guidelines and patient information quality instrument. J Med Internet Res 2023; 25: e47479. https://doi.org/10.2196/47479

30.

Petrut

Berindan-Neagoe

Feflea

, et al. Mental fatigue evaluation of surgical teams during a regular workday in a high-volume tertiary healthcare center. Urol Int 2020; 104(3-4): 301–308. https://doi.org/10.1159/000504988