Sage Journals HomeSage Journals Home
loading
When Wrong Answers Matter: Consequence-Weighted Evaluation of Large Language Models for ERCP Triage