Abstract
Introduction:
Artificial intelligence (AI) is becoming increasingly integrated into clinical care in hand surgery. Its applications extend across diagnosis, planning, intraoperative assistance, postoperative monitoring, rehabilitation, prosthetics and education.
Applications:
In diagnostic imaging, AI improves the detection of distal radius and scaphoid fractures, estimates osteoporosis from hand radiographs, identifies triangular fibrocartilage complex injuries on magnetic resonance imaging, segments bones and cartilage, and supports dynamic wrist analysis; ultrasound- and neurophysiological-based models aid carpal tunnel syndrome diagnosis. Prognostic models predict outcomes after carpal tunnel release and thumb carpometacarpal osteoarthritis with mixed performance. Pre- and intraoperative applications include large language model-based triage and coding, navigation and phase/gesture recognition from surgical video, autonomous microsurgical prototypes and telemanipulator platforms for supermicrosurgery. Artificial intelligence-enabled telemonitoring (e.g. remote photoplethysmography) and video-based mobility tracking support postoperative care and rehabilitation. Vision-guided and multimodal sensing enhance myoelectric prosthesis control.
Risks:
Risks include data privacy and security, algorithmic bias (data, transposition, normative, annotation) and opacity, overreliance with automation bias and skill erosion, and unresolved legal and ethical questions (liability, conflicts of interest, compassion in care).
Conclusion:
Balanced adoption requires diversified datasets, privacy-preserving strategies (pseudonymization, differential privacy, federated learning), transparent reporting, AI literacy and ethics in medical education and interfaces that expose uncertainty and employ cognitive forcing functions. Post-deployment surveillance should track data drift, out-of-distribution inputs and performance using automated alerts and multidisciplinary review. Artificial intelligence should augment, never replace, clinical judgment, with explicit role delineation and continuous monitoring to safeguard equity and patient-centred outcomes.
Keywords
Introduction
Artificial intelligence (AI) is defined as the ability of machines to perform human-like tasks. Drawing particularly on machine learning and deep learning, it allows autonomous or semi-autonomous analysis and interpretation of data, which is especially relevant to the field of medicine (Table 1). The integration of AI into medicinal practice began as early as 1986, with technological advances in information management (Kalisman and Kalisman, 1986) and speech recognition (Akers, 1986). Since then, publications in clinical AI have grown exponentially (Figure 1), with the potential to profoundly transform the patient care pathway.
The hierarchical structure and key components of approaches in artificial intelligence, machine learning and deep learning.

Evolution of the number of scientific publications linking artificial intelligence and surgery, indexed in PubMed between 1986 and 2024. The x-axis represents the years and the y-axis the number of publications. After an initial increase peaking in 2013, a decline is observed until 2016, followed by an exponential growth.
While this is a rapidly evolving field, this paper builds on previous work (Miller et al., 2023, 2025a) to analyse the benefits and risks of AI in hand surgery. The potential benefits are reviewed, including applications in diagnosis, surgical training and perioperative support, while addressing the challenges related to confidentiality, bias, technological dependence and ethical issues.
Benefits of AI
Artificial intelligence can contribute throughout the patient pathway. While some benefits are already in practice, others remain theoretical. Examples include detecting pathology, preoperative planning, robotic-assisted surgery, real-time analysis during surgery, providing personalized follow-up, early identification of postoperative complications and advancing research through the analysis of large datasets. Unlike humans, AI operates tirelessly and consistently, ensuring uninterrupted support across all stages of care.
Diagnostic accuracy
Artificial intelligence has been applied across various imaging modalities, most extensively to conventional radiography (Table 2). Its use for scaphoid fractures is particularly relevant given the potential consequences of a missed diagnosis and the challenges of detecting occult injuries. However, its effectiveness in identifying these fractures remains variable (Kraus et al., 2024; Langerhuizen et al., 2020). These studies should be interpreted cautiously to fully understand the scope and limitations of AI, as performance metrics alone can be misleading. It is important not to overstate the current capabilities of these models and to recognize that a binary outcome, such as fracture or no fracture, reflects probabilities rather than certainties. Therefore, an AI-generated report should not be regarded with the same consideration as a human-generated one (Miller et al., 2025b).
Applications of artificial intelligence in in hand surgery and performance comparison with human experts.
The Dice score is a number between 0 and 1 that shows how similar two shapes or areas are, often used to check how well an AI matches a human in tasks like medical image segmentation.
CT, computed tomography; MRI, magnetic resonance imaging; TFCC, triangular fibrocartilage complex.
Artificial intelligence has been used to identify perilunate dislocations on standard radiographs without error (Majzoubi et al., 2024). Current evidence indicates that AI can assist in identifying early osseous pathology with accuracy comparable to that of experienced radiologists (Wernér et al., 2024). Nonetheless, few studies have demonstrated clear superiority over expert clinicians. As large language and multimodal models continue to evolve, such advancements may become achievable, warranting discussion on how these technologies will ultimately integrate into clinical workflows and decision-making.
Applications of AI in computed tomography (CT) imaging show promise for wrist analysis, although studies remain too limited for reliable clinical use (Suojärvi et al., 2021; Teule et al., 2024).
While AI performs better with magnetic resonance imaging (MRI), it still lacks routine clinical application (Chen et al., 2024a). It can identify ligament injuries such as those of the triangular fibrocartilage complex (TFCC), automatically segment bones and cartilage with performance approaching that of experts, and analyse joint motion in real time to detect instabilities not visible on static images (Brui et al., 2020; Foster et al., 2018; Lin et al., 2022; Radke et al., 2021).
Ultrasound applications remain limited. Artificial intelligence has been able to detect Palmer type 1B TFCC lesions with accuracy similar to MRI, but it may not reliably distinguish complex foveal injuries (Shinohara et al., 2022). Other models can identify and segment tendons and sheaths in trigger finger with accuracy comparable to humans, but with no clear clinical advantage (Kuok et al., 2020).
When applying deep learning to the ultrasound diagnosis of carpal tunnel syndrome (CTS), some studies are limited by the absence of comparison with neurophysiology (Peng et al., 2024). When used alongside neurophysiology, it provides more accurate diagnostic results for CTS than ultrasound alone. Two studies have applied machine learning to neurophysiology: one analysing motor and sensory signals, with excellent performance (Bakalis et al., 2024) and the other focusing on electrodiagnostic criteria, with high accuracy (Tsamis et al., 2021).
Artificial intelligence has the potential to assist diagnosis and prognosis by integrating medical history, symptoms, clinical examination, imaging and laboratory results. Diagnostically, AI is already used in other specialties such as dermatology and neurology (Brancaccio et al., 2024). It can differentiate skin lesions with 72% accuracy, compared with 66% for dermatologists (Edge Health, 2024; Esteva et al., 2017), and can detect nerve palsies from photographs of the hand with up to 99% accuracy (Gu et al., 2022).
The use of AI in diagnosis prediction has been studied for CTS. One model using 11 clinical variables achieved 76.6% accuracy but remained inferior to neurophysiological examination (Park et al., 2021). Another study reported predicted postoperative outcomes using the Minimal Clinically Important Difference, achieving 71.8% accuracy for function and 75.9% for pain (Harrison et al., 2023). A third study compared AI with surgeon prediction of outcome at 6 months, with AI outperforming surgeons: 78% accuracy and 85% sensitivity vs. 65% and 72% (Loos et al., 2024). Conversely, in carpometacarpal osteoarthritis, AI performed worse than surgeons, with better performance when predicting function rather than pain after trapeziectomy (Loos et al., 2022).
Despite promising results, current AI applications in musculoskeletal imaging and diagnosis often rely on limited datasets, lack external validation and oversimplify clinical complexity. Reported accuracies may not generalize across populations or settings. Caution is warranted before integrating these tools into practice, as overreliance risks misdiagnosis and clinical overconfidence.
Surgical planning and assistance
Pre-operative
Artificial intelligence may support decision-making in surgical patients and assist in surgical planning where appropriate.
Large language models (LLMs), including ChatGPT and Google Gemini, are designed to process and generate natural language. One study compared their performance on 68 hypothetical hand injury cases, evaluating classification and treatment recommendations. When rated by a surgeon certified by the American Society for Surgery of the Hand, Gemini outperformed ChatGPT, with 70.6% correct classifications (vs. 26.5%) and greater treatment accuracy (97.8 vs. 88.9%). Although the absence of human expert comparison was a limitation (Pressman et al., 2024), the study demonstrated a baseline level of clinical ability.
Several studies have explored the use of AI for preoperative planning in hand surgery, but none has yet resulted in a clinically validated application (Ryhänen et al., 2025). Current AI analyses of the distal radioulnar joint remain largely experimental, with limited integration into surgical planning software (Roner et al., 2020). Liu et al. (2019a) proposed a system to assist with Kirschner wire placement for scaphoid fracture fixation, although final trajectory decisions still rely on the surgeon’s judgment. The future probably lies in hybrid approaches, where AI assists with quantitative modelling, trajectory optimization and anatomical risk prediction, while the surgeon integrates this information with intraoperative findings and clinical reasoning. In such settings, the AI-assisted surgeon may ultimately achieve greater precision and consistency than either AI or human expertise alone.
Perioperative
Artificial intelligence could improve visualization, provide real-time video-based guidance and automate selected operative tasks.
In knee arthroscopy, an approach potentially applicable to the wrist, AI allows simultaneous correction of image noise, blurring and colour imbalance, surpassing conventional techniques (Ali et al., 2023). Artificial intelligence techniques have also been shown to efficiently remove endoscopic smoke (Wang et al., 2023), improving clarity, a potential benefit for robotic minimally invasive peripheral nerve surgery. In the operating room, analysis of personnel flow can identify potential error sources and contribute to improving safety and efficiency. However, the use of cameras in this context raises privacy concerns, limiting acceptance and widespread adoption. To address this, tracking operating room movements while also automatically obscuring faces has been proposed (Bastian et al., 2023).
Analysis of surgical videos represents another application in the operating room. Using cameras integrated into surgical lights or on head mounts, recordings can train AI models to identify anatomical structures, operative phases and technical gestures. Examples of this include: segmentation of carpal bones in arthroscopy (Orgiu et al., 2024), automatic phase detection in distal radius fracture fixation (Graëff et al., 2025), interpretation of surgical gestures (Goodman et al., 2024) and assessment of microsurgical metrics (operative time, motion smoothness and distance travelled), with strong correlation to both expert-rated scores and surgeon experience (Sugiyama et al., 2024).
In microsurgery, the µSTAR autonomous robot performs microvascular anastomoses using a motorized suturing device, micro-camera and optical coherence tomography sensor. Tested on an ex vivo model, it completed 90% of sutures without human intervention, with precision matching that of surgeons, though with a longer average time per stitch (353 vs. 141 s). (Haworth et al., 2024).
Real-time feedback in postoperative care
Postoperatively, AI is taking an increasing role, from monitoring microvascular flaps and assessing joint mobility to optimizing rehabilitation protocols and coding surgical procedures.
Video-based techniques, such as remote photoplethysmography, allow monitoring of physiological parameters (perfusion, heart rate, oxygen saturation) in replantations and free flap surgeries, offering an alternative to traditional human observation (Chen et al., 2024b). Miniaturization of sensors now allows AI-powered portable flap telemonitoring. In a controlled ischemia simulation (tourniquet in healthy volunteers), remote photoplethysmography-driven AI detected vascular alterations with accuracy similar to pulse oximetry and manual evaluation (Lu et al., 2025).
Artificial intelligence is also being used to track changes in hand and wrist movements via video analysis (Exer AI, 2024) and evaluate joint movements (Dutrey et al., 2025). One AI system, combining wearable sensors with real-time acoustic feedback, corrects gait after hip arthroplasty and shortens recovery time (Alcaraz et al., 2018). A similar application has also been developed for hand rehabilitation (Bauknecht et al., 2025).
Large language models can also be used for coding hand surgery procedures. One study reported 100% accuracy for Perplexity.AI and 93.3% for Bard and Bing, while ChatGPT-4o and ChatGPT-3.5 reached 53.3% and 46.7%, respectively. For complex procedures, only Perplexity.AI and Bard achieved 60% (Isch et al., 2025).
Robotic surgery
In surgical robotics, telemanipulators replicate the surgeon’s movements in real time, enhancing precision and filtering tremors while keeping the surgeon in full control of each movement. Autonomous or semi-autonomous surgical robots, particularly in orthopaedics, can perform specific tasks such as implant positioning or trajectory planning based on preoperative data (Lim et al., 2025).
Although AI has yet to be integrated, telemanipulator-assisted surgery is now being applied to peripheral nerve injuries (neurolysis, repair, graft) through incisions smaller than 1 cm (Jiang et al., 2025). The Da Vinci® system, offering magnified 3D vision and high precision, is already used for such operations. New platforms are emerging, including Symani® (MMI™) and MUSA-3® (Microsure™). Symani is tailored for microsurgery on delicate structures (0.2–0.3 mm), with instruments offering seven degrees of freedom, active tremor suppression and motion scaling (Innocenti et al., 2023). MUSA-3 allows standard surgical instruments to be used with exceptional accuracy via an intuitive console and a stabilized robotic arm (van Mulken et al., 2020). These systems improve precision, safety and surgeon ergonomics in complex operations.
Early semi-autonomous robots were initially developed without artificial intelligence, including for wrist arthroscopy (Liverneaux et al., 2016) and percutaneous scaphoid screw fixation under navigation (Liverneaux, 2005). More recently, new semi-autonomous systems integrating AI have emerged, particularly for navigation-guided bone fixation. These robots enable precise fracture reduction and optimization of fixation implants, with indications including scaphoid fractures and non-unions (Liu et al., 2019b), perilunate fracture-dislocations (Yi et al., 2023), hamate fractures (Jie et al., 2022) and partial carpal arthrodeses (Gao et al., 2024).
To date, the only truly autonomous robots in surgery remain at the experimental stage, mainly used for performing microvascular anastomoses, such as the µSTAR robot.
Electronic hand prostheses
Currently, hand prostheses are controlled through voluntary contractions of residual limb muscles, producing electromyographic signals that are processed and translated into motor commands to drive the device. However, variability in electromyographic signals can limit accuracy, sometimes causing errors in determining the intended grasp type (Castellini et al., 2014).
To address these errors, alternative strategies are under development. For example, radio frequency identification technology offers reliable detection of objects placed near a sensor, but only if they are fitted with an embedded electronic chip (Trachtenberg et al., 2011). Since this is impractical, alternative solutions use AI-driven image recognition to identify objects in real time and select the correct grasp pattern. One approach uses three head-mounted cameras, although this raises ergonomic issues (Markovic et al., 2015). Another integrates a camera into the prosthetic palm, achieving 93.2% accuracy in object recognition (DeGol et al., 2016). A third combines image recognition with a multimodal sensing system (distance sensor, accelerometers, gyroscopes), achieving 91.8% success in object manipulation, 88.6% in functional tasks (YCB Gripper Benchmark) and an average grasp time of 0.73 s (Weiner et al., 2022).
Education and training
Artificial intelligence holds potential for training hand surgeons, reviewing scientific literature and educating patients.
Large language models can help create clinical scenarios, formal lectures and multiple-choice questions (Siu et al., 2023), and have been used to compare the difficulty of hand surgery board exams (Hasan et al., 2025). ChatGPT can deliver step-by-step guidance for procedures such as microvascular arterial anastomosis and thumb pollicization, but its inaccuracies risk misleading trainees (Mohapatra et al., 2023).
Training in Da Vinci® robotic surgery uses simulators such as Mimic® to objectively assess technical skills through a standardized global score calculated from parameters like applied force, collisions and instrument visibility (Egi et al., 2013). Artificial intelligence can identify surgeons’ expertise from simulated videos with 83–100% accuracy (Juarez-Villalobos et al., 2021). Robotic microsurgery training programs have also been implemented (Selber and Alrasheed, 2014).
Surgical workflow recognition applies AI to segment operations into discrete steps, enabling performance assessment, skill standardization and prompt error feedback (Garrow et al., 2021). Training such AI models demands substantial annotated datasets, a limitation partly mitigated using different AI learning strategies (Demir et al., 2023). The uncertainty- and cluster-aware temporal diffusion method enhances surgical workflow recognition by incorporating clustering and self-supervised spatial features, shortening training time while improving accuracy (Graëff et al., 2025).
Although most publishers forbid LLM-based peer-reviewing, some studies have explored it: with generic prompts, performance was poor, whereas targeted instructions enabled ChatGPT to produce evaluations comparable with those of human experts (Marrella et al., 2025).
Many patients turn to LLMs to better understand their condition, but quality hinges on accuracy, completeness and readability. ChatGPT 3.5 scored 4.83/6 for accuracy and 2/3 for completeness (Jagiella-Lodise et al., 2025) when asked about CTS, yet occasionally provided advice lacking scientific support, such as recommending non-steroidal anti-inflammatory drugs not endorsed by guidelines (Amen et al., 2024).
The readability of ChatGPT was lower than that of a Google search (Croen et al., 2025) and judged inferior to that of Mayo Clinic or WebMD for several conditions (CTS, trigger finger, Dupuytren’s disease, ganglion cyst) by surgeons (Pohl et al., 2024), but considered equivalent for CTS by patients (Pohl et al., 2025).
Risks
Data privacy and security
Training AI in healthcare relies on large datasets (Akyüz et al., 2024), raising issues of privacy and personal data use, where breaches can cause ethical concerns and direct harm to patients (Cohen et al., 2014). Whether access to patient data for training is legitimate depends on the purpose: public health vs. commercial gain (Faden et al., 2013). Even if patient data is fully anonymized, the issue of whether patients can opt out of their data being used for training or other purposes remains.
Public mistrust of health data use is warranted, given previous unauthorized sharing (Royal Free London NHS, Cambridge Analytica) (Dawson et al., 2019). Risks include discrimination by employers or insurers (Calo, 2011) and personal harms, including anxiety from exposure of sensitive details (Price and Cohen, 2019). Artificial intelligence may also infer information never disclosed, generating intrusive or inaccurate profiles outside current laws (Crawford and Schultz, 2014).
Patient consent for using health data is essential. Dynamic consent strengthens security by requiring authorization for each use but limits large-scale processing (Kaye et al., 2015). Broad consent enables wide data sharing, sometimes without direct oversight, as seen in some biobanks (Grady et al., 2015).
Bias and inequity
The integration of AI may introduce biases that undermine the reliability of results and compromise equity in healthcare, since large language models are trained on vast internet-based datasets that inherently mirror cultural and societal biases, particularly those rooted in Western perspectives and values (Table 3). Consequently, data biases can perpetuate these prejudices and social inequalities, leading to the exclusion of vulnerable groups. These forms of discrimination are often unintentional and systemic, making them difficult to detect and to challenge in court (Barocas and Selbst, 2016). Such biases have been documented in the prediction of recidivism, health status, insurability and disease risk (Kostick-Quenet and Gerke, 2022).
Main biases related to artificial intelligence in healthcare, from design to clinical use.
Transparency in reporting a particular model development and training is important for safe clinical use. In one analysis of 1.7 million responses from nine LLMs to 1000 emergency medical cases, clinical recommendations were influenced by sociodemographic profile: with identical patient data, LLMs were more likely to propose unwarranted interventions for LGBTQIA+, Black, or homeless patients and to recommend advanced imaging for those with higher incomes (Omar et al., 2025).
Transposition bias arises when AI systems that perform well in laboratory settings fail to achieve comparable results in clinical practice, for example, the low proportion of randomized trials in hand surgery (26%) within a body of research dominated by retrospective studies (Keller et al., 2023) and by the limited clinical adoption of AI despite the rapid growth of the field (Nair et al., 2024).
Some algorithms embed normative bias by prioritizing certain values (longevity over quality of life), potentially disregarding patient preferences and reinforcing algorithmic paternalism (Quinn et al., 2021). Supervised learning, relying on error-prone manual annotations, may introduce annotation bias that undermines AI reliability (Hashimoto et al., 2018), as demonstrated by a wrist surgery workflow recognition study, showing substantial inter- and intra-annotator variability (Graëff et al., 2023).
Many AI systems function as ‘black boxes’, where recommendations are given without a clear explanation of how they are reached. This lack of transparency may be due to the complexity of underlying algorithns or proprietary reasons (Hassan et al., 2024).
Over-reliance on technology
The growth of AI has prompted safeguards against algorithmic bias, yet cognitive biases remain underexplored. Lacking critical expertise, users may develop trust bias: underuse (omission errors) or overreliance (commission errors) (Hasanzadeh et al., 2025).
Omission is less concerning at present, as AI’s role remains modest. The traditional physician’s independent judgement continues to serve as the standard. Commission errors can be serious: studies on human–machine interaction show that users tend to accept the outputs of automated systems, even when unreliable (Gerke, 2021). Factors influencing automation bias include non-experts being more likely to trust algorithms than humans, and participants following ‘black box’ AI recommendations despite their lack of transparency. In contrast, expertise tends to mitigate this bias, and forming one’s own estimate beforehand reduces blind trust (Logg et al., 2019). Quantitatively, automation bias increases human error risk by 26% when AI is wrong (Goddard et al., 2012) and unjustly alters 7% of correct assessments under the influence of erroneous AI advice (Rosbach et al., 2025). Overreliance on AI can also lead to progressive clinical skill loss, or skill erosion (Samuel et al., 2024).
Regulatory and ethical concerns
Artificial intelligence poses challenges that extend beyond technical matters to legal uncertainties and ethical issues. Legally, regulatory frameworks lag behind technological progress, leaving liability for harm unclear (Cestonaro et al., 2023). Responsibility may be shared between the physician, the software supplier, the algorithm developer, and the data provider (Price et al., 2024). The lack of clear safety and liability frameworks slows the uptake of these technologies (Ahmed et al., 2023).
Ethically, several concerns emerge: (1) the potential replacement of healthcare professionals by automated systems (Chustecki, 2024); (2) the risk that some algorithms are deliberately biased to promote lucrative procedures or products contrary to clinical guidelines (Knudsen et al., 2024); and (3) the erosion of human qualities such as compassion in care relationships (Klugman and Gerke, 2022), although some hybrid systems aim to integrate human psychology with ‘artificial empathy’ (Morrow et al., 2023).
Prerequisites to restore balance between risks and benefits
To balance risks and benefits, AI requires rigorous oversight from design through to clinical use (pre- and post-development). An international consensus (FUTURE-AI), involving 117 experts from 50 countries, produced recommendations to mitigate cognitive, ethical, regulatory and AI-specific biases (Supplementary table 1) (Lekadir et al., 2025).
(1) Cognitive – many experts stress that medical education should include proficiency in AI, critical appraisal of its outputs, and competence in data science and decision-making (Grunhut et al., 2022). Training may need to focus more on information management rather than rote memorization (Wartman and Combs, 2019). Although explainable AI is intended to mitigate this bias, they should be clear and accessible, since overly complex explanations may be disregarded. Explanations therefore need to be clear, accessible and explainable (Vasconcelos et al., 2023). Cognitive forcing strategies, such as forming an initial judgement or displaying AI uncertainty, are more effective at mitigating this bias than explainable AI alone (Bucinca et al., 2021).
(2) Ethical – one study advocates teaching AI ethics in medicine using real-world cases (Katznelson and Gerke, 2021).
(3) Regulatory – evidence-based guidelines for the publication of clinical trials on medical AI have been proposed, including a description of AI type, clinical role, integration into the care pathway, data quality, algorithm version, human–AI interaction, error analysis and conditions of access to the tool or its code (Liu et al., 2020).
(4) Data-related – diversifying AI training datasets is essential (Cross et al., 2024). Privacy can be protected through pseudonymization (replacing direct identifiers), differential privacy (adding random noise to data) and federated learning (training a shared AI from data held across multiple sites). Regular audits and strengthened security standards help prevent misuse (Rieke et al., 2020). Integration of the necessary infrastructure to clinical systems is expensive and requires expertise. The use or reliance on multiple applications on personal devices that do not integrate to hospital systems limits useability.
Monitoring the balance of AI
After development, AI performance can decline if real-world conditions differ from those used in training (US Food and Drug Administration, 2023). Two scenarios are recognized: data drift, where data change gradually (e.g. more images of smokers for AI trained on adult chest radiographs) and out-of-distribution data, where they differ markedly (e.g. radiographs of knees or of children). These situations require post-development surveillance, and the FUTURE-AI consensus advocates continuous monitoring (Lekadir et al., 2025) (Table 4).
Recommendations from the international FUTURE-AI consensus for monitoring the performance of artificial intelligence in healthcare after-market release.
For example, data drift can be monitored by continuous tracking of input data and AI performance with specialized tools that trigger alerts, initiate retraining when drift occurs and monitor performance (Sahiner et al., 2023). In one study, human CT scan reports were reviewed by a LLM to verify performance of a pulmonary embolism detection AI model. Disagreements were tracked and alert thresholds set to prompt a human review (Sorin et al., 2025). Practically, implementing post deployment monitoring may be difficult and resource-intensive. It is also unclear who would be responsible for the task.
To monitor out-of-distribution data, on radiology study suggested encoding each image as numerical vectors compared with a reference, flagging deviations. In testing across various scenarios, the system detected over 95% of anomalies and identified drift in under three days (Zamzmi et al., 2025).
Algorithm to restore balance
Clinical monitoring of AI should identify early performance declines via threshold alerts and generate regular reports on errors and performance. These should be reviewed by a multidisciplinary committee to steer evidence-based continuous improvement (van Leersum and Maathuis, 2025).
Equally important is defining the respective roles of AI and the clinician. Some tasks may be delegated to AI while preserving human control over the final decision (Tanaka et al., 2023). Artificial intelligence should never supplant human judgement, particularly given persistent biases affecting vulnerable or underrepresented groups (Mennella et al., 2024).
In conclusion, the integration of AI is inevitable in all areas of hand surgery. However, realizing benefits demands vigilance: reliable data, clear regulations, continuous oversight and education. As a support tool, it should remain under human control, in the service of patients.
Supplemental Material
sj-docx-1-jhs-10.1177_17531934251401382 – Supplemental material for The balance between artificial and human intelligence in clinical practice
Supplemental material, sj-docx-1-jhs-10.1177_17531934251401382 for The balance between artificial and human intelligence in clinical practice by Domenico Marrella, Turkka Anttila, Jorma Ryhänen, Robert Miller, Bo Liu and Philippe Liverneaux in Journal of Hand Surgery (European Volume)
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors received no financial support for the research, authorship and/or publication of this article.
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
