Abstract
Background:
Rheumatoid arthritis (RA) is a chronic inflammatory disease that often requires multi-drug regimens, which can increase the risk of drug–drug interactions (DDIs).
Objective:
To determine whether polypharmacy contributes to potential DDIs in patients with RA.
Design:
A retrospective observational study that integrates data from drug interaction databases and medical records, supplemented by artificial intelligence analysis.
Methods:
Twenty-one RA drugs and 54 commonly prescribed medicines were screened for potential DDIs using the Micromedex and WebMD databases and compared with real-world co-prescription at our hospital. The ChatGPT-5 and Gemini 2.5 large language models (LLMs) were also evaluated as “AI judges” versus an expert rheumatologist to assess real-world co-prescription. This study was conducted from January to September 2025.
Results:
Among 75 frequently prescribed medicines in RA patients, 753 database-predicted potential DDIs were reported, with Micromedex showing 464 DDI pairs and WebMD showing 600 pairs. Both databases classified 47 serious potential DDIs as contraindicated or major, with high concordance. The main predicted DDI mechanisms were immunosuppression (27%), methotrexate toxicity (18%), and nephrotoxicity (12%), with tacrolimus (12%), methotrexate (10%), and infliximab (6%) as major culprit drugs. In real-world co-prescriptions, 259 RA patients showed methotrexate/naproxen (46.4%) and methotrexate/sulfasalazine (29.5%) as the most common real-world co-prescribing patterns with potential DDIs. Brennan–Prediger kappa values indicated fair agreement (0.3161) in the electronic database predictions and conditional agreement (0.9807) among preselected serious DDIs. The LLM evaluation revealed that advanced models (e.g., Gemini 2.5 Pro) provided finesse recommendations highly concordant with expert opinion, whereas simpler models were more conservatively aligned with database alerts.
Conclusion:
In both real-world co-prescription and database evaluation, serious potential DDIs were associated mainly with methotrexate toxicity. Based on these exploratory findings, advanced LLMs show potential to assist with the contextualizing rigid database alerts and supporting pragmatic clinical decision-making. However, expert clinical judgment and direct patient consultation remain essential to improve RA patient safety and medication adherence.
Plain language summary
Rheumatoid arthritis (RA) is a complex disease that requires multiple medications and might cause drug-drug interactions (DDIs). There is inadequate information comparing potential DDI predictions between electronic database and real-world co-prescription, especially with AI support. Our study revealed a high number of potential DDIs among medicines commonly prescribed for RA patients. Furthermore, the significant discordance in DDI predictions was detected between the electronic database and the real-world co-prescription, with large language models (LLMs). The real-world co-prescription indicated that methotrexate and NSAIDs/sulfasalazine pose a high risk for serious DDIs. By using advanced LLMs provided nuanced advice consistent with expert rheumatologist consensus by continuing treatment with careful monitoring for these serious DDIs. Management of harmful DDIs requires integrating modern decision support tools, deep clinical expertise, and direct patient communication.
Introduction
Rheumatoid arthritis (RA), a chronic inflammatory disease of the joints, is characterized by persistent synovial inflammation and joint destruction. Its pathophysiology involves the activation of autoimmune processes that lead to the production of pro-inflammatory cytokines, mainly tumor necrosis factor-alpha (TNF-α), interleukin-1, and interleukin-6. These cytokines promote synovial hyperplasia, angiogenesis, and perpetual activation of inflammatory cells, ultimately leading to pannus formation and cartilage and bone erosion. Autoantibodies, such as rheumatoid factor and anticitrullinated protein antibodies, play important roles in the inflammatory process and tissue damage, and this immune-mediated attack eventually causes progressive joint deformity, pain, and loss of function. 1 Among the known inflammatory joint diseases, RA imposes the poorest patient health-related quality of life, physical functioning, bodily pain, role limitations, and mental health impairment as well as substantial socioeconomic deprivation worldwide.2,3
RA, being a complex disease, usually requires the use of multiple medications to manage symptoms and prevent disease progression. Conventional synthetic disease-modifying antirheumatic drugs (DMARDs), such as methotrexate (MTX), inhibit the proliferation of immune cells and reduce inflammation, while biologic agents, including TNF inhibitors (e.g., adalimumab and etanercept), inhibit TNF-α—one of the major cytokines involved in RA pathophysiology. Interleukin-6 receptor inhibitors (e.g., tocilizumab) and T-cell co-stimulation blockers (e.g., abatacept) modulate immune responses, while B-cell depletion therapies (e.g., rituximab) target specific immune mechanisms to alleviate autoantibody production. Novel targeted synthetic DMARDs (e.g., tofacitinib) interfere with intracellular Janus kinase (JAK) pathways that lead to inflammation. 1
In general, a substantial number of RA patients need as many as five different drugs to control disease activity,4,5 which require the use of corticosteroids, NSAIDs, and immunosuppressive therapies. This so-called polypharmacy in RA patients is further complicated by the need to treat comorbid conditions (e.g., hypertension, diabetes, dyslipidemia) and introduces risks of additional drug-related problems. Thus, the number of drugs prescribed to RA patients can be as high as 25 in one patient with several health conditions5,6 and can contribute to increased rates of recurrent infections, unplanned hospitalizations, adverse reactions, and undesirable drug–drug interactions (DDIs). Indeed, recent studies have revealed associations between the number of medications and serious adverse events or poor treatment outcomes in RA, and yet only a limited number of studies have addressed these problems.5–8 Furthermore, DDIs in RA medication were reported in many international literature. MTX toxicity is commonly reported when concomitant with NSAIDs or proton-pump inhibitors (PPIs).9–11 Corticosteroids and NSAIDs, which are used as anti-inflammatory drugs in some patients, can cause a risk for gastric ulcer and bleeding.11,12 Additive hepatotoxicity has also been reported in cases of DDIs between TNF inhibitors and MTX or Leflunomide.11,13,14 Interleukin-6 receptor inhibitors are considered as cytochrome P450 (CYP450) inducers and can reduce the bioavailability of CYP substrates. 15 The combination used of immunosuppressants in RA treatment can increase the risk of serious infection.16,17
DDIs are an unwanted outcome, particularly in older patients receiving polypharmacy. 18 Potential DDIs can be assessed using online databases, with healthcare providers typically using subscribed databases and patients often utilizing free-access ones. We selected Micromedex as our primary subscription-based database, as it is widely utilized by healthcare professionals. In addition, WebMD was chosen as a free online resource to compare DDIs with Micromedex. This choice was based on the similar severity classification systems used by both platforms, which allowed for a comparable statistical analysis. However, a pilot study we conducted revealed the retrieval of occasionally discordant information from different types of databases in terms of the DDI number and severity of interactions. This can potentially cause misunderstandings between doctors and patients, but this issue remains largely unaddressed, especially concerning RA patients. The primary aim of this study was therefore to explore the concordance between DDI analyses using electronic database predictions and real-world co-prescription in RA patients at the Chakri Naruebodindra Medical Institute (CNMI), Samut Prakan, Thailand.
While established electronic databases are critical for identifying DDIs, their rigid, alert-based systems often fail to consider the specific clinical context required in complex fields like rheumatology. To address this gap, this study examines the use of artificial intelligence (AI), specifically large language models (LLMs), such as Google Gemini and ChatGPT, in clinical decision support.19,20 We pioneer a novel, three-way comparative analysis, directly benchmarking LLM performance against both traditional databases and the gold standard of expert clinical opinion. A secondary aim was to compare the performance of leading state-of-the-art LLMs as “judges,” evaluating and suggesting clinically significant DDIs in RA against expert rheumatologist practices. Information obtained from this comparative analysis might aid in improving treatment safety and outcomes for RA patients.
Methods
Drug selections
The drugs selected for study were medications commonly prescribed to RA patients (i.e., RA medicines and other common drugs), especially for noncommunicable disease (NCDs), such as diabetes mellitus, hypertension, hyperlipidemia, atherosclerotic vascular diseases, and osteoarthritis, as determined using Thailand’s National List of Essential Medicines 2022 and prescriptions at CNMI in 2024. A total of 75 drugs were gathered for analysis on January 10, 2025 (Table 1). The systematic search of potential DDIs in the electronic databases was conducted from January 11 to February 27, 2025. The evaluation of LLM performance and expert benchmarking was completed between September 1 and September 30, 2025.
Commonly prescribed medicines in patients with rheumatoid arthritis.
DMARDs, disease-modifying antirheumatic drugs; JAK, Janus kinase; SYSADOA, symptomatic slow-acting drug of osteoarthritis.
Database comparisons
Potential DDIs were identified using the Micromedex and WebMD databases. Micromedex, a part of the IBM Watson Health company, is a subscription-based database used by healthcare professionals. Access to the database was provided by a Mahidol University license (2025). WebMD is an online open-access database commonly used by patients. Our comparison of the two databases was made on February 27, 2025, to avoid search bias. Micromedex identified 464 potential DDIs, while WebMD showed 600, along with their interaction mechanisms.
Classification of severity and documentation
Both databases reported potential DDIs, mechanisms of interactions, and severity classifications. Micromedex classified severity into contraindicated (concomitant use of two interacting drugs should be avoided), major (interaction may be life-threatening and/or may require additional medical intervention to minimize/prevent adverse effects), moderate (interaction may exacerbate the patient’s condition and/or require therapy changes), and minor (interaction has only limited clinical effects and requires no major therapy changes). 21 WebMD classified DDIs into four categories: do not use together (high risk of dangerous interaction), serious (interaction may require doctor monitoring or a change in therapy), monitor closely (interaction is possible, so doctor monitoring is needed), and minor (interaction is unlikely). 22 The documentation levels reported by Micromedex were excellent (interactions established in well-controlled studies), good (interactions documented but not by well-controlled studies), fair (interactions suspected based on pharmacological considerations), and unknown. 21
Study design and patient selection for the determination of potential DDIs
This retrospective observational study was conducted at CNMI with approval from the Human Research Ethics Committee, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Thailand (Approval number MURA2024/210, approval date March 17, 2024, renewal date March 18, 2026). The study included patients diagnosed with RA and at least one NCD (e.g., diabetes mellitus, hypertension, hyperlipidemia, atherosclerotic vascular diseases, and/or osteoarthritis) from January 1, 2024, to December 31, 2024. Patients were identified using the ICD-10 codes for these conditions. The duration of patient data collection is from March 1, 2025, to May 10, 2025, to collect patient demographic data, the total drugs used by each patient, and possible adverse events related to DDIs. The reporting of this study conforms to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE, available information in Supplemental File). 23
Inclusion and exclusion criteria
Patients with RA, diagnosed or confirmed by experienced rheumatologists and followed by a rheumatologist in the Internal Medicine Department at CNMI, were recruited for the study. The included patients also had at least one NCD (diabetes mellitus, hypertension, hyperlipidemia, atherosclerotic vascular diseases, and/or osteoarthritis). Patients diagnosed with any cancer were excluded because cancer therapy could interfere with the study results. Medications regularly used by the patients during the study period were collected through the hospital’s electronic medical records.
Evaluation of LLMs as a clinical decision support tool
We evaluated the performance of LLMs in DDI analysis using a systematic approach involving the selection of relevant DDIs, the design of standardized prompts, and a comparison between LLM outputs and those of a clinical expert.
AI tools, versions, and access
The study, conducted in September 2025, used the following state-of-the-art LLMs: OpenAI’s ChatGPT (GPT-5 series, using both “Instant” and “Thinking” versions) and Google’s Gemini (2.5 series, using both “2.5 Flash” and “2.5 Pro” versions). A practical clinical workflow was simulated by conducting all interactions with the LLMs through their standard, publicly available web-based chat interfaces rather than via an application programming interface. This approach was chosen specifically because it mirrors how people in a real-world setting would typically use these tools to ask whether there is an interaction between drugs. The specific portals used were chatgpt.com for ChatGPT and gemini.google.com/app for Gemini. These models were tasked with systematically comparing, synthesizing, and reconciling the DDI outputs from the Micromedex and WebMD databases using a structured prompt. In this framework, the LLMs were evaluated specifically as aggregators and arbiters of existing DDI resources rather than as autonomous clinical reasoning systems interpreting primary patient data. The “2.5 Flash” and “Instant” versions were chosen to assess rapid-response capabilities, while the “2.5 Pro” and “Thinking” versions were selected to evaluate in-depth analytical and reasoning performance. The conceptual workflow is illustrated in Figure 1.

A conceptual workflow diagram. The workflow illustrates the integration of rigid database alerts with AI arbitration, ultimately validated against real-world specialist expertise to assess clinical relevance.
Selection of clinically relevant DDIs
Based on findings from our real-world co-prescription analysis, we selected common serious DDI pairs for our LLM evaluation. This list, which includes critical interactions, represents common and clinically significant scenarios encountered in rheumatology practice in Thailand. This ensured that the evaluation remained focused on interactions with direct relevance to patient safety.
Prompt engineering and LLM queries
A standardized prompt, designed to query LLMs for each DDI pair, was structured to be specific and to provide necessary clinical context by requesting an assessment of the interaction’s severity, the underlying mechanism, and recommendations for clinical management. The exact prompt is in Supplemental File.
Comparison of LLM outputs and expert opinions
The outputs generated by each of the four LLM instances were collected and anonymized. A board-certified rheumatologist from Ramathibodi Hospital, Mahidol University, with over 10 years of clinical experience, was recruited to serve as the expert gold standard. To ensure objectivity, the expert was blinded to both the database classifications and the LLM-generated outputs during the assessment process. This expert was provided with the same common DDI pairs and asked to provide assessment and management recommendations independently of the LLM outputs. The LLM-generated information was then compared against expert opinion to assess its accuracy, completeness, and clinical relevance. Discrepancies were noted and categorized to identify potential weaknesses in the LLM-based analysis.
Statistical analysis
Since the two DDI databases used in our study have four levels of severity with different names, we converted the severity of potential DDIs into a standardized five-point ordinal scale. This process was essential for harmonizing the data and enabling direct comparison. We scored “contraindicated” or “do not use together” scored as 1, “major” or “serious” as 2, “moderate” or “monitor closely” as 3, “minor” or “minor interaction” as 4, “none” as 5. This numerical conversion provided a distinct benefit by allowing an LLM to efficiently determine a consensus when the ordinal ratings from the two databases were discordant. The level of agreement between the two databases was evaluated using Brennan–Prediger Kappa values ranging from −1 (perfect disagreement) to 1 (perfect agreement), with 0 representing agreement expected by chance. We followed the guidelines of Landis and Koch to interpret the kappa values. 24 For intraclass correlation of the Brennan–Prediger kappa, a value of 1.00–0.81 was considered almost perfect agreement, 0.80–0.61 substantial agreement, 0.60–0.41 moderate agreement, 0.40–0.21 fair agreement, and 0.20–0.01 slight agreement.
Results
Potential DDIs of frequently prescribed medications in RA patients determined by Micromedex and WebMD
The 75 medications frequently prescribed in patients with RA had 753 database-predicted potential DDIs. Micromedex reported 464 DDI pairs, while WebMD reported 600 (Figure 2). Of the 464 Micromedex DDI pairs, 4 (0.9%), 345 (74.4%), 109 (23.4%), and 6 (1.3%) were classified as contraindicated, major, moderate, and minor, respectively, whereas 3 (0.5%), 114 (19.0%), 355 (59.2%), and 128 (21.3%) of the 600 pairs reported by WebMD were classified as contraindicated, serious, monitor closely, and minor, respectively. Overall, 311 DDIs were reported in both databases, 289 DDIs were reported by WebMD only, and 153 DDIs were reported by Micromedex only. The analysis yielded a Brennan–Prediger kappa value of 0.3161 (standard error (SE) = 0.0310, p < 0.001), suggesting fair agreement regarding the potential DDIs between the two electronic databases.

Severity classification of database-predicted potential DDIs among frequently prescribed medications in RA patients with NCDs, as determined by Micromedex and WebMD.
Among the 753 database-predicted potential DDIs, 47 were serious DDIs classified as contraindicated or major in both databases. The main mechanisms of these DDIs were immunosuppression (27%), methotrexate toxicity (18%), and nephrotoxicity (12%; Figure 3). The three major culprit drugs were tacrolimus (12%), methotrexate (10%), and infliximab (6%; Figure 4). A list of these serious DDIs with more details is provided in Supplemental Table S1. However, a large number of database-predicted potential DDIs were classified into different severity classifications that showed poor concordance between databases, with some interactions classified as contraindicated or major by one database but as minor or no interaction by the other. For example, coadministration of danazol and ezetimibe was classified as contraindicated by Micromedex, but WebMD indicated no interaction (Supplemental Table S2).

The mechanisms of serious database-predicted potential DDIs, classified as contraindicated or major by Micromedex and WebMD.

The culprit drugs that involved serious database-predicted potential DDIs were classified as contraindicated or major by Micromedex and WebMD.
Real-world co-prescribing patterns of potential DDIs in RA patients
This study included 259 Thai patients diagnosed with RA and NCDs. The majority were female (85.38%), with a mean age of 62.7 ± 12.1 years (range 23–89 years). The three most prevalent underlying NCDs among these patients were hypertension (38.6%), dyslipidemia (36.3%), and osteoarthritis (22.4%; Table 2). The median number of medications prescribed to these patients was eight items and included supplements (97.7%), immunosuppressive drugs (92.7%), analgesic drugs (75.3%), and corticosteroids (61.4%). The most frequently prescribed drugs were folic acid (242/259), methotrexate (232/259), prednisolone (159/259), hydroxychloroquine (136/259), and vitamin D (127/259).
Demographic data and clinical characteristics of RA patients with NCDs.
ASCVD, atherosclerotic cardiovascular disease; BMI, body mass index; DMARDs, disease-modifying antirheumatic drugs; JAK, Janus kinase; SYSADOA, symptomatic slow-acting drug of osteoarthritis.
In these patients, 9468 drug pairs were identified, with 207 pairs showing serious real-world co-prescribing patterns with potential DDIs according to Micromedex and WebMD. The majority of these serious potential DDIs were methotrexate/naproxen (46.4%) and methotrexate/sulfasalazine (29.5%), being the most common (Table 3). Using 207 pairs of these potential DDIs for analysis yielded a Brennan–Prediger Kappa value of 0.9807 (SE = 0.0136, p < 0.001), suggesting conditional agreement among preselected serious DDIs. Further analysis of the pairs of these potential DDIs with serious mechanisms, such as methotrexate toxicity, identified 148 patients who received methotrexate toxicity DDIs, with 19 (12.8%) reporting methotrexate toxicity symptoms (e.g., myelosuppression, hepatotoxicity, serious infection, and mucositis).
Potential serious DDIs and related medications in RA patients with NCDs as determined by Micromedex and WebMD.
CNS, central nervous system; DDIs, drug–drug interactions; MTX, methotrexate; NCDs, noncommunicable disease; RA, rheumatoid arthritis.
Among the 148 patients who received MTX, they were divided into two subgroups for analysis. The first 19 patients who suspected MTX toxicity have a mean age of 69.6 ± 8.0 years with an average creatinine of 0.76 ± 0.16 mg/dL, an average estimated glomerular filtration rate (eGFR) of 81.91 ± 16.87 mL/min/1.73 m², and an average alanine transaminase (ALT) of 28.6 ± 34.5 unit/L. The latter 129 patients who do not show MTX toxicity signs have a mean age of 60.8 ± 10.6 years, with an average creatinine of 0.76 ± 0.20 mg/dL, an average eGFR of 88.68 ± 15.58 mL/min/1.73 m², and an average ALT of 24.1 ± 19.0 unit/L (Table 4). When we focused on renal clearance between the two subgroups, the difference in renal clearance of the two groups was analyzed by a T-test using the mean difference of eGFR. This yielded a T-value of 1.65 (df = 23) and a p-value of 0.11, suggesting that the difference in renal clearance is not statistically significant between the two groups.
Demographic data and clinical characteristics of RA patients who received MTX with potential DDI toxicity.
ALT, alanine transaminase; DDIs, drug–drug interactions; eGFR, estimated glomerular filtration rate; MTX, methotrexate; RA, rheumatoid arthritis.
Performance of LLMs in DDI evaluation
An analysis of the three most common and serious DDIs in real-world practice was conducted as an exploratory proof of concept. The results revealed a notable variation in recommendations between LLMs and the established “gold standard” expert rheumatologist opinion (Table 5). Quantitatively, the Gemini 2.5 Pro and both ChatGPT models achieved 100% (3/3) binary agreement with the expert regarding the core decision to “continue with monitoring.” By contrast, the Gemini 2.5 Flash model showed 0% (0/3) agreement by recommending the total avoidance of all three combinations. The expert practice for combinations like methotrexate + naproxen, methotrexate + sulfasalazine, and methotrexate + celecoxib was to continue the medications and ensure regular monitoring, reflecting a common pragmatic clinical approach.
Drug interaction recommendations: databases versus real-world clinical practice.
CBC, complete blood count; LFTs, liver function tests; MTX, methotrexate; NSAID, non-steroidal anti-inflammatory drug.
The LLM performances varied significantly depending on the model’s sophistication (Tables 6 and 7). Both the “Instant” and “Thinking” ChatGPT models classified the interactions as Level 2 (“Use with caution”), although the “Thinking” model provided a more cautious opinion, such as preferring to “avoid chronic co-use” of methotrexate and naproxen, aligning with a risk-mitigation strategy. Both models suggested appropriate safer alternatives, including acetaminophen, and escalating to biological DMARDs. The Gemini models showed a distinct difference, as the Gemini 2.5 Flash model adopted a highly conservative stance, classifying the methotrexate + naproxen and methotrexate + celecoxib interactions as Level 2 but recommending to “avoid this combination” and switch to an alternative. By contrast, the Gemini 2.5 Pro model’s recommendations were remarkably aligned with the expert rheumatologist’s opinion, as it recommended “Use with Caution and Enhanced/Standard Monitoring” for all three interactions. Its clinical rationale was also highly detailed, advising patient education on toxicity signs and cautioning against concurrent over-the-counter NSAID use, mirroring real-world clinical advice.
Drug interaction recommendations: ChatGPT (GPT-5) models.
LLM, large language models; NSAID, non-steroidal anti-inflammatory drug; TNF, tumor necrosis factor.
Drug interaction recommendations: Google Gemini models.
CBC, complete blood count; LFTs, liver function tests; LLM, large language models; NSAID, non-steroidal anti-inflammatory drug; TNF, tumor necrosis factor; RA, rheumatoid arthritis.
Discussion
Polypharmacy and comorbidity might lead to potential DDIs in RA patients
This study aims to determine polypharmacy and potential DDIs, among RA and NCD drugs by using electronic database predictions in comparison with real-world co-prescription and LLMs. Cardiovascular disease, respiratory disease, and infections, as the major causes of RA patient deaths, 25 are often linked to systemic inflammation and to some medications. RA can occur at any age, but its onset seems more frequent among middle-aged and elderly, and especially female, patients. 26 In agreement with these global trends, more than 80% of our RA patients were female, with an average age of 62 years. Approximately one-third of our RA patients had NCDs, including hypertension (38.6%) and dyslipidemia (36.3%), again reflecting global trends of hypertension or dyslipidemia in approximately 30%–60% of older adults.27,28 Therefore, polypharmacy in older RA patients with NCDs is quite common in clinical practice. Polypharmacy is defined as five or more medications in one patient, with 10 or more medications defined as excessive polypharmacy. 29 Our RA cohort was intermediate between these two, with a median of eight medications per patient. This finding prompted our exploration of potential DDIs in our cohort, with the expectation that our findings might be useful in Thailand and worldwide, given the global prevalence of polypharmacy in older RA patients with NCDs. Management of potential DDIs before they occur is preferable to treatment failure or drug toxicity in these patients.
Divergence of potential DDIs predicted by electronic databases and in real-world co-prescription
Potential DDIs can be determined using several different methods. The most convenient method is currently to use electronic databases—either subscription-type or open-access databases. For medical personnel, copyrighted databases, such as Micromedex, are popular choices for searching for potential DDIs and subsequently informing or educating patients. Micromedex can classify the severity of potential DDIs with documentation, 21 thereby enabling healthcare professionals to make decisions based on information obtained from the database and select appropriate drug regimens for their patients’ consultations. However, the annual subscription fee is prohibitive, so only university or tertiary hospitals have access to these copyrighted databases in developing countries. Conversely, even patients can freely access open-access electronic databases such as WebMD to check for potential DDIs. 21 However, this could lead to conflicts between patients and healthcare professionals, as some studies have reported discordance in the identification of potential DDIs by different electronic databases for various diseases.30,31 Our in silico prediction using 75 medications frequently prescribed for RA patients with NCDs yielded a Brennan–Prediger kappa value of 0.3161, suggesting fair agreement for the potential DDIs identified by the Micromedex and WebMD databases.
In real-world co-prescription at Ramathibodi Chakri Naruebodindra Hospital, more than half of our RA cohort received supplements (97.7%), immunosuppressive drugs (92.7%), analgesic drugs (75.3%), corticosteroids (61.4%), antimalarials (56.0%), and DMARDs (55.6%). Among the almost 10,000 drug pairs, only 2.2% (207/9468) showed serious real-world co-prescribing patterns with potential DDIs in both electronic databases, and these were mainly related to methotrexate. The most common potential DDI pair was methotrexate/naproxen (46.4%, 96/207) and methotrexate/sulfasalazine (29.5%, 61/207), with good concordance between both databases and the Brennan–Prediger Kappa value of 0.9807, suggesting conditional agreement among the preselected serious DDIs. The potential DDIs with methotrexate were further evaluated based on patient records. Approximately 12.8% (19/148) of our cohort showed possible methotrexate toxicity in the form of myelosuppression, hepatotoxicity, infection, and mucositis. Therefore, mechanistic insights into methotrexate toxicity are another topic for consideration. Methotrexate, which is widely used as a chemotherapy agent for the treatment of leukemia, lymphomas, and solid tumors, is considered an antimetabolite that inhibits folate metabolism. 32 At low doses, methotrexate works as a DMARD in RA patients by dampening the overactive immune response, primarily through its influence on adenosine signaling, thereby reducing inflammation, pain, and swelling, and ultimately slowing down joint damage. 32 Concomitant use of methotrexate with other DMARDs, such as sulfasalazine, could potentiate pharmacodynamic effects, especially by modulating or suppressing immune activity, thereby reducing inflammation and joint damage. However, adverse events due to methotrexate can be predicted from its pharmacokinetics. Methotrexate, as a water-soluble molecule, primarily distributes into the total body water and is mainly excreted by the renal route. 33 Consequently, coadministration with other medicines that can potentially interfere with kidney functions could trigger methotrexate toxicity. One example is sulfasalazine, a common DMARD indicated for long-term use in RA patients. Coadministration of methotrexate with this highly protein-bound drug can increase methotrexate plasma concentrations, thereby elevating the risk of methotrexate toxicity. Naproxen, another common NSAID given to RA patients, can inhibit prostaglandin synthesis and reduce renal blood flow. Celecoxib is known as a COX-2 inhibitor and is also clinically used as an NSAID in RA. This medicine can reduce renal blood flow and inhibit tubular secretion MTX via renal organic anion transporters (OATs). 34 These phenomena could potentially impair methotrexate clearance and lead to methotrexate toxicity. 35 Therefore, concomitant administration of methotrexate and renal-impairing medicines could clearly trigger methotrexate toxicity. Our results align with the findings of Boeing et al. (2025), which reported that 81.1% of 370 RA patients experienced polypharmacy. This prevalence was notably higher among elderly patients with comorbidities, potentially increasing the risk of serious DDIs. 16 By contrast, Colebatch et al. (2011) found that the concurrent use of NSAIDs and methotrexate appears to be safe, provided that appropriate monitoring was performed. 36 Adequate communication between healthcare professionals and their RA patients with NCDs is essential, as polypharmacy in these patients is common.
Implications of LLM performance for clinical decision support
This study addressed the gap in knowledge regarding the usefulness of LLMs by introducing a novel paradigm for their use in clinical decision support. Our findings show that the more advanced models, particularly Gemini 2.5 Pro, demonstrated a capacity for complex clinical reasoning that surpassed the rigid alerts offered by traditional databases. While the databases flagged these pairs as “major” interactions, the advanced LLM provided an actual recommendation—“Use with Caution and Standard Monitoring”—that was almost identical to the real-world expert consensus. This suggests that sophisticated LLMs may be better at bridging the gap between theoretical risk and clinical pragmatism.
The heterogeneity observed among the LLM outputs has significant implications for their future roles in healthcare. The performance of simpler models like Gemini 2.5 Flash, which defaulted to a highly risk-averse “avoid combination” recommendation, highlights a potential pitfall. This advice, if followed without clinical oversight, could result in unnecessary discontinuation of effective therapy, thereby mirroring the challenges posed by overly sensitive database alerts. By contrast, the sophisticated outputs from Gemini 2.5 Pro and, to a lesser extent, ChatGPT-5 Thinking represent a step toward AI-assisted clinical judgment. Gemini 2.5 Pro did not simply classify risk; it provided a management plan that included patient education and specific monitoring parameters, closely emulating an expert consultation. The ability of these advanced models to consistently identify appropriate safer alternatives, such as acetaminophen or topical NSAIDs for pain and escalating to biologics for disease control, confirms their robust underlying knowledge base. These findings suggest that the future role of LLMs is not to serve as an infallible authority but as a “clinical co-pilot.” Advanced models can serve as powerful tools for junior clinicians or those in general practice by providing a nuanced second opinion and summarizing key management considerations.
However, the integration of LLMs into clinical practice also introduces significant medicolegal and safety considerations. Currently, no formal regulatory framework exists to assign accountability when an AI-generated recommendation leads to an adverse clinical outcome. Because these models can produce hallucinations or overly conservative advice, they must function strictly as decision-support tools. The clinician remains the sole responsible party for the final therapeutic decision, integrating AI suggestions with direct patient consultation.
Limitations
A limitation of this study is its single-center design at a tertiary care hospital with only 200–300 RA patients, which may not reflect the full complexity of RA patients in different clinical settings. In addition, because medication selection occurred between 2024 and 2025, the findings may not fully account for potential DDIs arising from new medications released in 2026 or later. The usage of NSAIDs in most cases is temporarily administered for a specific time. Therefore, it is difficult for health care professionals to monitor the potential DDIs, especially when patients are in their homes or outside the hospital. Our kidney and liver markers of the suspected MTX toxicity patients showed neither statistically nor clinically significant changes in comparison with the normal group. Real-time and remote monitoring of MTX/NSAIDs toxicity in RA patients might be required to obtain solid results of these potential DDIs. Regarding our AI analysis, the evaluation was limited to a small sample of DDI pairs as an exploratory proof of concept. Finally, the study design involves circularity, as LLMs were prompted with database outputs rather than primary clinical data, evaluating them as aggregators rather than independent reasoning systems.
Conclusion
A large number of potential DDIs were detected among medicines frequently prescribed for patients with RA. A significant discordance was found between the predictions of electronic databases and the pragmatic realities of clinical practice, as well as among different tiers of LLMs. Our evaluation revealed that while simpler models like Gemini 2.5 Flash provided highly conservative recommendations to avoid common drug pairings, more advanced models, particularly Gemini 2.5 Pro, offered advice that closely mirrored the expert rheumatologist consensus of continuing treatment with careful monitoring. Advanced LLMs may assist in contextualizing database alerts, but current evidence supports only exploratory use under expert supervision. These findings underscore how careful data interpretation, whether from a traditional database or sophisticated AI, remains a critical responsibility of the clinician. Ultimately, a combination of insights from modern decision support tools, deep clinical expertise, and direct patient communication is necessary to safely manage polypharmacy and prevent harmful DDIs in this complex patient population. Since polypharmacy is a common phenomenon in most RA patients, patient education to monitor important signs of serious DDIs from frequently prescribed medicines is essential for patient safety. In addition, patient self-record and emergency consultation to a healthcare professional for suspected serious DDIs might be a useful strategy for RA patient management.
Supplemental Material
sj-docx-1-taw-10.1177_20420986261460173 – Supplemental material for Drug interactions in rheumatoid arthritis: a disparity between electronic database prediction and real-world co-prescription with different large language models
Supplemental material, sj-docx-1-taw-10.1177_20420986261460173 for Drug interactions in rheumatoid arthritis: a disparity between electronic database prediction and real-world co-prescription with different large language models by Tonson Lalitkanjanakul, Dhanesh Pitidhammabhorn, Natnicha Jakramonpreeya, Werapat Suechanyapong, Supawit Tangpanithandee and Phisit Khemawoot in Therapeutic Advances in Drug Safety
Footnotes
Acknowledgements
Declarations
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
