Prediction of Antibiotic Resistance Phenotypes and Minimum Inhibitory Concentrations in Salmonella Using Machine Learning Analysis of Its Pan-Genome and Pan-Resistome Features

Abstract

Traditional experimental methods for determining antibiotic resistance phenotypes (ARPs) and minimum inhibitory concentrations (MICs) in bacteria are laborious and time consuming. This study aims to explore the potential of whole-genome sequencing data combined with machine learning models for robustly predicting ARPs and MICs in Salmonella. Using a training set of 6394 Salmonella genomes alongside antimicrobial susceptibility testing results, we built two machine learning (ML) predictive models based on the pan-genome and pan-resistome. Each model was implemented using three algorithms: random forest, extreme gradient boosting (XGB), and convolutional neural network. Among them, XGB achieved the highest overall accuracy, with the pan-genome and pan-resistome models accurately predicting ARPs (98.51% and 97.77%) and MICs (81.42% and 78.99%) for 15 commonly used antibiotics. Feature extraction from pan-genome and pan-resistome data effectively reduced computational complexity and significantly decreased computation time. Notably, fewer than 10 key genomic features, often linked to known resistance or mobile genes, were sufficient for robust predictions for each antibiotic. This study also identified challenges, including imbalanced resistance classes and imprecise MIC measurements, which impacted prediction accuracy. These findings highlight the importance of using multiple evaluation metrics to assess model performance comprehensively. Overall, our findings demonstrated that ML, utilizing pan-genome or pan-resistome features, was highly effective in predicting antibiotic resistance and identifying correlated genetic features in Salmonella. This approach holds great potential to supplement conventional culture-based methods for routine surveillance of antibiotic-resistant bacteria.

Salmonella is the leading cause of foodborne illnesses, hospitalizations, and deaths among major pathogens (Mkangara, 2023). The emergence and prevalence of antibiotic resistance genes (ARGs) in Salmonella and other pathogens has rendered many antibiotics ineffective (Zagaliotis et al., 2022). Conventional culture-based methods for antibiotic susceptibility testing are labor intensive and time consuming, making them unsuitable for large-scale or timely surveillance. The growing availability of genomic data and resistance databases enables antibiotic resistance prediction directly from whole-genome sequencing (WGS) data, complementing traditional methods.

Strategies for predicting antibiotic resistance from WGS have been reported in several pathogens including Mycobacterium tuberculosis (Bradley et al., 2015; Yang et al., 2019), Staphylococcus aureus (Bradley et al., 2015; Gordon et al., 2014), Escherichia coli (Her and Wu, 2018; Moradigaravand et al., 2018; Stoesser et al., 2013), Klebsiella pneumoniae (Nguyen et al., 2018; Stoesser et al., 2013), Neisseria gonorrhoeae (Eyre et al., 2017), and Salmonella (Maguire et al., 2019; Nguyen et al., 2019). The most frequently used approach for prediction has been the rule-based method (Camacho et al., 2009), which relies on the known genetic determinants of antibiotic resistance. However, this approach is incapable of identifying unknown resistance mechanisms. In the past decade, machine learning (ML) algorithms have increasingly been used to predict antibiotic resistance directly from WGS data. Some methods focus on distinguishing antibiotic resistance phenotypes (ARPs) (Bradley et al., 2015; Gordon et al., 2014; Her., 2018; Maguire et al., 2019; Moradigaravand et al., 2018; Stoesser et al., 2013; Yang et al., 2019), while others aim to predict minimum inhibitory concentrations (MICs) (Eyre et al., 2017; Nguyen et al., 2018, 2019).

A major challenge in using ML for antibiotic resistance prediction is creating large-scale training sets from WGS data. Some studies have successfully used short nucleotide k-mers as features and laboratory-derived ARPs or MICs as labels to build ML models (Maguire et al., 2019; Nguyen et al., 2018, 2019). K-mer counting reduces preprocessing and includes both coding and noncoding region data, but it is computationally expensive and memory intensive. To address this, a novel pan-genome approach has been introduced to predict ARPs (Her and Wu, 2018; Moradigaravand et al., 2018), which uses single nucleotide polymorphisms (SNPs) from core genomes and accessory gene presence or absence to represent pan-genomic diversity. This method offers a computationally more efficient alternative to k-mer counting. In addition, we recently proposed the concept of pan-resistome (He et al., 2020), which built upon the ideas of the pan-genome (Tettelin et al., 2005) and resistome (Wright, 2007). The pan-resistome includes all ARGs in a given environment or bacterial group, primarily consisting of latent ARGs (Inda-Díaz et al., 2023). It has been used to study host adaptability and antibiotic resistance (Marques et al., 2022) and to identify transferable ARGs and elements (Andrade-Oliveira et al., 2020; Decano et al., 2021). However, the predictive power of the pan-genome for MICs and the pan-resistome for both ARPs and MICs remains unconfirmed.

The aim of this study is to identify the optimal model and determine the minimum set of key genomic features necessary for accurate and robust predictions of ARPs and MICs. In addition, various factors involved in model development were assessed to test their influence on prediction success. Our approach builds on pan-resistome and ML techniques, offering new potential for improving antibiotic resistance surveillance and management.

Materials and Methods

Metadata collection and data processing

Metadata of 14,376 Salmonella genome sequences and 1,32,058 antimicrobial susceptibility testing results collected from 1952 to 2019 in the PATRIC (https://www.patricbrc.org/) (Wattam et al., 2017) were downloaded for this study. ARPs were defined as susceptible (S), intermediate (I), or resistant (R) and MICs were collected for the top 15 antibiotics (ranked by counts) used for Salmonella in the PATRIC. To standardize MIC values and eliminate biases, we retained only those tested with “broth microdilution” or “MIC” methods and excluded strains without susceptibility data or genome sequences. After removing these unqualified data, a total of 6394 Salmonella genomes were remained and each of them possessed at least one ARP or MIC result. All MIC measurement signs were deleted, and the float values were kept unchanged except for the signs without “=” as previously described (Nguyen et al., 2018); for example, an MIC would be recorded as 2a if the MIC was >a mg/L, a/2 for <a mg/L and a for ≤a, ≥a or =a mg/L. ARPs were re-annotated based on the resized MICs and Clinical and Laboratory Standard Institute’s guidelines.

Feature extraction

To evaluate the performance of various genomic predictors, both genomes and resistomes (annotated via the CARD) [Jia et al., 2017]) were used for feature extraction. SNPs of core genomes and the presence or absence of accessory genes were selected as concise genomic features. Orthologous gene clusters were generated using Roary (Page et al., 2015; Tettelin et al., 2005) based on pan-genome classification. Clusters present in >99% of genomes were categorized as soft-core genomes. Core genome alignments were performed using MAFFT (Katoh and Standley, 2014). SNP extraction and matrix construction were performed using FastSNP (https://github.com/ZHY-soure/SalAmrPred). Each of the A, C, G, T and missing nucleotide in SNPs was one-hot encoded as follows: [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1], and [0, 0, 0, 0], respectively. For the accessory genomes, the presence or absence of a gene was encoded as [1, 1, 1, 1] and [0, 0, 0, 0]. Feature matrices in the shape of [n_genomes , (n_SNPs + n_{accessory genes} ), 4] were then constructed for both genomic and resistomic data.

A total of 80,667 orthologous clusters were identified from gene coding sequences, with 3,749 classified as soft-core genomes and the rest as accessory. In addition, 132 orthologous ARG clusters, including 17 core clusters, were identified from 1,95,736 ARGs. In total, 1,72,882 SNPs were detected in core genomes and 676 in core resistomes. These SNPs, along with the presence or absence of accessory genes, were one-hot encoded into matrices of [6394, 249800, 4] for genomes and [6394, 808, 4] for resistomes. Metadata tables for genomes and antimicrobial susceptibility results are available on GitHub (https://github.com/ZHY-soure/SalAmrPred).

Model construction

We employed three ML algorithms for model construction: random forest (RF) (Hu and Szymczak, 2023), extreme gradient boosting (XGB) (Chen and Guestrin, 2016), and a convolutional neural network (CNN) (Quang and Xie, 2016)). RF and XGB were implemented using Python’s scikit-learn and XGBoost packages, with decision trees as the base learners and default hyperparameters. For ARP prediction, we performed binary classification to distinguish between susceptible and resistant phenotypes, excluding intermediate ones. The CNN was implemented using Keras with TensorFlow on an RTX2080Ti GPU, with fine-tuned layer parameters to address memory limitations. The model structure included three Conv1d layers, Maxpool1d layers, and Dense layers. MaxPool1d was not applied to models using resistomes due to smaller matrices. All scripts are available at https://github.com/ZHY-soure/SalAmrPred.

Performance evaluation

A fivefold cross validation was used for model evaluation. The dataset was randomly split into training (80%) and test (20%) sets, with the cross-validation process repeated multiple times to ensure robust results. The metrics for evaluation included $A c curacy = (N_{T P} + N_{T N}) / N_{Total}$ , $F 1_{score} = \frac{N_{T P}}{N_{T P} + \frac{N_{F N} + N_{F P}}{2}},$ and the area under the receiver operating characteristic curve (AUROC), where N_TP , N_TN , N_FN , and N_FP denote the number of true positive, true negative, false negative, and false positive, respectively. For multilabel classification tasks, the F1-score and AUROC were computed using a “macro” strategy, which calculates metrics for each class and reports the unweighted means. An additional metric, imbalance ratio (IR), was used to assess data imbalance. The IR was computed as $I R = Max \times N_{labels} / N_{Total}$ , where Max is the largest class size, N_labels is the number of label categories, and N_Total is the total number of genomes. A greater IR value indicates a more imbalanced dataset.

Feature importance analysis

For each task, the best-performing model was used to extract key genomic features. Features with weights >0 in the respective models were deemed important. For each antibiotic, the minimum number of genomic features required for accurate prediction was determined. Features were ranked by their importance, and models were constructed using the top n features (from 0 to 200, incrementing by 1). Key features identified through this process were reannotated using the CARD via BLASTn, with results visualized in comparison matrices. The overall workflow is depicted in Supplementary Figure S1.

Results and Discussion

XGB classifier performance

The XGB classifier was selected due to its superior prediction accuracy. As shown in Figure 1, the XGB classifier outperformed others in most antimicrobial ARP predictions, completing 27 of 30 tasks with an accuracy of 98.51% (95% confidence interval [CI]: 97.84–99.18%). The average metric scores with their respective 95% CI for each algorithm and antibiotic were presented in Supplementary Tables S1 and S2. Previous studies show that ensemble methods, especially decision trees, outperform linear models due to their scalability and built-in feature selection (Moradigaravand et al., 2018; Nguyen et al., 2018, 2019). Consistently, the gradient boosting algorithm excelled in our study (Fig. 1). However, the CNN model did not achieve better accuracy, F1-scores, and AUROCs than that of XGB model (Fig. 1). Likewise, neural networks have not shown substantial improvements in other studies (Moradigaravand et al., 2018), suggesting that complex models do not necessarily outperform simpler ones in ARG prediction.

FIG. 1.

The overall metrics for individual prediction models or antibiotics. (a) Individual prediction models and predictors. (b) Individual antibiotics. AUROC, the area under the receiver operating characteristic curves; CNN, convolutional neural networks; RF, random forest; XGB, extreme gradient boosting.

The XGB classifier using genomes as predictors for MICs delivered the best performance in 21 out of 30 tasks, achieving an overall accuracy of 81.42% (95% CI: 75.34–81.50%) (Fig. 1). It was reported in prior studies that lower overall accuracies for predicting MICs, including 69% for K. pneumoniae (Moradigaravand et al., 2018), 53% for N. gonorrhoeae (Eyre et al., 2017), and 60% for Salmonella (Nguyen et al., 2019), were, respectively, achieved. The improved overall accuracy for MIC predictions in our study may stem from the larger Salmonella genome dataset used for training, making it one of the largest MIC prediction models to date. Previous studies struggled with small datasets (Maguire et al., 2019). Another factor could be our use of classifiers for MIC predictions, categorizing MIC levels into discrete classes, rather than treating them as continuous variables as in earlier regression models (Eyre et al., 2017; Moradigaravand et al., 2018; Nguyen et al., 2019). Regression models face challenges with imprecise, imbalanced MIC values, reducing accuracy. The ±1 doubling dilution factor used in past studies improves accuracy but does not fully address MIC measurement issues, such as the increased error for higher values and the use of inequalities for extreme MICs. Thus, classifiers may be better suited for MIC predictions than regressors.

Pan-genomic feature extraction was computationally light weight

The pan-genome approach was utilized as the feature extraction method for E. coli in previous studies (Her and Wu, 2018; Moradigaravand et al., 2018) and for Salmonella in this study. The pan-genome feature extraction method condensed genomic information into a significantly smaller matrix, requiring only 600 GB of raw access memory (RAM) to process 6394 genomes for training the XGB model. This is a considerable improvement over the k-mer method, which necessitated 1.5 TB of RAM for 4500 genomes (Nguyen et al., 2019). Notably, it took <1 h to train an XGB model in this study. Thus, the pan-genome method is particularly advantageous for managing large-scale data with limited computational resources and for producing quicker prediction results.

It was revealed by evaluation metrics for the XGB models that there were no significant differences between using genome or resistome predictors (Fig. 1). The models based on resistomes achieved an overall accuracy of 97.77% (95% CI: 96.16–99.38%) and F1-score of 0.88 (95% CI: 0.80–0.96) for ARPs, while for MICs, the accuracy was 78.99% (95% CI: 72.15%−85.83%) with an F1-score of 0.35 (95% CI: 0.30–0.41). Compared with the genome-based models, which required 600 GB of RAM and close to 1 h for training, models using resistomes took only 10 min and under 100 GB of RAM, yet delivered similar performance levels. Our findings indicate that known ARGs suffice for the rapid and accurate prediction of most antibiotic phenotypes. As bacteria evolve, new ARGs and genomic features may emerge, requiring a dynamic, updated list of predictors based on experimental results and ML-mined genomic features.

Impact of imbalanced resistance classes and imprecise MIC values on the accuracy of predictions

In this study, we employed unweighted F1-score and AUROC as additional evaluation metrics due to the potential for highly imbalanced ARP classes to skew accuracy when weighted by data. For instance, the models predicting ARPs for azithromycin (S: R = 3123:24) achieved a high average accuracy of 99.29% (95% CI: 99.20–99.38%), yet the F1-score and AUROC were only 0.58 (95% CI 0.49–0.68) and 0.68 (95% CI: 0.52–0.84), respectively (Fig. 2). Similar patterns were observed in the ARP predictions for ciprofloxacin, nalidixic acid, and trimethoprim/sulfamethoxazole (Fig. 2). In contrast, the models for the cephalosporin group (cefoxitin, ceftiofur, and ceftriaxone) demonstrated the highest F1-scores (≥0.96) and AUROC (≥0.98), attributed to their relatively balanced S: R ratios (Fig. 2) and distinct resistant mechanisms (see below), which could be easily mined by the models.

FIG. 2.

Antibiotic resistance phenotypes (ARPs) and minimum inhibitory concentrations (MICs) distributions and their corresponding metrics. The integers below S, I, R, and MICs refer to counts of genomes. Deeper red indicates higher counts. The float values below others represents metric values. Lighter blue represents the less desirability. Evaluation metrics on the left and right side belong to ARPs and MICs, respectively. Acc, accuracy; F1, F1-score; I, intermediate; IR, imbalance ratio; S, susceptible; R, resistant.

Imbalanced resistance classes and imprecisely labeled MIC values, as seen in this study (Fig. 2 and Fig. 3), likely hindered predictions. The low F1-score resulted from the model’s difficulty distinguishing less frequent MICs. For example, ceftriaxone achieved high accuracy (99.80%) for 0.25 mg/L, found in 85% of strains, but poor performance for other MICs, with some accuracies dropping to 0.00% (Figs. 2 and 3). As a result, the ceftriaxone model had an overall accuracy of 90.27% (95% CI: 89.07%−93.45%) but a low unweighted F1-score of 0.24 (95% CI: 0.16–0.33).

FIG. 3.

Confusion matrices of predicting minimum inhibitory concentrations (MICs) via extreme gradient boosting (XGB) classifiers for 15 antibiotics. Each matrix represents the ability to distinguish individual MICs based on the test dataset of respective antibiotic labeled below. Numbers aligned on the diagonal for models that exhibited excellent performance. Otherwise, misclassifications were written in the other cells of MICs in the same row. The darker the cell, the higher the genome counts. MICs within the green and pink squares belong to the susceptible and resistant ranges, respectively.

The impact of data imbalance on model performance was assessed using IR values based on ARP and MIC distributions (Fig. 2 and Fig. 3). Models performed better for ARPs (higher F1 scores and AUROCs) when IR <1.85. However, IR >5 reduced MIC predictive accuracy, although IR <5 did not always ensure high scores due to confounding factors. Higher IRs consistently impaired predictions despite high overall accuracy. Balanced datasets are crucial for improving model performance, emphasizing the importance of balanced data in future studies.

Identification of key genomic features on robust prediction for each antibiotic

To identify key genomic features for each antibiotic, we extracted features with weights >0 from the four XGB classifiers and ranked them by weight. For ARPs, all antibiotics performed well with fewer than 10 features (Fig. 4), especially β-lactams, which needed only one or two features. The main features enhancing metrics were ARGs conferring resistance, as annotated in PATRIC and CARD (Table 1). For β-lactams, key features included bla _CMY and bla _TEM alleles (Cole et al., 2015; Folster et al., 2014); for aminoglycosides, inactivation enzymes such as aadA, AAC(3′), and APH(3′) (Ahmed et al., 2004; Stern et al., 2018; Stoesser et al., 2013); and for other antibiotics, genes such as floR, sul, and tet (Antunes et al., 2005; Cloeckaert et al., 2000; Zhang and LeJeune, 2008) (Table 1). While key features were primarily ARGs, other high-ranking features, such as mobile genetic elements recognized as auxiliary elements in ARG transfer (Li et al., 2018), also contributed to model performance. However, not all the related ARGs could be extracted from genomes, particularly for antibiotics azithromycin, ciprofloxacin, and nalidixic acid correlated with mrx (Poole et al., 2006) and qnrB (Jacoby et al., 2006). It was reported that azithromycin resistance was also determined by mutations in 23S rRNA (Chisholm et al., 2010; Zhang and van der Veen, 2019), which was not included in this study because only coding sequences were used. Most features were from the accessory pan-genome, with a few SNPs in parC and gyrA contributing to quinolone resistance (Bae et al., 2013; Nathania et al., 2022), which were not detected in this study.

FIG. 4.

The change of accuracy, F1-score and the area under the receiver operating characteristic curves (AUROC) as the feature number increases. Each line represents one antibiotic. (a)–(c) Prediction of antibiotic resistance phenotypes (ARPs) using raw genomes as predictors. (d)–(f) Prediction of ARPs using resistomes as predictors. (g)–(i) Prediction of minimum inhibitory concentrations (MICs) using raw genomes as predictors. (j)–(l) Prediction of MICs using resistomes as predictors.

Table 1.

Important Genomic Features for ARPs Prediction

Antibiotics	Rank	PATRIC annotations	CARD annotations
Amoxicillin/clavulanic acid	1, 1*	Class C β-lactamase	bla _CMY-2
Ampicillin	1, 2*	Class A β-lactamase	bla _TEM-40
	2, 1*	Class C β-lactamase	bla _CMY-2
Azithromycin	1*	Hypothetical protein	mrx
	6*	Transcriptional regulatory protein PhoP	phoP
Cefoxitin	1, 1*	Class C β-lactamase	bla _CMY-2
Ceftiofur	1, 1*	Class C β-lactamase	bla _CMY-2
Ceftriaxone	1, 1*	Class C β-lactamase	bla _CMY-2
Chloramphenicol	1, 1*	Chloramphenicol resistance, MFS efflux pump	floR
	3, 2*	Chloramphenicol resistance, MFS efflux pump	cmlA6
Ciprofloxacin	2*	Pentapeptide repeat protein QnrB family	qnrB19
Gentamicin	1, 2*	Aminoglycoside 3″-nucleotidyltransferase	aadA
	14, 1*	Aminoglycoside N(3)-acetyltransferase	AAC(3′)-VIa
Kanamycin	1, 1*	Aminoglycoside 3′-phosphotransferase	APH(3′)-Ia
	2*	Aminoglycoside 3′-phosphotransferase	APH(3′)-IIa
Nalidixic acid	9*	Pentapeptide repeat protein QnrB family	qnrB19
Streptomycin	2, 2*	Aminoglycoside 3′-phosphotransferase	APH(3′)-Ia
	3, 1*	Aminoglycoside 3″-nucleotidyltransferase	aadA2
Sulfisoxazole	1, 2*	Dihydropteroate synthase type-2	sul1
	2, 1*	Dihydropteroate synthase type-2	sul2
Tetracycline	1, 1*	Tetracycline resistance MFS efflux pump	tet(B)
	3	Tetracycline resistance MFS efflux pump	tet(A)
	6	Tetracycline resistance MFS efflux pump	tet(G)
	8, 9*	Tetracycline resistance MFS efflux pump	tet(C)
Trimethoprim/sulfamethoxazole	2, 2*	Dihydrofolate reductase	dfrA12
Trimethoprim/sulfamethoxazole	3, 4*	Dihydrofolate reductase	dfrA15

N and N^*refer to the rank of features in models using raw genomes and resistomes, respectively.

ARPs, antibiotic resistance phenotypes.

For MIC models, no new genomic features were identified, as key features overlapped with those in ARP predictions. Comparison matrices of the top 10 weighted features in the four XGB models per antibiotic revealed shared and distinct features (Fig. 5). Only ARG-related features and some correlated mobile genetic elements were consistent across models, with varying weights and ranks, while unique features did not improve overall performance (Fig. 4).

FIG. 5.

Comparison matrices of the top 10 most important features in 4 models for 15 antibiotics. Each dot represents a feature. Within each unit (individual triangle or square in a subgraph) separated by a gap, the further to the upper right the dot is, the higher the rank of the feature. Each unit represents comparison of features from two models. The dots in different colors represent the occurrences of the features in four models, where pink, red, purple, and black refer to 1, 2, 3, 4, or more times, respectively. Four triangles and six squares form a subgraph. In each subgraph, the order of models from top to bottom and from left to right are (1) antibiotic resistance phenotypes (ARPs) prediction using raw genomes as predictors, (2) ARPs prediction using resistomes as predictors, (3) minimum inhibitory concentrations (MICs) prediction using raw genomes as predictors, (4) MICs prediction using resistomes as predictors. Triangle units along the diagonal represent self-comparison of features, thus dots on the diagonal exactly show the features within each model, and square units represent comparison of two models.

This study has several limitations. The ML models, although robust, rely on the quality and representativeness of the training data, and biases or gaps may limit their generalizability. In addition, the models may overfit without proper validation, especially with complex datasets. The exclusive use of genomic data also overlooks environmental and phenotypic factors influencing antibiotic resistance. Future research should focus on improving dataset diversity and incorporating phenotypic resistance profiles and environmental factors to enhance model robustness and applicability.

Conclusion

This study demonstrated the effectiveness of integrating ML with WGS to predict antibiotic resistance in Salmonella and identify genes associated with resistance mechanisms. Using the pan-genome or pan-resistome in the XGB classifier, we predicted ARPs and MICs for 15 antibiotics, with fewer than 10 key genomic markers providing robust predictions. The models require fewer resources and shorter testing times, making them efficient for routine surveillance. However, imprecise labeling and data imbalance hindered accurate MIC predictions, suggesting the need for multiple evaluation metrics, especially IR data. This approach can also facilitate further investigation into key genomic features relevant to antibiotic resistance prediction models. These models could improve surveillance efficiency by providing rapid, accurate predictions to identify emerging resistance patterns early. They also have the potential to guide targeted interventions, such as selecting effective antibiotics and developing strategies to combat resistance spread. Their application in real-world settings could enhance public health responses to antibiotic resistance, extending their relevance beyond Salmonella.

Footnotes

Authors’ Contributions

Y.H., X.Z., and X.S. conceived the study. Y.H. wrote the Python scripts and with X.Z. prepared the article. X.Z., L.Z., and Y.C. contributed to the software development. X.D., Y.H., and A.G. edited for technical content, and X.S. refined the article. All authors approved the final version.

Disclosure Statement

No competing financial interests exist.

Funding Information

This work was supported by the National Key R&D Program of China (grant number 2019YFE0119700).

Supplementary Material

Supplementary Figure S1

Supplementary Table S1

Supplementary Table S2

References

Ahmed

, Nakagawa

, Arakawa

, et al. New aminoglycoside acetyltransferase gene, aac(3)-Id, in a class 1 integron from a multiresistant strain of Vibrio fluvialis isolated from an infant aged 6 months. J Antimicrob Chemother, 2004; 53(6):947–951; doi: 10.1093/jac/dkh221

Andrade-Oliveira

, Rossi

, Souza-Silva

, et al. Staphylococcus nepalensis, a commensal of the oral microbiota of domestic cats, is a reservoir of transferrable antimicrobial resistance. Microbiology (Reading), 2020; 166(8):727–734; doi: 10.1099/mic.0.000940

Antunes

, Machado

, Sousa

, et al. Dissemination of sulfonamide resistance genes (sul1, sul2, and sul3) in Portuguese Salmonella enterica strains and relation with integrons. Antimicrob Agents Chemother, 2005; 49(2):836–839; doi: 10.1128/AAC.49.2.836-839.2005

Bae

, Baek

, Jeong

, et al. Amino acid substitutions in gyrA and parC associated with quinolone resistance in nalidixic acid-resistant Salmonella isolates. Ir Vet J, 2013; 66(1):23; doi: 10.1186/2046-0481-66-23

Bradley

, Gordon

, Walker

, et al. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis . Nat Commun, 2015; 6:10063; doi: 10.1038/ncomms10063

Camacho

, Coulouris

, Avagyan

, et al. BLAST+: Architecture and applications. BMC Bioinformatics, 2009; 10:421; doi: 10.1186/1471-2105-10-421

Chen

, Guestrin

. XGBoost: A scalable tree boosting system. arXiv, 2016:785–794; doi: 1603.02754

Chisholm

, Dave

, Ison

. High-level azithromycin resistance occurs in Neisseria gonorrhoeae as a result of a single point mutation in the 23S rRNA genes. Antimicrob Agents Chemother, 2010; 54(9):3812–3816; doi: 10.1128/AAC.00309-10

Cloeckaert

, Sidi Boumedine

, Flaujac

, et al. Occurrence of a Salmonella enterica serovar Typhimurium DT104-like antibiotic resistance gene cluster including the floR gene in S. enterica serovar Agona. Antimicrob Agents Chemother, 2000; 44(5):1359–1361; doi: 10.1128/AAC.44.5.1359-1361.2000

10.

Cole

, Unemo

, Grigorjev

, et al. Genetic diversity of bla _TEM alleles, antimicrobial susceptibility and molecular epidemiological characteristics of penicillinase-producing Neisseria gonorrhoeae from England and Wales. J Antimicrob Chemother, 2015; 70(12):3238–3243; doi: 10.1093/jac/dkv260

11.

Decano

, Pettigrew

, Sabiiti

, et al. Pan-resistome characterization of uropathogenic Escherichia coli and Klebsiella pneumoniae strains circulating in Uganda and Kenya, isolated from 2017-2018. Antibiotics (Basel), 2021; 10(12):1547; doi: 10.3390/antibiotics10121547

12.

Eyre

, De Silva

, Cole

, et al. WGS to predict antibiotic MICs for Neisseria gonorrhoeae . J Antimicrob Chemother, 2017; 72(7):1937–1947; doi: 10.1093/jac/dkx067

13.

Folster

, Tolar

, Pecic

, et al. Characterization of bla _CMY plasmids and their possible role in source attribution of Salmonella enterica serotype Typhimurium infections. Foodborne Pathog Dis, 2014; 11(4):301–306; doi: 10.1089/fpd.2013.1670

14.

Gordon

, Price

, Cole

, et al. Prediction of Staphylococcus aureus antimicrobial resistance by whole-genome sequencing. J Clin Microbiol, 2014; 52(4):1182–1191; doi: 10.1128/JCM.03117-13

15.

, Zhou

, Chen

, et al. PRAP: Pan resistome analysis pipeline. BMC Bioinformatics, 2020; 21(1):20; doi: 10.1186/s12859-019-3335-y

16.

Her

, Wu

. A pan-genome-based ML approach for predicting antimicrobial resistance activities of the Escherichia coli strains. Bioinformatics, 2018; 34(13):i89–i95; doi: 10.1093/bioinformatics/bty276

17.

, Szymczak

. A review on longitudinal data analysis with random forest. Brief Bioinform, 2023; 24(2):bbad002; doi: 10.1093/bib/bbad002

18.

Inda-Díaz

, Lund

, Parras-Moltó

, et al. Latent antibiotic resistance genes are abundant, diverse, and mobile in human, animal, and environmental microbiomes. Microbiome, 2023; 11(1):44; doi: 10.1186/s40168-023-01479-0

19.

Jacoby

, Walsh

, Mills

, et al. qnrB, another plasmid-mediated gene for quinolone resistance. Antimicrob Agents Chemother, 2006; 50(4):1178–1182; doi: 10.1128/AAC.50.4.1178-1182.2006

20.

Jia

, Raphenya

, Alcock

, et al. CARD 2017: Expansion and model-centric curation of the comprehensive antibiotic resistance database. Nucleic Acids Res, 2017; 45(D1):D566–D573; doi: 10.1093/nar/gkw1004

21.

Katoh

, Standley

. MAFFT: Iterative refinement and additional methods. Methods Mol Biol, 2014; 1079:131–146; doi: 10.1007/978-1-62703-646-7_8

22.

, Tai

, Deng

, et al. VRprofile: Gene-cluster-detection-based profiling of virulence and antibiotic resistance traits encoded within genome sequences of pathogenic bacteria. Brief Bioinform, 2018; 19(4):566–574; doi: 10.1093/bib/bbw141

23.

Maguire

, Rehman

, Carrillo

, et al. Identification of primary antimicrobial resistance drivers in agricultural nontyphoidal Salmonella enterica serovars by using ML. mSystems, 2019; 4(4):e00211-219; doi: 10.1128/mSystems.00211-19

24.

Marques

, Prado

LCDS

, Felice

, et al. Insights into the Vibrio genus: A one health perspective from host adaptability and antibiotic resistance to in silico identification of drug targets. Antibiotics (Basel), 2022; 11(10):1399; doi: 10.3390/antibiotics11101399

25.

Mkangara

. Prevention and control of human Salmonella enterica infections: An implication in food safety. Int J Food Sci, 2023; 2023:8899596; doi: 10.1155/2023/8899596

26.

Moradigaravand

, Palm

, Farewell

, et al. Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data. PLoS Comput Biol, 2018; 14(12):e1006258; doi: 10.1371/journal.pcbi.1006258

27.

Nathania

, Nainggolan

, Yasmon

, et al. Hotspots sequences of gyrA, gyrB, parC, and parE genes encoded for fluoroquinolones resistance from local Salmonella Typhi strains in Jakarta. BMC Microbiol, 2022; 22(1):250; doi: 10.1371/journal.pcbi.1006258

28.

Nguyen

, Brettin

, Long

, et al. Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae . Sci Rep, 2018; 8(1):421; doi: 10.1038/s41598-017-18972-w

29.

Nguyen

, Long

, McDermott

, et al. Using ML to predict antimicrobial MICs and associated genomic features for nontyphoidal Salmonella . J Clin Microbiol, 2019; 57(2):e01260-18; doi: 10.1128/JCM.01260-18

30.

Page

, Cummins

, Hunt

, et al. Roary: Rapid large-scale prokaryote pan-genome analysis. Bioinformatics, 2015; 31(22):3691–3693; doi: 10.1093/bioinformatics/btv421

31.

Poole

, Callaway

, Bischoff

, et al. Macrolide inactivation gene cluster mphA-mrx-mphR adjacent to a class 1 integron in Aeromonas hydrophila isolated from a diarrhoeic pig in Oklahoma. J Antimicrob Chemother, 2006; 57(1):31–38; doi: 10.1093/jac/dki421

32.

Quang

, Xie

. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res, 2016; 44(11):e107; doi: 10.1093/nar/gkw226

33.

Stern

, Van der Verren

, Kanchugal

, et al. Structural mechanism of AadA, a dual-specificity aminoglycoside adenylyltransferase from Salmonella enterica . J Biol Chem, 2018; 293(29):11481–11490; doi: 10.1074/jbc.RA118.003989

34.

Stoesser

, Batty

, Eyre

, et al. Predicting antimicrobial susceptibilities for Escherichia coli and Klebsiella pneumoniae isolates using whole genomic sequence data. J Antimicrob Chemother, 2013; 68(10):2234–2244; doi: 10.1093/jac/dkt180

35.

Tettelin

, Masignani

, Cieslewicz

, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial ‘pan-genome’. Proc Natl Acad Sci U S A, 2005; 102(39):13950–13955; doi: 10.1073/pnas.0506758102

36.

Wattam

, Davis

, Assaf

, et al. Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center. Nucleic Acids Res, 2017; 45(D1):D535–D542; doi: 10.1093/nar/gkw1017

37.

Wright

. The antibiotic resistome: The nexus of chemical and genetic diversity. Nat Rev Microbiol, 2007; 5(3):175–186; doi: 10.1038/nrmicro1614

38.

Yang

, Walker

, et al.; CRyPTIC Consortium. DeepAMR for predicting co-occurrent resistance of Mycobacterium tuberculosis . Bioinformatics, 2019; 35(18):3240–3249; doi: 10.1093/bioinformatics/btz067

39.

Zagaliotis

, Michalik-Provasek

, Gill

, et al. Therapeutic bacteriophages for gram-negative bacterial infections in animals and humans. Pathog Immun, 2022; 7(2):1–45; doi: 10.20411/pai.v7i2.516

40.

Zhang

, LeJeune

. Transduction of bla _(CMY-2), tet(A), and tet(B) from Salmonella enterica subspecies enterica serovar Heidelberg to S. Typhimurium. Vet Microbiol, 2008; 129(3–4):418–425; doi: 10.1016/j.vetmic.2007.11.032

41.

Zhang

, van der Veen

. Neisseria gonorrhoeae 23S rRNA A2059G mutation is the only determinant necessary for high-level azithromycin resistance and improves in vivo biological fitness. J Antimicrob Chemother, 2019; 74(2):407–415; doi: 10.1093/jac/dky438

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.39 MB

0.04 MB

0.02 MB

0.00 MB