Abstract
Data mining and artificial intelligence algorithms can estimate the probability of future occurrences with defined precision. Yet, the prediction of infectious disease outbreaks remains a complex and difficult task. This is demonstrated by the limited accuracy and sensitivity of current models in predicting the emergence of previously unknown pathogens such as Zika, Chikungunya, and SARS-CoV-2, and the resurgence of Mpox, along with their impacts on global health, trade, and security. Comprehensive analysis of infectious disease risk profiles, vulnerabilities, and mitigation capacities, along with their spatiotemporal dynamics at the international level, is essential for preventing their transnational propagation. However, annual indexes about the impact of infectious diseases provide a low level of granularity to allow stakeholders to craft better mitigation strategies. A quantitative risk assessment by analytical platforms requires billions of near real-time data points from heterogeneous sources, integrating and analyzing univariable or multivariable data with different levels of complexity and latency that, in most cases, overwhelm human cognitive capabilities. Autonomous biosurveillance can open the possibility for near real-time, risk- and evidence-based policymaking and operational decision support.
Introduction
In 1956, Philip K. Dick originally published the short story “The Minority Report” in Fantastic Universe Magazine. The setting is 2054 New York City, where PreCrime, a specialized police department based on foreknowledge, anticipates crime thanks to three psychics called “Precogs.” Each of the three Precogs generates its report or prediction, and a computer analyzes its reports. If different, the computer identifies the two reports with the most significant overlap and produces a “majority report,” taking this as an accurate prediction of the future. However, the existence of a majority report implies the presence of a “minority report,” which can provide crucial alternative perspectives or contradicting evidence. Such minority reports may highlight overlooked factors, biases in the majority’s analysis, emerging trends not yet recognized by the consensus, or potentially avert flawed outcomes based on a possibly erroneous majority report. In psychology, precognition is the ability to obtain unknowable information about a future event through inference alone before the event occurs (Franklin et al. 2014). The notion of precognition poses a profound challenge to the idea of free will and unsettles the conventional understanding of cause and effect. Throughout human history, claims of precognitive abilities have been pervasive, yet they have encountered profound skepticism. Despite this, the fascination with foreseeing the future endures undiminished.
In data science, forecasting and precognition both involve predicting future events. Forecasting is a methodological process that estimates future outcomes by analyzing historical data and employing qualitative computational techniques, time series analysis, and causal models for predictive purposes (Jenner et al. 2020). There are 1800 articles in more than 400 different journals that published mathematical models forecasting the dynamics of epidemics and the influence of their transmission (Yang et al. 2020). These models can sometimes perform unevenly due to imbalances or a lack of diversity in the training datasets, resulting in misclassification or underrepresentation. Precognition is an emerging domain within artificial intelligence (AI) that refers to the ability to integrate and analyze large univariable or multivariable datasets to identify patterns that incorporate both historical and predictive information beyond the scope of their initial training environments (Meadows and Frangou 2020; Buttazzo 2023; Su et al. 2023). Precognition encompasses various models such as convolutional neural networks (Basu et al. 2022), deep learning models (Lee et al. 2021), genetic algorithms (Lu et al. 2023), deep precognitive diagnosis (Chharia et al. 2022), dynamic attention-based state encoders (Chen et al. 2022), and liquid neural networks (Chahine et al. 2023). These methods have identified in advance emerging future crime incidents, weather patterns, cyberattacks, market trends, customer behavior, novel diseases, and supply chain demand (Seyedan and Mafakheri 2020; Liu et al. 2022; Rotaru et al. 2022).
Predicting infectious disease emergence, reemergence, and the outcomes of different mitigation strategies remain central to biosurveillance and early warning efforts to mitigate the impact of infectious diseases. However, the usefulness of prediction in risk-based decision-making in the operational environment remains limited. Although biosurveillance and early warning systems can produce extensive amounts of raw data on infectious disease events, primarily using open-source information, only a few methods can effectively disambiguate and contextualize these data streams into quantitative indicators that provide actionable intelligence within the operational environment (MacIntyre et al. 2023). As a result, the adoption of these systems has been limited.
Known and unknown infectious diseases affecting humans, animals, and plants continue to emerge, re-emerge, and persist in different locations worldwide (Semenza et al. 2016; Gupta 2020; Hulme et al. 2023). These transboundary biothreats can disrupt human and animal health, food production, international agricultural supply chain, travel, and trade, causing price spikes, market distortions, and political and economic coercion while affecting military readiness and deployment (Friedberg 2018; Kemp et al. 2020; Lentzos 2020; Hulme et al. 2023). The risk of these pathogenic agents to global security is exacerbated by the genomic diversity and lethality of manmade biothreats that represent unknown unknowns, such as threats we are not even aware we are unaware of (Valdivia-Granda 2021). Gain-of-function research (GOFR), computer-aided bioweapon design, and portable DNA synthesis can lead to novel pathogens and toxins with altered transmissibility, virulence, pathogenicity, lethality, and environmental stability (Valdivia-Granda 2010; Avin et al. 2018; Bloom and Cadarette 2019; Fiott and Parkes 2019). Not surprisingly, several nations deny or falsely disclose the development of offensive biological agents capable of causing severe or catastrophic impacts (Kemp et al. 2020; Lentzos 2020).
Mitigating the threat posed by known and unknown biothreats requires answering three main fundamental questions: (i) Is it possible to predict the emergence, reemergence, and outbreak of infectious diseases in a specific location and period? (ii) If an infectious disease is present, can it predict how fast it will spread; how many people, animals, or plants will be affected; and how long it will persist? (iii) What scenarios will arise if a mitigation strategy is implemented? Answering these questions while determining if pathogens have been naturally, accidentally, or intentionally introduced in a country is of interest not only to scholars and practitioners within the public health, biodefense, and computer science community but also to personnel in the operational environment, including customs and border protection, law enforcement, and intelligence agencies. This article summarizes the challenges and opportunities for developing a data-driven approach that dynamically ranks the risk of known and unknown pathogens while identifying drivers affecting these events in the operational context of travel and trade.
Predicting Pathogen Emergence, Reemergence, and Mitigation
Predicting the emergence of novel infectious diseases is a multifaceted endeavor that hinges on various factors influencing host-pathogen interactions. Climatic changes, human activities such as travel and trade, land use patterns, food insecurity, and geopolitical conflicts are pivotal factors that significantly influence infectious disease incidence within specific regions. Accurately assessing the impact of these factors on the emergence of pathogens, including viruses, bacteria, fungi, or parasites, presents challenges due to nonlinearity, nonstationarity, noise, and limited data availability, making each pathogen emergence event unique and difficult to predict (Brett et al. 2017). The unpredictability is exemplified by the natural adaptation and global spread of viruses such as Zika, Chikungunya, and SARS-CoV-2, which highlight the challenges of predicting pandemics. Moreover, the uncertainty in forecasting is exacerbated by the possibility of accidental or intentional introduction of pathogens into new environments. Preparing for such potential outbreaks demands a comprehensive approach that accounts for a multitude of variables and acknowledges the inherent uncertainties involved, necessitating robust and adaptable predictive models (Dallas et al. 2019).
Predicting the reemergence of infectious diseases in locations where they previously disappeared or declined has been an active area of research (Rosenkrantz et al. 2022). The lack of high-quality data on essential disease spread features constrained these implementations’ success. However, in the last two decades, advances in the mathematical and computational frameworks capturing heterogeneous data sources related to pathogens and host biology have advanced this field significantly (Glennon et al. 2021). Features such as weather, human and animal behavior in both rural and urban settings, global mobility patterns, animal movements, plant production systems, and mobile phone records greatly improved the accuracy of these predictions. Some models have successfully forecasted the epidemiological trajectories of reemerging infectious diseases, including exponential increases followed by declines or sustained transmission, by leveraging the spatial and temporal nonrandom distribution of environmental drivers (Davis et al. 2017; Dallas et al. 2019; Leandro et al. 2022). Despite the significant advancements, the essential spatial modularity required for accurate prediction of re-emerging infectious disease transmission is undermined by the increasing volume of global travel and trade, which effectively reduces geographic boundaries. This heightened connectivity extends the range of potential risk areas and lowers genetic barriers of pathogens, consequently affecting predictive modeling frameworks (Dallas et al. 2019). The reemergence of measles, mumps, and Mpox highlights the difficulty in these estimations and their impact on preparedness and response (Gokhale et al. 2023).
Forecasting can be instrumental in evaluating the effectiveness of strategies for mitigating the emergence, reemergence, and spread of infectious diseases (Bhatia et al. 2021; Rosenkrantz et al. 2022; Savinkina et al. 2023). Situational awareness, horizon scanning, predictive analytics, and forecasting models can transform transmissible disease information into quantitative projections that analyze the contribution of variables such as population, demography, contact patterns, disease severity, pathogen mutation, and health care capacity (Bershteyn et al. 2022; Runge et al. 2023). Multiple scenarios modeled simultaneously can contrast alternative interventions, weigh the benefits, determine the value of information, and estimate the cost and unintended consequences of different mitigation strategies (Tam et al. 2020; Walker et al. 2020). Modeling approaches can lead to understanding unknown emergent behaviors from the creation of in silico experiments with synthetic populations (Venkatramanan et al. 2018; Zhu et al. 2024). However, disease transmission and pathogenicity are unknown when a new pathogen emerges. This structural uncertainty complicates the calibration of forecasting models and leads to significant discrepancies between expected and actual disease outbreak scenarios (Pei et al. 2023). Forecasting the epidemic trajectory of reemerging pathogens can be affected by parametric uncertainty arising from incomplete information, biased data, high sampling variance, and divergent or imprecise expert opinions (Bershteyn et al. 2022). In addition, stochastic uncertainty originating from inherent randomness during different simulation runs rather than inaccuracies in model architecture or data scarcity can affect the prediction outcomes (Bershteyn et al. 2022; Swallow et al. 2022).
Early detection, surveillance, and characterization of known and unknown transboundary infectious agents are crucial for timely notification to the international community. This information is critical for activating various mitigation strategies. However, differences in national surveillance systems can significantly affect the accuracy and timeliness of outbreak reporting, which may result in underreporting or misreporting. Such inconsistencies compromise national response efforts and have profound implications for global health, as delays and inaccuracies can impede international measures to control disease spread. The use of various indices is proposed to mitigate the impact of varied national reporting capabilities on risk assessments. These indexes generate quantitative scores that evaluate the risk infectious pathogens pose to individual nations, considering factors such as demographics, healthcare infrastructure, public health capabilities, disease dynamics, and socio-economic and political conditions. This quantitative assessment helps coordinate a more effective global response to infectious disease threats (De Groeve et al. 2014; GBD Healthcare Access and Quality Collaborators 2018; Ravi et al. 2020). However, quantitative indexes are derived from semi-structured interviews or surveys that might be too subjective by the nature of each country’s self-evaluation or by the conclusion bias of a small number of researchers involved in the scoring process (Al-Janabi et al. 2013; Lindbom et al. 2013). While global health indexes focus on estimation methods’ technical soundness, country users are more concerned about the extent of their involvement in the estimation process (Abouzahr et al. 2017; Boerma et al. 2018). These indices assign numerical values to various indicator categories, applying equal weight to each category (Chang and McAleer 2020). The aggregation resolution of these scores is at best at the country level, set in annual timeframes, and obviates that there are regional variations in mitigation capabilities within countries. For example, analytical tools used for near real-time infectious disease awareness vary across and within countries. As a result, available data may not be comparable, and estimates driven by covariates make scoring and interpretation difficult (Liao et al. 2017).
Addressing Pathogen Introduction and Technological Convergence
The International Health Regulations (IHR), established by the World Health Organization, constitute a legally binding agreement among 196 nations to enhance their capacity to identify and report potential global public health emergencies while minimizing disruptions to international travel and trade. For the IHR to be effective, all signatory countries must proficiently detect, assess, and report any public health risk within 24 h of receiving evidence and respond appropriately to the threat. However, the effectiveness of the IHR is compromised because only about one-third of the world’s nations are adequately equipped to evaluate, detect, and manage public health crises. This gap is exacerbated by the absence of a standardized framework to quantitatively measure and share the risk of known and unknown pathogens in a specific territory beyond morbidity and mortality. The World Organization for Animal Health has established the Terrestrial and Aquatic Animal Health Codes (AHC), which set global standards for animal and veterinary public health reporting. Within this framework, countries are required to report disease events in their territories. However, the time taken to report these events, from observation to confirmation, varies significantly among nations, ranging from days to months, depending on whether the affected species are domestic or wild and the type of disease involved (W. Valdivia-Granda, personal communication). Addressing the challenges and gaps of the IHR and AHC necessitates adapting to technological advancements that impact regulatory frameworks within each nation. Additionally, it requires tackling growing concerns related to data sharing, including issues of privacy, national security, and the economic impact of travel and trade restrictions. This is particularly crucial as both known and unknown biothreats are reported within a territory (Radosavljevic et al. 2015; Kockerling et al. 2017).
The increased volume of transcontinental passenger travel and cargo movement significantly heightens the risk of rapidly introducing known and unknown transboundary pathogens (Semenza et al. 2016; Rush et al. 2021). Considering the implications of infectious diseases on global health, trade, and security, significant attention has been directed toward prevention (e.g., diagnostic and detection) and mitigation strategies (prophylactics and therapeutics). There is a regulatory and mathematical framework for isolation and quarantine for humans, animals, and plants (Feng 2007; Moore 2007; Kim 2016; Aronna et al. 2021; Schumacher et al. 2024). However, the most effective strategy for curbing the spread of infectious diseases encompasses the implementation of an international exclusion protocol at the strategic, tactical, and operational levels based on dynamic quantitative risk assessment of the travel and trade environments. An exclusion protocol requires the collection, integration, analysis, dissemination, and visualization of extensive datasets, where interoperability, standardization, and analytical interpretation are crucial. This entails compiling data from both formal and informal sources, such as epidemiological alerts, environmental monitoring, public health surveillance systems, and nontraditional data not originally intended for risk assessment. This includes commodity rejections at international borders, daily passenger and cargo arrivals, economic growth rates, consumer behavior patterns, and food and agricultural commodity trade composition and dynamics. An international exclusion protocol requires resolving issues related to data integration, including the capability to track and maintain the provenance of the data sources. At the same time, there is a need for new policies to incentivize timely reporting. The effectiveness of an exclusion protocol hinges on the capacity of nations to swiftly adjust their regulatory framework to new threats, leading to adaptive procedures.
The world is experiencing unparalleled technological development in life and computational sciences, including synthetic biology and generative and general AI (Buttazzo 2023). However, several academic groups and organizations consistently raise concerns about the intensifying competition among nations to lead dynamic biotechnology and AI markets. It has been estimated that by 2027, individuals with and without formal scientific training will be prototyping biological designs and products derived from synthetic biology (Mehlman et al. 2023; National Academies of Sciences Engineering and Medicine [U.S.]. Committee on Future Biotechnology Products and Opportunities to Enhance Capabilities of the Biotechnology Regulatory System et al. 2017). Both institutional and DIY organisms produced by the convergence of AI and on-demand benchtop DNA synthesizers can cause accidental infections with unintended and unknown catastrophic or existential consequences. At the same time, as new efforts to sample microbial diversity in the wild increase, these benign efforts may inevitably lead to laboratory accidents and infections of novel pandemic-class pathogens. Of national security concerns are those samples that covertly move around the world by travel and transitional movement by shipping, the cryptographic transmission of their genetic sequences, or made available in public databases for nefarious purposes. This behooves the biodefense community to ensure the proper development of safety guidelines promoting at the national and international levels in academic and corporate environments for both GOFR/loss-of-function research. At this state, we should conservatively assume that continued research will eventually confer access to pandemic-class systems now markedly favoring offense (K. Esvelt, personal communication). While several ideas have been proposed to reduce the synthesis of pandemic-class pathogens on benchtop DNA synthesizers, synthesis-stop encoded controls on this hardware could be the most immediate block to their illegitimate use. Simultaneously, improving biosurveillance programs by contextualizing the risks posed by convergent technologies and their impact on trade and travel environments would allow timely regulatory adjustments and deployment of more strategically effective risk-based exclusion measures at border crossings.
There is an urgent need to develop a new analytical system for biothreat precognition to address the challenges of the emergence, reemergence, and spread of known and unknown pathogens. An autonomous, data-driven biosurveillance system should be engineered to aggregate, integrate, and quantitatively evaluate biothreats impacting global health, trade, and security, operating within predefined human constraints and supporting human decision-making but overcoming the limitation of using very large training datasets (Valdivia-Granda and Richt 2020; Valdivia-Granda 2021). This analytical and predictive framework could deliver pathogen risk assessments and support diverse operational stakeholders across national and international boundaries. Transitioning from a traditional analyst-dependent biosurveillance system to an autonomous architecture that supports human operational decisions involves processing extensive data streams with sophisticated computational methods. A critical aspect of this transition is the implementation of source moderation, multilingual translation, and contextual analysis to filter out noise, disinformation, and misinformation. This approach necessitates deep learning techniques for single and multi-document processing, natural language processing algorithms, and large language models to generate extractive and structured summaries from nonstructured data streams. Creating labeled data for training AI algorithms requires augmenting the original text with additional corpora and metadata, categorizing it into different training sets, and conducting named entity recognition, all according to guidelines set by field experts. The information collected from open sources, including news outlets, must discriminate against misinformation and disinformation by mapping and scoring reliability and quality (Qi et al. 2020; Alsmadi et al. 2022; Jeng et al. 2022; Hulme et al. 2023).
Autonomous, data-driven biosurveillance can support sentiment analysis, text classification, semantic understanding, reasoning, question answering, and factuality (Chang, et al. 2024). However, it must address the limitations of human cognitive capabilities and the vulnerability bias of retrospective biosurveillance to the sources’ sensitivity, accuracy, and timeliness, which can lag for days or weeks (Valdivia-Granda 2021). A lightweight, human-readable, hierarchical format can reuse and enhance information contextualized at the strategic, tactical, and operational levels across intelligence, public health, and policymaking. This structured format can not only facilitate data analytics but also provide a data exchange format, particularly in structured forms with national and international partners, while addressing concerns for data security, privacy, sovereignty, and economic impact. This system will significantly improve decision-making in specific operational environments by providing tailored information and quantitative risk assessments in near real time. Autonomous biosurveillance should provide high-resolution quantitative and standardized assessments of a country’s risk while considering the differences between intra and interregional prevention and mitigation capability. This would improve the awareness of risks among analysts, field operators, and decision-makers beyond traditional biosurveillance reports and allow different stakeholders to craft better mitigation strategies, including exclusion protocols. At the same time, this would provide analysts, field operators, and decision-makers with better awareness of risks beyond traditional static biosurveillance or early warning reports (Butler 2013; Bahk et al. 2015; Chowell et al. 2016; Cleaton et al. 2016; Pollett et al. 2017; Tran and Sakuma 2019).
Concluding Remarks
Predicting the emergence and spread of both known and unknown infectious diseases is one of the most crucial and challenging tasks for the next century, given their significant impacts on global health, trade, and security. Trade and travel pathways are particularly vulnerable to the natural, accidental, or intentional introduction of biological threats. Early detection of these potential dangers is essential for implementing effective exclusion protocols and timely mitigation strategies. However, it is critical to gather both relevant and seemingly irrelevant (minority) information to fully understand the temporal and spatial dynamics of disease systems caused by these biothreats in various environments. Contextualizing the risks posed by biological threats to global health, trade, and security requires an autonomous biosurveillance system that integrates many sources of information taken in near real time and quantitative capability assessments beyond annual health indexes. This information could provide new insights into evidence-based decision-making and optimal implementation of exclusion protocols in the operational environment. A dynamic quantitative risk assessment of known and unknown pathogens in travel and trade environments can open the possibility for near real-time and data-driven regulatory policymaking evaluation and adjustment. Understanding the risk, vulnerability, mitigation capacity, and temporal fluctuations across diseases and nations is critical. This process should attempt to overcome the cognitive biases that inevitably cloud human judgment and focus on quantitative risk assessments in five national security time frames and levels: the immediate and the emerging and the strategic (global), tactical (country), and operational (ports-of-entry/exit). Such an approach will require new legislation that improves information exchange efficiency between the regulatory authorities, academic community, private industry, and other nations.
Footnotes
Acknowledgment
The author is grateful to the reviewers for their valuable critique of this article.
Author Disclosure Statement
The views expressed here are solely those of the author in his private capacity and do not in any way represent the views, positions, or policies of the Agriculture Programs and Trade Liaison, Office of Field Operations, U.S. Customs and Border Protection, Department of Homeland Security, and other constituent agencies and departments of the U.S. government. The author declares that this submission is an original work and there is no conflict of interest regarding the publication of this article.
Funding Information
The author received no funding support for the preparation of this article from any agencies in the public or private sector.
