ViraFit: Tunable Fitness Model for Viral Evolution Within a Contact Network

Abstract

The spread of a virus can be modeled as the diffusion of virion populations along the edges of a host contact network. Most prior research assumes that the virus maintains the same genome throughout the diffusion process, and, consequently, the genome itself is of no modeling interest. Eletreby et al. do consider a model of multiple variants in which a virus can mutate within a host into another variant before transmission across an edge. Still, they fail to model the genome or any established fitness models. We incorporate three biological notions, fitness landscapes, viral quasispecies, and genome structure, to more accurately model the diffusion of a virus with mutations. We investigate established fitness landscape models and various simulated contact networks. We simulate the diffusion process across the contact network, incorporating co-occurring evolutionary and epidemiological processes. In our proof-of-concept simulations, while determining which variant should infect an exposed individual, we consider the fitness values of the variants. ViraFit tunes the ruggedness of the fitness landscape by varying several attributes: genome length, network size, mutation rate, infection probability, and infection time. Our simulation results demonstrate the importance of including the fitness value of each variant and how the fitter ones persist over time.

Keywords

evolution mutation virus viral fitness contact network ruggedness quasispecies

1. INTRODUCTION

The spread of a viral infection in a host population is commonly modeled as a diffusion process on a contact network, where nodes represent hosts and edges represent opportunities for transmission (Leventhal et al., 2015; Eletreby et al., 2020). In most such models, the virus is treated as a fixed entity: once introduced, the pathogen propagates across the network without meaningful genetic change, or mutation is assumed to be negligible relative to the timescale of transmission. This assumption is often violated in practice. Many RNA viruses, including SARS-CoV-2, undergo rapid mutation within hosts, generating genetically diverse populations during the course of an outbreak (Domingo, 1998, 2006). Consequently, viral diffusion and viral evolution are not independent processes but unfold concurrently.

Recent work has begun to relax the single-strain assumption. Notably, Eletreby et al. (2020) introduced a multi-strain epidemic model in which a virus may mutate within a host before being transmitted to others. Their framework allows competition among strains during transmission and captures the co-occurrence of epidemiological and evolutionary dynamics. However, this and related models (Alexander and Day, 2010; Leventhal et al., 2015) represent strains abstractly and do not explicitly model genome structure, mutational neighborhoods, or established notions of evolutionary fitness. As a result, they cannot capture how specific mutations, genome length, or the organization of sequence space shape the emergence and persistence of viral variants.

In contrast, biological evolution is fundamentally governed by the relationship between genotype and fitness. A fitness landscape associates each genomic variant with a measure of reproductive success, thereby defining which mutations are beneficial, neutral, or deleterious (Fragata et al., 2019). When mutations occur one site at a time, variants form the nodes of a high-dimensional sequence space (a hypercube), and evolution proceeds as an adaptive walk along edges connecting neighboring genotypes (Kauffman and Weinberger, 1989). Classical landscape models, such as House-of-Cards (HoC) (Kauffman and Levin, 1987), NK (Kauffman and Weinberger, 1989), and Rough Mount Fuji (Neidhart et al., 2014), describe different degrees of ruggedness and epistasis, shaping the number of local optima and the accessibility of fitter variants. These models provide a principled framework for studying evolutionary dynamics; yet, they have rarely been integrated into network-based epidemic models.

Moreover, viral evolution is best understood not at the level of a single genotype, but as a population of closely related variants. RNA viruses typically exist as quasispecies: dynamic mutant swarms generated by high replication error rates, in which individual genomes differ by one or a few nucleotides from one another (Domingo, 2006; Domingo and Perales, 2019). Despite continual mutation, these populations often display remarkable phenotypic stability, reflecting strong selection operating over structured sequence space rather than random drift (Leitmeyer and Rico-Hesse, 1997). Representing viral populations as isolated strains, therefore, obscures the evolutionary mechanisms that govern adaptation, persistence, and competition.

Finally, viral genomes possess diverse lengths and architectures that directly affect mutational neighborhoods and evolutionary potential. RNA viruses range from compact genomes of a few kilobases to complex genomes exceeding $10^{6}$ bases (Claverie et al., 2006; Marintcheva, 2018). For example, SARS-CoV-2 contains a $\sim 30$ kb single-stranded RNA genome with multiple open reading frames (Naqvi et al., 2020; Wu et al., 2020), while the Zika virus has a $\sim 10.8$ kb genome organized into a single open reading frame (Ye et al., 2016). Genome length and structure determine the size and topology of sequence space and, in turn, influence both mutational accessibility and the ruggedness of the fitness landscape.

Taken together, these observations reveal a fundamental gap in existing network epidemic models: while they capture the spread of infection, they largely ignore genome-level evolution, the structure of sequence space, and fitness-driven selection. Although prior work allows mutation among abstract strains (Eletreby et al., 2020), it does not incorporate explicit genotype representations, fitness landscapes, or quasispecies dynamics. As a result, such models cannot explain how specific mutations arise, how fitter variants outcompete others, or how evolutionary trajectories depend jointly on genome properties and network structure.

In this work, we introduce ViraFit, a network-based modeling framework that unifies epidemiological diffusion with genome-level evolutionary dynamics. ViraFit integrates three core biological concepts: (1) explicit genome representations embedded in a structured sequence space, (2) fitness landscapes that govern mutation and selection, and (3) quasispecies dynamics that model viral populations as evolving mutant distributions. Viral spread on a contact network is simulated as a coupled evolutionary–epidemiological process, in which competing variants mutate, transmit, and are selected based on fitness.

The primary novelty of ViraFit lies in modeling viral evolution at the level of explicit genomes rather than as a fixed set of abstract strains. Existing network-based epidemic models either assume a single, immutable pathogen or represent multiple variants as discrete types with predefined mutation or transmissibility matrices (Alexander and Day, 2010; Eletreby et al., 2020). Such approaches do not capture the structure of sequence space, the mutational neighborhood of a genome, or the emergence of fitness from genotype. Conversely, classical fitness landscape models describe adaptive walks in genotype space but ignore population-level transmission on contact networks. ViraFit unifies these two perspectives by embedding viral genomes as nodes in a hypercube sequence space, associating each genotype with a fitness value, and coupling within-host mutation to between-host transmission on a network. This enables the study of how genome length, mutational accessibility, landscape ruggedness, and contact topology jointly shape strain emergence, competition, and persistence—phenomena that cannot be represented by constant-strain or matrix-based multi-strain models.

Using established fitness landscape models and a range of synthetic contact networks, we demonstrate that both evolutionary outcomes and epidemic dynamics depend critically on genome length, mutation rate, infection probability, infection duration, and network structure. Our results show that explicitly incorporating fitness alters diffusion patterns and that fitter variants consistently emerge and persist over time. By bridging genotype-level evolution with network-based transmission, ViraFit provides a biologically grounded framework for studying viral adaptation during outbreaks. ViraFit is publicly available at https://github.com/Badhan023/Viral_Fitness_Landscape.

2. BACKGROUND

Classical fitness landscape models provide standard mechanisms for controlling landscape ruggedness and epistasis, which, in turn, shape evolutionary accessibility and adaptive walk behavior. We summarize three widely used models: HoC, NK, and Rough Mount Fuji, to (1) situate ViraFit within established landscape theory and (2) motivate the design of our explicit, tunable genome–fitness function. In our implementation, these models can additionally be used as optional baseline landscape generators for comparison against the proposed fitness formulation (Section 3.1).

Several models have been introduced to visualize the concept of the fitness landscape. Wright (1932) presented the fitness landscape as a high-dimensional map, typically in three dimensions, in which genotypes are plotted in the $x - y$ plane and fitness on the z-axis. Evolution can be viewed as ‘walks’ and adaptation as ‘climbs’ to higher positions on this fitness landscape. The HoC model (Kauffman and Levin, 1987) represents a genotype–fitness landscape in which each genotype is assigned a fitness value drawn independently at random. The model is not restricted to any particular alphabet size and can be formulated for arbitrary sets of amino acids or nucleotides. Its defining characteristic is the absence of correlation between neighboring genotypes, resulting in maximal fitness variance and a highly rugged landscape. Consequently, adaptive walks under the HoC model are strongly constrained by local optima.

In the NK model (Kauffman and Weinberger, 1989), fitness landscapes are defined by three parameters: N, A, and K. For a protein or genome sequence, N denotes the number of loci (sequence length), A denotes the number of possible states per locus (alphabet size), and K specifies the number of other loci whose states epistatically influence the contribution of a given locus to overall fitness. Thus, K controls the degree of epistasis in the system and is conceptually equivalent to the ruggedness of the fitness landscape: when $K = 0$ , the landscape is purely additive and smooth, whereas larger values of K introduce increasing interaction among loci, producing a more rugged landscape with multiple local optima. In our implementation, we restrict the alphabet to two possible states (i.e., $A = 2$ ), yielding a binary genotype space for computational tractability, while preserving the tunable ruggedness properties of the NK framework.

The Rough Mount Fuji model (Neidhart et al., 2014) introduces a genotype-fitness model that combines a random HoC landscape with an additive landscape. In the simplest version of the model, the additive selective advantage, s, is the same for all loci. By varying s relative to the standard deviation of the HoC fitness values, the ruggedness of the landscape can be tuned (de Visser and Krug, 2014).

3. METHODS

ViraFit has two main parts: genome-fitness and network models, which portray the viral evolution fitness landscape. The genome-fitness model represents the quasispecies environment comprising all strains of a particular viral genome, whereas the network model represents the contact network among hosts. The overall model is obtained by combining these two models: the simulation of mutation and infection operates on the network model, while the genotype-fitness model influences the network.

3.1. Genome-Fitness model

Let us consider that we have a viral genome S of length N. There exist four nucleotides, A (Adenine), T (Thymine), C (Cytosine), G (Guanine), and T (Thymine) [U (Uracil) in RNA]. As a result, the initial genome will have one of $4^{N}$ possible strains (theoretically). These $4^{N}$ strains can be represented as the vertices of a hypercube graph where the strains that differ at exactly one residue are neighbors (Kauffman and Weinberger, 1989). Let $H = (V_{H}, E_{H})$ be the N-dimensional hypercube of these $4^{N}$ strains, where $V_{H}$ is the set of $4^{N}$ vertices, and $E_{H}$ is the set of edges connecting these vertices. Two vertices will have an edge between them if they are 1-mutation neighbors.

Since it is difficult to picture such high-dimensional spaces, let us consider a genome of length N constrained to use two nucleotides, A and G, to compose the structure of the hypercube. Let 1 and 0 represent A and G, respectively; each strain is a binary string of length N. Then, such binary strings can be easily defined as vertices of an N-dimensional Boolean hypercube (Kauffman and Levin, 1987), where the number of vertices will be $2^{N}$ . Since the neighbors of a vertex in H differ at exactly one residue, for strains of length N, there can be exactly N neighbors for each vertex. For example, in Figure 1, the neighbors of vertex “000” are “001,” “010,” and “100,” altering a single nucleotide at a single one of the three positions in the string. To generalize the number of neighbors, let the number of nucleotides be B. Then the number of 1-mutant neighbors, D, will be $(B - 1) N$ . Since viral genomes consist of four nucleotides, the count of neighbors of one node will be 3 N.

FIG. 1.

A three-dimensional Boolean hypercube, in which each vertex represents one of the possible binary sequences. Each strain has three neighbors and a fitness value in parentheses. The fittest strain is 111, marked by a dotted red circle; the least fit strain is 000, from where our simulation starts.

Each vertex of the hypercube H has a fitness value, F, associated with it. The fitness function is designed as Eqn. 1 to keep it simple. Each hypercube node has N neighbors, since each nucleotide has two options: 0 or 1.

F = \frac{\sum_{i = 1}^{N} f_{i}}{N}; where f_{i} = {\begin{array}{l} f_{i} = 1; if S_{i} = 1 \\ f_{i} = 0; if S_{i} = 0 \end{array}

(1)

For computational tractability, we represent each genomic position using a reduced alphabet that preserves the structure of the mutational neighborhood while enabling large-scale simulation. This simplification does not affect the qualitative behavior of the adaptive process, which is governed by the fitness landscape topology.

3.2. Multiple-strain network model

Let $G = (V_{N}, E_{N})$ be the underlying contact network where $V_{N}$ is the set of vertices representing hosts, and $E_{N}$ is the set of edges representing the contact among those hosts. We have implemented the network model using the Waxman graph (Waxman, 1988; Naldi, 2005). The M vertices of the Waxman graph are uniformly distributed over a rectangular Cartesian coordinate plane, each with a $(x, y)$ coordinate. The probability that an edge exists between nodes u and v is determined from their Euclidean distance $d (u, v)$ by the expression in Eqn. 2, where L is the maximum distance between two vertices, and $α$ and $β$ are two parameters in the interval $(0, 1]$ . Larger values of $β$ result in denser graphs, but smaller values of $α$ enhance the density of short links relative to the longer ones. For our network model, we set $β = 0.4$ and $α = 0.2$ , which lie in the standard regime of the Waxman model (Waxman, 1988), to obtain a spatially embedded graph in which connection probability decays with Euclidean distance, producing predominantly local contacts with occasional longer-range links. This structure is commonly used to approximate human contact patterns in network-based epidemic modeling.

P ({u, v}) = β e^{- \frac{d (u, v)}{L α}}

(2)

In the Waxman model, each node has only one attribute: the position in the coordinate plane. We have modified the Waxman model by adding two other features to each node to prepare the graph for the epidemiology network: infection status and infection duration. The status of a node can be uninfected(inactive) if a node has not received any infection or infected (active) of strain type i, for $i = 1... N$ . The strain type is an element of the vertex set $V_{N}$ (vertex set of the hypercube H). The infection duration is an integer value ranging from 0 to T, denoting the infection period of a node. If a node is uninfected, the timer is 0; whenever it becomes infected, it starts at 1 and increases by 1 per iteration. When it reaches the duration, $δ$ , it becomes 0 again, marking the recovery of the host from the infection. The status is also changed to inactive.

Unless otherwise stated, parameters not explicitly varied in the experiments (e.g., the simulated annealing schedule T, $T_{\min}$ , and $τ$ , as well as the Waxman network parameters $α$ and $β$ ) were held fixed to isolate the effects of genome-level and epidemiological factors. These parameters primarily influence the rate of convergence of the adaptive walk rather than the qualitative evolutionary trends observed. Accordingly, the reported experiments focus on parameters governing mutation, genome size, and infection dynamics, while other parameters are fixed to reduce confounding effects.

3.3. Simulation timeline and outputs

Each simulation proceeds in discrete iterations. In each iteration, ViraFit performs: (1) within-host mutation in genotype space (a walk on the hypercube H governed by fitness and simulated annealing), (2) infection decision for exposed susceptible hosts based on both neighborhood prevalence and strain fitness, and (3) infection/recovery updates based on the infection duration parameter $δ$ . We distinguish these genotype-space mutation walks from diffusion on the contact network G; the former governs how strains change, while the latter governs how infections propagate between hosts.

3.4. Adaptive-walk statistics

We define adaptive walk length as the number of simulation iterations required for the globally fittest strain in the predefined fitness landscape (hypercube H) to first appear in the host population. Because the fitness landscape is fixed and identical across all simulation replicates, this definition ensures direct comparability of adaptive walk length across different epidemiological and network parameter settings.

3.5. Algorithm

At the beginning of the simulation, we started with the least fit strain, $S_{0}$ , from all possible strains present in the hypercube H. We chose $10 %$ of the hosts to be infected by $S_{0}$ and assigned the remaining hosts to be uninfected. The infection timer was set to 0 for the inactive hosts and to a random timer from the range [0…T] for each infected node.

The full algorithm, Infection-Propagation, can be divided into three phases: mutation, infection decision, and infection and recovery, and loops through these phases for I times. These are discussed in the subsections 3.5.1, 3.5.2, and 3.5.3. The main algorithm, Infection-Propagation, is explained in subsection 3.6. All the symbols used in the algorithms have been listed and described in Table 1.

Table 1.
Parameters Used for the Model and Their Description

Parameter Description

G The contact graph

H The hypercube of strains

$μ$ Probability of the strain at a node mutating

$T, T_{\min}, τ$ Parameters of simulated annealing; $0 < τ < 1$

$γ$ Probability of a host being infected

$δ$ The duration of an infection

I The number of iterations of the simulation

Parameter	Description
G	The contact graph
H	The hypercube of strains
$μ$	Probability of the strain at a node mutating
$T, T_{\min}, τ$	Parameters of simulated annealing; $0 < τ < 1$
$γ$	Probability of a host being infected
$δ$	The duration of an infection
I	The number of iterations of the simulation

3.5.1. Mutation

To determine the fitness of mutant strains produced during within-host evolution, we employ a simulated annealing procedure that governs how mutations move across the fitness landscape. The simulated annealing algorithm (Bertsimas and Tsitsiklis, 1993) is a probabilistic technique for approximating the global optimum of a given function, inspired by physical annealing. It is an effective and general method for finding global optima in the presence of many local optima. The idea is that this algorithm starts from the annealing temperature, T, and loops through the iterations needed to cool down to the minimum temperature $T_{\min}$ using $τ$ for the geometric reduction rule ( $T = T * τ$ ).

In every iteration, every infected host has a probability of mutating; see Figure 2. The mutation rate $μ$ controls the probability that a strain mutates at a given iteration. In the general formulation of ViraFit, $μ$ may be strain-dependent, allowing different variants to exhibit different mutational propensities. In the experiments reported here, unless otherwise stated, we use a fixed value of $μ$ across all strains to isolate the effects of the fitness landscape and network structure. We use simulated annealing as a stochastic acceptance mechanism to select whether a candidate mutant replaces the current strain; see Figures 3 and 4. At each mutation event, a neighboring genotype in the hypercube is sampled, and the fitness difference $Δ F$ between the candidate and the current strain is computed. In our implementation, the annealing temperature T is a control parameter that governs the probability of accepting fitness-decreasing mutations. This allows occasional transitions to lower-fitness variants early in the process while progressively favoring fitter strains as T decreases. At each mutation event, a candidate neighboring genotype is evaluated by its fitness difference $Δ F$ relative to the current strain. If $Δ F > 0$ , the mutation is accepted. If $Δ F \leq 0$ , the mutation is accepted with probability $e^{Δ F / T}$ . The temperature is initialized to a user-defined value T and is decreased geometrically at each iteration according to $T \leftarrow τ T$ until a minimum temperature $T_{\min}$ is reached. Higher values of T allow greater exploration of the fitness landscape by permitting occasional downhill moves, whereas lower values of T increasingly favor only fitness-improving mutations. Usually, a strain mutates into another one with a better fitness value. But simulated annealing lets a strain mutate to a less fit one with some probability. We have used simulated annealing on a hypercube, where each strain has $N - 1$ neighbors. The Mutation algorithm takes the network model G and the hypercube H as inputs and calls the Simulated-Annealing subroutine for all strains in the infected hosts in G. Simulated-Annealing algorithm decides whether a strain should mutate to a fitter strain, remain as it is, or mutate to a less fit one by calculating the fitness difference between the next and the current strain, $Δ F$ . If $Δ F$ is positive, the next strain is fitter than the current one, and the algorithm will move to the new strain. But, if $Δ F \leq 0$ , then with the probability of $e^{\frac{Δ F}{T}}$ , it can move to the next less fit mutant. This probability lets T moderate the move to a less fit strain. The higher the temperature, the higher the likelihood of significant downhill moves; this probability is negligible at low temperatures. Algorithm 2 decides the possibility of whether a strain mutates to the new strain chosen by Algorithm 3.

FIG. 2.

Pseudocode for updating infected nodes.

FIG. 3.

Pseudocode for simulated annealing to select the next strain, if any.

FIG. 4.

Pseudocode for returning the fitness value of a strain.

3.5.2. Infection decision

See Figures 5 and 6. In Infection-Propagation, infection events are strictly contact-driven: a susceptible host can become infected only if it has at least one infected neighbor in the contact network; see Figures 7 and 8. Thus, infection cannot occur spontaneously, and transmission is restricted to network edges.

FIG. 5.

Decision making at the moment of a new infection. The host that is going to be infected is the one in the middle, and there are five neighbors, among which two are inactive, two are infected by strain 1, and the other one by strain 2.

FIG. 6.

Contact-driven infection decision. Infection probability increases with the number of infected neighbors, $p_{v} = 1 - {(1 - γ)}^{m}$ . When infection occurs, the transmitted strain is sampled probabilistically according to prevalence-weighted fitness.

FIG. 7.

Pseudocode for infection and recovery to update the timer and status of the infected nodes, and change the timer and status of the new infected nodes.

FIG. 8.

Pseudocode for infection propagation.

Algorithm 6 describes the procedure. Let v be an uninfected host, and let $N_{I} (v)$ denote the set of its infected neighbors. If $| N_{I} (v) | = 0$ , then v remains uninfected during the current iteration. Otherwise, v is considered exposed. Let $m = | N_{I} (v) |$ be the number of infected neighbors. The probability that v becomes infected during that iteration is defined as

p_{v} = 1 - {(1 - γ)}^{m},

(3)

where

γ \in (0, 1]

is the per-contact transmission parameter. This formulation models independent transmission attempts from each infected neighbor and ensures that infection probability increases monotonically with the number of infected contacts. In particular, when

m = 1

p_{v} = γ

, while for

m > 1

, exposure risk increases with local infection prevalence.

If infection occurs, the transmitted strain is selected probabilistically from among the strains present in $N_{I} (v)$ . Suppose s distinct strains are present among these neighbors. For each strain i, we compute an unnormalized transmission weight,

w_{i} = \frac{infectio n_{strai n_{i}}}{m} \times fitnes s_{strai n_{i}},

(4)

where

infectio n_{strai n_{i}}

denotes the number of neighbors infected by strain i, and

fitnes s_{strai n_{i}}

is the fitness value assigned in the landscape model. These weights are then normalized to obtain transmission probabilities

{\tilde{P}}_{i} = \frac{w_{i}}{\sum_{j} w_{j}},

(5)

and the transmitted strain is sampled from the categorical distribution defined by

{{\tilde{P}}_{i}}

. This probabilistic formulation allows both local strain prevalence and evolutionary fitness to influence transmission while preserving stochastic competition between strains. Consequently, less fit strains may occasionally establish infection if realized first, reflecting independent transmission events from neighboring hosts.

Importantly, while $γ$ governs per-contact transmissibility, the infection probability $p_{v}$ depends explicitly on the number of infected neighbors. Therefore, infection risk is not uniform across hosts but increases with neighborhood infection intensity, preventing underestimation of transmission at high prevalence.

Finally, the infection decision is formulated at the host level. The quantity $infectio n_{strai n_{i}}$ represents the number of adjacent hosts infected by strain i, thereby capturing population-level strain abundance in the contact network. The model does not explicitly represent within-host viral load or quasispecies distributions; instead, each host is assumed to carry a single dominant strain at any time. This abstraction preserves epidemiological tractability while allowing neighborhood-level strain competition to shape transmission dynamics.

3.5.3. Infection and recovery

This is the third phase in which the timer and the status of each node in G are updated. Each host node has its own timer set to 0 while uninfected. Once a node is infected, the timer starts and increments by 1 in each iteration. When it reaches the maximum duration limit, this algorithm resets it to 0 and the status to “uninfected”. A node is considered recovered when its timer reaches the duration $δ$ . For all the new nodes infected by phase 2, Algorithm 7 updates the infections accordingly and starts the timer for each host node. To keep our model simple, we considered reinfection after a host recovers from an infection but did not account for immunity. Our model does not explicitly represent host mortality or demographic turnover; hosts are assumed to recover after a fixed infection duration and remain in the contact network. This abstraction allows us to isolate the coupled effects of mutation, fitness landscapes, and network structure on strain evolution and transmission without introducing additional population-level dynamics. While host death and removal can substantially impact epidemic outcomes, incorporating mortality is orthogonal to the present study and remains an important direction for future extensions of the framework.

3.6. Infection propagation

The Infection-Propagation algorithm, shown in Algorithm 8, comprises three phases: mutation, infection decision, and infection and recovery, as discussed in the previous subsections. The while loop iterates for the total number of iterations specified as input, or stops when the count of active strains in G falls to zero.

4. RESULTS

In this section, we examine how key evolutionary and epidemiological parameters in ViraFit influence the adaptive walk toward the globally fittest genotype, starting from the least-fit initial strain. The adaptive walk length is defined as the number of iterations required for the globally fittest genotype in the fixed hypercube fitness landscape to first appear in the host population.

ViraFit integrates parameters from both the fitness landscape and the host contact network. Evolutionary parameters include genome length and mutation rate ( $μ$ ), while epidemiological parameters include infection probability ( $γ$ ) and infection duration ( $δ$ ). Network structure is generated using a Waxman random geometric model, whose size determines the number of hosts and whose structural properties are governed by parameters $α$ and $β$ .

Unless otherwise specified, default values are $μ = 0.25$ , genome length $= 5$ , $γ = 0.5$ , and $δ = 5$ . In the analyses below, we systematically vary mutation rate, genome length, infection probability, infection duration, and network size to evaluate their effects on adaptive walk length and convergence dynamics.

4.1. Effects of mutation rate, genome length, infection probability, and infection duration

Figure 9a illustrates the effect of mutation rate on the time required to reach the globally fittest genotype. Across all network sizes, lower mutation rates are associated with longer adaptive walk lengths, indicating slower traversal of the fitness landscape. Increasing $μ$ consistently reduces the time to reach the global optimum, although the magnitude of this reduction diminishes at larger network sizes.

FIG. 9.

(a) Mean adaptive walk length (averaged over 100 stochastic realizations) as a function of network size for mutation rates $μ = 0.1, 0.15, 0.2,$ and 0.25. (b) Mean adaptive walk length across network sizes for genome lengths $L = 3, 4, 5, 6, 7,$ and 8. (c) Mean adaptive walk length across network sizes for infection probabilities $γ = 0.1$ –0.7. (d) Mean adaptive walk length across network sizes for infection durations $T = 3$ –7. Error bars represent standard deviation across stochastic runs.

The interaction between genome length and network size is shown in Figure 9b. Shorter genomes exhibit consistently shorter adaptive walks across all network sizes, reflecting the smaller size and lower dimensionality of the corresponding fitness landscape. As genome length increases, walk length increases monotonically, consistent with increased landscape dimensionality and mutational search space. Adaptive walk length varies smoothly with network size, without abrupt transition points. Larger networks generally lead to modest increases in walk length for longer genomes, suggesting that increased host population size enables broader exploration of genotype space before fixation of the global optimum.

The effect of infection probability $γ$ is presented in Figure 9c. Lower infection probabilities produce longer adaptive walks, while higher values of $γ$ shorten the time to reach the global optimum. This pattern reflects the role of transmission intensity in amplifying selective sweeps: higher infection probability accelerates the spread of advantageous variants once they arise, thereby reducing convergence time. Differences across network sizes are relatively modest under this infection formulation.

Figure 9d shows the effect of infection duration. Shorter infection durations are associated with slightly longer adaptive walks, particularly in larger networks. Longer durations provide an extended within-host evolutionary opportunity before recovery, thereby increasing the probability that fitter variants arise and propagate. However, the magnitude of this effect is smaller than that observed for the mutation rate.

Overall, the mutation rate remains the dominant determinant of convergence speed to the global fitness peak. Genome length controls the dimensional complexity of the fitness landscape and therefore the overall magnitude of adaptive walk length. Epidemiological parameters primarily modulate the rate at which advantageous variants diffuse through the contact network rather than fundamentally altering evolutionary trajectories.

4.2. Parameter sensitivity and model robustness

We systematically evaluated the effects of key parameters governing both the fitness landscape and the contact network, including genome length, mutation rate ( $μ$ ), infection probability ( $γ$ ), infection duration ( $δ$ ), and network size. Across the tested settings, the qualitative relationships between epidemiological exposure and landscape exploration were consistent: mutation rate regulates the speed of traversal across the fitness landscape; genome length influences landscape dimensionality and ruggedness; and network size shapes the extent of strain competition and exploration. Infection probability and duration primarily modulate the rate at which strains propagate through the host population, thereby affecting the time required to encounter higher-fitness genotypes.

Additional parameters, such as the simulated annealing temperature schedule (T, $T_{\min}$ ) and Waxman network parameters ( $α$ , $β$ ), were held fixed in this study. These parameters influence exploration intensity and connectivity structure, and therefore affect quantitative convergence times. However, they do not alter the fundamental mechanism of fitness-guided strain competition operating at the host-contact level. Consequently, while absolute adaptive walk lengths may vary under alternative parameterizations, the core qualitative behaviors, fitness-weighted transmission, competitive strain replacement, and the interaction between network structure and evolutionary dynamics, remain robust.

4.3. Count of infections per strain

We observe that the counts of infections per strain differ. Moreover, not all the strains can persist over time (iterations). For genotype length = 5 and network size = 500, there can be 32 possible strains, as shown in Figure 10. The figure shows the trend in infection counts over iterations, with strains sorted by increasing fitness value.

FIG. 10.

Trends in the count of infections per strain for genotype length = 5 and network size = 500. Here, the strains are organized according to their fitness value, mentioned on the left of each strain. The fitter strains, i.e., 01111, 10111, 11011, 11101, 11110, and 11111, are quite stable and persist over time.

4.4. ViraFit on nucleotide sequence

We have also run our model on nucleotide sequences. We simulated only $3 - 6$ positions, focusing on those in the sequence where mutation occurs frequently. The only change we have made to the result is that, instead of using the average walk length to the fittest state, we have considered the average fitness value reached after 200 runs. As shown in Table 2, the higher the genome length and the bigger the network size, the higher the fitness value it reaches.

Table 2.
Table for Genome Length Versus Network Size, Where the Value Shows the Average Fitness Reached for 100 Walks

Genome Size 100 200 300 400 500

3 0.5552 0.68263 0.74265 0.68359 0.73152

4 0.65046 0.72617 0.93971 0.99362 0.88597

5 0.84422 0.89929 0.9361 0.92908 0.98775

6 0.96558 0.93043 0.97525 0.97323 0.98752

Genome Size	100	200	300	400	500
3	0.5552	0.68263	0.74265	0.68359	0.73152
4	0.65046	0.72617	0.93971	0.99362	0.88597
5	0.84422	0.89929	0.9361	0.92908	0.98775
6	0.96558	0.93043	0.97525	0.97323	0.98752

5. DISCUSSION

ViraFit provides a unified framework for exploring how viral evolution and epidemiological spread interact through both genome-level fitness landscapes and host contact networks. The model exposes several tunable parameters in both the fitness and network components, allowing systematic investigation of how evolutionary trajectories depend on genetic architecture and population structure. In the fitness model, the form of the fitness function directly shapes the adaptive walk of viral strains. While prior landscape models typically assign fitness values using randomized functions, we adopt a simple formulation to maintain computational tractability. Nonetheless, our framework readily accommodates empirically derived fitness values (de Visser and Krug, 2014), thereby enabling more biologically realistic representations of ruggedness when modeling large-scale nucleotide or protein sequences.

During mutation, a modified simulated annealing procedure (Bertsimas and Tsitsiklis, 1993) governs movement through the fitness landscape. In high-dimensional sequence spaces, where each strain has many mutational neighbors, approximate optimization is essential. Simulated annealing allows the model to favor fitness-increasing mutations while retaining a nonzero probability of accepting fitness-decreasing steps, thereby reflecting the stochasticity inherent in real evolutionary processes. The parameters governing this procedure, the mutation rate $μ$ , initial temperature T, minimum temperature $T_{\min}$ , and cooling rate $τ$ , collectively regulate the balance between exploration and exploitation and thus shape the resulting adaptive walks.

Importantly, our results demonstrate that network structure itself modulates evolutionary dynamics. Although it is expected that the fitness landscape influences which variants spread through the network, we find that network properties, such as size, also affect how strains traverse the landscape. For example, adaptive walks are not identical across networks of different sizes, nor are they invariant when network size is held constant, and sequence length is varied. This indicates that the fitness and network models independently and jointly shape both evolution and infection propagation. As shown in Eq. (4), infection outcomes depend on both local strain prevalence and fitness. Consequently, a less fit strain can dominate transmission if it is sufficiently abundant in the vicinity of a host, even in the presence of fitter competitors. Larger networks amplify this effect by increasing opportunities for such competitive interactions, effectively increasing landscape ruggedness and enabling reversals in adaptive trajectories. Thus, reaching a high-fitness strain does not guarantee its persistence: a population may transiently access a fitness peak yet fail to maintain it under continued competitive pressure.

In ViraFit, strain fitness represents competitive (intrahost) fitness rather than direct per-contact transmissibility between hosts. The probability that an exposed host becomes infected is governed by the epidemiological parameter $γ$ , which determines the per-contact transmission probability and yields an overall infection probability $p_{v} = 1 - {(1 - γ)}^{m}$ that increases with the number of infected neighbors m. Fitness does not directly modify $p_{v}$ ; instead, it influences which strain establishes infection when multiple variants are present in the local neighborhood.

When infection occurs, strain selection is probabilistic and based on prevalence-weighted fitness. Specifically, strains present among infected neighbors are assigned weights proportional to their local prevalence and fitness, and the transmitted strain is sampled accordingly. This formulation allows both evolutionary advantage and local abundance to shape transmission outcomes without assuming deterministic competitive exclusion. Consequently, in the absence of competition, strains with different fitness values spread at similar rates determined by $γ$ and contact structure, whereas under competition, fitter strains have a higher probability of being transmitted but do not exclude less-fit strains deterministically.

This probabilistic formulation better reflects the stochastic nature of real transmission processes, where infection events arise from independent exposure attempts and less-fit strains may occasionally establish infection due to chance or local abundance effects. At the same time, the model preserves a clear separation between epidemiological transmissibility (governed by $γ$ ) and evolutionary fitness (governing competitive success during strain selection), thereby maintaining the interpretability of both parameters.

Compared with prior epidemic network models that assume a fixed pathogen or represent multiple strains as abstract types without explicit genomic structures or fitness landscapes (Alexander and Day, 2010; Eletreby et al., 2020), ViraFit introduces a fundamentally different paradigm. Each variant is represented as a point in a structured sequence space (hypercube), and fitness is assigned at the genotype level. Mutation is therefore modeled explicitly as movement through neighboring genomes, rather than as transitions among a predefined set of strain types. This distinction enables ViraFit to capture how genome length, mutational neighborhoods, and landscape ruggedness mechanistically influence evolutionary outcomes—capabilities absent from existing multi-strain network models based solely on mutation or transmissibility matrices.

In this study, adaptive walk length is defined relative to the globally fittest genotype in the fixed fitness landscape. This ensures that walk lengths are directly comparable across parameter settings and reflect the time required to reach the global evolutionary optimum under different epidemiological and network conditions. It quantifies the extent of adaptive exploration permitted by a given epidemiological and network configuration. Longer adaptive walks indicate that the pathogen population can access more distant, potentially higher-fitness peaks in the underlying landscape, rather than implying slower evolutionary dynamics.

6. CONCLUSION

ViraFit provides a mechanistic framework that integrates genotype-level fitness landscapes with explicit host contact networks to study the joint dynamics of viral evolution and epidemiological spread. By representing strains as vertices in a structured sequence space and coupling mutation, fitness-guided selection, and contact-driven transmission, the model captures how population structure and evolutionary pressure interact to shape adaptive trajectories. The infection formulation ensures that transmission risk increases with local exposure, while fitness-weighted strain selection governs competitive establishment, enabling consistent comparison of adaptive walk lengths across parameter settings.

In practice, multiple strains may share identical or near-identical fitness values, giving rise to numerous local optima in the fitness landscape. Consequently, adaptive walks initiated from different starting genotypes may converge to distinct local or global peaks. Incorporating empirically derived fitness landscapes (de Visser and Krug, 2014) would further enhance biological realism and allow application to specific viral systems.

At present, the model abstracts several epidemiological processes to maintain tractability. Interactions among co-circulating viruses, host immunity, isolation behavior, and disease-induced mortality are not explicitly represented, despite their potential influence on transmission and evolutionary trajectories. Extending ViraFit to incorporate these mechanisms represents an important direction for future work and would enable a more comprehensive investigation of coupled evolutionary–epidemiological systems.

Footnotes

AUTHORS’ CONTRIBUTIONS

Lsh conceived of the project, while BD wrote the software, performed the experiments, and reported the results. Both authors wrote the manuscript and read it thoroughly.

ACKNOWLEDGMENT

This material is based upon work supported by the National Science Foundation under Grant No. CCF-1918656.

AUTHOR DISCLOSURE STATEMENT

No competing financial interests exist.

FUNDING INFORMATION

Funding was provided by the National Science Foundation under Grant No. CCF-1918656.

References

Alexander

, Day

. Risk factors for the evolutionary emergence of pathogens. J R Soc Interface 2010;7(51):1455–1474.

Bertsimas

, Tsitsiklis

. Simulated Annealing. Statist Sci 1993;8(1):10–15.

Claverie

J-M

, Ogata

, Audic

, et al. Mimivirus and the emerging concept of “giant” virus. Virus Res 2006;117(1):133–144.

de Visser

JAG

, Krug

. Empirical fitness landscapes and the predictability of evolution. Nat Rev Genet 2014;15(7):480–490.

Domingo

. Quasispecies: Concept and Implications for Virology . Current Topics in Microbiology and Immunology. Springer: Berlin Heidelberg, Spain; 2006.

Domingo

. Quasispecies and the implications for virus persistence and escape. Clin Diagn Virol 1998;10(2–3):97–101.

Domingo

, Perales

. Viral quasispecies. PLoS Genet 2019;15(10):e1008271.

Eletreby

, Zhuang

, Carley

, et al. The effects of evolutionary adaptations on spreading processes in complex networks. Proc Natl Acad Sci U S A 2020;117(11):5664–5670.

Fragata

, Blanckaert

, Dias Louro

, et al. Evolution in the light of fitness landscape theory. Trends Ecol Evol 2019;34(1):69–82.

10.

Kauffman

, Levin

. Towards a general theory of adaptive walks on rugged landscapes. J Theor Biol 1987;128(1):11–45.

11.

Kauffman

, Weinberger

. The NK model of rugged fitness landscapes and its application to maturation of the immune response. J Theor Biol 1989;141(2):211–245.

12.

Leitmeyer

, Rico-Hesse

. Viral evolution and epidemiology. Curr Opin Infect Dis 1997;10(5):367–371.

13.

Leventhal

, Hill

, Nowak

, et al. Evolution and emergence of infectious diseases in theoretical and real-world networks. Nat Commun 2015;6(1):6101–6111.

14.

Marintcheva

. Introduction to Viral Structure, Diversity and Biology. Academic Press: USA; 2018, pp. 1–26.

15.

Naldi

. Connectivity of Waxman topology models. Comput Commun 2005;29(1):24–31.

16.

Naqvi

AAT

, Fatima

, Mohammad

, et al. Insights into SARS-CoV-2 genome, structure, evolution, pathogenesis and therapies: Structural genomics approach. Biochim Biophys Acta Mol Basis Dis 2020;1866(10):165878.

17.

Neidhart

, Szendro

, Krug

. Adaptation in tunably rugged fitness landscapes: The Rough Mount Fuji model. Genetics 2014;198(2):699–721.

18.

Waxman

. Routing of multipoint connections. IEEE J Select Areas Commun 1988;6(9):1617–1622.

19.

Wright

. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. In: Proceedings of the VI International Congress of Genetics, Vol. 1. Blackwell; 1932, pp. 356–366.

20.

, Zhao

, Yu

, et al. A new coronavirus associated with human respiratory disease in China. Nature 2020;579(7798):265–269.

21.

, Liu

Z-Y

, Han

J-F

, et al. Genomic characterization and phylogenetic analysis of Zika virus circulating in the Americas. Infect Genet Evol 2016;43:43–49.