Integrating Data-Driven Strategies into Model Predictive Control for Enhanced Production of Human Interferon α2b in Glycoengineered Pichia pastoris

Abstract

The biopharmaceutical industry is experiencing rapid growth, necessitating scalable optimization and control strategies to meet strict process objectives. Model predictive control (MPC) offers a robust framework for regulating complex bioprocesses; however, its performance critically depends on the availability of reliable process models. While mechanistic models are often preferred, practical limitations have accelerated the adoption of data-driven approaches. In this study, we evaluate the applicability of artificial neural networks (ANNs) and Gaussian process (GP) models in MPC for fed-batch cultivation of glycoengineered Pichia pastoris to produce human interferon α2b (huIFNα2b). Experiments were performed in a fermentation calorimeter with real-time monitoring of P. pastoris metabolism through metabolic heat rate, capacitance, and exhaust gas analysis. Comparative results demonstrate that GP-based MPC achieved superior process control, efficient substrate utilization, and a 1.1-fold increase in huIFNα2b productivity relative to ANN-based MPC. Furthermore, GP-based adaptation of feeding strategies reduced methanol consumption by 14% compared with ANN-based control. These findings highlight the potential of GP-driven MPC as a promising tool for enhancing productivity and sustainability in industrial bioprocesses.

Keywords

model predictive control data-driven ANN Gaussian process

Introduction

With the advancements emerging in gene manipulation techniques, various microbial systems are emerging as efficient chassis for the production of a multitude of biopharmaceutical products through microbial fermentation route. Initiatives such as process analytical technology (PAT) and quality by design (QbD) by US FDA and European Medicines Agencies not only ensure safer production but also promote enhancing process knowledge by imparting real-time process monitoring and control.¹ Biopharmaceutical industries generally operate in fed-batch mode as it is bestowed with a plethora of advantages. However, in industrial practice, the control objectives of fermentation process/bioreactors are limited to simple regulation loops, including pH, temperature, and dissolved oxygen (DO). Indeed, there is unequivocal evidence that the implementation of advanced control strategies in managing the biochemical state variables holds the potential to substantially enhance process performance.² The formidable challenge in regulating biochemical state variables stems from the inadequate availability of suitable process analyzers and robust process models. The intricacies of biological systems, characterized by dynamic and nonlinear behavior, pose hurdles in developing accurate models.³

Numerous research endeavors have actively pursued the development of automatic regulation of biochemical state variables in fed-batch fermentation. Presently, two fundamental approaches for biological process control are prevalent: open-loop and closed-loop control. In the open-loop control, the feeding paradigm adheres to a predefined feeding rate that is independent of the output parameters from the process.^4,5 Even when deviations from the targeted state variable trajectory are identified, no corrective control actions, such as adjustments to the feeding profile, are implemented. Conversely, closed-loop control systems integrate real-time feedback, enabling dynamic adjustments to the feeding rate based on observed deviations from the desired process outcomes. The implementation of closed-loop control in bioprocess is facilitated by widely applied techniques, including proportional integral derivative (PID) control, fuzzy control, artificial neural networks (ANNs)-based control, and statistical process control. Currently, in fed-batch process, the PID controller stands out as the most common closed-loop control method and is often implemented through indirect feedback control schemes. These schemes link the nutrient feeding rate with the measurements of process parameters such as pH and DO and operate as pH-stat and DO-stat fed-batch strategies.⁵ PID control demonstrates favorable results with enhanced process performance as compared with open-loop control. Despite the advantages posed by PID controller, the usage of standalone PID controller to regulate biochemical state variables is limited due to the nonavailability of sensors to accurately measure the parameters such as biomass or substrate concentration. Optical or capacitive biomass sensors, while providing insights, are susceptible to influences from factors such as mixing, aeration, foam, and culture’s state. Sensors commonly used for substrate concentration measurements may lack the required accuracy and reliability.^6,7 Control strategies based on empirical models, such as ANN, fuzzy logic, and statistical process control, demonstrate favorable results if the system parameters do not deviate significantly from the kinetic parameter values that were used during the model creation. The principal limitation inherent in these methodologies resides in the imperative to construct a model bespoke to each novel process, mandating the acquisition of substantial quantities of statistical data.⁴

Model predictive control (MPC) stands as a widely accepted and popular control technique in industrial applications.^8,9 It is an advanced model-based process control strategy characterized by the solution of an optimization problem, either online or offline, within a set of predefined constraints. The core capability of MPC lies in its ability to predict, utilizing an assumed dynamic process model, over a finite horizon. This predictive capability allows MPC to proactively optimize control actions based on future predictions, enabling the system to anticipate and respond to changes in the process dynamics. The versatility of MPC has contributed to its broad acceptance, making it a valuable tool for enhancing the efficiency and performance of various industrial processes.⁸ The reference trajectory is obtained from an existing process or from a mathematical model. Classical mechanistic models or data-driven models are employed for developing the prediction model. Applications of MPC to bioprocess for various microorganisms have been reported in literature.^10–12 Most of the experimental investigations were based on simulation studies, and only a smaller number of studies were reported toward real-time implementation.¹³

In this present investigation, MPC was developed with the aid of data-driven modeling. Two data-driven models (ANNs and Gaussian process [GP]) were employed for the prediction of difficult-to-measure biochemical state variable (i.e., specific growth rate $μ$ ). Here, we demonstrate the effectiveness of the developed MPC to control fed-batch cultivation of glycoengineered Pichia pastoris for the production of human interferon α2b (huIFNα2b). IFNα2b, characterized by its composition of 165 amino acid residues and a molecular weight of 19 kDa, emerges as a prominent cytokine molecule. Notably, it plays a pivotal role in interacting specifically with IFN α/β receptors, thereby initiating the JAK-STAT-mediated signaling pathway.¹⁴ This initiation leads to the induction of various biological responses, encompassing immunomodulatory, antiproliferative, and antiviral properties. Commercially recognized as Intron^®A, this molecule has garnered FDA approval for the treatment of a diverse range of conditions. These conditions include chronic hepatitis C, hepatitis B, hairy cell leukemia, malignant melanoma, follicular lymphoma, condyloma acuminata, and AIDS-related Kaposi sarcoma. The recent surge in demand for IFNα2b can be attributed to its crucial role in combination therapies alongside other antiretroviral and antimalarial drugs, specifically in the treatment of SARS-CoV-2 (COVID-19) disease.^15,16 This surge underscores the molecule’s newfound significance, emphasizing its multifaceted therapeutic applications and its potential to contribute significantly to public health. Therefore, the protein needs to be produced in larger quantities with enhanced productivity, and the production method can benefit from the optimization implemented by the developed MPC.

Material and Methods

STRAIN AND MEDIA

A glycoengineered strain of P. pastoris with a Mut+ phenotype, containing the plasmid SuperMan5 pep4Δ prb 1Δ procured from Biogrammatics Inc., USA, was employed for the expression of huIFNα2b.¹⁴ Glycerol stocks were maintained at −80°C in yeast extract, peptone, and dextrose (YPD) media containing 20% v/v glycerol.

The composition of various media used for the cultivation of glycoengineered P. pastoris was as follows:

Starter culture medium (YPD) in g/L: yeast extract 10, peptone 20, and dextrose 20.

Preculture medium (yeast extract, peptone, glycerol: YPG) in g/L: yeast extract 10, peptone 20, and glycerol 20.

Optimized fermentation medium (Basal Salt Medium, BSM)¹⁷: glycerol 48.84 g/L, K₂SO₄ 18.2 g/L, MgSO₄ 7.28 g/L, KOH 4.13 g/L, CaSO₄. 2H₂O 0.93 g/L, (NH₄)₂SO₄ 8.42 g/L, 85% H₃PO₄ 26.7 mL/L, PTM4 salts 4.4 mL/L.

INOCULUM PREPARATION

Glycerol stocks of the strain, stored at −80°C, were utilized to streak a YPD agar plate, initiating the growth of colonies. From the YPD plate, a single colony was carefully selected and introduced into a 5 mL test tube containing YPD media, establishing a starter culture. This starter culture was subsequently employed to inoculate a baffled flask containing 300 mL of YPG media, thereby creating a preculture.

The preculture underwent cultivation at 30°C with agitation at 220 RPM for a period of 24 hours. Over this duration, the preculture achieved a final optical density at 600 nm (OD₆₀₀) of 4 absorbance units, indicative of the growth and biomass accumulation during the cultivation process.

FERMENTATION CALORIMETER WORKING PRINCIPLE AND OPERATION

Fed-batch fermentations were conducted in a 5L calorimetric bioreactor designed to monitor metabolic heat release via heat compensation calorimetry. The system dynamically adjusts the compensation heater power to match microbial heat generation. The bioreactor features a dual-layer jacket with four sensitive temperature probes (Isothermal Technology Ltd., UK): two inside the vessel and two at the jacket inlet and outlet. Silicon oil circulates in the inner jacket to maintain isothermal conditions, while the outer jacket is vacuum-sealed to minimize heat loss. Precise temperature control uses two PID controllers: one modulates the heater power based on microbial heat, and the other regulates the cryostat (Julabo GmbH, Germany) responding to inlet jacket temperature. Metabolic heat rate is obtained by separating total heat into biological and nonbiological components, with the latter estimated from mechanical operation contributions as detailed elsewhere.¹⁸ A dynamic energy balance was applied over fermentation calorimeter to calculate the biological heat rate. Eq. (1) describes the unsteady state heat rate balance contributed by different operating parameters.

q_{C} - q_{J} - q_{E} + q_{S T} - q_{A} + q_{B} = {m_{w} C}_{P} \frac{d T_{r}}{d t}

where

q_{B}

(biological heat rate, W/L),

q_{baseline}

(baseline heat rate, W/L),

q_{C}

(power generated by compensation heater, W/L),

T_{r}

(reactor temperature, °C),

m_{w}

(mass of water, kg). Heat rate contributions due to nonbiological activities were found to be environmental heat loss (

q_{E}

, W/L), the heat flow from reaction broth to jacket (

q_{J}

, W/L), agitation heat rate (

q_{S}, W / L

), and aeration heat loss (

q_{A}

, W/L) and were lumped as baseline heat rate. Finally, the biological heat rate is calculated from the output of compensation heater and baseline heat rate as given in Eq. (2).

q_{B} = q_{baseline} - q_{C}

DATA ACQUISITION FOR MONITORING AND CONTROL

For the seamless integration of all process inputs and outputs, a cRIO 9075 data acquisition (DAQ) hardware, sourced from National Instruments in Austin, Texas, USA, was employed. DAQ hardware featured dedicated slots accommodating both analog and digital input/output modules. A supervisory control and data acquisition system, developed using LabVIEW 13.0 from National Instruments, was utilized to acquire and process all signals.

The real-time signals obtained were subjected to meticulous data pretreatment through LabVIEW programming. To enhance signal quality, a moving simple average filter with a window size of 100 data points was implemented on the raw data. This filtering mechanism aimed to reduce noise originating from input signals, ensuring a smoother and more accurate representation of the acquired data.¹⁸ Furthermore, for all acquired analog input and output signals, including temperature, pH, DO, off-gas analyzer data, and metabolic heat rate, scaling factors were selectively applied. This step was taken to align the acquired data with previously calibrated values, enhancing the accuracy and reliability of the measurements. The pretreated real-time signals were then visualized through graphical plots, facilitating a comprehensive understanding of the data.

CARBON DIOXIDE EMISSION RATE MEASUREMENT

Real-time monitoring of off-gas activity during P. pastoris cultivation: The concentration of evolved CO₂ was measured using an exhaust gas analyzer (Siemens AG, Ultramat 23, Berlin, Germany). CO₂ detection by the analyzer is facilitated by infrared (IR) absorption, and the resulting signal is transmitted as a mole fraction denoted as $(y_{{CO}_{2}, out})$ . The calculation of carbon dioxide emission rate (CER) values is derived from the measured mole fractions data as represented in Eq. (3).

CER (\frac{m . mol}{L . h}) = \frac{\dot{m_{g}}}{V_{R}} [y_{{CO}_{2}, out} (\frac{y_{inert, in}}{y_{inert, out}}) - y_{{CO}_{2}, in}]

where

{\dot{m}}_{g}

the mass flow rate of gas in L/h,

V_{R}

is the volume of the reaction broth in L,

y_{{CO}_{2}, in}

is the mole fraction of

C O_{2}

in air inlet stream,

y_{{CO}_{2}, out}

is the mole fraction of

C O_{2}

in air outlet stream,

y_{inert, in}

is the mole fraction of

N_{2}

in air inlet stream and

y_{inert, out}

is the mole fraction of

N_{2}

in the air outlet stream.

DIELECTRIC SPECTROSCOPY

Dielectric spectroscopy (capacitance probe/impedance spectroscopy) serves as a method to measure the viable cell concentration of the cell culture suspended in a conductive medium. The capacitance probe (Aber Instruments, Aberystwyth, UK) is configured with a ceramic covering at the probe tip, which is encircled by two annular rings. An electric field is generated at the applied frequency, and the interaction of the electric field with the cells induces ion movements across the cell membrane, causing polarization as intracellular and extracellular ions migrate toward the nonconductive plasma membrane. This results in the establishment of a charge gradient across the membrane, giving rise to capacitance. Living cells, with intact membranes, exhibit this capacitance, distinguishing them from the dead cells that may have lysed, damaged, or fragmented membranes. Unlike biomass, other suspended particles in the cultivation broth, lacking a plasma membrane, do not contribute to capacitance.¹⁹ In this study, capacitance was recorded at dual frequencies (580 kHz and 15.560 MHz) through Futura software (Aber Instruments, Aberystwyth, UK). The difference in the capacitance values between these frequencies ( $Δ C$ ) represents the overall capacitance value. The integration of capacitance probe as PAT tool enhances the understanding of cellular processes and offers valuable insights into the dynamics of cell growth and metabolism during P. pastoris cultivation.

DATA CURATION FOR ANN AND GP MODEL TRAINING

Machine learning algorithms require large datasets to accurately predict process parameters, capturing key changes during product formation. This study employs two algorithms to model nonlinear dynamics: ANNs and GP. A two-layer feedforward ANN with tanh activation was developed. GP, a robust nonlinear regression technique, is defined by a mean function representing average values and a covariance (kernel) function capturing dependencies between inputs. Here, the GP model uses a radial basis function kernel, critically shaping the GP’s behavior.²⁰ Five fed-batch fermentation experiments, each with 28,000 data points, were used for data-driven model development. The dataset included online measurements of DO concentration, metabolic heat rate, capacitance, and CER. Historical data enabled the model to identify patterns across runs. Each attribute was normalized to zero mean and unit variance to address scale differences, enhancing pattern recognition and generalization. Data from all experiments were combined and randomly split into training, validation, and test sets (75%, 15%, 10%) using MATLAB’s “dividerand” function. Root mean square error (RMSE) was used to select the best network for MPC implementation (Fig. 1).

Fig. 1.

Schematic representation of data-driven MPC workflow. MPC, model predictive control.

CONTROL METHODOLOGY

The objective of this work is to maximize the huIFNα2b productivity from the high cell density cultivation of glycoengineered P. pastoris. This is accomplished by maintaining the organism at an optimal growth rate and ensuring that the majority of the carbon substrate is channelled toward biomass and product formation. To achieve the stated objective, MPC was formulated using a cost function as stated in Eq. (4).

ϕ = \sum_{i = 1}^{N_{P}} {(\hat{μ} - μ_{sp})}^{2} + λ \sum_{i = 1}^{N_{C}} {(F_{in} - F_{ref})}^{2}

where

\hat{μ}

is the data-driven-based observed specific growth rate (h⁻¹),

μ_{sp}

is the setpoint of the specific growth rate (h⁻¹),

F_{ref}

is the precalculated reference feeding profile (mL/h),

F_{in}

is the control input (mL/h),

λ

is the control penalty gain, and

N_{P}

and

N_{C}

are prediction and control horizons.

The reference feeding trajectory is precalculated analytically based on the substrate balance equation, and it is mathematically represented as shown in Eq. (5).

F_{ref} = \frac{X_{0} V_{0} μ_{sp}}{Y_{X / S} (S_{in} - S)} e^{μ_{sp} t}

where

X_{0}

represents initial biomass concentration (g/L),

V_{0}

represents volume of the culture in the fermenter (L),

μ_{sp}

is the setpoint of specific growth rate (h⁻¹),

Y_{X / S}

is the biomass yield per substrate consumed,

S_{in}

is the concentration of the substrate fed into the fermenter (g/L),

S

is the concentration of substrate present in the fermenter.

A prediction horizon of 1 hour and a control horizon of 0.5 hours were chosen to minimize the quadratic cost function based on tracking error over a finite moving horizon of length $N_{P}$ , and the trajectory tracking (tracking a predefined setpoint) is achieved. If the process variables are beyond the specification, the process will deviate from its norm, thereby hampering the productivity. Moreover, it does not make sense to have the process variables of interest involved in this study to have negative values. Consequently, MPC was subjected to constraints on prediction and control horizons as shown in Eqs. (6) and (7), respectively.

0 \leq F_{in, k} k = 1, \dots, N_{c}

0 \leq μ_{k} k = 1, \dots, N_{P}

MPC follows receding horizon principle, where the optimization problem subjected to the constraints is solved online at instant $k$ .

HUIFN α2B PRODUCTION

Sterilization of fermentation calorimeter was performed in situ using prefilled distilled water at 121°C and 1 bar pressure for 15 minutes. Following sterilization, BSM was aseptically transferred into the calorimeter. Subsequently, a 10% v/v preculture inoculum was aseptically introduced into the calorimeter, resulting in a final working volume of 2.5 L. The high cell density cultivation was performed at 30°C, with pH maintained at 5.4 through the controlled addition of 25% v/v ammonia and 85% v/v orthophosphoric acid. DO levels were sustained above 10% by supplying air at 1 vvm, coupled with agitation rate regulation between 400 and 800 rpm. To mitigate excessive foaming, manual feeding of 10% v/v antifoam solution (silicon oil) was added when necessary.

The high cell density cultivation of P. pastoris is characterized by three distinct phases: the glycerol growth phase (∼20 to 24 hours), methanol adaptation phase (∼6 to 8 hours), and the recombinant protein production phase (∼40 to 50 hours). The glycerol growth phase resulted in significant biomass generation, utilizing glycerol as a primary carbon source. Real-time signals from process analyzers indicated the depletion of glycerol, manifested through a decline in heat rate signal and off-gas activity and an increase in DO concentration. During the methanol adaptation phase, gradual methanol (concentration not exceeding 20 g/L) dosing facilitated effective adaptation to the new carbon source. Activation of alcohol oxidase (AOX) transcription elements indicative of the adaptation phase was reflected in changes in real-time signals of metabolic heat rate and off-gas activity. Production phase involved continuous supply of methanol (concentration not exceeding 20 g/L), regulated by employed process control strategies as detailed in the " Control methodology" section. Throughout the cultivation, regular sample collection facilitated offline analysis, enabling the estimation and monitoring of key process parameters such as biomass growth, substrate consumption, and product formation.

OFFLINE ANALYSIS

Samples collected at regular intervals were preserved at 4°C for offline analysis. Cells were subjected to centrifugation at 10,000 RPM for 10 minutes at 4°C for extracting cellular components. The resulting supernatant was carefully collected and stored at –20°C for subsequent analysis of carbon substrate and huIFNα2b. The separated cells were subjected to a double wash with deionized water to eliminate any residual salts and subsequently dried at 80°C to determine the dry cell weight (DCW). Cell growth was quantified through OD₆₀₀ using a UV–visible spectrophotometer (GE Healthcare, UK). Obtained absorbance readings were converted into DCW values (g/L) based on the following correlation developed ( $1 OD = 0.21 \times DCW$ ).

PERFORMANCE INDEX

The prediction performance of the data-driven models was assessed through the RMSE metric, which is defined as shown in Eq. (8).

RMSE = \sqrt{\frac{\sum_{i = 1}^{N} {(y_{predicted} - y_{actual})}^{2}}{N}}

where

y_{predicted}

is the model predicted value,

y_{actual}

is the experimental value, and

N

is the number of data points.

Results and Discussion

REAL-TIME MONITORING OF HIGH CELL DENSITY CULTIVATION OF GLYCOENGINEERED P. PASTORIS

The deployment of real-time process analyzers such as fermentation calorimeter, off-gas analyzer, and dielectric spectroscopy exacerbated the monitoring and control action by accurately depicting the dynamics of high cell density cultivation of P. pastoris. The consistent elevation in metabolic heat rate, CER, and capacitance signal corresponded to a concurrent increase in biomass concentration during high cell density cultivation. Clear demarcation of real-time signals was observed (Fig. 2) as the cultivation progressed through different stages. Particularly, a decline in the real-time signals was noticeable at the end of glycerol batch phase (20–23 hours). Subsequent to this, a gradual adaptation to methanol utilization was evidenced by a slower increase in these signals, followed by the depiction of dynamics during the induction phase.

Fig. 2.

Real-time dynamic profiles from three process analyzers, that is, fermentation calorimeter, capacitance probe, and exhaust gas analyzer. Metabolic heat rate (red continuous), capacitance (blue continuous), and CER (green continuous): (a) NN-based MPC, (b) GP-based MPC. CER, carbon dioxide emission rate; GP, Gaussian process; NN, neural network.

As Gibb’s free energy dissipation is inherent to all living systems, the enthalpic changes associated with diverse metabolic processes can be effectively elucidated through the application of fermentation calorimeter. The customized fermentation calorimeter utilized in our study demonstrates a sensitivity of 6.73 mW/L. The maximum microbial heat generated during the cultivation of glycoengineered P. pastoris for the production of recombinant huIFN α2b ranged from 40 to 52 W/L, aligning with previously reported literature.²¹ The integration of calorimeter signal with CO₂ signal facilitates the interpretation of the interrelation between heat fluxes in bioprocesses and corresponding alterations in the metabolic flux. In the context of P. pastoris cultivation, methanol consumption during the induction phase undergoes oxidation within the peroxisome, yielding formaldehyde, which is subsequently further oxidized to generate essential energy intermediates for cellular functions. This oxidation results in the evolution of CO₂ gas, and as a result, the induction phase is characterized by higher CER values (80–170 m.mol/L.h) compared with the glycerol phase, where the maximum CER generated ranges from 50 to 80 m.mol/L.h. An increase in cumulative CER production during the induction phase signifies ongoing activity in the expression of AOX. The heightened CER production in the induction phase suggests that a substantial portion of assimilated methanol is primarily directed toward energy metabolism, with less allocation to biomass production compared with glycerol batch phase. An excessive or abrupt increase in CER values serves as an indicator of extensive CO₂ oxidation, leading to the accumulation of formaldehyde in the culture broth and resulting in cytotoxic conditions.²² Application of process analyzers in conjunction with offline measurements facilitated the calculation of thermochemical energetic yields. The metabolic heat generation arising from the balance between substrate uptake rate and its allocation to diverse metabolic reactions is an inherent characteristic of the organism, signifying the coupling between catabolic and anabolic processes. In the context of black box microbial stoichiometries, real-time heat rate measurements coupled with auxiliary process analyzers provide a distinct advantage of predicting various yield coefficients. The heat yield coefficient attributed to the biomass generation $(Y_{\frac{Q}{X}})$ is obtained from the slope of the plot between the cumulative heat $(Q)$ and biomass generated. The values of $Y_{\frac{Q}{X}}$ under conditions of balanced growth exhibit a high degree of consistency and are dependent on the specific metabolic pathways in operation. During the induction phase, $Y_{\frac{Q}{X}}$ values for all experimental runs were in the range of 18–19.5 kJ/g.²¹ Thus, the real-time measurements obtained from the process analyzers were successful in deciphering the intricate relationship between the biomass production, metabolic heat generation, and CO₂ production. The integration of real-time data from process analyzers enabled training the data-driven models for the prediction of biochemical state variable $μ$ , which facilitated the optimized feeding of methanol for enhanced production of huIFNα2b.

EXPERIMENTAL VERIFICATION OF ANN MPC

The architecture of the ANN investigated in this study was denoted as 4–N–1, where “4” signifies the number of input nodes in the input layer, N represents the variable number of hidden nodes in the hidden layer, and “1” designates the single output node in the final layer. Each node corresponds to a neuron that computes a weighted sum of the outputs from the preceding layer. The identification of the optimal-performing network entailed the careful selection of an appropriate training algorithm and determining the number of neurons in the hidden layer. In accordance with established practices, the number of neurons was systematically chosen through an iterative process of trial and error, with the overarching objective of minimizing the prediction error between the actual target values and the corresponding values predicted by the network.²³ The optimal number of neurons was 12, with a lower RMSE being registered while training the network. The network’s efficacy experienced a decline when the variable N was constrained to values exceeding 12 (Table 1). A higher N value tends to induce overfitting during the training phase, amplifying network complexity and computational burden. This, in turn, leads to a suboptimal generalization when applied to test data. Conversely, a lower count of neurons in the hidden layer contributes to underfitting, where the network inadequately captures the underlying patterns in the data, further compromising its predictive performance. To assess and compare the performance of the developed MPC, real-time datasets that were not utilized during the training phase were employed. The ANN MPC consistently demonstrated the capability to maintain the specific growth rate within a range of 0.02–0.05 h⁻¹ during both testing and validation datasets. Noteworthy deviations, ranging within ±0.015 h⁻¹ from the optimal specific growth rate of 0.035 h⁻¹, were observed at various intervals throughout the induction phase (Fig. 3). The manifestation of oscillatory behavior as depicted in Figure 4 is attributed to the intricate tuning of a higher number of hyperparameters involved in the development of ANN. The optimal feeding rate of methanol was determined as per Eq. (5) to maintain the desired setpoint values, and $F_{o}$ was determined from the batch-phase data as shown in Eq. (6). The implementation of exponential feeding rates emerged as a highly effective strategy in maintaining the cellular metabolic demands. This approach facilitated concurrent substrate utilization, thereby maintaining the residual methanol concentration in close proximity to zero for a duration exceeding 15 hours. A comprehensive elucidation of the sustained and balanced utilization of substrate across diverse cellular activities is illustrated in Figure 5. Upon closer examination of the results, it was observed that, at the onset of huIFNα2b production, the cells demonstrated a notable maximum methanol consumption $(q_{s} = 0.074 g / g . h) .$ As the induction phase progressed, there was a discernible decline in the methanol consumption rate. This temporal evolution was accompanied by a significant reduction in the growth rate, transitioning from an initial value of $μ_{ip} = 0.042 h^{- 1}$ to a subsequent value of $μ_{ip} = 0.028 h^{- 1}$ at the end of the induction phase, and this observed trend was mirrored with huIFN α2b productivity. Owing to the weighted optimization of multiple control tasks within the MPC framework, a marginal disparity is observed at the commencement of the control process. Subsequently, a consistent decrement in the actual value ensues, attributed to the implementation of active soft constraints.²⁴ A similar phenomenon was observed for other recombinant proteins produced under Mut+ phenotype strains.²⁵ These findings align with established research suggesting that the induction of heterologous gene expression triggers an augmentation in intracellular product levels, accompanied by an elevation in binding protein levels and a concurrent reduction in $μ_{i p}$ values. The plausible involvement of cellular stress response in the context of heterologous protein production, particularly relevant to complex proteins such as huIFNα2b characterized by two disulfide bonds and an N-glycosylation site, further underscores the intricacies of the observed phenomena.^14,26

Fig. 3.

Comparison of data-driven predicted specific growth rate (black continuous) with the experimentally obtained data (red circles): (a) NN-based MPC, (b) GP-based MPC.

Fig. 4.

Influence of controller performance on methanol feeding rates. Precalculated reference feeding rate (gray continuous) compared with experimental feeding rate (red continuous): (a) NN-based MPC, (b) GP-based MPC.

Fig. 5.

Impact of optimal methanol feeding strategy on residual methanol concentration. NN-based MPC (black circles), GP-based MPC (red circles).

Table 1.

Root Mean Square Error Over Training Data of Artificial Neural Network for the Prediction of Specific Growth Rate

NUMBER OF NEURONS	RMSE
5	0.0088
7	0.0056
10	0.0024
12	0.00065
14	0.0034

RMSE, root mean square error.

EXPERIMENTAL VERIFICATION OF GP MPC

GP prediction model assumes the form of black box system, and the characterization arises from its capacity to furnish predictions of the output variable based on the input variables without manifesting the explicit functional relationship between them. The probabilistic approach for the regression was based on the prior distribution of the output variable $(μ)$ and is characterized by a kernel function. The hyperparameter, (length scale) was optimized to 8.57 during the training process to maximize the likelihood of the observed data, yielding a very low RMSE of 0.00045. The length scale less than 8 resulted in wiggly function, as the model complexity increased to capture the finer details in the data. Conversely, a higher length scale (>8.57) resulted in overfitting of the data due to the evolution of a smoother model.

GP MPC consistently exhibited the ability to sustain the specific growth rate within a range of 0.025–0.045 h⁻¹ during both testing and validation datasets. Noteworthy deviations, falling within the range of ±0.01 h⁻¹ from the optimal specific growth rate of 0.035 h⁻¹, were discerned at diverse intervals throughout the induction phase (Fig. 3). The objective of attaining a stable output, reflected through optimal feeding of methanol, was successfully realized through the implementation of GP MPC exhibiting a 1.03- and 1.9-fold enhancement of huIFNα2b productivity compared with our previously reported studies, that is, µ controlled by gain-scheduling PID and residual methanol concentration controlled by model-based adaptive PI (Table 2). As illustrated in Figure 4, the feed rate was subjected to lesser oscillatory behavior in GP MPC as compared with ANN MPC. The straightforward approach of GP, which requires lesser hyperparameter tuning than ANN, also contributed to the efficiency of GP MPC. The residual methanol concentration was maintained under methanol-limiting conditions for longer durations, and this tightly regulated methanol feeding reinstated the housekeeping activities more effectively, which resulted in enhanced production of huIFNα2b. A 1.1-fold increment in huIFNα2b productivity was observed through GP MPC approach, and also a 14% decrease in methanol utilization was observed due to the optimization of feeding rate.

Table 2.

Comparative Assessment of Process Strategies on Human Interferon α2b Production

CONTROL METHOD	FINAL DCW (g/L)	huIFNα2B (mg/L)	$q_{P}$ (mg/g h)	REFERENCE
Gain-scheduling PID (model free)	101.2	243.65	0.56 ± 0.08	²⁷
Model-based adaptive PI	84.5	244.34	0.31 ± 0.01	²⁸
NN-based MPC	108.2	252.32	0.55 ± 0.03	This study
GP-based MPC	116.65	283.56	0.61 ± 0.03	This study

huIFN α2b, human interferon α2b; PID, proportional integral derivative.

Conclusions

In this study, we explored the potential of controlling critical process parameters in huIFNα2b production. The utilization of data-driven models proves advantageous by providing reliable estimates of specific growth rate and thereby providing accurate input to the MPC. To effectively manage the inherent nonlinearities and discontinuities in the process, MPC was leveraged to fulfill the multiple objectives and predict the process events. This proactive approach facilitated continuous product formation with enhanced productivity. Furthermore, the optimal use of raw material (carbon substrate—methanol) with the aid of the proposed MPC is of high relevance for industry as it provides the possibility of significantly enhancing the product concentration without any additional burden of substrate consumption. The obtained results underscore the potential for leveraging nonlinear approaches to achieve more robust and efficient control strategies in bioprocess applications.

Ethical Statements

The authors declare that they have no known financial interests or personal relationships that could have appeared to influence the work reported in the paper.

Neither ethical approval nor informed consent was required for this study.

Data Availability

Data will be made available on request.

Authors’ Contributions

S.S.P.A.: Investigation, formal analysis, data curation, visualization, writing—original draft, and writing—review and editing. Shikha S.: Investigation. Senthilkumar S.: Project administration, resources, and writing—review and editing.

Footnotes

Acknowledgement

The authors would like to acknowledge Ms. Sandhya Sekhar, a research scholar, for her help during fermentation runs. The authors kindly acknowledge the Department of Biosciences and Bioengineering, IIT Guwahati, for providing the state-of-the-art infrastructure facility.

Author Disclosure Statement

No competing financial interests exist.

Funding Information

The authors gratefully acknowledge the financial support from the Department of Science and Technology—Science and Engineering Research Board, Government of India, for the successful accomplishment of this work (CRG/2019/002882).

References

Rathore

, Mishra

, Nikita

, et al. Bioprocess control: Current progress and future perspectives. Life (Basel), 2021; 11(6):557; doi: 10.3390/life11060557

Schuler

, Marison

. Real-time monitoring and control of microbial bioprocesses with focus on the specific growth rate: Current state and perspectives. Appl Microbiol Biotechnol, 2012; 94(6):1469–1482; doi: 10.1007/s00253-012-4095-z

Narayanan

, Luna

, Von Stosch

, et al. Bioprocessing in the digital age: The role of process models. Biotechnol J, 2020; 15(1):e1900172; doi: 10.1002/biot.201900172

Mears

, Stocks

, Sin

, et al. A review of control strategies for manipulating the feed rate in fed-batch fermentation processes. J Biotechnol, 2017; 245:34–46; doi: 10.1016/j.jbiotec.2017.01.008

Mahmoodi

, Nassireslami

. Control algorithms and strategies of feeding for fed-batch fermentation of Escherichia coli : A review of 40 years of experience. Prep Biochem Biotechnol, 2022; 52(7):823–834; doi: 10.1080/10826068.2021.1998112

Luttmann

, Bracewell

, Cornelissen

, et al. Soft sensors in bioprocessing: A status report and recommendations. Biotechnol J, 2012; 7(8):1040–1048; doi: 10.1002/biot.201100506

Sagmeister

, Wechselberger

, Jazini

, et al. Soft sensor assisted dynamic bioprocess control: Efficient tools for bioprocess development. Chem Eng Sci, 2013; 96:190–198; doi: 10.1016/j.ces.2013.02.069

Forbes

, Patwardhan

, Hamadah

, et al. Model predictive control in industry: Challenges and opportunities. IFAC Pap OnLine, 2015; 48(8):531–538; doi: 10.1016/j.ifacol.2015.09.022

Bolmanis

, Dubencovs

, Suleiko

, et al. Model predictive control—a stand out among competitors for fed-batch fermentation improvement. Fermentation, 2023; 9(3):206; doi: 10.3390/fermentation9030206

10.

Aehle

, Bork

, Schaepe

, et al. Increasing batch-to-batch reproducibility of CHO-cell cultures using a model predictive control approach. Cytotechnology, 2012; 64(6):623–634; doi: 10.1007/s10616-012-9438-1

11.

Craven

, Whelan

, Glennon

. Glucose concentration control of a fed-batch mammalian cell bioprocess using a nonlinear model predictive controller. J Process Control, 2014; 24(4):344–357; doi: 10.1016/j.jprocont.2014.02.007

12.

Ławryńczuk

. Modelling and nonlinear predictive control of a yeast fermentation biochemical reactor using neural networks. Chemical Engineering Journal, 2008; 145(2):290–307; doi: 10.1016/j.cej.2008.08.005

13.

Dubencovs

, Suleiko

, Sile

, et al. The application of adaptive model predictive control for fed-batch Escherichia coli BL21 (DE3) cultivation and biosynthesis of recombinant proteins. Fermentation, 2023; 9(12):1015; doi: 10.3390/fermentation9121015

14.

Katla

, Yoganand

KNR

, Hingane

, et al. Novel glycosylated human interferon alpha 2b expressed in glycoengineered Pichia pastoris and its biological activity: N-linked glycoengineering approach. Enzyme Microb Technol, 2019; 128:49–58; doi: 10.1016/j.enzmictec.2019.05.007

15.

Wang

, Li

, Liu

, et al. Subcutaneous injection of IFN alpha-2b for COVID-19: an observational study. BMC Infect Dis, 2020; 20(1):723; doi: 10.1186/s12879-020-05425-5

16.

Zhou

, Chen

, Shannon

, et al. Interferon-α2b Treatment for COVID-19. Front Immunol, 2020; 11:1061; doi: 10.3389/fimmu.2020.01061

17.

Katla

, Karmakar

, Tadi

SRR

, et al. High level extracellular production of recombinant human interferon alpha 2b in glycoengineered Pichia pastoris: Culture medium optimization, high cell density cultivation and biological characterization. J Appl Microbiol, 2019; 126(5):1438–1453; doi: 10.1111/jam.14227

18.

Mohan

, Sivaprakasam

. Heat compensation calorimeter as a process analytical tool to monitor and control bioprocess systems. Ind Eng Chem Res, 2017; 56(30):8416–8427; doi: 10.1021/acs.iecr.7b01367

19.

Flores-Cosío

, Herrera-López

, Arellano-Plaza

, et al. Application of dielectric spectroscopy to unravel the physiological state of microorganisms: Current state, prospects and limits. Appl Microbiol Biotechnol, 2020; 104(14):6101–6113; doi: 10.1007/s00253-020-10677-x

20.

Rashedi

, Rafiei

, Demers

, et al. Machine learning‐based model predictive controller design for cell culture processes. Biotechnol Bioeng, 2023; 120(8):2144–2159; doi: 10.1002/bit.28486

21.

Katla

, Pavan

, Mohan

, et al. Biocalorimetric monitoring of glycoengineered P. pastoris cultivation for the production of recombinant huIFNα2b: A quantitative study based on mixed feeding strategies. Biotechnol Prog, 2020; 36(3):e2971; doi: 10.1002/btpr.2971

22.

Gao

M-J

, Zheng

Z-Y

, Wu

J-R

, et al. Improvement of specific growth rate of Pichia pastoris for effective porcine interferon-α production with an on-line model-based glycerol feeding strategy. Appl Microbiol Biotechnol, 2012; 93(4):1437–1445; doi: 10.1007/s00253-011-3605-8

23.

Wong

, Chee

, Li

, et al. Recurrent neural network-based model predictive control for continuous pharmaceutical manufacturing. Mathematics, 2018; 6(11):242; doi: 10.3390/math6110242

24.

Tebbani

, Dumur

, Hafidi

, et al. Nonlinear predictive control of fed‐batch cultures of Escherichia coli. Chem Eng & Technol, 2010; 33(7):1112–1124; doi: 10.1002/ceat.201000029

25.

Cregg

, Cereghino

, Shi

, et al. Recombinant protein expression in Pichia pastoris. Mol Biotechnol, 2000; 16(1):23–52; doi: 10.1385/MB:16:1:23

26.

Cereghino

, Cregg

. Heterologous protein expression in the methylotrophic yeast Pichia pastoris. FEMS Microbiol Rev, 2000; 24(1):45–66; doi: 10.1111/j.1574-6976.2000.tb00532.x

27.

Allampalli

, Rathinavelu

, Mohan

, et al. Deployment of metabolic heat rate based soft sensor for estimation and control of specific growth rate in glycoengineered Pichia pastoris for human interferon alpha 2b production. J Biotechnol, 2022; 359:194–206; doi: 10.1016/j.jbiotec.2022.10.006

28.

Allampalli

SSP

, Sekhar

, Sivaprakasam

. Enhanced production of human interferon α2b in glycoengineered Pichia pastoris by robust control of methanol feeding and implications of various control strategies. Biochem Eng J, 2024; 201:109152; doi: 10.1016/j.bej.2023.109152