Abstract
Implementing remote, real-time spectroscopic monitoring of radiochemical processing streams in hot cell environments requires efficiency and simplicity. The success of optical spectroscopy for the quantification of species in chemical systems highly depends on representative training sets and suitable validation sets. Selecting a training set (i.e., calibration standards) to build multivariate regression models is both time- and resource-consuming using standard one-factor-at-a-time approaches. This study describes the use of experimental design to generate spectral training sets and a validation set for the quantification of sodium nitrate (0–1 M) and nitric acid (0.1–10 M) using the near-infrared water band centered at 1440 nm. Partial least squares regression models were built from training sets generated by both D- and I-optimal experimental designs and a one-factor-at-a-time approach. The prediction performance of each model was evaluated by comparing the bias and standard error of prediction for statistical significance. D- and I-optimal designs reduced the number of samples required to build regression models compared with one-factor-at-a-time while also improving performance. Models must be confirmed against a validation sample set when minimizing the number of samples in the training set. The D-optimal design performed the best when considering both performance and efficiency by improving predictive capability and reducing number of samples in the training set by 64% compared with the one-factor-at-a-time approach. The experimental design approach objectively selects calibration and validation spectral data sets based on statistical criterion to optimize performance and minimize resources.
Keywords
Introduction
Optical spectroscopy can be used for nondestructive analysis and the rapid characterization of chemical processing systems in laboratory- and industrial-scale applications. In situ spectroscopic techniques improve processing speed and efficiency, minimize waste, enhance worker safety, and offer the ability to track material inventory in real-time.1–4 Spectroscopic approaches are useful for monitoring nitric acid (HNO3) and nitrate salt concentrations in a variety of applications when system performance highly depends on HNO3 concentration (e.g., plutonium and uranium solvent extraction—the PUREX process).5,6
Near-infrared spectroscopy (NIR) is used to scan the 750–2500 nm region of the electromagnetic spectrum. Most NIR absorbance bands result from vibrational overtones and combinations of C–H, N–H, or O–H bond stretches. One advantage of the NIR region compared with the IR region is the greater separation of overtone and combination bands. Many researchers have used NIR absorption bands to study fundamental structural properties of water.7–13 Water structure is highly sensitive to changes in temperature and solute interactions, which perturb the water H-bonding network and give rise to spectral variations.14–16 Thus, aqueous species that do not absorb NIR radiation themselves but interact with water molecules can be quantified.
For example, inorganic acids do not directly absorb NIR radiation, but strong acids (e.g., HNO3) dissociate into protons (H+) and corresponding anions (i.e., NO3–), which perturb the NIR water bands.15,17 The water band is also sensitive to aqueous ionic species even when they have the same charge.8,12 Because of the multicomponent and nonselective nature of these spectral features, univariate approaches (e.g., Beer’s law) using selective absorption bands for quantification are not applicable. Multivariate analysis, or chemometrics, may instead be used to correlate covarying NIR spectral signatures to the concentration(s) of species. 18 Partial least squares regression (PLSR) is a statistical, multivariate approach relating spectral features to concentration in multicomponent systems.19,20 PLSR relates the independent (X matrix) and dependent (Y matrix) variables with linear combinations of latent variables. X represents an n × m matrix of n samples across m wavelengths (i.e., spectra), and Y represents the response matrix (i.e., concentration) for n samples. Spectrometers acquire spectra comprised of hundreds to thousands of data points per sample, which results in a system that is overdetermined. PLS reduces the dimensionality of the predictive variables and finds correlations with established success.6,14,16,19,20
To predict the concentration of an analyte from an unknown spectrum (i.e., a spectrum that is not in the training set), PLS models must be built from a representative set of calibration samples that include all spectrally active components over the range of expected concentrations (i.e., factor space).21–25 PLSR models are subject to uncertainty, which causes variance in the predictions of the model. Systematic and random errors in model parameters are often referred to as the bias-variance trade-off.21,22 The main contributor to the uncertainty in a model, if it is too complex (i.e., comprised of too many standards), is variance. If the model comprises too few sampling points and not complex enough, bias dominates the uncertainty in a regression model. Consequently, the number of calibration standards in a PLS model affects the model performance. The type of sample also plays a critical role in the prediction performance of PLSR models.
The prediction performance of PLSR models, built from training sets selected by various methods, is not well defined. There is no generally accepted approach for selecting a representative training set. Common criteria include (i) samples must contain all the expected X and Y variance in future predictions; (ii) sample concentrations must be uniformly distributed over the total range; and (iii) the set must contain enough samples to statistically define relationships between component concentrations and spectral variables. Selecting the optimal set of calibration standards is usually a subjective decision made by the analyst. One-factor-at-a-time (OFAT) approaches vary one variable at a time while the others are kept fixed. 25 This conventional approach usually requires a large number of samples and regression models have been shown to overemphasize the fitting aspect, which is less important than a model’s predictive ability. 26
Experimental design is a mathematical approach that efficiently collects information from available historical data or predetermined experiments.27,28 Various experimental designs have been used to describe the structured variation of a factor space including mixture, multilevel full factorial, central composite, and optimal.28,29 Optimal designs are often used when the number of samples is less than required by more traditional models (i.e., central composite and full factorial). 30 Optimal designs are computer-generated and increase the likelihood of determining an optimal subset of experiments (i.e., training set), based on specific criteria, with the fewest number of samples. I-optimal designs are inherently optimized to minimize prediction variance as opposed to D-optimal designs, which are optimized for parameter estimation.30,31 Experimental designs have been used to extract training sets from a given factor space for chemometric analysis.32,33 Thus, they are applicable to spectroscopic measurements taking place in harsh, restrictive, and expensive environments (e.g., a hot cell environment).
Developing methodology to determine the minimum number of samples, without sacrificing predictive capability, could be useful for quantitative measurements in hot cell environments. 33 Other methods, such as calibration transfer functions, have been tested to move calibration models from one location/instrument to another without requiring the expensive, time-consuming, and laborious task of running the entire training and validation sets under the new conditions.34,35 This is especially challenging when the instrument used for the data acquisition is not the same as the one used for the calibration model and analytical conditions (e.g., temperature and humidity) are not comparable. Thus, we hypothesize that successfully implementing calibration transfer functions in hot cell environments may be unlikely. Achieving optimal performance from the fewest number of calibration samples, which would be analyzed directly in the hot cell, may be a more suitable approach for quantitative measurements under these extreme conditions.
We assessed a spectroscopic and multivariate approach for the quantification of NaNO3 (0–1 M) and HNO3 (0.1–10 M) using the NIR absorption water band centered near 1440 nm. NaNO3 was used to represent metal ion nitrate salts and assess how well the HNO3 NIR water band can be distinguished from other nitrate species. This work evaluates the performance of PLS models built using a variety of training sets derived by OFAT, D-optimal, and I-optimal designs. These training sets were selected to minimize resource (i.e., time and material) consumption while providing a model suitable for the intended use.
Materials and Methods
Materials
All chemicals were commercially obtained (American Chemical Society–grade) and used as received unless otherwise stated. Concentrated HNO3 (70%) and NaNO3 were purchased from VWR Life Science. All solutions were prepared using deionized (DI) water with a resistivity of 18.2 MΩ·cm.
Calibration Standards
Calibration standards containing HNO3 (0.1–10 M) and sodium nitrate (0–1 M) were used to build PLSR models for the quantification of NaNO3 and HNO3. The training sets were chosen to cover the range of potential solution conditions. The OFAT calibration set was made based on 0, 5, 25, 50, and 100% of the largest HNO3 and NaNO3 concentration in the factor space (i.e., 10 M HNO3 and 1 M NaNO3). This resulted in a set of 25 calibration samples. A description of how the D- and I-optimal calibration standards were selected is provided in the following section. Each calibration sample was prepared by weighing out appropriate amounts of NaNO3 crystalline powder and pipetting appropriate volumes of concentrated HNO3 and DI water into volumetric glassware to achieve the final volume (5.00 ± 0.01 mL).
Design of Experiments
Different experimental design methodologies were used to select calibration standards based on statistics to reduce the amount required by OFAT without sacrificing predictive capability. Optimal designs were used to statistically derive sets of calibration standards within the factor space (i.e., concentration range) by estimating the parameters without personal bias and with minimum variance. Two-component (i.e., NaNO3 and HNO3) D- and I-optimal designs were built with the Design of Experiments toolkit in the Unscrambler software package by Camo Analytics (version 11.0.5.0). Optimal design points were selected using “best exchange”, which searches the entire design space using both point and coordinate exchange and a quadratic process order. These designs comprised six required model points and 10 lack-of-fit points. Optimal designs can be augmented with lack-of-fit points which are chosen based on their ability to maximize the minimum distance between runs while conserving optimality and improving precision. The 10 lack-of-fit points from an additional D-optimal design were used as the validation set to test the predictive capability of each PLSR model. I- and D-optimal designs were evaluated using the fraction of design space. 36
Absorbance Spectroscopy
A StellarNet Dwarf Star spectrometer with an optical resolution of 2.5 nm and the SL5 Tungsten halogen + deuterium lamp (UV and visible range from 190 to 2500 nm) were used to collect the NIR spectra. NIR spectra included absorbance measurements recorded every 2 nm from 860 to 1760 nm. Each spectrum was an average of five scans and spectra were recorded in triplicate for each sample. The spectrometer was referenced to pure water before collecting each spectrum unless otherwise stated. Obtaining reference spectra in air is challenging because of air bubbles and is prone to contamination of the glass windows by precipitates that form upon drying. Thus, obtaining reference spectra of water (blank in water) is better for the remote processing applications.
A cuvette, purchased from Starna Cells, Inc. (584.4-Q-1) with a 1 mm path length, was used for each measurement to ensure consistent optical quality. The Starna cuvette was thoroughly rinsed with DI water, stored containing DI water on lint-free Kimwipes, and always handled using dust-free latex gloves. The cuvette’s Z height of 8.5 mm was necessary to accommodate Quantum Northwest’s qpod 2e temperature-controlled sample compartment holder purchased from Avantes (CUV UV–Vis TC). Two quantum cascade laser–UV collimating lenses were placed on opposite sides of the sample compartment. NIR measurements were performed at a constant temperature (21.5 ± 0.05 ℃). Reference and sample solutions were thermally equilibrated in the cuvette’s 21.5 ℃ temperature-controlled environment for 1 min before recording the spectrum. No spectral variations due to temperature fluctuation were observed after this time frame. A syringe was used to inject the rinse, reference, and sample solutions. To reduce the effect of lamp and detector fluctuations on spectral signatures, reference spectra were collected between each sample measurement.
Multivariate Data Analysis
Principal component analysis (PCA) 37 and PLSR analysis were performed using the Unscrambler X (v.10.4) software package from CAMO Software AS. PLSR and PCA models were built from spectra collected on stationary samples. Spectra were mean-normalized (divided each column by their mean values) before PCA and PLS analyses to assign a comparable absorption coefficient to each species and equalize the influence of each wavelength. PLSR models were optimized independently by minimizing the root mean square error of the calibration (RMSEC) and RMSE of the cross-validation (RMSECV). A full cross-validation (CV) was performed by randomly taking one sample at a time from the calibration set until every sample was left out once and recalibrating sub-models on the remaining data points. The residuals from each sub-model were combined to compute the CV residual variance, which is an estimate of the residuals (i.e., uncertainty) in the predictions (RMSE of the prediction, or RMSEP). The performance of each regression model will be discussed in the Results and Discussion section.
The PLSR models were built from different training sets selected by D- and I-optimal designs and the OFAT approach. The selected samples are listed in Tables S1 to S3 (Supplemental Material). The PLSR models were also built from D- and I-optimal lack-of-fit points included by random order. Models were validated against a validation set comprised of lack-of-fit points from a D-optimal design (see Table S4).
The NIR spectra were preprocessed using a variety of transformations to make the signatures more suitable for chemometric analysis. NIR spectra are commonly preprocessed using a variety of transformations, including smoothing, mean centering, standard normal variate analysis autoscaling, and derivatives, to better discern and interpret the spectra. 38 For this data set, a first-order Savitzky–Golay smoothing algorithm with an 11-point window size was applied. After smoothing the spectra, a second-order Savitzky–Golay algorithm with an 11-point window size was also used to compute the first derivative of each spectrum to correct for baseline offsets. This algorithm takes the derivative of a polynomial fitted by a least squares linear regression around five adjacent variables. All transformations were calculated using the Unscrambler. The wavelength range of 1280 to 1650 nm was used for each calibration model.
Model Comparison
Proper validation is important to test the dependence of the model on “unknown” samples and evaluate the predictive power of the regression models. RMSEs for the calibration, validation, and prediction were calculated using Eq. 1
The Tukey–Kramer method was used for the pairwise comparison of RMSEPs of each PLSR model assuming the null hypothesis H0: µi = µj. The RMSEP was separated into bias and standard error of prediction (SEP) using Eqs. 3 and 4, respectively, and compared using a 95% t confidence interval following a method outlined previously18,41,42
At the level α = 0.05, a type I error is made 5% of the time (i.e., H0 is falsely rejected). However, the critical t value must be adjusted when making multiple comparisons to account for α inflation and avoid significant results that may not be accurate statistical findings. Thus, the critical t value was adjusted using the Tukey–Kramer method to avoid making a type I error.
43
Prediction biases between two models were compared at the 95% confidence interval, which was calculated using Eq. 5. The value se represents the standards error of the estimated difference where di is the difference in error between models being compared and
Equations 7 and 8 were used to evaluate the 95% t confidence interval for the model SEPs. The value r represents the correlation coefficient between each ei for any two models being compared. The confidence interval was calculated using Eq. 9. If the confidence interval of the SEP ratios corresponding to two models contained the integer one, they were not considered statistically different. The overall prediction performance of two models was considered statistically similar if the bias confidence interval contained 0 (Eq. 5) and the SEP ratio contained one (Eq. 9).
The order by which the models were compared was carried out as follows: (i) the mean of all predicted
Results and Discussion
Near-Infrared Water Bands
Prominent water absorption bands occur at 760, 970, 1190, 1450, and 1940 nm and a weaker band occurs at 845 nm.9,10 Each band has a unique extinction coefficient and spectral features that depend on temperature and ionic strength. The first overtone of water is in the 1300–1600 nm region. It is derived from the main O–H tension band in the middle IR (2700–3200 nm) and can be used to investigate H-bonding networks and polymerization in aqueous systems.11–14 Additional NIR regions (e.g., 1100–1300 nm and 1800–2100 nm) have also been used to study water structure.9,15–16 The intense NIR water bands centered near 1440 and 1920 nm are unusable with a standard path length (i.e., 1 cm) because of complete absorption of NIR radiation. The resulting spectra are noisy because of subtracting two spectra with large absorbances. However, these absorption bands are measurable using a smaller path length cuvette (e.g., ∼1 mm).
NIR spectra (900–1650 nm) corresponding to pure water, 1 M HNO3, and 1 M NaNO3 are shown in Fig. 1. The absorption bands at 970 and 1190 nm are evident but have low signal intensity with a 1 mm path length. The band centered near 1440 nm, assigned to the combination of symmetric and antisymmetric O–H stretching modes (first overtone), dominates the spectrum.14,15 Changes in this water band are evident with the addition of 1 M NaNO3 and 1 M HNO3 to the solution (see Fig. 1a) and the unique spectral characteristics indicate dissimilar interaction(s) between the dissociated species and water. When the spectrometer is referenced to water, changes in spectral intensity due to changes in water structure are visualized more clearly (see Fig. 1b). 1 M HNO3 decreases the net absorbance near the water band center (results in negative absorbance values) and increases the absorbance above 1538 nm. Relative to water, the 1 M NaNO3 spectrum increases the absorbance at 1417 nm and decreases the absorbance in the rest of the spectrum.
NIR absorbance spectrum of DI water, 1 M HNO3, and 1 M NaNO3 in a cuvette (a) blanked in air and (b) blanked in water using a 1 mm path length. No absorption bands were observed in the 400–900 nm visible–NIR region using a 1 mm path length.
When the concentrations of NaNO3 and HNO3 were varied while maintaining constant total nitrate concentration, an isosbestic point appeared near 1470 nm. This is evident in a binary mixture diagram of these species in water (see Fig. S1, Supplemental Material). This point may shift in wavelength position depending on interactions with the solute and the type of inorganic salt. 12 At least two distinct species are present in the system, which distinctly affect water structure and give rise to unique spectral characteristics (i.e., Na+ and H3O+). The isosbestic point is consistent with isosbestic points noted by others,8,9,12 who explain that it exists due to a shift in the concentration-dependent equilibrium between bonded and nonbonded OH valences. In this NaNO3/HNO3 system, hydrogen ions are “order-producing” since they can incorporate into “ice-like” water clusters. 9 The structure-maker behavior of hydrogen ions increases the absorption in the range between 1450 and 1650 nm for HNO3, which indicates a shift to a higher amount of hydrogen-bonded water molecules (i.e., OH valences).12,13 Both NO3– and Na+ ions are order-destroying ions. Thus, a different spectral response is expected in the 1450–1650 nm region for NaNO3 and may be primarily attributed to Na+ ions, which decrease the amount of hydrogen-bonded OH valences.
This behavior is comparable to perturbations of the water band resulting from temperature fluctuations. With changing temperature, an isosbestic point of pure water occurs in the region between 1440 and 1446 nm.7,13 This spectral response is understood as an increase in the fraction of weakly bonded water molecules (left shoulder) and a decrease in the fraction of strongly bonded water molecules (right shoulder) resulting in an overall blueshift to shorter wavelengths (i.e., higher energy). This interpretation is supported by a two-state mixture model in which one component converts to another as a function of temperature.13,16
Principal Component Analysis
Principal component analysis is useful for compressing data by recognizing patterns including outliers, trends, and groups. 35 It takes information from large data tables containing the original variables and projects them onto a smaller number of latent variables called principal components (PCs). Information is attributed to variables with systematic variation (e.g., spectral peaks) and variables with little or no variation are considered “noise” (e.g., baselines). PCs are computed iteratively and contain only a certain portion of the total information needed to describe the entire spectrum. Subsequent PCs contain less information than the previous one and can be used to determine the difference sources of variance in the spectral data set. 13
Scores and loadings are interrelated and correctly interpreted together. Line X-loading plots are useful for detecting important variables in spectra and scores represent trends in the data. Figure 2 shows loadings and scores the first two PCs of a PCA analysis of spectra corresponding to solutions containing varying HNO3 (0.1–10 M) concentrations in the spectral region 1300 to 1650 nm. The spectrometer was referenced to water for these measurements, but the scores and loadings are identical to spectra that were referenced to air (data not shown here). The first and second PCs (PC-1 and PC-2) account for 99.97% of the total variance. Since these PCs capture such a large portion of the total variance, additional PCs comprise primarily noise. PC-1 accounts for 98.25% of the variance, which implies that PC-1 accounts for most of the spectral changes with a peak at 1403 and 1457 nm and a broad negative component from 1550 to 1650 nm. The score vector of PC-1 is essentially a straight line (Fig. 2a), which indicates that the spectral changes described by this component occur in a nearly constant fashion. PC-2 accounts for another 1.74% of the total variance and corresponds to a unique loading that has a peak at 1428 nm and a broad shoulder from 1500 to 1650 nm. The score vector has a parabolic shape (see Fig. 2b), which indicates a spectral turning point near 4–5 M HNO3. The dissociation degree of HNO3 changes significantly above ∼4 M HNO3 when it no longer fully dissociates to H+ and NO3–, but associated HNO3 forms two strong hydrogen bonds with water molecules.
44
This nonlinear dissociation behavior of HNO3 may be accounted for in part by PC-2 in the PCA model, which does not progress linearly with concentration and accounts for the nonlinear NIR spectral response. These scores and loadings are like those derived by a PLSR analysis of the same data set (data not shown here).
(a) Scores and (b) loadings of PCs 1 and 2 with varying HNO3 concentration (M).
The partial dissociation of HNO3 increases in a relatively linear fashion from ∼1 to ∼8.5 M HNO3. 17 Thus, we would expect a scores plot that corresponds to associated HNO3 molecules to increase nearly linearly with concentration (not parabolically). Additionally, NIR absorption arises from changes in the permanent dipole of water molecules, and these changes in dipole moments arise primarily because of the presence of ions (i.e., H+ and NO3–). The effect of associated HNO3 molecules on the NIR spectral response is expected to be minimal and may not be the species described by PC-2. Additional studies on this subject are needed to clarify the exact behavior described by PC-2.
Model Optimization
Each two-component training set spanned the same experimental range, with HNO3 and NaNO3 concentration ranging from 0.1 to 10 M and 0 to 1 M, respectively. The wavelength range that comprised the predictor matrix X in each PLS model was 1280–1650 nm. The OFAT design contained the most samples (25), whereas the D- and I-optimal designs comprised either six (required model points; abbreviated D-R and I-R) or 16 samples (required +10 lack-of-fit points; abbreviated D and I). Each PLSR model was optimized independently by minimizing the RMSECV and RMSEP. Preprocessing the data with a first-order Savitzky–Golay smoothing over an 11-point window and applying a second-order Savitzky–Golay first-derivative with an 11 smoothing points, improved each model by minimizing both cross-validation (CV) errors and the RMSEP.
Selecting the correct number of factors is critical for PLS model performance. To determine the number of factors to include, the residual variances of each response variable (Y) were plotted as a function of factors in Fig. 3. The RMSE values plotted in this figure correspond to models built using training sets selected by OFAT (25 samples), D-R (six samples), and I-R designs (six samples). The optimal number of factors was selected by determining the points at which RMSECV reached a minimum. This point consistently occurred at five factors for each model representing the two-component systems (including the D- and I-optimal designs comprising the required +10 lack-of-fit points; data not shown here).
RMSEC and RMSECV versus the number of factors selected for PLSR selected models with respect to HNO3 and NaNO3 from (a) OFAT, (b) D-optimal required, and (c) I-optimal required designs.
Requiring more factors than the number of species (Na+, H+, and NO3–) in this system was anticipated since intermolecular interaction(s) exist between the dissociated ionic species and water molecules. Even at one factor, the RMSEC and RMSECV of the OFAT model did not deviate from one another despite the removal of different calibration standards in the CV. This implies that the calibration set contained redundant information (i.e., more samples than necessary). The RMSEC and RMSECV for the PLS models built from calibration sets selected by D- and I-optimal experimental designs deviated, which implies that these data sets did not contain redundant information.
Calibration and Prediction Performance
Summary of PLSR model calibration and validation metrics for data sets derived from OFAT, D-optimal, and I-optimal approaches.
Savitzky–Golay smooth.
Savitzky–Golay first-derivative.
R2 of the calibration.
The calibration and CV statistics of HNO3 and NaNO3 with respect to the D-R and I-R designs had much lower values than the models built from designs comprised of more samples. The calibration and CV values were roughly an order of magnitude less than the average of the same values reported for D, I, and OFAT designs. If only the calibration statistics were considered, one may conclude that these models performed better than the models with additional samples. However, PLSR models are considered acceptable when the error values between RMSEC and RMSECV are like the RMSEP values. The D-R and I-R RMSEPs for HNO3 and NaNO3 were much larger than the RMSEC and RMSECV values, which indicate that these models were not performing optimally. These designs did not have enough samples in the training sets to contribute sufficient variance to the PLSR model. Therefore, the biases were also larger than the D- and I-optimal models comprising 16 total samples. The bias did not seem to account for the appropriate amount of uncertainty in the model.21,22 The large difference between error statistics and RMSEPs highlights the risk of relying solely on error statistics for model optimization, particularly when developing calibration sets with few samples, and the need for properly validating the regression model(s). 45
The OFAT regression model had the largest RMSEP for sodium nitrate, in comparison with the D- and I-optimal designs with fewer samples (16), despite having the smallest RMSECV. This indicates that even with many samples, RMSECV should not always be considered an accurate estimate of the prediction variance. Including more data points in a training set does not always result in a better regression, and redundant information in a calibration model may impede prediction performance. It could also suggest that the evenly spaced concentration matrix, which does not include concentrations at odd concentrations, may not account for the spectral variation or structure of the validation set that better represents the spectral variability of the factor space.
Statistical Comparison
The prediction performance of the optimized PLS models was compared pairwise to find statistical differences in RMSEP separately as bias and SEP (see Figs. 4 and 5). If the confidence interval for bias contained zero, the models produced similar results. When the confidence interval for SEP contained one, the estimates were considered statistically similar. If either the bias or the SEP intervals did not meet these criteria, the models were considered significantly different. The order in which the five designs were compared is listed in Tables S5 and S6 for HNO3 and NaNO3, respectively.
Confidence intervals for (a) bias and (b) standard deviation (SD; SEP) for the 10 possible comparisons (HNO3). If the confidence interval crosses the vertical solid line for both bias and SEP, the designs are statistically similar. Confidence intervals for (a) bias and (b) standard deviation (SD; SEP) for the 10 possible comparisons (NaNO3). If the confidence interval crosses the vertical solid line for both bias and SEP, the designs are statistically similar.

The confidence intervals for bias and SEP differences for HNO3 comparisons are shown in Fig. 4. OFAT (25 samples) and the D-optimal design comprising 16 samples were statistically similar in terms of predictive capability for HNO3 concentration. The predictive capability of the D-optimal design was significantly different from the other three designs. The D-R model (six samples) performed statistically similar to the OFAT model but was more biased than the D-optimal model. With respect to HNO3, validation statistics revealed that the prediction performance of the D-R and I-R PLS models was inferior to the optimal designs containing additional lack-of-fit points. The I-R design had the largest bias and SEP. However, these designs contained fewer samples and more efficiently used resources. This criterion has significant importance for the intended application. The resulting rank order of HNO3 prediction performance is: D > OFAT = D-R > I > I-R.
The confidence intervals for bias and SEP differences for NaNO3 are shown in Fig. 5. D-R (six samples) and the D-optimal design (16 samples) were statistically similar for NaNO3 concentration. The PLS models built from these training sets performed significantly better statistically than the other three designs. I-R design predictions had a higher RMSEP prediction error for both HNO3 and NaNO3 than every other model. The I-R design also had a larger prediction NaNO3 bias than any other design. The resulting rank order of NaNO3 prediction performance is: D = D-R > I > OFAT > I-R.
Optimal designs are iterative and algorithmically optimize a numerical criterion (e.g., D- and I-criterion). D- and I-criterion are common criteria relating to the variance of factor effects or the precision of predictions, respectively. 30 I-optimal designs (also called Integrated Variance) pick points that minimize the integral of the prediction variance across the design space. Even though the I-optimal design is designed to optimize the design space so that the average, scaled prediction variance is minimized, it did not perform as well as the D-optimal design, which is designed to estimate the effects of the factors by maximizing the determinant of the information matrix X’X of the design. The prediction performance differed significantly because of the variability of the design structure. This result is contrary to previous findings, which found that PLS models built from an I-optimal design performed best. 32 The D-optimal spectral design approach was as robust as the OFAT approach with respect to HNO3 and more robust than OFAT with respect to NaNO3 predictions despite using 76% fewer calibration samples. This highlighted the significant reduction of raw materials and resources required to build a calibration set.
The D-R and D-optimal models produced statistically similar validation results with respect to sodium nitrate. The D-optimal design performed better overall since it was statistically less biased than the D-R design with respect to HNO3. However, in balancing efficiency with performance, one could argue that since the D-R set contained 62.5% fewer samples, that model would ensure the efficient use of resources while providing a model with suitable prediction performance (RMSEP). However, the estimated error in the predictions for the D-R model would be underestimated since the RMSECV is much lower than the RMSEP. Additional design points could be added to achieve the appropriate performance.
Several considerations for selecting an experimental design include the intended use of the model, the variability encountered during prediction, and the efficient use of time and raw materials. For example, if the intended use is simply to guide operations at a minimum cost, then fully vetted models are not necessary and models with the fewest number of samples, and a comparable RMSEP would be desirable. Under these criteria, the D-R calibration model would likely be the superior candidate. When a robust model performance and error statistics are essential, the RMSECV needs to be comparable to the RMSEP and additional samples (i.e., lack-of-fit points) should be included in the D-R model. To determine whether including 10 lack-of-fit points was redundant, the number of design points was varied systematically.
RMSE Lack-of-Fit Point Comparison
The PLSR models were generated from 30 D-optimal design-based calibration sets, each containing a varying number of lack-of-fit points, to determine the optimal number to include in the training set. A design-balancing efficiency and performance would minimize resource consumption and achieve the appropriate level of deviation in predicted concentrations. Lack-of-fit points are included in experimental designs to increase the fraction of design space and improve the precision of a model. 36 Herein, they were included to improve the variance in PLSR models. As a rule of thumb, PLS models perform well when the RMSEP is ≤10% of the midway point between the highest and lowest concentration in the concentration matrix. By this metric, an RMSEP of 0.5 for HNO3 and 0.05 for NaNO3 would be considered reasonable (RMSEP% ∼10).
To determine how many lack-of-fit points needed to be incorporated in the PLS model to account for the discrepancy between RMSECV and RMSEP of the D-R model, lack-of-fit points were randomly ordered three times and included in the regression model one at a time (see Table S7). RMSEC, RMSECV, and RMSEP were calculated for each model and one standard deviation was reported between the three models calculated with the same number of lack-of-fit points (Fig. 6). Including lack-of-fit points in the training set used to derive the PLS models increased the RMSEC and RMSECV of the PLS models such that these values were comparable to the RMSEP. The RMSEP of HNO3 is nearly double the RMSEC and RMSECV. However, this value is comparable to the model error statistics and can be considered robust since the RMSEP is far less than a RMSEP% of 10. The RMSEP for NaNO3 is nearly identical to the RMSEC and RMSECV values with three or more lack-of-fit points included in the model. With the addition of more than three lack-of-fit points, the model’s RMSECV did not improve significantly.
RMSEC, RMSECV, and RMSEP for (a) HNO3 and (b) NaNO3 versus of lack-of-fit points included in PLR regression models. Error bars represent one standard deviation of three data points.
The RMSECV did not improve significantly with additional lack-of-fit points and provides another indication that including more samples in the training set will not necessarily improve the PLS models. Inclusion of irrelevant variables often weakens the model performance; even though it may be less biased and adapt better to training data, the model often contains more variance. The three lack-of-fit points samples that appear best to include in the D-R model are shown in Table S8. This training set, comprising nine samples, would achieve the desired level of prediction performance and correctly approximates the error in the predicted concentrations (RMSEP of 0.122 for HNO3 and 0.038 for NaNO3). The model prediction performance of this design is statistically equivalent for HNO3 and NaNO3 to the D-optimal training set comprising 10 lack-of-fit points (data not shown here) but comprises 44% fewer samples. Comparison plots of the predicted and reference values for HNO3 and NaNO3 are shown in Figs. S2 and S3 respectively to show how the model performed at high, low, and mid-point ranges of the factor space.
Future Studies and Applications
The nuclear industry has used analytical approaches including inductively coupled plasma mass spectrometry, thermal ionization mass spectrometry, and acid/base titrations for decades. Although these techniques are highly accurate and selective, they are resource-intensive, relatively time-consuming (i.e., analysis requires hours to days), and/or error is often associated with large dilutions of 1000 to 10 000 times. They also require withdrawing individual samples and transferring them to different locations for analysis. To overcome the time delay from between grab sample submission and the reporting of the results, remote optical spectroscopic measurements can be made in-line using high-transmission optical fibers. Applying optical spectroscopy for process control in nuclear technology has received much attention in recent decades, but much work is still required.1,4 Research focused specifically on addressing the difficulty in transferring data collected in a glove box to the hot cell would be valuable. Future studies will examine the performance of models built from an optimized training set that is analyzed directly in a hot cell to one that is derived by means of a calibration transfer function.34,35 This work could also include expanding the training set to include other metal nitrate species (e.g., uranyl nitrate) and fluctuations in temperature. A higher order model (e.g., cubic) may be necessary to approximate the true response surface of training sets with a larger number of factors.
Conclusion
Results of this study indicate that optical NIR absorptivity spectra of the water band centered at 1440 nm and PLSR analysis may be used for quantitative measurements of HNO3 and nitrate salts over a range of concentrations relevant to critical processing stages within the nuclear fuel cycle. PCA of NIR spectra identified subtle spectral feature that accounted for the nonlinear spectral response due to the incomplete dissociation of nitrate acid to H+ and NO3– at concentrations >1 M HNO3. This work also demonstrated the utility of experimental design for generating training sets that minimize time and materials while simultaneously maintaining prediction performance. For this system, designed experiments more efficiently select correlation(s) between two factors (i.e., analytes) on a response (i.e., spectrum) than OFAT experiments. When minimizing the number of samples in a PLSR model, it is important not to solely rely on RMSECV statistics to determine model performance. Considering error statistics alone may result in models with excellent calibration statistics but inadequate prediction performance. Proper validation is also essential when optimizing models using a reduced number of calibration samples.
Future work will test this approach on a system that includes additional factors (e.g., other metal cation nitrates and variable temperature) relevant to radiochemical processing applications. Calibration models requiring less time and resources than OFAT approaches, while adequately capturing the structured variation of a data set, could have a tremendous effect on laboratory- and industrial-scale applications. These results are promising for applications in harsh and restrictive work environments such as radiochemical hot cells, where calibration transfer functions may be difficult to implement and minimizing resource consumption is crucial.
Supplemental Material
sj-pdf-1-asp-10.1177_0003702820987281 - Supplemental material for Chemometrics and Experimental Design for the Quantification of Nitrate Salts in Nitric Acid: Near-Infrared Spectroscopy Absorption Analysis
Supplemental material, sj-pdf-1-asp-10.1177_0003702820987281 for Chemometrics and Experimental Design for the Quantification of Nitrate Salts in Nitric Acid: Near-Infrared Spectroscopy Absorption Analysis by Luke R. Sadergaski, Gretchen K. Toney, Laetitia H. Delmau and Kristian G. Myhre in Applied Spectroscopy
Footnotes
Disclaimer
This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (
).
Acknowledgments
The work performed was supported by the 238Pu Supply Program at the US Department of Energy’s Oak Ridge National Laboratory.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding for this program was provided by the Science Mission Directorate of the National Aeronautics and Space Administration and administered by the US Department of Energy, Office of Nuclear Energy, under contract DEAC05-00OR22725. This work used resources at the High Flux Isotope Reactor, a Department of Energy Office of Science User Facility operated by Oak Ridge National Laboratory.
Supplemental Material
All supplemental material mentioned in the text is available in the online version of the journal.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
