Abstract
Lithium-ion battery-based energy storage systems have been widely utilized in many applications such as transportation electrification and smart grids. As a key health status indicator, battery performance would highly rely on its capacity, which is easily influenced by various electrode formulation parameters within a battery. Due to the strongly coupled electrical, chemical, thermal dynamics, predicting battery capacity, and analysing the local effects of interested parameters within battery is significantly important but challenging. This article proposes an effective data-driven method to achieve effective battery capacity prediction, as well as local effects analysis. The solution is derived by using generalized additive models (GAM) with different interaction terms. Comparison study illustrate that the proposed GAM-based solution is capable of not only performing satisfactory battery capacity predictions but also quantifying the local effects of five important battery electrode formulation parameters as well as their interaction terms. Due to data-driven nature and explainability, the proposed method could benefit battery capacity prediction in an efficient manner and facilitate battery control for many other energy storage system applications.
Keywords
Introduction
Lithium-ion (Li-ion) battery has been widely utilized as a popular energy storage system for many clean applications such as transportation electrifications (Su et al., 2011) and smart grids (e.g. virtual power plant, microgrids) due to their merits such as high energy density and low self-discharge rates (Li et al., 2020; Wang et al., 2020). As a key health state indicator that influences battery performance, the capacity of battery would be affected by numerous components within a battery. Therefore, to well monitor Li-ion battery health state for lifecycle economic analysis and wider energy storage applications (Chen et al., 2018; Su and Wang, 2012), it is vital to perform effective battery capacity predictions and analyse the local effects of important battery components of interest.
However, as a highly-complex energy storage technique, lots of electrical, chemical, and mechanical behaviours would happen during the operation of batteries, which could present over 100 parameters in total (Su et al., 2017). The strong-coupled interdependency and highly nonlinear dynamics make the local effects of different cell components on the battery capacity become challenging to be captured. Currently, the main utilized solutions to understand the effects of battery component parameters particular for electrode formulation on the battery performance are still mainly based on trial-and-error methods (Kwade et al., 2018), which are laborious and significantly time-consuming. In light of this, deriving a suitable data analysis solution that could directly quantify the local effects of battery components of interest and accurately capture battery capacity information at the early-prediction stage become crucial in battery-based energy storage applications.
With the rapid developments of control theory, system simulation, and artificial intelligence, data analysis solution have been utilized as a powerful tool in battery management domain (Hu et al., 2019; Li et al., 2019; Tang et al., 2021). A good deal of data driven-based strategies have been developed to estimate battery dynamics (Feng et al., 2020; Meng et al., 2018, 2021), forecast battery degradation in both cycling (Li et al., 2019; Tang et al., 2019, 2020) or calendar modes (Hu et al., 2020), diagnose different faults of batteries (Ouyang et al., 2019; Yang et al., 2018), balance battery cells within a pack (Feng et al., 2020; Liu et al., 2020), achieve reliable charging control (Ban et al., 2021), and energy management (Li et al., 2019; Liu et al., 2018). In a word, more efficient and smarter battery management can be achieved after designing suitable data-driven strategies, by which the system-level applications by adopting machine learning methods could also benefit (Chen and Su, 2018; Chen et al., 2021; Ye et al., 2021). However, these strategies mainly focus on the improvement of battery final macroscopic behaviours, with little attention to the component formulation parameters within a battery. In fact, fewer works have been done so far by designing proper data analysis solution to explore underlying mapping among different battery component formulation parameters. For instance, based on the cross-industry standard process (CRISP), a neural network-based solution and a decision tree-based solution are proposed in Schnell et al. (2019) and Turetskyy et al. (2020) to explore dependencies of battery component and forecast battery maximal capacity, respectively. In Kornas et al. (2019), through developing the multi-variate process capability indices, a data-driven approach is proposed for evaluating battery production data and achieve the quality assurance of relevant process. After selecting four key parameters from electrode mixing and coating stages, the Gaussian process regression-based as well as the random forest-based data analysis solutions are designed in (Liu et al., 2021a, 2021b) to classify electrode mass loading and perform sensitivity analyses of these parameters. It should be known that in real battery applications, these component parameters also play a more pivotal role in affecting battery performance including capacity, it is thus meaningful to derive suitable data analysis solution to understand the local effects of this type of battery parameters.
Based upon the aforementioned considerations, this paper proposes an effective data analysis solution to predict battery capacity, where the local effects of electrode formulation parameters of interest are also quantified and analysed. Some contributions can be made as follows: (1) after identifying five key electrode formulation parameters for Li-ion battery, a generalized additive model (GAM)-based solution is proposed to predict battery capacity. (2) based upon the explainability of GAM, the local effects of these five parameters and their interaction terms are captured and quantified. (3) The performance of GAM-based data analysis solution is verified based on three GAM cases with different number of interaction terms. Obviously, the derived data analysis solution in this study could not only perform satisfactory battery capacity predictions but also successfully quantify the local effects of five important battery electrode formulation parameters as well as their interaction terms, paving a promising way to understand battery components and obtain battery capacity information at the early-prediction cases.
The rest of this paper is organized as follows: Section 2 describes the battery electrode component formulation and related capacity. Section 3 details the fundamental of GAM, the workflow of designing GAM with explainability to analyse local effects, and the corresponding performance metrics to evaluate capacity prediction performance. Section 4 gives the results and discussions of capacity prediction and local effects analysis. Finally, Section 5 concludes the present work.
Battery electrode component formulation and related capacity
As a popular energy storage system in many applications, Li-ion battery generally consists of electrolyte, electrode of both anode and cathode. The component formulation of battery electrode plays a vital role in determining the properties of battery product such as its capacity and service life, further affecting the relevant behaviour of battery-based energy storage applications. In this context, battery electrode must be well designed and analysed to ensure an affordable battery with high-quality and good performance.
To ensure the effectiveness of battery, some components are usually required within a battery electrode, as illustrated in Figure 1. Specifically, these include the active material component, electrode additive component, as well as polymeric binder component. For real energy storage applications, LiFePO4 (LFP) is one of most popular active material of battery electrode due to the fact that LFP presents many advantages such as non toxic and strong-adaptable to extreme cases such as high temperature as well as large current. Apart from the component of electrode LFP, electrode additive is another key component within Li-ion battery. In general, the conductive fillers including carbon black and carbon nanofiber are required to increase battery electrode’s intrinsic electronic conductivity. Besides, the polymeric binder is also needed to increase the mechanical cohesion of a battery. In main applications of battery-based energy storage, three different kinds of polymeric binders including the polyvinylidene-fluoride (PVDF), the polyethylene co-ethyl-acrylate co-maleic-anhydride (TPE), as well as the hydrogenated nitrile-butadiene-rubber (HNBR) are generally utilized due to their superiority of exceptional chemical stability and effective binding property. On the other hand, as a key health state indicator, battery capacity could be affected by the thermodynamics of battery electrode and must be well monitored or predicted for real battery-based applications.

A schematic to reflect key electrode component parameters to formulate Li-ion battery cell for battery-based applications.
In this context, for real battery-based energy storage applications, all these battery component formulation parameters could significantly influence battery electrode behaviours such as electronic conductivity, further affecting battery capacity. A reliable data analysis solution that could predict battery capacity and analyse the local effects of these component formulation parameters on the related battery capacities is urgently required. To achieve this, an effective data analysis framework with the explainability based on the GAM is proposed in this study to not only achieve early-stage prediction of battery capacity and analyse the local effects of interested battery electrode component formulation parameters on its capacity. Without the loss of generality, well-designed battery dataset (Rynne et al., 2019) from Hawaii Natural Energy Institute Franco is adopted. More introductions of experiments to generate this type of dataset and explain this dataset can be found in (Rynne et al., 2019) for the readers of interest. To be specific, five battery electrode components of interest include the LFP-based active material, C65 grade-based carbon black, 100 nm diameter-based CNF, as well as the binders with three different types of PVDF, TPE and HNBR. Here the detailed formulation weights of these battery electrode components are: 75% to 95% for LFP, 0% to 20% for C65, 0% to 10% for CNF, 3% to 20% for Binder. The coulomb counting method with C/25 discharging current rate is utilized to obtain battery capacity with a unit of
Methodology
In this section, the fundamental of generalized additive model is first described, followed by describing the derived generalized additive model-based framework to predict battery capacity and analyse the local effects of battery electrode component parameters of interest. Moreover, some performance metrics are also introduced to investigate the battery capacity prediction performance of derived model.
Generalized additive model
As a none-parametric extension of generalized linear model, generalized additive model (GAM) belongs to a flexible regression model with the ability to achieve reasonable nonlinear fitting (Hastie and Tibshirani, 2017). Through using univariate as well as bivariate shape functions of each predictor, interpretative parameters with nonlinear behaviours could be captured before adding to a GAM for revealing their complicated nonlinear underlying mapping. Supposing the response outputs
where
After from each solo parameter term, it should be known that the interaction terms derived from interested parameter could also affect the prediction. To consider the effects of these interaction terms, the bivariate shape functions that represent interaction effect could be added into the GAM as
where
GAM-based workflow
Based upon the superiority of straightforward structure and explainability of GAM technique, to well predict battery capacity as well as analyse local effects of battery electrode formulation parameters of interest, a GAM structure shown in Figure 2 is adopted, while the detailed workflow to derive GAM-based model for battery capacity prediction and parameter local effect analysis is shown in Table 1.
Detailed workflow to design GAMs for battery capacity prediction and parameter local effects analysis.

GAM-based regression structure for battery capacity prediction and parameter local effects analysis.
Following this workflow, the local effects of solo battery electrode formulation parameters including LFP, C65, CNF, BinderType and their interaction terms could be obtained to reflect their influence on predicting battery capacity. It should be known that for different query points, the quantified local effects would present different values. The parameter with higher ranking value implies that this parameter would cause larger local effect on battery capacity prediction.
Performance metrics
To systematically investigate the prediction performance of designed various GAMs for battery capacity prediction, these three typical performance metrics are adopted in this study.
1. Mean absolute error (MAE): Let
2. Root mean square error (RMSE): as another classical performance metric, RMSE is generally utilized to reflect the deviations between predicted battery capacities and real test values as
3. R-Squared: let
For real predictions, when prediction results get close to the real test values, both MAE and RMSE could get close to 0, while R-Squared would be close to 1.
Results and discussions
In this section, to investigate performance of derived GAMs with different interaction terms, the prediction results and discussions are first given for battery capacity predictions. Then the case studies of using these GAMS to perform local effects analysis of battery electrode component parameters of interest are also presented and discussed in this section.
Battery capacity predictions
In this subsection, the case study of using derived GAMs to predict battery capacity is carried out and analysed. As illustrated in Figure 2, for model development, five battery electrode parameters including LFP, C65, CNF, Binder as well as BinderType are utilized as the inputs to three GAMs with different interaction terms (none interaction terms, 3 interaction terms, and 5 interaction terms), while related battery capacity is selected as the GAMs’ output. Without the loss of generality, six-folds cross validation is carried out to evaluate the battery capacity prediction performance of all GAMs. The predicted versus actual plots by using these three GAMs are also presented.
Figure 3 shows battery capacity predictions of using all three GAMs with various interaction terms, while Table 2 presents their related prediction performance metrics, respectively. From Figure 3, it can be observed that most predicted capacity points could well match the real test observations for all these tree GAMs. Quantitatively, GAM with none interaction terms achieves the worst prediction results with 1.61 mAh/g MAE and 1.96 mAh/g RMSE. This implies that there exists interaction effects of electrode formulation parameters on battery capacity predictions. However, the R-squared value of none interaction terms case is also acceptable with 0.96. In comparison, GAM with 3 interaction terms achieves the best prediction results with 1.10 mAh/g MAE and 1.39 mAh/g RMSE, which are 31.7% and 29.1% better than those from GAM case of none interaction terms. Interestingly, the battery capacity prediction results would become worse when the number of interaction terms becomes 5, whose MAE (1.18 mAh/g, 7.2% increase) and RMSE (1.55 mAh/g, 11.5% increase) are a bit larger than those of 3 interaction term case. This is mainly due to the overfitting issue caused by more added interaction terms. However, the R-squared values of all these two GAMs with interaction terms are larger than 0.98, implying that GAM with interaction terms is capable of providing satisfactory prediction performance of battery capacity. In summary, after inputting these five battery electrode formulation into the GAMs, reliable accuracy could be achieved for battery capacity predictions.
Performance metrics for battery cell capacity prediction by using GAM models with various interaction terms.

Battery cell capacity prediction results based on GAM models with various interaction terms. (a) None terms, (b) 3 terms, and (c) 5 terms.
To further evaluate deviation results of battery capacity predictions, the predicted versus actual plots (PVAPs) for all these three GAM cases with different interaction terms are shown in Figure 4. For the observations on the left or right of PVAP, the farthest from average result an value would provide the most leverage and force the prediction line towards that observation. A more accurate model should has the ability to result the observations become closer to perfect prediction line. Obviously, according to Figure 4, the observations get closer to the perfect prediction line for GAM with 3 interaction terms, while a few observations are away from the perfect prediction line of GAM case with none interaction terms, leading to the reduced accuracy of related battery capacity predictions.

Predicted versus actual plots by using GAM models with various interaction terms: (a) None terms, (b) 3 terms, and (c) 5 terms.
Local effects analysis
After deriving three GAMs with different interaction terms for battery capacity prediction, this subsection would focus on the analysis of local effects of five interested battery electrode formulation parameters on battery capacity. Specifically, two randomly selected observations are utilized as the query points in this study to analyse these local effects, as detailed in Table 3.
Case studies of randomly selecting query points for local effects analysis of parameters of interest.
Case study 1 with none interaction terms
We first analyse the local effects of battery electrode formulation parameters based on the GAM without interaction term. Through using LFP, CNF, C65, Binder and BinderType as inputs to this GAM, their local effect plots for two query points could be quantified and illustrated in Figure 5(a) and (b), respectively. Obviously, although the local effect of LFP term is positive for query point 1 while negative for query point 2, this term gives the largest local effects that are over three times larger than Binder (the second largest one) on capacity prediction for both these two query points. It is also interesting to note that although the quantified local effects of these five parameters are different, their amplitude trends are similar. Specifically, C65 and CNF give the third and fourth largest local effects on battery capacity prediction while BinderType provides the smallest local effects. These results indicate that the five battery electrode formulation parameters of interest provide the same local effects for predicting battery capacity in both two selected query points.

Local effects analysis of parameters for none interaction term case: (a) query point 1 and (b) query point 2.
Case study 2 with 3 interaction terms
Next, to investigate the local effects of interaction terms derived from five parameters, three interaction pairs including the LFP-CNF, Binder-BinderType, and C65-CNF are added as new terms to GAM. For the local effects analysis of GAM case with 3 interaction terms, after quantifying the local effect values of LFP, C65, CNF, Binder, BinderType and 3 additional interaction pairs, their local effect rankings for both two query point cases are illustrated in Figure 6(a) and (b), respectively. Not surprisingly, the quantified local effect values of all five solo battery electrode formulation parameters match well with the case of GAM without interaction terms for both two query points. Apart from solo parameter terms, the interaction terms would also contribute some local effects to make the battery capacity prediction become better. Quantitatively, the pair of LFP-CNF give the 0.2 quantified local effect to the results, which is 90% and 400% larger than those from interaction pairs of Binder-BinderType and C65-CNF, respectively. It can be concluded that the five solo battery electrode formulation parameters play the dominant effects on predicting battery capacity while LFP-CNF gives the largest interaction effects among all interaction pairs.

Local effects analysis of parameters for 3 interaction terms case: (a) query point 1 and (b) query point 2.
Case study 3 with 5 interaction terms
To further evaluate how other interaction terms affect battery capacity prediction, five interaction pairs with the most largest local effects are further added to GAM. According to the histogram shown in Figure 7, the quantified local effects for five solo battery electrode formulation parameters are similar for those from the case study 1 with none interaction terms. For the interaction pairs of C65-Binder and LFP-CNF, their local effects are relatively small and become negligible. This fact implies that adding these two interaction pairs would not cause large local effects on the prediction performance of GAM. These quantified local effect results are very useful as they well match the conclusion from real battery experiments, but this study illustrates how a GAM-based framework is able to quantify the local effects of battery electrode formulation parameters and their interaction pairs of interest.

Local effects analysis of parameters for 5 interaction terms case: (a) query point 1 and (b) query point 2.
Conclusion
Li-ion battery-based energy storage systems have been widely used in many real industrial applications. As battery behaviours significantly relies on its capacity dynamics, this article explores the effective battery capacity predictions and local effects analysis of battery electrode formulation parameters of interest. An effective data analysis solution based on the GAMs with different interaction terms is proposed to not only predict battery capacity but also quantify the local effects of five battery electrode formulation parameters and their interaction terms. Illustrative results show that battery capacity could be well predicted with over 0.96 R-Squared value after using LFP, C65, CNF, Binder and BinderType as inputs to the GAM. LFP could provide the largest local effect while BinderType give the smallest effect for two selected query points. The GAM with all these five solo battery electrode formulation parameters and three interaction pairs including LFP-CNF, Binder-BinderType, and C65-CNF could give the best prediction results. Due to the advantages in terms of explainability and driven only by data, the proposed GAM-based solution is able to assist engineers to get battery capacity information as well as understand the local effects of parameters of interest within a battery. When more relevant data from other types of battery are available in the future, the proposed solution could be easily extended to analyse and predict other battery properties, further benefitting the development of smarter battery-based energy storage systems.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by State Grid Corporation of China and the project ‘The key technologies’ research and application for aggregation control of virtual power plants in urban public buildings (52090R200005)’ and also supported by National Natural Science Foundation of China under Grant No.52107079 and No.51807023.
