Features selection in a predictive model for cardiac surgery-associated acute kidney injury

Abstract

Background

Cardiac surgery-associated acute kidney injury (CSA-AKI) is related to increased morbidity and mortality. However, limited studies have explored the influence of different feature selection (FS) methods on the predictive performance of CSA-AKI. Therefore, we aimed to compare the impact of different FS methods for CSA-AKI.

Methods

CSA-AKI is defined according to the kidney disease: Improving Global Outcomes (KDIGO) criteria. Both traditional logistic regression and machine learning methods were used to select the potential risk factors for CSA-AKI. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of the models. In addition, the importance matrix plot by random forest was used to rank the features' importance.

Results

A total of 1977 patients undergoing cardiac surgery at Fuwai hospital from December 2018 to April 2021 were enrolled. The incidence of CSA-AKI during the first postoperative week was 27.8%. We concluded that different enrolled numbers of features impact the final selected feature number. The more you input, the more likely its output with all FS methods. In terms of performance, all selected features by various FS methods demonstrated excellent AUCs. Meanwhile, the embedded method demonstrated the highest accuracy compared with the LR method, while the filter method showed the lowest accuracy. Furthermore, NT-proBNP was found to be strongly associated with AKI. Our results confirmed some features that previous studies have reported and found some novel clinical parameters.

Conclusions

In our study, FS was as suitable as LR for predicting CSA-AKI. For FS, the embedded method demonstrated better efficacy than the other methods. Furthermore, NT-proBNP was confirmed to be strongly associated with AKI.

Keywords

feature selection machine learning logistic regression cardiac surgical procedure acute kidney injury

Introduction

The incidence of cardiac surgery-associated acute kidney injury (CSA-AKI) varies between 7% and 40%, depending on the definition, patient characteristics, and the type of cardiac surgery.¹ This common and serious complication poses a great threat to patient outcomes ^1–7 as well as the health care system,⁷ and early prediction and detection of AKI can bring major benefits.⁸ Currently, the Kidney Disease Improving Global Outcome (KDIGO) criteria are recommended for the diagnosis of AKI. However, these criteria are not applicable for timely prediction.^5,9 Several risk scores have been tested for helpfulness in risk stratification.^3,10–13 However, most of these potential risk factors in these models were analyzed with the traditional logistic regression method, which could only handle the linear relationship between variable and outcome^2,4

With the booming data in electronic medical records (EMR), it is difficult to build precise models with high-dimensional data. The irrelevant and redundant features negatively affect the accuracy.¹⁴ Machine learning, equipped with the capacity to remove redundant attributes, can improve task learning efficiency and optimize the performance of predictive models.¹⁵ Although, an increasing number of studies have explored the predictive model of CSA-AKI by machine learning,¹⁶ there is no consensus on the appropriate feature selection (FS) method for CSA-AKI.⁵

This study aimed to compare the different FS methods based on machine learning with logistic regression (LR) from three aspects: predictive performance, outcome accuracy, and feature importance.

Materials & methods

Study population

This retrospective observational study was approved by the Ethics Committee of Fuwai Hospital in Beijing, China [2022-1086]. Patients who underwent cardiac surgery with cardiopulmonary bypass at our institution from December 2018 to April 2021 were enrolled. Informed consent was exempted.

Data collection

Preoperative variables, including demographic characteristics, laboratory values, and medical and medication histories, were extracted. In addition, surgery time, CPB time, aortic clamp time, and surgery type were also extracted.

Study design

Two groups enrolled in different features were used to explore the impact of various FS methods on CSA-AKI. For group I, we included all features extracted, a more inclusive group. As for group II, we retained the reported features related to CSA-AKI.^3,6,17–20 LR and ML engineering were performed simultaneously in the two groups to determine suitable variables. The whole process showed in Figure 1.

Figure 1.

The flow chart shows the whole process of our study.

AKI definition

The primary outcome was postoperative AKI, defined according to Kidney Disease Improving Global Outcomes (KDIGO) criteria, which was determined as the maximal change in serum creatinine level during the first seven postoperative days.²¹ The baseline serum creatinine (SCr) level was defined as the most recent measurement before surgery, and an increase in serum creatinine by 1.5–1.9 times baseline or by ≥ 26.5 μmol/l (0.3 mg/dl) from baseline was identified as KDIGO-AKI. We did not choose decreasing urine output as our objective indicator because it is not specific and can be affected by various clinical variables; meanwhile, the definition of oliguria is not constant with changeable cutoff values for different situations.²²

Statistical analysis

Continuous variables of patient characteristics were expressed as the mean ± standard deviation or median with an interquartile range. The categorical variables of patient characteristics were reported as the frequency number and percentage. Student’s t-test or the Mann–Whitney U test was used for continuous variables, and the χ2 test or Fischer’s exact test was applied for categorical variables (p < .05 indicates statistical significance).

Feature selection and model evaluation

Logistic regression was the traditional method for detecting the association between features and AKI (p < .1 indicates statistical significance). In contrast, ML was used as a novel method to select relevant features. Generally, the feature selection based on ML can be divided into three main categories: filter, wrapper, and embedded methods¹⁴(Figure 2).In this study, we aimed to compare maximal information coefficient (MIC), recursive feature elimination (RFE), and random forest (RF) with LR.

(1) Filter : Based on the univariate relationship, features are selected with the endpoint (i.e., AKI). We set p < .05 as a threshold to decide whether a feature should be included.²³ (Figure 2(a))

(2) Wrappe r: The wrapper method trains the estimator on an initial set of features and obtains the importance of each feature through the coef_ attribute or the feature_importances attribute. (Figure 2(b))

(3) Embedded : This includes two main steps. First, certain machine learning algorithms and models are trained to obtain the weight coefficients of each feature, and then the features are selected according to the weight coefficients from the largest to the smallest. (Figure 2(c))

Figure 2.

Basic process of different feature selection. (a) the process of Filter, (b) the process of Wrapper, and (c) the process of Embedded.

The importance matrix plot for the RF method was used to rank the features selected. Furthermore, we used area under receiver operating characteristic curve(AUC) to evaluate the model’s performance, which was compared with the LR by the Delong test. All analyses were developed in open-source software libraries (Python version 3.9).

Results

Characteristics of the analyzed cohort

A total of 1977 patients were enrolled in the analysis, in which 550 (27.8%) patients were identified with postoperative AKI. (Figure 1) Those who developed AKI were more likely to be older (age: 55 vs 58, p < .001), more obese (BMI:23.59 vs 24.06, p < .001), and female (36.3% vs 41.5%, p = .03). Patients with the poor condition were more vulnerable to suffering from AKI. In terms of intraoperative parameters, longer surgical time (244.52 vs 262.59, p < .001), longer CPB and aortic clamp time (108 vs 124, p < .001; 75 vs 89.5, p < .001) were detected to have negative effects on patient outcome. In addition, the surgical procedure (including the CPB procedure) was also found to impact the progression of AKI. Table 1 presents a detailed summary of this information.

Table 1.

Patient characteristics and postoperative outcomes.

Variables	All patients (n = 1977)	Post-operative AKI (n = 550)	No post-operative AKI (n = 1427)	p value
Demographic data
Age, years, median, (IQR)	3656(47,64)	58(51,65)	55(44,63)	<0.001
Gender, female, n (%)	746(37.7)	228(41.5)	518(36.3)	0.03
Body-mass index (kg/m2) mean, (SD)	24.31(3.54)	23.59(3.47)	24.60(3.54)	<0.001
Race (n, %)
(1)	1859(94)	514(93.5)	1345(94.3)
(2)	26(1.3)	11(2.0)	15(1.1)
(3)	35(1.8)	13(2.4)	22(1.5)
(4)	33(1.7)	3(0.5)	30(2.1)
(5)	24(1.2)	9(1.6)	15(1.1)
Preoperative variables
Heart rate, mean, (SD)	78.69(14.16)	82.06(17.01)	77.39(12.66)	<0.001
SBP, mean, (SD)	127.92(18.52)	123.09(18.61)	129.78(18.15)	<0.001
DBP, mean, (SD)	73.52(12.47)	71.28(13.44)	74.38(11.97)	<0.001
Body temperature mean, (SD)	36.38(0.26)	36.37(0.27)	36.39(0.27)	0.11
Smoke, (n, %)	655(33.1)	191(34.7)	464(32.5)	0.34
Smoke within a latest month, (n, %)	270(13.7)	64(11.6)	206(14.4)	0.10
Allergy, (n, %)	192(9.7)	46(8.4)	146(10.2)	0.21
NYHA class (n, %)
Class I	236(11.9)	25(4.5)	211(14.8)
Class II	1266(64)	317(57.6)	949(66.5)
Class III	441(22.3)	191(34.7)	250(17.5)
Class IV	34(1.7)	17(3.1)	17(1.2)
Diabetes mellitus (n, %)	219(11.1)	55(10)	164(11.5)	0.34
Hypertension (n, %)	669(33.8)	172(31.3)	497(34.8)	0.13
Hyperlipidemia (n, %)	698(35.3)	176(32)	522(36.6)	0.13
Chronic kidney disease (n, %)	3(0.2)	3(0.5)	0(0)	0.03
COPD (n, %)	3(0.2)	1(0.2)	2(0.1)	1
Peripheral vascular disease (n, %)	12(0.6)	4(0.7)	8(0.6)	0.75
Cerebrovascular accident (n, %)	31(1.6)	14(2.5)	17(1.2)	0.03
Infective endocarditis (n, %)	16(0.8)	6(1.1)	10(0.7)
Fully recovered ischemic central nervous system injury (n, %)	42(2.1)	6(1.1)	10(0.7)	0.38
Lacunar cerebral infarction (n, %)	16(0.8)	19(3.5)	23(1.6)	0.01
Non-invasive tests suggesting carotid artery stenosis >79%, (n, %)	14(0.7)	6(1.1)	8(0.6)	0.21
Previous cardiac valvular disease, (n, %)	1466(74.2)	489(88.9)	977(68.5)	<0.001
Previous coronary heart disease, (n, %)	472(23.9)	106(19.3)	366(25.6)	0.003
Previous congenital heart disease, (n, %)	351(17.8)	48(8.7)	303(21.2)	<0.001
Previous aortic disease, (n, %)	236(11.9)	47(8.5)	189(13.2)	0.004
Previous peripheral vascular disease (n, %)	340(17.2)	159(28.9)	181(12.7)	<0.001
Previous carotid surgery, (n, %)	123(6.2)	47(8.5)	76(5.3)	0.008
LVEDD, mean, (SD)	52.01(10.21)	54.31(11.18)	51.12(9.67)	<0.001
LVEF (%), mean, (SD)	61.33(6.24)	58.96(7.71)	62.24(5.29)	<0.001
WBC Count, mean, (SD)	6.24(1.61)	6.23(1.74)	6.25(1.56)	0.85
Neutrophil count, mean, (SD)	69.35(8.79)	71.10(8.85)	68.67(8.68)	<0.001
Hemoglobin count mean, (SD)	138.05(17.61)	135.86(19.42)	138.90(16.80)	0.001
Platelet count, mean, (SD)	205.58(57.32)	190.50(57.24)	211.40(56.30)	<0.001
ALT, median, (IQR)	19(13,29)	20(13,31)	18(12,28)	0.03
AST, median, (IQR)	25(21,31)	27(22,34)	25(21,30)	<0.001
ALP, mean, (SD)	67.97(21.71)	69.94(26.99)	67.21(19.25)	0.03
GGT, median, (IQR)	25(17,41)	46.74(61.81)	24(16,37)	<0.001
Direct bilirubin, median, (IQR)	3.36(2.44,5.12)	4.15(2.83,6.88)	3.12(2.36,4.55)	<0.001
Total bilirubin, median, (IQR)	12.12(9.10,16.58)	13.95(9.97,20.02)	11.53(8.83,15.51)	<0.001
Serum creatinine, mean, (SD)	84.55(18.07)	90.18(21.40)	82.39(16.10)	<0.001
BUN, median, (IQR)	5.98(4.88,7.41)	6.89(5.68,8.47)	5.71(4.68,6.94)	<0.001
TP, mean, (SD)	68.30(5.62)	67.66(6.00)	68.54(5.44)	0.003
ALB, mean, (SD)	39.99(3.35)	39.27(3.60)	40.27(3.21)	<0.001
PT, mean, (SD)	13.47(1.70)	14.15(2.44)	13.21(1.21)	<0.001
D-dimer, median, (IQR)	0.23(0.17,0.35)	0.27(0.19,0.47)	0.22(0.17,0.32)	<0.001
NT-proBNP (LN), median, (IQR)	252.40(79.25,875.00)	7.31(6.95,7.81)	4.96(4.00,5.75)	<0.001
Hs-CRP, median, (IQR)	0.83(0.35,2.17)	1.11(0.49,3.15)	0.73(0.32,1.83)	<0.001
Statins (n, %)	109(5.5)	25(4.5)	84(5.9)	0.24
β-blocker (n, %)	735(37.2)	267(48.5)	469(32.8)	<0.001
ACEI (n, %)	145(7.3)	39(7.1)	106(7.4)	0.79
Intraoperative variable
High dosage tranexamic acid, n (%)	980(49.6)	284(51.6)	696(48.8)	0.25
Emergency surgery, n (%)	18(0.9)	8(1.5)	10(0.7)	0.11
Volume of intraoperative blood salvage, median, (IQR)	200(200,210)	200(200,220)	200(200,210)	0.04
Lowest temperature during CPB, median, (IQR)	32.0(31.0,31.9)	32.0(31.0,32.7)	32.0(31.1,33.0)	0.002
Hemoglobin at end of CPB, mean, (SD)	103.8(14.91)	101.58(14.79)	103.66(14.93)	0.005
CPB Again, (n, %)	47(2.4)	13(2.4)	34(2.4)	0.98
IABP, (n, %)	12(0.6)	5(0.9)	7(0.5)	0.28
ECMO, (n, %)	3(0.2)	1(0.2)	2(0.1)	1
Surgery duration, mean, (SD)	249.55(79.70)	262.59(86.52)	244.52(76.35)	<0.001
CPB Duration, median, (IQR)	113(85,148)	124(95,162)	108(81,142)	<0.001
Aortic clamping duration, median, (IQR)	80(57,109)	89.50(66.75,120.00)	75(54,105)	<0.001
CABG (n, %)	531(26.9)	128(23.3)	403(28.2)	0.02
Valve surgery (n, %)	1292(65.4)	444(80.7)	848(59.4)	<0.001
Congenital heart disease surgery (n, %)	290(14.7)	42(7.6)	248(17.4)	<0.001
Aortic surgery (n, %)	166(8.4)	34(6.2)	132(9.3)	0.02

Outcomes of different selection methods

In Group I, 66 available features were included for analysis. Finally, 46 features were identified by LR methods; 47 were selected by the MIC method. Regarding the wrapper (RFE), the features that negatively influenced the cross-validation score were removed. Hence, 36 features were selected. As for the embedded (RF), when the threshold of the model was set as 0.01, 18 features were included. (Table 2)

Table 2.

Features selected by ML and LR methods.

	Group I	Group II
All features	group, gender, age, race, BMI,NYHA, smoke, smoke1m	Gender, age, race, BMI
	β-blocker, ACEI, statins, Temp, HR, SBP, DBP,LVEF, LVEDD	NYHA, smoke, sd, LVEF, L VEDD
	WBC,NEUT,HGB,PLT,ALT,AST,GGT,ALP,DBIL, TBIL,SCR,BUN,TP,ALB,PT,D-dimer,NT-pro BNP,Hs-CRP	WBC,NEUT,HGB,PLT,ALT,AST,GGT,ALP,DBIL, TBIL,SCR,BUN,TP,ALB,PT,D-dimer,NT-pro BNP,Hs-CRP
	Allergy, Dm, Htn, Hla, CKD,COPD,PVD,CA,IE, MED10-18	Dm,Htn,Hla,CKD,COPD,PVD,CA,MED13-14,MED16, MED18
	IABP, ECMO, aortic clamping duration	Intra5- 7
	CABG, valve surgery, congenital heart disease surgery, aortic surgery	CABG, valve surgery, aortic surgery
Filter	Gender, race, NYHA, smoke, smoke1m, ACEI, HR, SBP, DBP, LVEF, LVEDD	Gender, age, race, NYHA, sd
	WBC,NEUT,PLT,ALT,AST,GGT,ALP,DBIL, TBIL,SCR,BUN,ALB,PT,D-dimer,NT-pro BNP	LVEF, LVEDD, NEUT, PLT, AST,GGT,ALP,DBIL, TBIL,SCR,BUN,TP,ALB,NT-pro BNP,Hs-CRP
	allergy,Htn,CKD,PVD,IE,MED10-13, MED15,MED17-18	Dm, Hla,PVD,MED14, MED18
	Intra1-2, Intra5,Intra7, ECMO	Intra5-7,CABG, valve surgery, aortic surgery
	Valve surgery, congenital heart disease surgery, aortic surgery
Wrapper	age, BMI, NYHA,β-blocker, Temp, HR, SBP,DBP,LVEF	Gender, age, BMI, NYHA, smoke, sd, LVEF, LVEDD
	LVEDD, WBC,NEUT,HGB,PLT,ALT,AST,GGT,ALP,DBIL, TBIL,SCR,BUN,TP,ALB,PT,D-dimer,NT-pro BNP,PVD	NEUT,HGB,PLT,ALT,AST,GGT,ALP,DBIL, TBIL,SCR,BUN,TP,ALB,NT-pro BNP,Hs-CRP
	MED13, MED15,MED18	Hla,PVD,MED13,MED18
	Intra2, Intra3,Intra5-7	Intra5-7,CABG, valve surgery
	Valve surgery
Embedded	age, BMI, HR, SBP,LVEF, LVEDD	BMI, NYHA, LVEF,DBIL, TBIL,SCR,BUN
	NEUT,PLT,GGT,DBIL, TBIL,SCR,BUN,ALB,PT,NT-pro BNP	NT-pro BNP
	Intra5-6
Logistic regression	Gender, age, ace, BMI, NYHA, β-blocker, HR, DBP, SBP, LVEF	Gender, age, race, BMI, NYHA, SD, LVEF, LVEDD,Lab2-4
	LVEDD, NEUT,HGB,PLT,AST,GGT,ALP,DBIL, TBIL,SCR,BUN,TP,ALB,PT,D-dimer,NT-pro BNP,Hs-CRP	NEUT,HGB,PLT,AST,GGT,ALP,DBIL, TBIL,SCR,BUN,TP,ALB,NT-pro BNP,Hs-CRP
	Lab6-18, Hla, CKD, PVD	Hla, CKD,COPD,PVD, MED13-14,16,18
	MED10-11,MED13-18	Intra5-7
	Intra2-3,Intra5-7	CABG, valve surgery, aortic surgery
	CABG, valve surgery ,congenital heart disease surgery, aortic surgery

Abbreviations: BMI = body mass index; NYHA = Classification of NYHA heart function; smoke1m = smoke within a latest month; group = high or low dosage tranexamic acid; ACEI = angiotensin converting enzyme inhibitors; Temp = body temperature before surgery; HR = heart rate; SBP = systolic blood pressure; DBP = diastolic blood pressure; sd = SBP-DBP; LVEF = left ventricular ejection fraction; LVEDD = Left ventricular end diastolic dimension; MED10 = fully recovered ischemic central nervous system injury; MED11 = lacunar cerebral infarction; MED12 = non-invasive tests suggesting carotid artery stenosis >79%; MED13 = previous cardiac valvular disease; MED14 = previous coronary heart disease; MED15 = previous congenital heart disease; MED16 = previous aortic disease; MED17 = previous carotid surgery; MED18 = previous peripheral vascular disease; Intra1 = volume of intraoperative blood salvage; Intra2 = lowest temperature during CPB; Intra3 = hemoglobin at end of CPB; Intra4 = CPB again; Intra5 = surgery duration; Intra6 = CPB duration; Intra7 = aortic clamping duration; CPB = cardiopulmonary bypass; IABP = Intra-aortic balloon pump; ECMO = Extracorporeal Membrane Oxygenation.

In Group II, 41 features reported to be related to CSA-AKI were enrolled. Eventually, there were 35 features selected by the LR method,30 features selected by MIC, 31 by the wrapper (RFE), and nine features selected by the embedded (RF) method. (Table 2)

Comparison of different FS methods

In terms of AUCs, ML techniques and traditional statistical approaches demonstrated excellent predictive performance in both groups. In Group I, the features selected by LR, and MIC methods all achieved an AUC of 0.97. RFE and RF performed the AUC with 0.98 and exhibited a significant difference compared with LR (p < .01). In Group II, the LR method performed an AUC of 0.98. The MIC and RFE method achieved an AUC of 0.97. Furthermore, the RF method performed an AUC of 0.99 and showed a significant difference from the LR method (p < .01). (Figure 3)

Figure 3.

The AUC of different models in two groups. (a) group I, (b) group II.

Regarding the selection accuracy of ML methods compared with the traditional LR method, we concluded that for Group I, the accuracy of the MIC method was 72.34% compared with LR. Meanwhile, the accuracy was 91.66% for the RFE and 100% for the RF method. Furthermore, in Group II, the accuracy of the MIC method was 96.67%, 95.54% for the RFE, and 100% for the embedded method.

Identification of significant features

An importance matrix plot was conducted by random forest model to identify potential risk factors for CSA-AKI. The plot demonstrated that NT-proBNP was the most influential factor. Furthermore, in both groups, the top four most important features for all FS methods were almost the same, including prothrombin time (PT), blood urea nitrogen (BUN), left ventricular ejection fraction (LVEF), and NT-proBNP. (Table 3 & Figure 4)

Table 3.

Top 10 features of different FS methods.

Filter		Wrapper		Embedded
Group 1	Group 2	Group 1	Group 2	Group 1	Group 2
NT-proBNP	NT-proBNP	NT-proBNP	NT-proBNP	NT-proBNP	NT-proBNP
PT	PT	PT	PT	PT	PT
BUN	LVEF	LVEF	BUN	BUN	LVEF
LVEF	BUN	BUN	LVEF	LVEF	BUN
PLT	TBil	Dbil	Dbil	PLT	TBil
Dbil	Dbil	PLT	PLT	Dbil	SCr
HR	PLT	SCr	TBil	HR	PLT
NEUT	SCr	HR	HR	TBil	Dbil
TBil	SBP	TBil	SCr	GGT	HR
SCr	NEUT	SBP	SBP	ALB	SBP

Figure 4.

The importance matrix plot of all FS in group I. (a) Filter, (d) Wrapper, and (e) Embedded. The importance matrix plot of all FS in group II. (d) Filter, (e) Wrapper, and (f) Embedded.

Discussion

In this retrospective cohort study, we compared LR and ML techniques to select potential risk factors for AKI. Firstly, we concluded that different enrolled numbers of features impact the final selected feature number. The more inputs, the more outputs with all FS methods. Secondly, as for the performance, all selected features by various FS methods demonstrated excellent AUCs. Meanwhile, the embedded method demonstrated the highest accuracy compared with the LR method, while the filter method showed the lowest accuracy. Thirdly, regarding the importance ranking of features, our results confirmed some features that previous studies have reported and found some novel clinical parameters.

Tremendous research has shown that ML outperformed LR for the prediction of AKI.^2,4,24,25 However, a recent meta-analysis reported that ML algorithms were comparable to regression models in developed models.²⁶ Researchers used traditional logistic regression or Cox regression to explore the potential risk factors for AKI.^{1,6,17,27–29} Recently, some studies found other features might influence the progression of AKI [17-19, 28, 30, 31], yet there is a lack of sufficient multicenter evidence to support their findings. The performance of different FS methods in predicting CSA-AKI remains unknown. Hence, we included as many features as possible in Group I, and 25 more were not commonly recognized before compared with Group II. We used LR and ML approaches to select the relevant features in the two groups. We identified that the more you input, the more likely its output with all FS methods. In addition, the number of features selected by ML methods was less than LR methods, which might promote establishing an easier and more effective predictive model. Meanwhile, the number of selected features by the embedded method is smaller than the filter and wrapper method, which may be attributed to the wrapper and the embedded method are based on the filter method and are better at handling and processing data.³⁰

Our study showed the AUCs between FS and LR methods were significant statistically. Still, all performances demonstrated excellent, which implies that FS and LR methods can be performed comparably well in selecting predictors of CSA-AKI. Furthermore, no matter dealing with a lower or higher dimensional dataset, the embedded method performed better than other methods, which might be attributed to the feature subset search process of the embedded method incorporated into the classifier training process. Notably, the AUCs of all FS methods in our study were beyond 0.9. Perhaps this is because we included serum creatinine (SCr) in our analysis, which is an important indicator of the definition of KDIGO-AKI. However, Koyner Carey et al. found that the algorithm for predicting severe AKI did not change significantly after excluding the SCr variable.²⁴ Future studies should investigate the complex connection between baseline SCr and postoperative AKI. Additionally, we included many intraoperative variables that might improve the performance of our models.

In terms of the selection accuracy of the ML methods compared with the LR method, our analysis indicated that the embedded method achieved the highest accuracy despite the number of the features inputted. In contrast, the filter method achieved less accuracy. It may be due to the wrapper, and embedded methods have a built-in algorithm, in which the feature selection process and algorithm training are performed simultaneously. Therefore, their results are more accurate and reliable.³⁰

According to the importance matrix plot, the most influential factor was NT-proBNP in all FS methods. The relation between NT-pro BNP and CSA-AKI may be due to the “cardio-renal” syndrome, which is out of the analysis of our study.³¹ In addition, the laboratory values such as PT, BUN, total and direct bilirubin, and features to evaluate left ventricular function were found to have a higher potential impact on the progression of CSA-AKI. In addition, preoperative hemodynamic variables such as heart rate and systolic blood pressure were ranked in the top 10 in our cohort, suggesting better managing these features before surgery might benefit the patient. Abundant evidence denotes that intraoperative variable, including surgery and the CPB procedure, are closely associated with postoperative AKI.^3,4 In one single-center cohort, patients at low risk of AKI were reclassified as high risk after including intraoperative variables.³² We also confirmed that CSA-AKI was associated with the aortic clamp time, CPB, and surgery time. In addition, we found that the preoperative use of β-blockers might help mitigate AKI. However, it is still controversial whether pharmacological interventions are beneficial for high-risk patients.^33,34 Additionally, the top 10 features included were almost the same between two groups in different methods. However, the importance ranking of these features were different which may attribute to the interaction between features. when more features are enrolled, their importance percentages will change. Furthermore, some features such as ALT, and the volume of intraoperative blood salvage, which were statistically important, were only detected by ML methods. Additionally, Lee, Hofer et al. found that a hybrid of FS and LR could perform comparably with deep neural network. This implies that FS and LR could be combined in future research, as FS could reduce the number of parameters, decrease the learning time, and avoid the problems of dimensionality; meanwhile LR could output explainable variables with low computational cost,³⁵which need more investigation when faced with booming data and high dimensional statistics.

Our study also has several limitations. First, this is a retrospective analysis with single-center data and a relatively small number of cases. The performance of machine learning algorithms might be different for a larger dataset with a different distribution of patient characteristics in different institutions. Second, the most important variables are not clinically modifiable, and whether our results could benefit high-risk patients is unknown. Nevertheless, further prospective trials are imperative to evaluate whether the adjustment of modifiable predictors could yield beneficial results. But we have confirmed the importance of the intraoperative variables. Third, we did not include some biomarkers, such as cystatin C, tissue inhibitor of metalloproteinases 2 (TIMP-2), and insulin-like growth factor-binding protein 7 (IGFBP7),³⁶ which have been reported to have high specificity and sensitivity, irrespective of potentially interfering conditions in our analysis. Further investigation could explore whether these biomarkers could help better detect postoperative AKI.

Conclusions

In conclusion, ML was as suitable as LR for selecting potential risk factors for CSA-AKI. For ML, the embedded method demonstrated better efficacy than the other methods. Furthermore, NT-proBNP was confirmed to be strongly associated with AKI.

Footnotes

Author contributions

QL and JJ S contributed to manuscript preparation. HL, and YY C participated in the analysis and interpretation of the data, CH Z and JS contributed to reviewing and editing, CH Z contributed to visualization, and JS contributed to supervision and administration of the project. All authors read and approved the final manuscript.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (no. 81970290).

Ethical statement

ORCID iD

Qian Li

References

Thongprayoon

Hansrivijit

Kovvuru

, et al. Diagnostics, risk factors, Treatment and outcomes of acute kidney injury in a New Paradigm. J Clin Med 2020; 9: 2–17. DOI: 10.3390/jcm9041104.

Lee

Yoon

Nam

, et al. Derivation and validation of machine learning approaches to predict acute kidney injury after cardiac surgery. J Clin Med 2018; 7: 1–05. DOI: 10.3390/jcm7100322.

Thongprayoon

Hansrivijit

Bathini

, et al. Predicting acute kidney injury after cardiac surgery by machine learning approaches. J Clin Med 2020; 9: 6–11. DOI: 10.3390/jcm9061767.

Tseng

Chen

Wang

, et al. Prediction of the development of acute kidney injury following cardiac surgery by machine learning. Crit Care 2020; 24: 478. 2020/08/02. DOI: 10.1186/s13054-020-03179-9.

Liu

, et al. Feature ranking in predictive models for hospital-Acquired acute kidney injury. Sci Rep 2018; 8: 17298. 2018/11/25. DOI: 10.1038/s41598-018-35487-0.

Amini

Najafi

Karrari

, et al. Risk factors and outcome of acute kidney injury after Isolated CABG surgery: a prospective cohort study. Braz J Cardiovasc Surg 2019; 34: 70–75. 2019/02/28. DOI: 10.21470/1678-9741-2017-0209.

Zaouter

Potvin

Bats

, et al. A combined approach for the early recognition of acute kidney injury after adult cardiac surgery. Anaesth Crit Care Pain Med 2018; 37: 335–341. 2018/05/20. DOI: 10.1016/j.accpm.2018.05.001.

Malhotra

Siew

. Biomarkers for the early detection and Prognosis of acute kidney injury. Clin J Am Soc Nephrol 2017; 12: 149–173. 2016/11/09. DOI: 10.2215/CJN.01300216.

Ostermann

Joannidis

. Acute kidney injury 2016: diagnosis and diagnostic workup. Crit Care 2016; 20: 299–313.

10.

Chertow

Lazarus

Christiansen

, et al. Preoperative renal risk stratification. Circulation 1997; 95: 878–884.

11.

Thakar

Arrigain

Worley

, et al. A clinical score to predict acute renal failure after cardiac surgery. J Am Soc Nephrol 2005; 16: 162–168.

12.

Mehta

Grab

O’Brien

, et al. Bedside tool for predicting the risk of postoperative dialysis in patients undergoing cardiac surgery. Circulation 2006; 114: 2208–2216.

13.

Wijeysundera

Karkouti

Dupuis

J-Y

, et al. Derivation and validation of a simplified predictive index for renal replacement therapy after cardiac surgery. JAMA 2007; 297: 1801–1809.

14.

Saeys

Inza

Larranaga

. A review of feature selection techniques in bioinformatics. Bioinformatics 2007; 23: 2507–2517. 2007/08/28. DOI: 10.1093/bioinformatics/btm344.

15.

Guyon

Elisseeff

. An introduction to variable and feature selection. Journal of machine learning research 2003; 3: 1157–1182.

16.

Lee

Hofer

Gabel

, et al. Development and validation of a deep neural Network model for prediction of postoperative in-hospital mortality. Anesthesiology 2018; 129: 649–662. 2018/04/18. DOI: 10.1097/aln.0000000000002186.

17.

Kristovic

Horvatic

Husedzinovic

, et al. Cardiac surgery-associated acute kidney injury: risk factors analysis and comparison of prediction models. Interact Cardiovasc Thorac Surg 2015; 21: 366–373. 2015/06/21. DOI: 10.1093/icvts/ivv162.

18.

Neugarten

Sandilya

Singh

, et al. Sex and the risk of AKI following cardio-thoracic surgery: a meta-analysis. Clin J Am Soc Nephrol 2016; 11: 2113–2122. 2016/11/01. DOI: 10.2215/CJN.03340316.

19.

Jian

, et al. Risk factors for acute kidney injury after Cardiovascular surgery: evidence from 2,157 Cases and 49,777 Controls - a meta-analysis. Cardiorenal Med 2016; 6: 237–250. 2016/06/09. DOI: 10.1159/000444094.

20.

Wang

Zhou

, et al. Independent risk factors contributing to acute kidney injury according to updated valve academic research consortium-2 criteria after Transcatheter aortic Valve Implantation: a meta-analysis and meta-regression of 13 studies. J Cardiothorac Vasc Anesth 2017; 31: 816–826. 2017/04/08. DOI: 10.1053/j.jvca.2016.12.021.

21.

Thomas

Blaine

Dawnay

, et al. The definition of acute kidney injury and its use in practice. Kidney Int 2015; 87: 62–73. 2014/10/16. DOI: 10.1038/ki.2014.328.

22.

Hori

Katz

Fine

, et al. Defining oliguria during cardiopulmonary bypass and its relationship with cardiac surgery-associated acute kidney injury. Br J Anaesth 2016; 117: 733–740. 2016/12/14. DOI: 10.1093/bja/aew340.

23.

Nagele

Liggett

. Genetic variation, β-blockers, and perioperative myocardial infarction. Anesthesiology 2011; 115: 1316–1327. 2011/09/16. DOI: 10.1097/ALN.0b013e3182315eb2.

24.

Koyner

Carey

Edelson

, et al. The development of a machine learning Inpatient acute kidney injury prediction model. Crit Care Med 2018; 46: 1070–1077. 2018/03/30. DOI: 10.1097/ccm.0000000000003123.

25.

Penny-Dimri

Bergmeir

Reid

, et al. Machine learning algorithms for predicting and risk Profiling of cardiac surgery-associated acute kidney injury. Semin Thorac Cardiovasc Surg 2021; 33: 735–745. 2020/09/27. DOI: 10.1053/j.semtcvs.2020.09.028.

26.

Christodoulou

Collins

, et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 2019; 110: 12–22. 2019/02/15. DOI: 10.1016/j.jclinepi.2019.02.004.

27.

Chen

, et al. Derivation and validation of a model to predict acute kidney injury following cardiac surgery in patients with normal renal function. Ren Fail 2021; 43: 1205–1213. 2021/08/11. DOI: 10.1080/0886022x.2021.1960563.

28.

Kim

Lee

J-H

Kim

, et al. Can we really predict postoperative acute kidney injury after aortic surgery? Diagnostic accuracy of risk scores using gray zone approach. Thorac Cardiovasc Surg 2016; 64: 281–289.

29.

Brown

Cochran

Leavitt

, et al. Multivariable prediction of renal insufficiency developing after cardiac surgery. Circulation 2007; 116: I139–I143.

30.

Hameed

Petinrin

Hashi

, et al. Filter-wrapper combination and embedded feature selection for gene expression data. Int J Advance Soft Compu Appl 2018; 10: 90–105.

31.

Maries

Manitiu

. Diagnostic and prognostic values of B-type natriuretic peptides (BNP) and N-terminal fragment brain natriuretic peptides (NT-pro-BNP). Cardiovasc J Afr 2013; 24: 286–289. 2013/11/13. DOI: 10.5830/cvja-2013-055.

32.

Adhikari

Ozrazgat-Baslanti

Ruppert

, et al. Improved predictive models for acute kidney injury with IDEA: intraoperative data embedded analytics. PLoS One 2019; 14: e0214904.

33.

Ostermann

Kunst

Baker

, et al. Cardiac surgery associated AKI prevention strategies and medical treatment for CSA-AKI. J Clin Med 2021; 10: 5285–6128. DOI: 10.3390/jcm10225285.

34.

Meersch

Zarbock

. Prevention of cardiac surgery-associated acute kidney injury. Curr Opin Anaesthesiol 2017; 30: 76–83. 2016/09/23. DOI: 10.1097/ACO.0000000000000392.

35.

Aatila

Lachgar

Hamid

, et al. Keratoconus severity classification using features selection and machine learning algorithms. Comput Math Methods Med 2021; 2021: 9979560. DOI: 10.1155/2021/9979560.

36.

Massoth

Zarbock

Meersch

. Acute kidney injury in cardiac surgery. Crit Care Clin 2021; 37: 267–278. 2021/03/24. DOI: 10.1016/j.ccc.2020.11.009.