Multiclass Discriminant Analysis using Ensemble Technique: Case Illustration from the Banking Industry

Abstract

Linear discriminant analysis (LDA) has found extensive application in predicting bankruptcy. In this article, we elucidate a novel modelling approach for LDA that can also aid in gaining useful insights regarding the relative importance and ranking of factors in the banking industry. The model steers away from the traditional computation of the variance/covariance matrix and employs an ensemble technique to assign records to classes. The efficacy of our model is tested using two datasets. Specifically, a large dataset from the banking industry was partitioned into the testing and training datasets, and an accuracy of 87.9% was achieved

JEL Codes: C38, G33

Keywords

LDA separation cutoff score confusion matrix ensemble technique banking industry bankruptcy prediction

1. Introduction

The linear discriminant analysis (LDA) was developed by Fisher (1936), considered by many as the father of modern statistics (Read, 2016). The purpose of LDA was to delineate two or more populations based on their characteristics x₁, x₂, x₃, …, x_k using a linear function Z = a₁x₁+ a₂x₂ + a₃x₃ + … + a_kx_k. The linear function, also known as the linear discriminant function, is a linear combination of the characteristics that are used to discriminate between groups. Consequently, discriminant analysis can be described as a statistical tool employed to discriminate/separate two or more categories of a dependent variable using linear discriminant functions (or hyperplanes) that are linear combinations of the independent variables.

Other popular competing statistical tools used in classification and multivariate statistical testing include the T-square distribution (Hotelling, 1931), Mahalnobis distance (Mahalanobis, 1936), logistic regression (Gaja & Liou, 2018; Johnson et al., 2018), neural networks (Gaja & Liou, 2018; Li et al. 2018), and support vector machines (SVM) (Venkatesan et al., 2018; Wang et al., 2018). While the T-square distribution, a generalization of the student’s t-test, is used in multivariate statistical testing, the Mahalanobis distance finds applications in cluster analysis and classification techniques for face and pattern recognition (Li et al. 2018; Mohammed et al. 2018).

Logistic regression (Cramer, 2004) is a powerful and versatile tool as the independent variables need not be normally distributed (Pohar et al., 2004). The neural network was first explored by Rosenblatt (1962) as a classification tool in 1962. It consists of interconnected neurons where each neuron represents a separating hyperplane. Therefore, the network represents piecewise linear separating hyperplanes. Other early literature on neural networks includes the work of Rumelhart et al. (1986, 1987). SVM (Cortes & Vapnik, 1995; Hastie et al., 2008) focusses on constructing a hyperplane or a set of hyperplanes that tries to maximize the distance from the hyperplane to the nearest data point of the other class.

Despite the availability of several alternate techniques for classification problems, LDA remains a very useful tool in the world of statistical analysis and statistical modelling. Some of the applications of LDA include (a) solving spatial problems (King, 1970; Unsal & Nazman 2020), (b) biology and biological studies (David et al., 2010; Maroco et al., 2011; Morais & Lima, 2018; Preisner et al., 2010; Rao, 1948), (c) pattern recognition (Fard et al., 2018; Jayadevan et al. 2011; Liu et al., 2004; Peets et al. 2017; Siddiqi et al., 2015; Zeina & Al-Anzi, 2018), (d) textile industry (Vila & Kuster, 2007) and (e) bankruptcy prediction (Altman, 1968, 1984; Cardwell et al., 2003; Samarakoon & Hasan, 2003; Viswanathan et al., 2020).

Given the established profound significance of LDA in statistical analysis, it is highly imperative to build a strong foundation in understanding this modelling technique by simple practical means. The traditional method of teaching LDA in a classroom assumes advanced knowledge of linear algebra on the part of the participants (Ragsdale & Stam, 1992). Our modelling approach maintains the spirit of Fisher by focussing on maximizing the distance between the group centroids with a constraint that the pooled variance is equal to one. The modelling approach draws some basic ideas from the spreadsheet models developed by Albright and Winston (2009).

We also present a generalized approach that can model more than two groups. In this scenario, we consider two groups at a time (pairwise) with common variance, ensemble the predictions of all pairwise comparisons, and then take a maximum vote approach for prediction of classes. Although such an approach is practised in other machine learning models like SVM, we have not come across such an approach (in practise) for LDA to the best of our knowledge.

Additionally, we also study the relative importance of factors for each pairwise comparison, and then use them to rank order the factors in the overall scheme of things. We consider this a key contribution of our article as this information can be used to reduce the number of variables needed in the study.

The rest of the article is as follows. Section 2 presents a review of the applications of LDA in studying banking and financial industries. Section 3 presents the detailed algorithm for our modelling approach that extends the discriminant analysis to more than two groups. Section 4 presents the case analysis using two datasets from the finance/banking industry, and the conclusions are presented in Section 5.

2. Application of LDA in Financial and Banking Sectors

LDA finds applications in bankruptcy prediction based on accounting ratios and other financial variables. The seminal article by Altman (1968) to predict corporate bankruptcy considered 66 manufacturing corporations, of which 50% were bankrupt and the rest were solvent. Altman’s model (Altman, 1968) is still useful in many practical situations, although LDA’s fundamental assumption of normal distribution for the independent variables may not always hold true for financial ratios.

Several modelling efforts have focussed on improvising the Z-score model. Altman et al. (1977) developed the Zeta model that had seven variables and considered accounting adjustments. Springate (1978) initially considered 19 financial ratios, but after refinement, settled for only four ratios to determine the health of the company. Altman’s Z-score and its modifications have found applications in bankruptcy prediction in the steel industry (Altman, 1993), savings and loan associations (Pantalone & Platt, 1987), railroads (Altman, 1983), retail sales (Nunthapad, 2000), textile industry (Cardwell et al., 2003), cement industry (Mohammed, 2016; VenkataRamana et al., 2012), and in studying emerging stock markets (Samarakoon & Hasan, 2003). A review of different empirical failure classification models across multiple countries was presented in Altman (1984). A detailed review of bankruptcy prediction studies since the 1930s is presented in Gissel et al. (2007).

There are several studies that consider a comparative analysis of different bankruptcy prediction models. Kiyak and Labanauskaite (2012) study the practical application of bankruptcy prediction models that employ discriminant analysis and logistic regression. The study concluded that LDA-based methods like Altman’s model and Springate’s model performed better than their logistic regression counterparts. A similar comparison was also made by Bunyaminu and Issah (2012), and their conclusion was that the LDA-based methods had a higher accuracy in the first year prior to failure. Husein and Pambekti (2014) compared four different models—two LDA-based models (Altman’s model and Springate’s model), probit-based Zmijewski’s model and Grover’s model. The study concluded that all the models perform well in bankruptcy prediction, while the probit based Zmijewski’s model is the best. Viswanathan et al. (2020) used an unsupervised machine learning technique (k-means clustering technique) to identify logical groups in terms of financial health, and then used supervised learning techniques like LDA and random forest methods for predicting. The study achieved a 95% accuracy with both LDA and random forest and concluded that the LDA is a better alternative as it also has explanatory powers on the relative importance of the variables.

Given the applications of LDA and its variants in the finance/banking domain, in this article, we develop a novel modelling approach for LDA, and highlight details of how the relative importance of variables can be obtained. This is a major contribution to our study.

3. Novel Modelling Approach for Linear Discriminant Analysis

The traditional approach requires the computation of two p × p variance-covariance matrices (between-class and within-class) followed by the optimization of the ratio of between sum of squares to within sum of squares to obtain various $\hat{a}$ . Although the underlying concepts should be easier to comprehend by a participant with advanced knowledge of linear algebra and statistics, most of our participants find this step cumbersome. The need to teach LDA to MBA students in a simplified manner was first addressed by Ragsdale and Stam (1992) and later by Albright and Winston (2009).

Ragsdale and Stam (1992) developed a regression-based model that worked very well for two group problems. In fact, the results obtained by solving the ordinary least square (OLS) are equivalent to Fisher’s LDA (Fisher, 1936; Ragsdale & Stam, 1992). Their approach considers a problem with two groups, say groups 1 and 2. The dependent variable Z, representing the group variable, is expressed in terms of its n characteristics, say x₁, x₂, x₃, …, x_n. The regression equation would then be Z = a + b₁x₁+ b₂x₂ + b₃x₃ + … + b_kx_k. Note that Z would take the values 1 and 2. The coefficients of the regression equation were then estimated using an OLS regression. The resulting equation was then used to estimate the discriminant score, Z, for each record. The group averages, Z₁ and Z₂, were calculated, and the average of these two values was chosen as the cut-off point. According to the classification rule, those records with discriminant scores of less than the cut-off point are classified as belonging to group 1, while other records belonged to group 2. However, Ragsdale and Stam (1992) noted the regression model

cannot, in general, be used for DA problems with more than two groups since the relationship between the dependent and independent variables may not be linear. Even if this relationship can be made linear by some appropriate coding for the dependent variable, it can be impossible to discern what this coding should be if there are several independent variables.

Subsequently, Albright and Winston (2009) devised a new algorithm that keeps Fisher’s idea intact while avoiding advanced linear algebra altogether. In their solution methodology, they tried to define a discriminant score for each record i, which was obtained as a linear combination of the discriminant weights and the value of the predictor variable for each record. A record was deemed to be belonging to group k if the discriminant score for record i exceeded the cut-off score. Albright and Winston (2009) solved this problem as an optimization model using the evolutionary solver in Excel. The objective of the model was to maximize the percentage of correct predictions subject to constraints on the discriminant weights (between −1 and +1) and the cut-off point (chosen arbitrarily). The authors note that a drawback to this approach is that each run of the model could potentially return different sets of optimal values for discriminant weights.

To address this drawback, we present the following model to solve LDA. The basic idea is explained in Section 3.1, and the detailed procedure is presented in Algorithm 1.

3.1 Some Simple Basics to understand the Goal

Let an (n_i ◊ p) matrix X_i represent a sample drawn from population i. Let the jth row of this matrix be denoted as x_ij; i = 1, 2, …, m; j = 1, 2, …, n_i where m represents the number of groups or populations, n_i represents the sample size drawn from population i, and p represents the number of features for each group. Let N be the total number of records. Then we have N = n₁ + … + n_m. The first step in the procedure involves the computation of the (m ◊ p) vector representing the sample mean, $\overset{─}{x_{i}}$ , and the (p ◊ 1) vector representing the grand mean, $\overset{─}{x}$ , as shown in Equations (1) and (2).

{\overset{─}{x}}_{i} = \frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} x_{i j}; i = 1, 2 \dots, m

(1)

\overset{=}{x} = \frac{1}{m} \sum_{i = 1}^{m} {\overset{─}{x}}_{i}

(2)

The equation of the hyperplne obtained from LDA can be represented as shown in Equation (3).

Z_{i} = a_{1} X_{i 1} + a_{2} X_{i 2} + \dots + a_{M} X_{i M} = \sum_{j = 1}^{M} a_{j} X_{i j}

(3)

where Z_i is the determinant score of the record i. Let ${\overset{─}{Z}}_{k 1}$ denote the mean determinant score of the class/group k; k = 1, 2, … c. We then have ${\overset{─}{Z}}_{1} = a_{1} {\overset{─}{X}}_{11} + a_{2} {\overset{─}{X}}_{12} + \dots + a_{M} {\overset{─}{X}}_{}, {\overset{─}{Z}}_{2} = a_{1} {\overset{─}{X}}_{21} + a_{2} {\overset{─}{X}}_{22} + \dots + a_{M} {\overset{─}{X}}_{2 M},$ and so on. This can be generalized as shown in Equation (4).

\overset{─}{Z_{k}} = \sum_{j = 1}^{M} a_{j} {\overset{─}{X}}_{k j}

(4)

Let variance of Z be denoted as Var (Z). Since one of the key goals of the LDA is to maximize the distance between the means/centroids of each group, we solve an optimization model to find the values of a₁, a_{2 …}, a_M that maximize the value of ${[{\overset{─}{Z}}_{1} - {\overset{─}{Z}}_{2}]}^{2} / V a r (Z)$ . In other words, this problem reduces to identifying a₁, a₂ …, a_M that maximizes the absolute difference $| {\overset{─}{Z}}_{1} - {\overset{─}{Z}}_{2} |$ subject to the constraint Var (Z) = 1 (Morrison, 1976).

The decision variables in our model are the discriminant weights. We begin the procedure by calculating the group mean for each predictor variable using Equation (1). The discriminant score is then obtained as the weighted (linear) combination of the predictor variables for each record using the discriminant weights as shown in Equation (3). Note that this calculation is also used by Albright and Winston (2009). The mean discriminant score is then calculated using Equation (4). The group variances are calculated first followed by the calculation of the pooled variance, Var (Z), using Equation (5). Note that Equation (5) furnishes a more generalized formula that can also work for more than two groups. We then use the generalized reduced gradient non-linear method in the Excel solver to obtain the unique discriminant values by maximizing $| {\overset{─}{Z}}_{1} - {\overset{─}{Z}}_{2} |$ subject to the constraint that the pooled variance is one. The cutoff score (CS) is then calculated as the average of ${\overset{─}{Z}}_{1}$ and ${\overset{─}{Z}}_{2}$ using Equation (6). If the calculated discriminant score is greater than the CS, then the record is classified as belonging to group 1. Otherwise, the record is deemed to be in group 2. We then develop the confusion matrix and ascertain the accuracy of the model.

V a r (Z) = \frac{(n_{1} - 1) V_{1} + (n_{2} - 1) V_{2} + \dots + (n_{k} - 1) V_{k}}{(n_{1} + n_{2} + \dots + n_{k})}

(5)

where V₁, V₂, …, V_k, are the group variances for groups 1, 2, …, k, respectively.

C S = \frac{{\overset{─}{Z}}_{1} + {\overset{─}{Z}}_{2}}{2}

(6)

Another significant contribution of our approach is that we quantify the relative importance of predictor variables in terms of group separation and classification using the Karl Pearson correlation coefficient between the discriminant score and each input variable.

Algorithm 1 works in the case of two group classification problems. However, when it gets to three or more groups, we resort to the ensemble method, using the maximum voting. Here we perform pairwise comparisons, where each group is separately compared against the rest. Once all comparisons are completed, we count the number of times each group has been predicted for a specific record, and the group with the maximum count (votes) is selected as the group in which that record is classified. Therefore, Algorithm 2 presents the more generic procedure to solve LDA with three or more groups.

Algorithm 1. Modelling Approach for Two Group LDA

Input: N, M, X_ij; i = 1, 2, …, N; j = 1, 2, …, M; and k = 1, 2

Output: a₁, a_{2 …}, a_M, CS, grouping for each record, confusion matrix

1: Compute the means of each class/group $k, {\overset{─}{X}}_{j k}$ , using equation (1)

2: Compute the discriminant score, Z_i, for each record i using equation (3)

3: Compute the mean determinant score for each group $k, {\overset{─}{Z}}_{k}$ , using equation (4)

4: Compute the group variance and the pooled variance, Var (Z), using equation (5)

5: Employ Excel Solver to find a₁, a_{2 …}, a_M, by maximizing $| {\overset{─}{Z}}_{1} - {\overset{─}{Z}}_{2} |$ subject to Var(Z) = 1

6: Calculate the cut-off score, CS using Equation (6)

7: for i = 1 to N do

7.1: if Z(i) > CS then

Record i belongs to group 1

else

Record i belongs to group 2

8: Compute the accuracy of the prediction using the confusion matrix

9: Rank the predictor variables based on the correlation between the determinant score and input variable

4. Case Study Analysis & Discussion of the Solution

Having provided the generic algorithm, we are ready to test our model by employing the generalized reduced gradient nonlinear method in MS Excel Solver. In this study, two datasets were used. The first dataset dealt with a two-group classification problem from Altman (1968). The second dataset, a large one with 583 records with three groups from the banking sector, was adopted from the work of Viswanathan et al. (2020). We split the dataset into an 80% training set and a 20% testing set. The detailed implementation of our algorithms using Excel screenshots is presented in Appendix A.

Algorithm 2. Ensemble Technique for Multiclass LDA

Input: N, M, X_ij; i = 1, 2, …, N; j = 1, 2, …, M; and k = 1, 2 …, c

Output: a₁, a_{2 …}, a_M, CS, grouping for each record, confusion matrix

1: for k = 1 to c do

1.1: Votes (i, k) = 0

1.2: for k' = 1 to c do

1.2.1: If k' >k then

Run Algorithm 1 (group k vs k' group)

1.3: for i = 1 to N do

1.3.1: If Record i is allocated to class k then

Votes (i, k) = Votes (i, k) + 1

2: for i = 1 to N do

2.1: Group (i) = 1

2.2: for k = 2 to c do

2.2.1: If Votes (i, k) > Votes (i, k – 1) then

Group (i) = k

3: Compute the accuracy of the prediction using the confusion matrix

4.1 Altman’s Classification Model for Bankrupty Detection among Companies

The seminal article by Altman (1968) classified and predicted corporate bankruptcy based on a set of financial ratios. Z-score of Fisher’s LDA was employed to classify the firm as either “Bankrupt” or “Solvent.” The data used in the study was from manufacturing corporations. The data set has 33 bankrupt firms and 33 solvent firms. The central goal was to determine whether bankrupt firms and solvent firms could be sharply differentiated (separated) in terms of five financial ratios—(a) Working Capital/Total Assets (WCTA), (b) Retained Earnings/Total Assets (RETA), (c) Earnings Before Interest and Taxes/Total Assets (EBITTA), (d) Market Value of Equity/Book Value of Total Debt (MVEBVTD), and (e) Sales/Total Assets (SATA). The abbreviations within brackets are made for ease of identifying the ratios in the spreadsheet columns (see Appendix A). The data set was taken from Morrison (1976).

4.1.1 Discussion of Results

The linear discriminant score function, Z, is presented in Equation (7). The coefficients in the equation are the optimized values of a₁, a₂, …, a₅. These are the weights attached to the five financial ratios taken to differentiate the groups, “Bankrupt” and “Solvent.”

Z = 0.0059 W C T A + 0.0071 R E T A + 0.0162 E B I T T A + 0.0030 M V E B V T D + 0.4849 S A T A

(7)

Please note that the coefficients/weights are given to four decimal places for display.

Based on the mean discriminant scores of the two groups, the cut-off score value is 1.1489. If the discriminant score of the individual record is <1.1489, the company is classified as “Bankrupt”, else “Solvent.”

From the prediction made, we see that 63 out of the 66 records have been correctly classified, indicating an accuracy of 95.45%. Given that the actual number of bankrupt companies is 33, the model could predict 31 correctly and misplace 2 into the solvent category. Likewise, given that 33 companies are solvent; the model could predict 32 correctly while misplacing 1 into the bankrupt category. It appears that the discriminant scoring model of Fisher has been able to sharply differentiate “Bankrupt” companies from “Solvent” companies in a robust manner with excellent predictive accuracy. The confusion matrix is presented in Table 1.

Table 1.

Confusion Matrix for Altman’s Data

	Predicted Bankruptcy	Predicted Solvency
Actual bankruptcy	31	2
Actual solvency	1	32

Source: The authors.

Having established the model accuracy, we focus on the relative importance of the predictor variables. The relative importance cannot be based on the weights as they behave exactly like the slopes of the regression equation. We instead use the absolute value of the Karl Pearson correlation coefficient between the discriminant score and each input variable to understand the relative importance of the predictor variables (see Table 2 for details).

Table 2.

Relative Importance of the Predictor Variables

Predictor Variable	WCTA	RETA	EBITTA	MVEBVTD	SATA
Correlation	0.730	0.870	0.681	0.735	0.259
Rank	3	1	4	2	5
p-Value	<0.001	<0.001	<0.001	<0.001	0.036

Source: The authors.

RETA is No. 1, MVEBVTD is No. 2, closely followed by WCTA (No. 3), EBITTA is No. 4, and SATA is No. 5 in terms of relative importance ranking. These rankings are devoid of the influence of the original scale of input variables (in other words, they are invariant to the measurement scale). Except SATA, all the rest are strongly correlated with the discriminant function. A hypothesis test was also conducted to see if there is a significant correlation between the discriminant score and the individual variables. The test showed that all variables are significant (p-values are presented in Table 2). The inference is that these financial ratios, in an emphatic manner, differentiate the two classes sharply and hence serve as good predictor variables of “Bankruptcy”/“Solvency.”

4.2 Three-way Classification Problem for Banks

The three-way classification problem on bank data adopted by Viswanathan et al. (2020) classified bank’s health as low, medium or high based on credit-deposit ratio (CDR), ratio of net interest income to total assets (NITA), return on assets (ROA), capacity adequacy ratio—Tier I (CART1), capacity adequacy ratio—Tier II (CART2), liquidity asset to total asset ratio (LATA) and gross nonperforming assets to gross advances ratio (GNPATA). The unbalanced dataset was comprised of 583 records (banks), out of which 133 banks were of low health, 363 banks were of medium health and the remaining 87 were of high health. The central goal was to determine whether the three groups based on the bank’s health (low/group 1, medium/group 2, and (high/group 3) could be sharply differentiated (separately) in terms of the predictor variables listed above. While the Altman’s (1968) dataset was run only in the training mode (given the number of records in this dataset), we split the dataset into training (80% or 467 records) and testing (20% or 116 records) datasets to test the efficacy of the model in classifying new records.

In the solution procedure, we solved three pairwise classification problems for the training set and used the voting procedure (Algorithm 2) to measure the accuracy of our training model. The cut-off scores and the hyperplane equations for the training model are presented in Tables 3 and 4, respectively (see Supplementary Excel files Bank-Train-Solver.xls for training data and results, and Bank-Test-Solver.xls for testing data and results).

Table 3.
Cut-off Scores for the Pairwise Group Classification

Pair Group 1 vs Group 2 Group 1 vs Group 3 Group 2 vs Group 3

Cut-off score −12.966 −16.191 18.097

Pair	Group 1 vs Group 2	Group 1 vs Group 3	Group 2 vs Group 3
Cut-off score	−12.966	−16.191	18.097

Source: The authors.

Table 4.

Hyperplane Equation for the Three Pairwise Classification Problem

Pair	Hyperplane Equation*
Group 1 vs group 2	Z₁ = −0.219 CDR + 0.399 NITA − 0.466 ROA + 0.083 CART1 + 0.209 CART2 − 0.032 LATA − 0.023 GNPATA
Group 1 vs group 3	Z₂ = −0.222 CDR + 0.128 NITA − 0.670 ROA + 0.070 CART1 + 0.167 CART2 − 0.054 LATA − 0.058 GNPATA
Group 2 vs group 3	Z₃ = 0.213 CDR + 0.103 NITA + 0.807 ROA − 0.056 CART1 − 0.124 CART2 + 0.070 LATA + 0.084 GNPATA

Source: The authors.

Note: *A common variance (pooled variance for the three groups) is used for each pairwise classification.

The confusion matrix for the training model is presented in Table 5. The accuracy of the training model was 94.22%. From Table 5, we note that the training model correctly classifies 99 out of 107 banks deemed to be in low health, 275 out of 290 banks deemed to be in medium health and 66 out of 70 banks deemed to be in high health.

Table 5.

Confusion Matrix for the Training Dataset

	Predicted Low	Predicted Medium	Predicted High
Actual low	99	7	1
Actual medium	12	275	3
Actual high	0	4	66

Source: The authors.

We now proceed to the testing model, where we again considered the three pairwise classifications. Note that we do not run the solver model for the testing dataset. Instead, for each pair of groups, we compare the discriminant scores for each record with the cut-off score for that pair to identify the relevant group for that record. Once we are done with each of the three pairwise comparisons, we perform the voting procedure to assign groups to each record. The confusion matrix for the testing model is presented in Table 6.

Table 6.

Confusion Matrix for the Testing Dataset

	Predicted Low	Predicted Medium	Predicted High
Actual low	26	0	0
Actual medium	13	60	0
Actual high	0	1	16

Source: The authors.

From Table 6, we note that the testing model classifies all banks deemed to be of low health correctly, while it also classifies 60 of the 73 banks deemed to be of medium health correctly. In the case of banks deemed to be in good health, the testing model classifies 16 out of 17 banks correctly. This corresponds to an accuracy of 87.93%.

Additionally, for each pairwise comparison, we calculate the correlation between the discriminant scores and the predictor variables to rank order them in order of relative importance for that comparison. Then, for each predictor variable, the ranks are assigned based on the maximum vote. The result of this analysis is presented in Table 7.

Table 7.

Relative Importance of the Predictor Variables in the Banking Dataset

	CDR	NITA	ROA	CART1	CART2	LATA	GNPATA
Group 1 vs group 2
Correlation	−0.981	−0.135	−0.326	−0.090	0.067	0.402	0.256
Rank	1	5	3	6	7	2	4
p-Value	<0.001	0.003	<0.001	0.052*	0.147*	<0.001	<0.001
Group 1 vs group 3
Correlation	−0.985	−0.242	−0.402	−0.163	0.075	0.343	0.252
Rank	1	5	2	6	7	3	4
p-Value	<0.001	<0.001	<0.001	0.004	0.103*	<0.001	<0.001
Group 2 vs group 3
Correlation	0.975	0.327	0.460	0.221	−0.081	−0.289	−0.244
Rank	1	3	2	6	7	4	5
p-Value	<0.001	<0.001	<0.001	<0.001	0.079*	<0.001	<0.001
Overall ranks for the predictor variables
Rank	1	5	2	6	7	3	4

Source: The authors.

Note: *Refers to correlations that are not significant at 5% level of significance.

Based on Table 7, CDR is ranked first in all three pairwise comparisons. As a result, it is the most important factor, and hence it gets an overall rank of 1. ROA gets a rank of 2 as it comes second in two pairwise classifications. Similarly, the ranks for GNPATA, NITA, CART1, and CART2 are 4, 5, 6, and 7, respectively. LATA gets a rank of 3 as it has ranks 2, 3, and 4 for the three pairwise comparisons. Further, CART2 was found to be insignificant at the level of significance (a = 0.05) in all three pairwise comparisons, while CART 1 was found to be insignificant at a = 0.05 in the first comparison (group 1 vs group 2). This may indicate that CART2 could be removed as a variable in the classification problem.

5. Conclusions

Fisher’s discriminant analysis is a popular scoring model used in the classification of two or more groups. With the principal goal of maximizing the separation between the groups, we have developed a spreadsheet model using “Solver” of MS Excel, while avoiding the nuances of computing two-variance-covariance matrices. Further, the two approaches are equivalent when it comes to classification problems with two groups (Morrison, 1976). When it comes to more than two groups, we first perform pairwise comparisons with a common variance, then ensemble the predictions of all pairwise comparisons, and finally take a maximum vote approach for prediction of classes. As a result, we can see that the accuracy of our ensemble approach will be as good as the variance−covariance approach.

We clearly demonstrate the statistical parsimony of LDA with the spirit of Fisher in achieving the largest separation possible between the two groups using two different datasets. LDA seems to have high accuracy over all datasets, and the gist of LDA can be succinctly modelled using the spreadsheet approach.

The spreadsheet modelling approach can be replicated for any discriminant classifier example. We see this as a model that can be easily scaled in terms of records (i.e., even larger datasets can be solved using this approach as the number of constraints is not dependent on the number of records). Note that irrespective of the number of records, there is always one constraint in our optimization model, namely Var (Z) =1.

Additionally, we demonstrate how the results obtained from LDA can be used to rank the predictor variables in order of their importance in the classification problem. This exercise will clearly demonstrate which of the predictor variables are important in the classification problem. Hence, we can use this information to reduce the number of variables in the study. We have shown how this can be accomplished using two datasets from the finance/banking sectors.

Supplemental Material

Supplemental material for this article is available online.

Supplemental Material

Supplemental material for this article is available online.

Supplemental Material

Supplemental material for this article is available online.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Appendix

References

Albright

S. C.

, & Winston

W. L.

(2009). Management Science Modeling, International Student Edition. South-Western.

Altman

E. I.

(1968). Discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4), 589–609.

Altman

E. I.

, Haldeman

R. G.

, & Narayanan

(1977). ZETA analysis: A new model to identify bankruptcy risk of corporations. Journal of Banking and Finance, 1, 29–54.

Altman

E. I.

(1983). Corporate financial distress: A complete guide to predicting, avoiding and dealing with bankruptcy. Wiley.

Altman

E. I.

(1984). The success of business failure prediction models: An international survey. Journal of Banking and Finance, 8(2), 171–198.

Altman

E. I.

(1993). Corporate financial distress and bankruptcy: A complete guide to predicting & avoiding distress and profiting from bankruptcy (2nd ed.). Wiley.

Bunyaminu

, & Issah

(2012). Predicting corporate failure of UK’s listed companies: Comparing multiple discriminant analysis and logistic regression. International Research Journal of Finance and Economics, 94, 6–22.

Cardwell

P. M.

, McGregor

C. C.

, & Synn

W. J.

(2003). Bankruptcy prediction in the textile industry. International Business & Economics Research Journal, 2(8), 31–40.

Cortes

, & Vapnik

(1995). Support-vector networks. Machine Learning, 20, 273–297.

10.

Cramer

J. S.

(2004). The early origins of the logit model. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Scienes, 35(4), 613–626.

11.

David

D. E.

, Lynne

A. M.

, Han

, & Foley

S. L.

(2010). Evaluation of virulence factor profiling in the characterization of veterinary escherichia coli isolates. Applied and Environmental Microbiology, 76(22), 7509–7513.

12.

Fard

P. A.

, Shakoorjavan

, & Akbari

(2018). The relationship between odour intensity and antibacterial durability of encapsulated thyme essential oil by PPI dendrimer on cotton fabrics. The Journal of the Textile Institute, 109(6), 832–841.

13.

Fisher

R. A.

(1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.

14.

Gaja

, & Liou

(2018). Defect classification of laser metal deposition using logistic regression and artificial neural networks for pattern recognition. The International Journal of Advanced Manufacturing Technology, 94(1–4), 315–326.

15.

Gissel

J. L.

, Giacomino

, & Akers

M. D.

(2007). A review of bankruptcy prediction studies: 1930-present. Journal of Financial Education, 33, 1–42.

16.

Hastie

, Tibshirani

, & Friedman

(2008). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.

17.

Hotelling

(1931). The generalization of Student’s ratio. Annals of Mathematical Statistics, 2(3), 360–378.

18.

Husein

M. F.

, & Pambekti

G. T.

(2014). Precision of models of Altman, Springate, Zmijewski and Grover for predicting the financial distress. Journal of Economics, Business and Accountancy Ventura, 17(3), 405–416.

19.

Jayadevan

, Kolhe

S. R.

, Patil

P. M.

, & Pal

(2011). Offline recognition of Devanagari script: A survey. IEEE Transactions on Systems, Man, and Cybernetics – Part C: Applications and Reviews, 41(6), 782–796.

20.

Johnson

C. Y.

, Howards

P. P.

, Strickland

M. J.

, Waller

D. K.

, & Flanders

W. D.

(2018). Multiple bias analysis using logistic regression: An example from the national birth defects prevention study. Annals of Epidemiology, 28(8), 510–514.

21.

King

L. J.

(1970). Discriminant analysis: A review of recent theoretical contributions and applications. Economic Geography, 46, 367–378.

22.

Kiyak

, & Labanauskaite

(2012). Assessment of the practical application of corporate bankruptcy prediction models. Economics and Management, 17(3), 895–906.

23.

, Wang

, Nie

, Wang

, & Tan

(2018). Distance metric optimization driven convolutional neural network for age invariant face recognition. Pattern Recognition, 75, 51–62.

24.

Liu

, Lu

, & Ma

(2004). Improving kernel Fisher discriminant analysis for face recognition. IEEE Transactions on Circuits and Systems for Video Technology, 14(1), 42–49.

25.

Mahalanobis

P. C.

(1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences in India, 2(1), 49–55.

26.

Maroco

, Silva

, Rodrigues

, Guerreiro

, Santana

, & de Mendoca

(2011). Data mining methods in the prediction of dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC Research Notes, 4, 299.

27.

Mohammed

(2016). Bankruptcy prediction by using the Altman Z-score model in Oman: A case study of Raysut cement company SAOG and its subsidiaries. Australasian Accounting, Business and Finance Journal, 10(4), 70–80.

28.

Mohammed

N. N.

, Khaleel

M. I.

, Latif

, & Khalid

(2018). Face recognition based on PCA with weighted and normalized Mahalanobis distance. 2018 International Conference on Intelligent Informatics and Biomedical, Sciences (ICIIBMS), Bangkok, Thailand.

29.

Morais

C. L. M.

, & Lima

K. M. G.

(2018). Principal component analysis with linear and quadratic discriminant analysis for identification of cancer samples based on mass spectrometry. Journal of the Brazilian Chemical Society, 29(3), 472–481.

30.

Morrison

D. F.

(1976). Multivariate statistical analysis (2nd ed.). McGraw Hill.

31.

Nunthapad

(2000). The application of Altman’s and McGurr’s banktrptcy models to small retail firms: A comparative analysis (Dissertation). Nova Southeastern University, Proquest Dissertation Publishing.

32.

Pantalone

, & Platt

M. B.

(1987). Predicting failure of savings and loan associations. AREUEA, 15(2), 46–64.

33.

Peets

, Leito

, Pelt

, & Vahur

(2017). Identification and classification of textile fabric using ATR-FT-IR-spectroscopy with chemometric methods. Spectrochemica Acta Part A: Molecular and Biomolecular Specroscopy, 173, 175–181.

34.

Pohar

, Blas

, & Turk

(2004). Comparison of logistic regression and linear discriminant analysis: A simulation study. Metodoloski Zvezki, 1(1), 143–161.

35.

Preisner

, Guiomar

, Machado

, Menezes

J. C.

, & Lopes

J. A.

(2010). Application of Fourier transform infrared spectroscopy and chemometrics for differentiation of salmonella enterica serovar enteridis phage types. Applied and Environmental Microbiology, 76(11), 3538–3544.

36.

Ragsdale

C. T.

, & Stam

(1992). Introducing discriminant analysis to the business statistics curriculum. Decision Sciences, 23(3), 724–745.

37.

Rao

C. R.

(1948). The utilization of multiple measurements in problems of biological classification. Journal of the Royal Statistical Society. Series B (Methodological), 10(2), 159–203.

38.

Read

(2016). The early life of Ronald Aylmer Fisher. In Read

(Ed.), The econometricians. Great minds in finance (pp. 111–120). Palgrave Macmillan.

39.

Rosenblatt

(1962). Principles of neurodynamics. Spartan Books.

40.

Rumelhart

D. E.

, Hinton

G. E.

, & Williams

R. J.

(1986). Learning internal representations by backpropagating errors. Nature, 323, 533–536.

41.

Rumelhart

D. E.

, Hinton

G. E.

, & Williams

R. J.

(1987). Learning internal representations by error propagation. Parallel Distributed Processing, 1, 318–362.

42.

Samarakoon

L. P.

, & Hasan

(2003). Altman’s Z-score models of predicting corporate distress: Evidence from the emerging Sri Lankan stock market. Journal of the Academy of Finance, 1, 119–1258.

43.

Siddiqi

M. H.

, Ali

, Khan

A. M.

, Park

Y.-T.

, & Lee

(2015). Human facial expression recognition using stepwise linear discriminant analysis and hidden conditional random fields. IEEE Transactions on Image Processing, 24(4), 1386–1398.

44.

Springate

G. L. V.

(1978). Predicting the possibility of failure in a Canadian firm. MBA Research Project, Simon Fraser University.

45.

Unsal

M. G.

, & Nazman

(2018). Investigating socio-economic ranking of cities in Turkey using data envelopment analysis (DEA) and linear discriminant analysis (LDA). Annals of Operations Research, 294, 281–295.

46.

VenkataRamana

, Azash

S. M.

, & Ramakrishnaiah

(2012). Financial performance and predicting risk of bankruptcy: A case of selected cement companies in India. International Journal of Public Administration and Management Research, 1(1), 40–56.

47.

Venkatesan

, Karthigaikumar

, Paul

, Satheeskumaran

, & Kumar

(2018). ECG signal preprocessing and SVM classifier-based abnormality detection in remote healthcare applications. IEEE Access, 6, 9764–9773.

48.

Vila

, & Kuster

(2007). The importance of innovation in international textile firms. European Journal of Marketing, 41(1/2), 17–36.

49.

Viswanathan

P. K.

, Srinivasan

, & Hariharan

(2020). Predicting financial health of banks for investor guidance using machine learning algorithms. Journal of Emerging Finance, 19(2), 226–261.

50.

Wang

, Zheng

, Yoon

S. W.

, & Ko

H. S.

(2018). A support vector machine-based ensemble algorithm for breast cancer diagnosis. European Journal of Operational Research, 267(2), 687–699.

51.

Zeina

D. A.

, & Al-Anzi

F. S.

(2018). Employing Fisher discriminant analysis for Arabic text classification. Computers & Electrical Engineering, 66, 474–486.

Multiclass Discriminant Analysis using Ensemble Technique: Case Illustration from the Banking Industry

Abstract

Keywords

1. Introduction

2. Application of LDA in Financial and Banking Sectors

3. Novel Modelling Approach for Linear Discriminant Analysis

3.1 Some Simple Basics to understand the Goal

Algorithm 1. Modelling Approach for Two Group LDA

Algorithm 2. Ensemble Technique for Multiclass LDA

4.1 Altman’s Classification Model for Bankrupty Detection among Companies

4.1.1 Discussion of Results

Table 3. Cut-off Scores for the Pairwise Group Classification Pair Group 1 vs Group 2 Group 1 vs Group 3 Group 2 vs Group 3 Cut-off score −12.966 −16.191 18.097

Supplemental Material

Supplemental Material

Supplemental Material

Footnotes

Declaration of Conflicting Interests

Funding

Appendix

References

Table 3.
Cut-off Scores for the Pairwise Group Classification

Pair Group 1 vs Group 2 Group 1 vs Group 3 Group 2 vs Group 3

Cut-off score −12.966 −16.191 18.097