A deep metric learning approach for weakly supervised loan default prediction 1

Abstract

To deal with the systematic risk of financial institutions and the rapid increasing of loan applications, it is becoming extremely important to automatically predict the default probability of a loan. However, this task is non-trivial due to the insufficient default samples, hard decision boundaries and numerous heterogeneous features. To the best of our knowledge, existing related researches fail in handling these three difficulties simultaneously. In this paper, we propose a weakly supervised loan default prediction model WEAKLOAN that systematically solves all these challenges based on deep metric learning. WEAKLOAN is composed of three key modules which are used for encoding loan features, learning evaluation metrics and calculating default risk scores. By doing so, WEAKLOAN can not only extract the features of a loan itself, but also model the hidden relationships in loan pairs. Extensive experiments on real-life datasets show that WEAKLOAN significantly outperforms all compared baselines even though the default loans for training are limited.

Keywords

Metric learning loan default prediction weakly supervised learning pair-wise learning

1 Introduction

In order to decrease the systematic risk of banks and financial administrations, the analysis of credit risk and the decision making for granting loans is one of the most important operations for financial institutions. Basing on the background and financial situation of the applicant, every loan proposal should be carefully considered. For the rapid increasing of online applications, it is intractable to evaluate all loans artificially. Thus, predicting whether a loan is going to be defaulting automatically is becoming an extremely urgent task and draws much attention of the researchers. Although numerous methods were proposed for this task [3 , 36], almost all of them are fully supervised which assumes that the training data is adequate to learn the prediction model. In the real-world, there exist some scenarios where only limited default loans are available that leads these methods failing in learning a decent model. For example, a new P2P lending start-up which has almost no historical data for training and previous models from other companies may not be suitable for its business. Directly using these models would lead to poor accuracy. How to learn a prediction model in such weakly supervised situation is a new problem and has not be well-studied.

Nevertheless, weakly supervised loan default prediction is not a trivial task. It is facing three challenges that prevent the existing models yielding accurate results: (1) Limited default samples. The default samples in training data are paltry compared with the total number of loans, which leads to insufficient supervised information for obtaining a well-trained prediction model; (2) Hard decision boundary. Loans in the training set are binary labeled. But the likelihood of which a loan would be defaulting is more useful than the binary result in real-world, such as determining the interest rate; (3) Heterogeneous features. The features describing a loan consist of multiple aspects that have different data types, such as numerical, textual, categorical and so on. It is challenging to model these types of features in a unified manner.

To better understand these three challenges, we conduct some statistical analysis on three months loans in LendingClub and give some examples in Fig. 1. Because of the charge-off loans need about half a year to discover, we omit them in weakly supervised prediction although they are default. As illustrated in the left part of Fig. 1, only 8% of loans are defaulting. It means that, even though the total number of loans are huge i.e., 42346 loans, the default samples only take a tiny fraction of them, i.e., 3387 default loans, which is not sufficient for model training. The situation would be even worse if the available loans were in an extreme short period. For data labels, just like the examples that are shown in the right part of Fig. 1, loans are only be labeled with 0 indicating not a default loan, or 1 indicating a default loan. The prediction model trained from such hard decision boundary can barely capture how likely a new loan would be defaulting. Moreover, data types of loan features are various. It contains not only the numerical features such as income and age, but also the categorical feature such as sexual. In addition, the features also contain some textual data such as the mail address. All this information should be gathered together for yielding accurate predictions.

Fig. 1

Illustration of challenges in weakly supervised loan default prediction.

Although many research efforts have been conducted for predicting the default loans, none of them is able to address all the up-mentioned challenges. Traditional loan prediction methods can be categorized into two lines: classification-based methods [27, 28] which model this problem as a binary classification problem, and regression-based methods [4 , 19] that learn numerical credit scores for evaluating the risk of loans. Because of the interpretability, we argue that credit scores are more appropriate than the result of 0-1 classification. However, methods in these two categories hold the assumption that the training data have adequate default loans for model optimization. This assumption is not always hold in our problem. For training sufficiently, a batch of semi-supervised models [5 , 38] have been proposed for efficiently utilizing the information of normal loans to optimize the prediction model. Nevertheless, these models only support numerical input and fail to capture the full information in heterogeneous data types. Nowadays, with the emerging of deep representation learning [10, 35], these challenges are promising to be solved in one unified model.

In this paper, we propose a weakly supervised default loan prediction method, namely WEAKLOAN. It learns an accurate prediction model from a small number of default samples while supporting various types of features and yielding the numerical risk score of a loan systematically. The main idea of WEAKLOAN is based on the deep metric learning framework which constructs a metric to measure the similarity between two given samples. By doing so, WEAKLOAN can not only extract the features of a loan itself, but also model the hidden relationships in loan pairs. Both of the two aspects are critical for predicting the default probability of a loan and are good for learning the risk score of a loan in weakly supervised setting.

WEAKLOAN is composed of three key building-blocks: feature encoding module, metric learning module and score calculating module. By involving a deep neural network based encoding module, input features consisting of various data types are firstly transformed into a numerical embedding vector, and further fed in a multi-layer perception for generating the loan encoding. To learn the encoding module effectively, WEAKLOAN reorganizes the training data as loan pairs and employs two orthogonal objective functions for optimization. In this way, the scale of training data is quadratic increase and the hidden relationships lying in different categories of loans are modeled explicitly. Once the encoding and metric learning modules are well-trained, WEAKLOAN predicts the risk score of a new loan based on the result of its pairs with some pre-selected reference loans. The numerical risk score contains the useful information of both the predicted loan itself and the reference ones, which solves the hard decision boundary problem. After that, a per-defined threshold of the risk score is employed to predict whether a loan is defaulting. Moreover, we also provide a method for determining the threshold in new datasets. By integrating these modules, WEAKLOAN is able to predict the default loans effectively and efficiently, even though the labeled default loans are limited.

The main contributions of the paper lie on these aspects:

The loan default prediction task is formulated as a weakly supervised learning problem with limited available default loans for model training, that fits the reality perfectly.

We propose a weakly supervised default loan prediction model, WEAKLOAN, by employing deep metric learning, which can handle three challenges systematically in loan prediction problem, including limited default loans, hard decision boundary and heterogeneous features.

Extensive experiments on different scale of datasets show that WEAKLOAN is superior to compared baselines, especially when the number of labeled default loans is small.

The rest of this paper is organized as follows: Section 2 introduces the related works. Section 3 gives the problem statement and overview of WEAKLOAN, and interprets the technical details. We validate the effectiveness of WEAKLOAN in Section 4. And this work is concluded in Section 5.

2 Related work

Existing works that are related to this paper can be categorized into three groups: supervised loan default prediction methods, semi-supervised loan default prediction methods and deep metric learning. Methods in the first two groups aim at to train a precise prediction model for evaluating the default risk of a new loan. For the third group, we specify some deep metric learning models which are related to the architecture of WEAKLOAN.

2.1 Supervised loan default prediction

Several methods are proposed to predict the default risk of a loan in supervised learning setting, such as [2 , 36]. Authors of these works assume that there exist sufficient labeled default loans can be used to learn a perfect model. However, this assumption does not usually hold in real world. For solving the data unbalance problem, Chen et al. employs a diversified sensitivity under-sampling to reorganize the training data and yields better result compared with the baselines [36]. New models [33], such as LSTM, are utilized to model the sequential influence of the activities of applicants. Furthermore, some boosting techniques [28] are proposed to train the prediction model accurately. Although extensive supervised learning methods are available for predicting the default risk of loans, the labeled default loans are usually scarce in real-world, which leads all these methods failing to learn a decent model in weakly supervised setting. Perez-Martin et al. propose a logistic regression-based model for estimating the credit risk of P2P lending platform in China [6]. Non-parametric statistical method is used to identify the borrower’s characteristics and extract the features.

Nevertheless, all these works rely on sufficient labeled loans to train the prediction model. In real world, the labeled data is expensive and hard to get. Omitting the influence of unbalance data labels would lead the model failing to yield accurate prediction result.

2.2 Semi-supervised loan default prediction

To deal with the limited default data, recent years have witnessed an increasing amount of semi-supervised algorithms [18 , 38] for loan default prediction. Methods in this group hold an assumption that there has a subset of training data being labeled (with both normal loans and default loans) and sufficient unlabeled loans. Under this setting, Livieris et al. [18] propose a semi-supervised learning boosting method for evaluating the default risk. Li et al. [38] employs a semi-supervised SVM to optimize the decision super-planes for predicting loan default. Xiao et al. [22] utilize the semi-supervised ensemble algorithms to learn a better set of parameters of the prediction model.

Methods in this group treat the unlabeled loans as normal loan. We argue that obtaining the labeled normal loans would be rather expensive due that it takes several months or years to validate whether a loan is normal. Otherwise, these unlabeled loans could also be defaulting, which provides error supervised information in model training. This situation limits the usage of up-mentation methods.

2.3 Deep metric learning

Apart from these loan default prediction methods, WEAKLOAN is highly related to metric learning [1 , 24], which is a paradigm that learns a metric for sample pairs (or triplets) according to the pre-given supervised information. With the emerging of deep learning [25, 37], deep metric learning, which replace the feature selection module with a deep neural network, has been studied and aroused much attention [14, 17]. In recent years, there exist some efforts that extend deep metric learning methods to solve weakly supervised learning problems [16].

Although many deep metric learning-based methods are used for weakly supervised tasks, methods for weakly supervised loan default prediction have not been well studied. This is the motivation of this paper.

3 Methodology

In this section, we first define the weakly supervised default loan prediction problem. After that, the architecture of WEAKLOAN is overviewed and the details of the three key components in WEAKLOAN will be specified respectively. Moreover, the complexity analysis of WEAKLOAN will be given at the end of this section.

3.1 Problem Definition

Given a set of historical loan dataset $D = {L_{1}, \dots, L_{N}}$ containing N loans in which a small number of them are default loans $D^{d} = {L_{1}^{d}, \dots, L_{M}^{d}}$ . Each loan in $D$ is represented by a set of heterogeneous features L_n = [f₁, …, f_K] of which types could be numerical, categorical and so on. Note that the total number of loans in $D$ is not very large, which means the historical data is not quite sufficient for learning the prediction model in a totally supervised manner. With the information provided by $D$ , the definition of weakly supervised default loan prediction is to predict the default risks of a new set of loans $D^{'} = {L_{1}^{'}, \dots L_{T}^{'}}$ , satisfying that the risks of default loans in $D^{'}$ are larger than the risks of others.

3.2 Overview of WEAKLOAN

As shown in Fig. 2, WEAKLOAN is a deep metric learning based framework that is suitable for weakly supervised setting. It has been tested to be able to learn a better model compared with other supervised and semi-supervised methods [11, 15] in other tasks. To adapt it on weakly supervised default loan prediction and solve the existing challenges, our solution is decomposed into three sub-procedures in WEAKLOAN.

Fig. 2

Overview of WEAKLOAN. Feature Encoding Module is to learn the potential features from various data types of loans; Metric Learning Module constructs loan pairs and explores the relationships in these pairs based on labels; Score Calculating Module will predict the default probability of a loan according to its relationship with other reference loans.

Firstly, to fully utilize the information of loans and deal with heterogeneous modality of data, the features of each loan L = [f₁, …, f_K] are encoded into a fixed-length distributed vector by a deep neural network-based feature encoding module. Multiple feature types are supported by the encoding module, including numeric and categorical. It can be easily extended to other types such textual and image.

Secondly, to solve the limited default sample and increase the size of training data, the training loans in $D$ are reorganized into pairs in which each pair contains two loans <L_i, L_j>. Three types of latent relationships are defined based on the labels of the training pairs, namely two normal loans, two default loans, and one of each. The parameters of encoding module are trained according to the latent relationships.

Thirdly, for predicting the default score of a new loan $L_{t}^{'}$ , WEAKLOAN randomly selects some loans in training data as the references and uses them to construct pairs with $L_{t}^{'}$ . Taking these pairs as the input, WEAKLOAN generates the pair risks and the risk of $L_{t}^{'}$ is defined as the average of them.

After that, we can obtain the label of a new loan by comparing its risk with a threshold. The threshold is determined by the data distribution and can be estimated with the training data.

3.3 Feature encoding module

Taking L = [f₁, …, f_K] as the input, WEAKLOAN transforms L_n to a representation vector x_n that encodes multiple types of features. According to the type of features, the input can be divided into several groups, including the numeric feature group $F^{n} = [f_{1}^{n}, \dots f_{a}^{n}]$ , the categorical feature group $F^{c} = [f_{1}^{c}, \dots, f_{b}^{c}]$ and so on. After that, WEAKLOAN encodes each group separately. For concise, we only show the processing details of up-mentioned two feature groups. $\begin{matrix} x^{n} = MLP (F^{n}; θ_{n}) \\ x^{c} = MLP (e^{c}; θ_{c}), e^{c} = M^{c} \cdot F^{c} \\ x^{nc} = concat ([x^{n}, x^{c}]) \\ x = MLP (x^{nc}; θ_{nc}) \end{matrix}$ (1)

As shown in Equation 1, the numeric features are encoded into a representation vector by a multilayer perceptron (MLP). For categorical features, WEAKLOAN first employs an embedding metrics M^c to learn the representation of each class, and then transforms the class embeddings with a MLP to obtain the encodings. Finally, the different types of encodings are concatenated together to generate the encoding of L. Note that, in order to balance the influence of different features, we initialize the dimension d of every feature equally and learn the weights of them by employing a neural network, namely, ∀i ∈ [1, a] and j ∈ [1, b], $\dim (x_{i}^{n}) = \dim (x_{j}^{c}) = d$ .

This feature encoding method can be easily extended to other types of features. Assuming that a feature f^g is an image, we can employ a Conventional Neural Network (CNN) [3] to encode it into a d dimensional vector and concatenate it with existing features encoding x^nc. If there is a feature f^t that presents as a sentence, we can first extract its keywords and embed these keywords with the pre-trained word vectors [21]. Then, these word embeddings can be easily transformed into a d dimensional vector by using RNN [31] or BERT [20] and encoded in the same way as other features.

3.4 Metric learning module

To optimize the parameters, especially the weights of MLPs in Equation 1, a metric learning framework is conducted in WEAKLOAN. The metric learning module takes a pair of loans as the input and outputs a default score which indicates how likely the input loans contain default loans. Next, we first specify the generation of loan pairs and then detail the metric learning architecture and optimization method respectively.

3.4.1 Reorganize training data into loan pairs

Due to the limited labeled loans in training data and imbalance of data labels, WEAKLOAN reorganizes the training data into loan pairs and utilizes the label of these pairs to guide the model optimization. According to the binary label of loans, there exists three kinds of the loan pairs: (default, default), (default, normal) and (normal, normal) denoted as $(L_{i}^{d}, L_{j}^{d})$ , $(L_{i}^{d}, L_{j}^{n})$ and $(L_{i}^{n}, L_{j}^{n})$ . Because that the number of default loans in $D$ is small, we first sample all $(L_{i}^{d}, L_{j}^{d})$ pairs and construct a set of $(L_{i}^{d}, L_{j}^{n})$ pairs by randomly sampling for each default loan. After that, the $(L_{i}^{n}, L_{j}^{n})$ pairs are randomly generated from the normal loans. $\begin{matrix} (L_{i}^{d}, L_{j}^{d}) \sim D^{d} \\ (L_{i}^{d}, L_{j}^{n}) \sim D \\ (L_{i}^{n}, L_{j}^{n}) \sim D - D^{d} \end{matrix}$ (2)

Once the pairs are generated, we specify the default scores of them. Assuming that the default scores of $(L_{i}^{d}, L_{j}^{d})$ , $(L_{i}^{d}, L_{j}^{n})$ and $(L_{i}^{n}, L_{j}^{n})$ pairs are s_dd, s_dn and s_nn accordingly. They hold the relationship of s_dd > s_dn > s_nn. After this procedure, the training data is organized into a new set of pairs: $D^{pair} = {(L_{i}, L_{j}, s_{ij})}$ , where $L_{i}, L_{j} \in D$ and s_ij ∈ {s_dd, s_dn, s_nn}.

3.4.2 Metric learning and optimization

In metric learning procedure, the objective function is defined to minimize the output of WEAKLOAN and the ground truth default score. Specifically, WEAKLOAN consumes a pair of loans (L_i, L_j) and first encodes the two input loans by the features encoding module to generate its representation vector x_i and x_j. Then, the representation vectors are fed into a fully connected flatten layer to calculate the default score ${\hat{s}}_{ij}$ . $\begin{matrix} x_{i}, x_{j} = FEM (L_{i}), FEM (L_{j}) \\ x_{ij} = concat (x_{i}, x_{j}) \\ {\hat{s}}_{ij} = Flatten (x_{ij}; θ_{F}) \end{matrix}$ (3) where FEM is the feature encoding module. Assuming that the dimensions of x_i and x_j are h, the dimension of x_ij is 2h. After the flatten layer, the numeric default score ${\hat{s}}_{ij}$ is generated.

For model training, WEAKLOAN directly fits the estimated default score with the ground truth s_ij. The loss function of the metric learning part is: $L_{Θ} = argmin \sum_{(L_{i}, L_{j}, s_{ij}) \in D^{pair}} | | {\hat{s}}_{ij} - s_{ij} | |_{2} + λ \cdot | Θ |$ (4) where || · ||₂ is the 2-norm of the divergence and Θ = {θ_n, θ_c, θ_nc, θ_F}. |Θ| is a regularization term for preventing WEAKLOAN converging to an ordinary point, such as all parameters are zeros. λ is used to balance the regularization term and the learning target. We empirically set λ = 0.1 in all experiments.

3.5 Risk score calculating module

Although the default scores of loan pairs are able to predict accurately, the risk score of one specific new loan $L_{t}^{'}$ cannot be calculated following the metric learning framework. To address this problem, we propose a reference-based risk score calculation method that first selects R reference loans in $D$ to construct reference pairs with $L_{t}^{'}$ . Then, with the well-trained WEAKLOAN, we calculate the default scores of all reference pairs and generate risk score based on the results of them.

Specifically, with the reference loans $R = {L_{1}, \dots, L_{R}}$ , the default score of the loan pairs in ${(L_{t}^{'}, L_{1}), \dots, (L_{t}^{'}, L_{R})}$ are able to calculate ${{\hat{s}}_{t 1}, \dots, {\hat{s}}_{tR}}$ . Then, the risk score of $L_{t}^{'}$ is defined as the average of these default scores: $s_{t}^{'} = \frac{1}{R} \cdot \sum_{r = 1}^{R} {\hat{s}}_{tr}$ (5)

The select reference loans are determined by what kinds of defaults are willing to be detected. In this paper, we give a naive selection policy which randomly chooses equally-sized default and normal loans, R/2 loans are default and the other R/2 are normal. The risk score can be rewrote to: $s_{t}^{'} = \frac{1}{R} \cdot [\sum_{r = 1}^{R / 2} {\hat{s}}_{tr}^{d} + \sum_{r^{'} = 1}^{R / 2} {\hat{s}}_{{tr}^{'}}^{n}]$ (6)

3.6 Loan default prediction

In usage, $s_{t}^{'}$ is compared with a pre-defined threshold φ to determine whether it could be granted. For a new loan $L_{t}^{'}$ , if $s_{t}^{'}$ is larger than φ, the loan is more likely to be a default loan.

Moreover, we propose a Gaussian Mixture Model (GMM) to estimate the proper value of φ. Specifically, we first utilize the well-trained WEAKLOAN to calculate all the risk score of loans in the training dataset and obtain a risk score list S = [s₁, s₂, . . . , s_n]. Assuming that S is sampled from a GMM that contains two Gaussian components, one denoting as $N (μ_{n}, σ_{n})$ for normal loans and another one denoting as $N (μ_{d}, σ_{d})$ for default loans. Intuitively, μ_d > μ_n which indicates the default loans are more likely have large risk score than the normal ones. We can estimate the parameters of the two components {μ_n, σ_n, μ_d, σ_d}, by using EM algorithm. Then, the threshold can be determined by {μ_d, σ_d}: $φ = μ_{d} + k \cdot σ_{d}$ (7) where $k \in ℕ$ for controlling the loan policy. If k is positive, the threshold is large, which means that more loans are potential to be considered as the normal loans. In contrary, if k is negative, more loans are potential to be considered as the default loans.

3.7 Complexity analysis

Beyond accuracy, the complexity of a default loan prediction method is also not to be neglected. High complexity limits the usage of the prediction method, especially in P2P online lending. For predicting the risk score of a new loan, the most complex part of WEAKLOAN lies in the feature encoding module. By involving the reference loans, the complexity of WEAKLOAN is O ((R + 1) · α) where R is the number of references and α is a const that indicates the complexity of feature encoding. However, this complexity can be rather reduced if we fix the reference loans and pre-compute the representation vectors of these references before predicting new loans. In this way, the complexity of WEAKLOAN is O (1) which is suitable for online lending.

4 Experiments

In this section, we aim to answer the following evaluation questions:

EQ1: Can WEAKLOAN predict the default loans more precisely than existing state-of-the-art baselines?

EQ2: How the performance of WEAKLOAN is under different numbers of training loans?

EQ3: How the threshold φ and the number of reference loans R influence the performance of WEAKLOAN?

All the experiments are conducted in a high-end server Dell T7920 with two Nvidia 2080Ti GPUs. Both our methods and the compared baselines are implemented with Python 3.7. We utilize PyTorch 1.3.1 to build and train the WEAKLOAN model. For the performance evaluation, we directly use the metrics provided by sklearn package. Next, we first introduce the experiment settings and then analyze the experimental results to answer up-mentioned questions.

4.1 Experiment settings

4.1.1 Dataset

The datasets that we used in this paper are loans from LendingClub 2 , which is an American peer-to-peer lending company that provides the “bridge” between investors and borrowers. The dataset contains complete loan data for all loans issued through 2007-2015, including the current loan status (Current, Late, Fully Paid, etc.) and latest payment information. The total dataset has about 2.3 million loans and there exists a 145-dimensional feature set for each loan, including credit scores, number of finance inquiries, address and so on. The distribution and some statistics of the dataset are illustrated in Fig. 3. As shown, the default loans only take a small fraction of the whole dataset. In our experiments, we also regard the late loans and call-off loans as default loans. The experimental dataset is public available at here 3 .

Fig. 3

Statistics of the LendingClub dataset.

For simulating the weakly supervised scenario, we only use a small number of loans as the training data and take the other loans as the test data for evaluation. As shown in Fig. 3, the loans distribute unevenly over time. We first shuffle the dataset and calculate the average number of loans in one day. Then, five sub-datasets: one day $D_{1}$ , one week $D_{2}$ , one month $D_{3}$ , two months $D_{4}$ and three months $D_{5}$ , are constructed by randomly sampling loans from the whole dataset for training. The details of these dataset are shown in Table 1.

Table 1

Sub-datasets to train the prediction model

	$D_{1}$	$D_{2}$	$D_{3}$	$D_{4}$	$D_{5}$
Default Loans	54	342	1440	2812	4259
Total Loans	0.5 k	3.5 k	15 k	30 k	45 k

4.1.2 Compared baselines

In order to show the superior of WEAKLOAN, we compare it with several existing baselines for default loan prediction. These baselines can be categorized into two groups: supervised methods which contains SVM, Logistic Regression, Naive Bayes (NB) and Multi-layer Perceptron (MLP); and semi-supervised methods that includes S3VM. We also test the boosting method [26], but omit the result because it classifies almost all loans into normal.

Naive Bayes (NB): This model assumes that the loans are conditional independent given the labels. We choose multinomial distribution to estimate the likelihood of dataset and maximize the likelihood to optimize model parameters.

SVM [9]: Directly taking the features as input, SVM is a classic method for max-marginal classification which learns a classification super plane to decide whether a loan is default.

Logistic Regression (LR) [6]: It is a model that transfers the linear combination of input features with a logistic function and generates the classification result based on the output of the function.

Multi-layer Perceptron (MLP) [2]: This method models the classification problem with multilayer neural networks and outputs the probability of each class for one specific loan.

Semi-supervised SVM (S3VM) [38]: Given a small training set of labeled loans and sufficient unlabeled loans, S3VM is an inductive learning algorithm that estimates the classification function at all possible values and then uses the fixed function to deduce the classes of the unlabeled loans.

To clarify the existing works related to default loan prediction, we summarize the data, methods, evaluation metrics and results that reported in their original papers. All this information are organized in Table 2. As shown, Li et al. [38] use the same dataset with us. They achieve over 0.9 accuracy which is better than the performance we reported in Fig. 5. This is because method in [38] utilizes 70% data for training which contains over one million loans. In this paper, we focus on weakly-supervised setting and only use at most 45 k loans for training.

Table 2
Summary of the baselines

Paper Classifier Metrics Dataset Score

Huang et al. [9] SVM Accuracy Australian and German credit datasets in UCI Repository of ML 0.77

Perez-Martin et al. [6] SVM + LR RMSE &Accuracy Simulated Data (Private) –

Byanjankar et al. [2] Neural Networks Accuracy European P2P lending dataset (Private) 0.74

Li et al. [38] Semi-Supervised SVM Accuracy LendingClub dataset 0.91

Odeh et al. [26] Boosting Rule strength Seventh Farm Credit District customers loan dataset (Private) –

Paper	Classifier	Metrics	Dataset	Score
Huang et al. [9]	SVM	Accuracy	Australian and German credit datasets in UCI Repository of ML	0.77
Perez-Martin et al. [6]	SVM + LR	RMSE &Accuracy	Simulated Data (Private)	–
Byanjankar et al. [2]	Neural Networks	Accuracy	European P2P lending dataset (Private)	0.74
Li et al. [38]	Semi-Supervised SVM	Accuracy	LendingClub dataset	0.91
Odeh et al. [26]	Boosting	Rule strength	Seventh Farm Credit District customers loan dataset (Private)	–

4.1.3 Evaluation metrics

In this paper, we employ four commonly used metrics, including precision, recall, accuracy and F1-score, for evaluating the performance of WEAKLOAN with the compared baselines. Given a set of test loans, the classification results are classified into four groups: true positive (TP), true negative (TN), false positive (FP), and false negative (FN), where the true/false indicate whether the loan is classified correctly and positive/ negative indicate whether the classification result of a loan is default or normal. Then, these four evaluation metrics can be formulated as follows: $\begin{matrix} precision = \frac{TP}{TP + FP}; recall = \frac{TP}{TP + FN} \\ accuracy = \frac{TP + TN}{TP + TN + FP + FN} \\ F 1 - score = 2 \cdot \frac{precision \times recall}{precision + recall} \end{matrix}$ (8)

4.1.4 Parameter Settings

The hyperparameters that need to be set in WEAKLOAN consisting of three parts: (1) the value of default score s_dd, s_dn and s_nn; (2) the number of reference loans R; and (3) the decision threshold φ. For the default scores, we empirically set s_dd = 1, s_dn = 0.5 and s_nn = 0.1. For R and φ, we employ grid search to select the best values. More details of parameter studies are available in Section 4.4. Because that the test loans are accessible in our experiments, we empirically use φ that has the highest performance for evaluation and do not report the results of its estimation method.

In model training, we utilize the commonly used optimizer Adam [12] to learn the parameters. According to the recommendation, the learning rate is empirically set to 0.01. For each subdataset, we train WEAKLOAN in 50 epochs and select the model performing best on the test data to report.

4.2 Features selection

Before conducting the experiments, we first select a subset of relevant features from the LendingClub dataset which contains 146 different features for each single loan. Exploration data analysis (EDA) was employed to find what features are relevant to the default loan prediction task. Specifically, two rules are used for feature selection: (1) mutual exclusivity, which regularizes that the features are independent of each other. (2) relevance, which selects the features that are able to contribute to the prediction of default loan. For the first rule, we employ correlation matrix of these 146 features and choose independent ones as the candidate features. Figure 4 illustrates part of the results of this procedure. As shown, the loan amount is highly correlated with installment, and grade, subgrade and interest rate are correlated with each other, i.e., the correlation coefficients are approximate to 1. Therefore, we choose loan amount and interest rate as the selected features. For the second rule, we further filter the candidate features to obtain the relevant features by calculating the correlation coefficient between these features and the label. After that, the relevant features subset is returned, which consists of 7 features to learn the prediction model by WEAKLOAN. The result of these features and some loans can be found in Table 3.

Fig. 4

The correlation matrix of the selected features.

Table 3

Data samples of the selected features

Amount	Interest	Employ Time	Home	Income	Purpose	DTI
5000	10.65	10 + years	Rent	24000	Credit card	27.65
2500	15.27	<1 year	Rent	30000	Car	1
2400	15.96	10 + years	Own	12252	Business	8.72
10000	13.49	10 + years	Rent	49200	Other	20
3000	12.69	1 year	Rent	80000	Other	17.94

4.3 Performance comparison

In this section, we aim to answer the first two EQs. We compare the performance of different baselines and WEAKLOAN in various scales of training data. The results of these experiments are reported in Fig. 5. From these results, we can observe that:

For the performance of default loan prediction, WEAKLOAN achieves great improvement compared to other state-of-the-art baselines in most cases in terms of precision, accuracy and F1-score, demonstrating the effectiveness of our proposed framework.

With the increase of training loans, the performance of all methods first increases and then becomes stable. And WEAKLOAN outperforms all compared baselines significantly on precision, accuracy and F1-score, which indicates the superior of WEAKLOAN in weakly supervised setting. It also proves that WEAKLOAN is able to yield accurate prediction, even though the default samples in training data are insufficient.

Among the compared baselines, supervised methods inferior to the semi-supervised method S3VM, especially when the number of training loans is small. It is because semi-supervised methods consider the nature of training data and yield better evaluation results of default loans, validating that the loan prediction task is suitable to be solved by semi-supervised or weakly supervised learning methods. WEAKLOAN outperforms the compared baselines including the semi-supervised methods. It is because WEAKLOAN involves heterogeneous features and models the interaction of these features to the prediction.

Within supervised methods, MLP is the strongest baselines, but it still inferior to WEAKLOAN. MLP can be treated as a variant of WEAKLOAN without the metric learning module. This result proves that the deep metric learning module in WEAKLOAN can not only learn a soft decision boundary for default loans but also capture the hidden relationship of labels, which is useful for generating more accurate predictions.

One interesting fact observed in the second sub-figure in Fig. 5 is that the recall of WEAKLOAN does not perform well compared with NB and MLP. It is because these two methods prefer to classify all loans to be the default loans. As shown, the precision of them is relatively low which means they cannot detect decent default loans.

Fig. 5

Results of performance comparison.

Overall, WEAKLOAN performs best on three evaluation metrics, precision, accuracy and F1-score, compared with the baseline models. Although it is inferior to NB and MLP on recall, we can still claim that our proposed WEAKLOAN is able to explore enough useful information from heterogeneous features in limited default loans for training, and produce numerical scores to predict their default probabilities.

4.4 Parameter studies

To answer the third EQ, we compare the performances of WEAKLOAN under the different parameter settings. Specifically, we vary the values of φ and R, and get different results of WEAKLOAN. Due to the space limitation, we only report the results under F1-score which is able to represent the performance of WEAKLOAN comprehensively.

4.4.1 Influence of φ

The results of parameter φ are illustrated in Fig. 6. From these results, we can observe that with the increase of φ, the performance of WEAKLOAN first increases and then decreases violently. The reason is that high threshold will lead to an amount of false default loan alerts. The F1-score is dominated by these loans. This phenomenon is helpful in selecting the threshold φ on different datasets.

Fig. 6

The effectiveness of φ.

4.4.2 Influence of R

The results of parameter R are illustrated in Fig. 7. From the results, we can observe that with the increase of R, the F1-score of WEAKLOAN first increases, suggesting that sufficient reference loans can improve the performance of default loan prediction. After the first increase, the performance becomes stable and then decreases slightly, which is useful from a practical point of view to select parameters.

Fig. 7

The effectiveness of R.

5 Conclusion

In this paper, we investigate how to address three challenges, namely insufficient default samples, hard decision boundaries and numerous heterogeneous features, in loan prediction task. Along this line, we formulate this task as a weakly supervised loan default prediction problem, and propose a framework WEAKLOAN based on deep metric learning to solve it. Specifically, WEAKLOAN consists of three modules: feature encoding module can model the loan features from multiple types, metric learning module can capture the hidden relationship in loan pairs, and score calculating module outputs the probability that a loan may be defaulting. The experimental results show that WEAKLOAN is superior to the compared baselines on both efficiency and effectiveness. For future research, we plan to explore the influences of different sampling policies of reference loans.

Footnotes

References

Bellet

, Habrard

and Sebban

, A survey on metric learning for feature vectors and structured data, arXiv preprint arXiv:, 1306.6709 (2013).

Byanjankar

, Heikkilä

and Mezei

, Predicting credit risk in peer-to-peer lending a neural network approach, Proceedings of the IEEE Symposium Series on Computational Intelligence (2015), 719–725.

Coser

, Maer-matei

M.M.

and Albu

, Predictive models for loan default risk assessment, Economic Computation & Economic Cybernetics Studies & Research53(2) (2019), 149–165.

Khandani

A.E.

, Kim

A.J.

and Lo

A.W.

, Consumer credit-risk models via machine-learning algorithms, Journal of Banking & Finance34 (2010), 2767–2787.

Kim

and Cho

S.B.

, An ensemble semi-supervised learning method for predicting defaults in social lending, Engineering Applications of Artificial Intelligence81 (2019), 193–199.

Perez-Martin

, Perez-Torregrosa

and Vaca

, Big data techniques to measure credit banking risk in home equity loans, Journal of Business Research89 (2018), 448–454.

Kulis

, Metric learning: A survey, Foundations and Trends in Machine Learning5 (2013), 287–364.

Bravo

, Thomas

L.C.

and Weber

, Improving credit scoring by differentiating defaulter behaviour, Journal of the Operational Research Society66 (2015), 771–781.

Huang

, Chen

and Wang

, Credit scoring with a data mining approach based on support vector machines, Expert Systems with Applications33 (2007), 847–856.

10.

Huang

, Li

and Change

, Loy and X. Tang, Learning deep representation for imbalanced classification, Proceedings of the IEEE conference on computer vision and pattern recognition (2016), 5375–5384.

11.

Buniatyan

, Popovych

, Ih

, Macrina

, Zung

and Seung

H.S.

, Weakly supervised deep metric learning for template matching, Advances in Computer Vision943 (2020), 39–58.

12.

Kingma

D.P.

and Ba

, Adam: A method for stochastic optimization, Proceedings of the International Conference on Learning Representations (2015).

13.

Babu

D.P.S.

and Vijayalakshmi

, Enhancement of ecommerce security through asymmetric key algorithm, Computer Communications153 (2020), 125–134.

14.

Hoffer

, Ailon

Proceedings of the International Workshop on Similarity-Based Pattern Recognition (2015), 84–92Deep metric learning using triplet network, International Workshoon Similarity-Based Pattern Recognition.

15.

Pang

, Hengel

A.V.D.

and Shen

, Weakly supervised deep anomaly detection with pairwise relation learning, arXiv:1, a), (2019).

16.

Pang

, Shen

, Jin

and Hengel

A.V.D.

, Deep weaklysupervised anomaly detection, arXiv:1, b), (2019).

17.

Song

H.O.

, Jegelka

, Rathod

and Murphy

, Deep metric learning via facility location, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), 5382–5390.

18.

Livieris

I.E.

, Kiriakidou

, Kanavos

, Tampakas

and Pintelas

, On ensemble ssl algorithms for credit scoring problem, Informatics5(40) (2018).

19.

Sirignano

J.A.

and Giesecke

, Risk analysis for large pools of loans, Management Science65 (2019), 107–121.

20.

Devlin

, Chang

M.W.

, Lee

and Toutanova

, Bert: Pretraining of deep bidirectional transformers for language understanding, arXiv preprint arXiv:5, (2018).

21.

Pennington

, Socher

and Manning

C.D.

, Glove: Global vectors for word representation, Proceedings of the conference on empirical methods in natural language processing (2014), 1532–1543.

22.

Xiao

, Zhou

, Zhong

, Xie

, Gu

and Liu

, Costsensitive semi-supervised selective ensemble model for customer credit scoring, Knowledge-Based Systems189 (2020), 105–118.

23.

Kennedy

, Namee

B.M.

and Delany

S.J.

, Using semisupervised classifiers for credit scoring, Journal of the Operational Research Society64 (2013), 513–529.

24.

Yang

and Jin

, Distance metric learning: A comprehensive survey, Ph.D. Dissertation, Michigan State University (2006).

25.

Amanullah

M.A.

, Habeeb

R.A.A.

, Nasaruddin

F.H.

, Gani

, Ahmed

, Nainar

A.S.M.

, Akim

N.M.

and Imran

, Deep learning and big data technologies for iot security, Computer Communications151 (2020), 495–517.

26.

Odeh

, Koduru

, Featherstone

A.M.

, Das

and Welch

S.M.

, A multi-objective approach for the prediction of loan defaults, Expert Systems with Applications38 (2011), 8850–8857.

27.

Moffatt

P.G.

, Hurdle models of loan default, Journal of the Operational Research Society56 (2005), 1063–1071.

28.

Odegua

, Papers, Predicting bank loan default with extreme gradient boosting (2020).

29.

Maldonado

and Paredes

, A semi-supervised approach for reject inference in credit scoring using svms, Proceedings of the Industrial Conference on Data Mining (2010), 558–571.

30.

Fitzpatrick

and Mues

, An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market, European Journal of Operational Research249 (2016), 427–439.

31.

Mikolov

, Kombrink

, Burget

, Cernock‘y

and Khudanpur

, Extensions of recurrent neural network language model, Proceedings of the IEEE international conference on acoustics, speech and signal processing (2011), 5528–5531.

32.

Sun

, Chen

, Ye

J.X.

, Zhang

, Xu

C.Z.

, Zhang

, Wang

, Wu

, Zhang

and Qu

, Semi-supervised anti-fraud models for cash pre-loan in internet consumer finance, Proceedings of the IEEE International Conference on Industrial Cyber Physical Systems (2019), 635–640.

33.

, Long

, Sun

, Yang

and Li

, Overdue prediction of bank loans based on LSTM-SVM, Proceedings of the IEEE Smart-World (2018), 1859–1863.

34.

Liu

, Zhang

and Yang

, Fraud risk measurement of basic medical insurance for urban and rural residents in china, Economic Computation and Economic Cybernetics Studies and Research / Academy of Economic Studies53 (2019), 277–296.

35.

Bengio

, Courville

and Vincent

, Representation learning: A review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence35 (2013), 1798–1828.

36.

Chen

, Zhang

and Ng

W.W.Y.

, Loan default prediction using diversified sensitivity undersampling, Proceedings of the International Conference on Machine Learning and Cybernetics (2018), 240–245.

37.

LeCun

, Bengio

and Hinton

, Deep learning, Nature521 (2015), 436–444.

38.

, Tian

, Li

, Zhou

and Yang

, Reject inference in credit scoring using semi-supervised support vector machines, Expert Systems with Applications74 (2017), 105–114.