Abstract
The research on the risk pricing of Internet finance online loans not only enriches the theory and methods of online loan pricing, but also helps to improve the level of online loan risk pricing. In order to improve the efficiency of Internet financial supervision, this article builds an Internet financial supervision system based on machine learning algorithms and improved neural network algorithms. Moreover, on the basis of factor analysis and discretization of loan data, this paper selects the relatively mature Logistic regression model to evaluate the credit risk of the borrower and considers the comprehensive management of credit risk and the matching with income. In addition, according to the relevant provisions of the New Basel Agreement on expected losses and economic capital, starting from the relevant factors, this article combines the credit risk assessment results to obtain relevant factors through regional research and conduct empirical analysis. The research results show that the model constructed in this paper has certain reliability.
Introduction
The existence of the fragility of the financial system and the evasive innovation activities undertaken by financial institutions to obtain greater profits have increased the risks of the financial system to a certain extent. When the risks accumulate enough, the crisis will be getting closer. The US government issued a series of strict supervision bills after the crisis, which showed a strict supervision attitude, and made the innovation activities of financial institutions in financial markets generally suppressed, and significantly reduced the development level of various innovative products.
Financial supervision and financial innovation are mutually reinforcing and mutually restrictive. In a sense, market failure is the cause of financial regulation, and the implementation of regulation will prompt the market to reach a new equilibrium. However, financial innovation can break the equilibrium of the market. When financial innovation reaches a qualitative change through a quantum leap, it will break through the existing regulatory system, thereby further affecting the orderly development of the financial system. In this case, the financial crisis is more likely to happen. Therefore, governments of all countries regard constructing a harmonious and effective financial regulatory environment as their current urgent priority. Objectively speaking, financial regulators need to make appropriate adjustments to financial innovations that change their operating conditions. While pursuing profits and evading supervision, financial institutions amplify the systemic risks of the financial system in the financial market and increase the difficulty of supervision. From this point of view, the impact of financial institution’s financial innovation on the financial market reflects two opposites: financial innovation can not only promote the development of the financial system to a certain extent, but also increase the risk of the financial system. The history of the financial crisis is often accompanied by the history of financial regulatory reform and the development of financial innovation. The dialectics of supervision theory has realized the transition from supervision by financial supervisory institutions to innovation of financial institutions, and reached the conclusion that the deregulation of supervisory institutions has achieved the re-innovation of financial institutions. Moreover, the results of this study explain the relationship between the two to promote and contain each other. In addition, their dynamic game process has led to the continuous deepening and reform of the financial supervision system, and financial innovation and financial supervision have formed an evolutionary game relationship between them through mutual promotion and mutual restraint. It is urgent to seek to reconstruct the coordination relationship between regulation and innovation and to prevent the occurrence of new financial crises by seeking an evolutionary game equilibrium between financial regulation and financial innovation in the current global financial crisis governance process. In the current trend of countries in the world seeking financial regulatory reform, it is a long and arduous task to explore an appropriate financial regulatory path suitable for Chinese characteristics. At present, China’s regulatory model is separate regulation, and the establishment of this model not only borrows from the US pre-crisis regulatory model, but also adapts to the current needs of Chinese separate operations. With the development of the international economic situation, the openness of China’s financial market is gradually strengthened, and financial innovations are constantly emerging. However, the comprehensive financial innovation business will cause the problem of overlapping supervision and supervision caused by the separate supervision mode, and the way of separate supervision in China will also produce more areas that are not supervised.
Related work
The literature [1] believed that the supervision of financial institutions by regulatory authorities will be influenced by interest groups, and this influence will eventually lead to deviations from the regulatory objectives. The literature [2] pointed out that market failures, especially financial market failures, should be resolved through government regulation. The literature [3] pointed out that while banks are pursuing their own profit maximization, systemic risks are constantly increasing, which leads to the existence of inherent instability, and put forward the “financial instability hypothesis". The literature [4] believed that to solve the problem of market contract failure, a supervision mechanism must be introduced. Subsequently, domestic and foreign scholars conducted in-depth research on financial supervision theory based on the existing financial supervision theory. The literature [5] pointed out from the perspective of my country’s financial supervision model that the root cause of weak supervision is caused by the current supervision model. The literature [6] pointed out that my country should establish financial supervision with risk as the core by comparing foreign supervision models. The literature [6] put forward the guiding direction of my country’s financial supervision system reform on the basis of functional financial supervision. The literature [7] studied several problems in the US financial regulatory agencies, and found that the conditions for the integration of financial regulatory structures are still immature through the comparison of Sino-US financial regulatory structures. The literature [8] analyzed my country’s financial regulatory system and pointed out that in the post-crisis era, it is necessary to always adhere to macro-prudential regulatory standards in order to establish a coordinated and stable financial regulatory system. The literature [9] analyzed the problems of financial supervision after the outbreak of the financial crisis, and showed that financial supervision needs to be reformed to cope with the crisis. The literature [10] pointed out that banks have certain systemic risks and vulnerabilities, and pointed out that for the healthy and orderly development of financial markets, it is necessary to establish an effective regulatory system. The literature [11] tested and judged the effectiveness of the improvement of financial market efficiency on the reform of financial supervision. The literature [12] systematically analyzed the problems in my country’s financial supervision system and gives relevant suggestions. The literature [13] analyzed the changes in the US credit market from the perspective of financial supervision.
The literature [14] pointed out that with the development of the economy, people’s requirements for the quality of related financial products and services will become higher and higher. Moreover, in order to continuously seize market share, financial institutions need to adopt certain financial innovation strategies to meet the latest needs of the financial market and further achieve their goal of pursuing higher profits. Due to the increasing competition in the market, if companies want to achieve a net increase in profits, they can continue to reduce their costs by improving their own technology. This technical improvement is the result of profit-driven financial innovation. The literature [15] explained financial innovation in terms of expected corporate profits and market competition. The literature [16] put forward the theory of circumvention innovation and believed that the innovation of commercial banks is carried out to evade the existing financial supervision. According to Katie’s theory of circumvention, the literature [17] further proved the connection between financial innovation and regulation. The study found that the strict supervision of investment banks by regulatory authorities has further encouraged banks to develop more innovative products.
The literature [18] briefly discussed the connotation and causes of financial innovation, which makes the concept of financial innovation gradually appear in everyone’s field of vision. The literature [19] compared financial innovation between China and the West and pointed out that the theory of financial innovation in the West is not fully applicable to my country. In the empirical aspect of financial innovation, the multi-objective optimization method is used for the first time and a model of my country’s financial innovation is established, and the reasons for many problems in my country’s financial innovation are further explained through empirical evidence. The literature [20] systematically elaborated the development of financial innovation theory and its impact on the evolution of my country’s regulatory system based on the theory of financial innovation, and studied financial innovation based on game theory. Moreover, the literature conducted a dynamic game analysis of the entire process of financial innovation and analyzed in detail the game relationship between the various innovation entities in the process of financial innovation and the result of achieving an optimal equilibrium. Finally, according to the game relationship, some phenomena in my country’s financial innovation were theoretically discussed. The literature [21] believed that the financial crisis is partly due to the speculation of complex financial derivatives and pointed out that financial innovation and overconfidence in new financial innovation products are the main reasons for the outbreak of the financial crisis. The article [23] implementated IoT-based Smart City is achieved by exploiting IoT and BigData Analytics using Hadoop ecosystem in real time environments. The article [24] reflects on IoT and its main role in the development of human behaviors and actions. The paper also deals with the compilation of various data from different databases connected to the Internet. The literature [25] addresses the numerous issues in the field of vehicle communication with the suggestion for a mutual unified and dispersed spectrum sensing model. The introduction of a mutual cognitive paradigm minimizes conflict and multiple unknown problems. The literature [26] discusses the issue, such as large amount of bigdata, and introduces the SmartBuddy framework for creating smart and adaptive ecosystems using human behaviors and human dynamics. The article [27] talks around the development of coordinated non-cyclic chart for video coding calculations for movement estimation in parallel reconfigurable computing frameworks [28]. The partitioning algorithm moreover plays a key part in optimizing the encoding of images [29].
Research design
The basic idea of factor analysis is to first group variables according to the magnitude of correlation to make the correlation between variables in the same group greater than the variables between different groups, and a basic structure represented by each group of variables is called a common factor. The core is to reflect the majority of the information of the original variables with a small number of factors that do not have multicollinearity. This article assumes that there are p variables, each with a mean value of 0 and a standard deviation of 1, and X is used to represent the original observed variable and the transformed variable. Moreover, this article assumes that the original public factor variable is y1, y2, ⋯ , y m , the standardized public factor variable is F1, F2, ⋯ , F m , (m < p), and the relevant conditions are as follows:
(1) The observable random vector is [21]:
The mean vector is:
The covariance matrix is equal to the correlation matrix R, that is, cov (X) = ∑.
(2) The unmeasured vector is:
is independent of F, E (ɛ) = 0, and cov (ɛ) are diagonal, that is:
It can be seen that the vectors of ɛ are also independent.
In summary, the factor model is as follows:
Its matrix form is:
In this mathematical model, F is a common factor, A is a factor load matrix, and a ij is called a factor load, that is, the load of the i-th original variable on the j-th factor.
Establishing a factor analysis model is not only to find common factors and group variables, but also to know the actual meaning of each common factor, so as to make a scientific analysis of actual problems.
The public factor has the ability to reflect the correlation of the original variables. Using the public factor to represent the original variable can describe the characteristics of the research object. Therefore, we express the common factor as a linear combination of variables, that is:
The above formula is used to calculate the common factor score of each variable, and then to conduct a more in-depth study of the variable, which is called the factor score function. Since the number m of equations in the above formula is less than the number P of variable functions, we cannot accurately calculate the factor score, only the estimated value of the factor score. Weighted least squares, regression, etc. are all methods of estimating factor scores.
Discrete variables are usually treated with dummy variables. However, because there are many discrete variables in this paper, if all the virtual variables are used for processing, the analysis results will be complicated and the accuracy of the results will not be high.
According to the definition of whether the dependent variable borrower defaults in Logistic regression, the borrower is divided into default and non-default. Among them, non-default borrowers are marked as 0, and default borrowers are marked as 1.
Woe (Weight of Evidence) represents the effect of different values of independent variables on the default of the borrower, and its calculation formula is shown below [22].
Among them, woe i refers to the WOE value corresponding to the i-th attribute of a variable, g i refers to the number of non-default borrowers corresponding to the i-th attribute of a variable, and b i refers to the number of default borrowers corresponding to the i-th attribute of a variable. Meanwhile, g refers to the total number of non-default borrowers in the sample, and b refers to the total number of default borrowers in the sample. The variables are discretized, and the woe value of each group of variables is calculated. It can be seen from the changes in woe’s calculation formula that woe reflects the difference between the proportion of default borrowers to non-default borrowers and the proportion of total default borrowers and non-default borrowers in each group, which can reflect the influence of the value of the independent variable on the borrower’s default. Therefore, we can directly replace the original value with woe value for subsequent calculations. The higher the woe value, the higher the probability that the borrower in this group is a non-default borrower.
After the woe value of each variable is calculated, the information gain IV value of the variable can be calculated. The IV value is an indicator used to measure the ability of a variable to distinguish the borrowers of good or bad credit, and it can be used to screen the variables. The formula for calculating the IV value is:
The greater the IV value, the greater the difference in the distribution of credit good and bad borrowers in this variable, that is, the better the ability to distinguish the variable. If the IV value is less than 0.02, it is considered that the variable has no predictive ability for the explained variable and should be eliminated. At the same time, if the IV value is greater than 0.5, the data is considered to be overfitting and should be eliminated.
The WOE value of the variable is used to replace the original value for logistic regression analysis. Logistic regression analysis is a risk assessment method with high accuracy and robustness. Based on the borrower’s information (such as personal characteristics, liabilities, credit information and other characteristic information), it explains the past loan repayment (repayment/default). Moreover, it predicts the default probability of borrowers with certain characteristics through regression equations. The specific regression model is as follows:
Among them, the probability of default PD (probability of default) refers to the probability that the borrower cannot repay the principal and interest of the loan or perform the contract-related obligations according to the contract. odds is the borrower’s credit good/bad ratio, α is a constant, x
i
is the ith explanatory variable that may affect the borrower’s default risk, and β
i
is the parameter to be estimated. In order to improve the predictive effect and understandability of the model, the explanatory variable x is discretized. We use the weight of evidence value of the variable instead of the original value for calculation:
If In (odds) is directly regarded as the final score, then for the ith variable, the score is
Among them, parameters A and B are constants, and both are determined by the risk appetite of each company. However, in practical applications, these two parameters are difficult to explain, and it is not convenient to specify the initial value. Generally, in a credit score, three parameters are given in advance: b is the benchmark score, o is the good/bad ratio corresponding to the benchmark score, and P is the score that increases when the odds doubles. The formula for the conversion between B and b, o, and p is:
The final credit score model is obtained:
After the credit score of each borrower is obtained, the credit rating of the borrower can be divided. Meanwhile, borrowers at different credit ratings have different default risks.
The KS indicator (Kolmogorov-Smirnov) can test the accuracy of the two sample score cards to analyze whether there is a significant difference between the borrower’s default and non-default credit scores in the sample. The KS test usually divides the whole sample into two parts according to the borrower’s default and non-default after the model predicts the credit score of the whole sample, and then uses the KS statistic to test whether the distribution of the credit scores of the two groups of samples is significantly different. The calculation formula is as follows:
The value range of KS is between 0-1. The larger the value, the better the model is in distinguishing non-default borrowers from default borrowers. In general, a KS value greater than 0.2 indicates that the model is acceptable, a value greater than 0.4 indicates that the model has good discrimination ability, and a value above 0.5 indicates that the model has strong discrimination ability. However, when the model is checked, the situation where KS is too high should be paid special attention. The reason is that KS is too high may also be due to problems in the model itself.
The RAROC model takes economic capital as the core, fully considers the risks of online loans, and can achieve the purpose of matching the risks and returns of online loans. The expression is:
If LGD (Loss Given Default) is assumed as the default loss rate, that is, the probability that the borrower will not recover the loan after the default occurs, EAD (Exposure at Default) is the online loan risk exposure, and PD is the default probability. At the same time, K is the capital requirement coefficient, then the formula for calculating the expected loss EL is:
Limited by the risk management level of the online loan platform, the calculation of the online loan economic capital EC adopts the calculation formula given by the IRB method, and its expression is:
Since Internet finance online loans conform to the characteristics of large quantity and low single amount, it can be classified as other retail exposures in retail exposure. With a confidence level of 99.9%, the capital requirement coefficient K can be determined according to the following formula:
The correlation R in the above formula changes with the default probability PD:
Then, at a certain RAROC level, the online loan pricing interest rates of different grades of borrowers can be obtained:
The non-parametric test (Wilcoxon signed rank test) of the two paired samples of SPSS software is used to compare the real interest rate of the online loan platform with the RAROC interest rate. Moreover, this paper examines the effectiveness of the RAROC model based on the new Basel protocol by comparing the results of the two different processing methods. The non-parametric test of two paired samples is generally used to compare the effects of different processing methods of the same object to infer whether the two effects are significantly different. The null hypothesis H0 of Wilcoxon’s signed rank test is: the results of two different treatments of the same object are not significantly different.
The Wilcoxon signed rank test of two paired samples first removes the sample pairs with the same observation value according to the sign test method. Then, it subtracts the observation value corresponding to the first group of samples from each observation value of the second group of samples, and the difference is recorded as a positive sign and vice versa as a negative sign. At the same time, it arranges the absolute value of the difference in ascending order to find the corresponding rank. Finally, it calculates the sum of positive rank W+, the sum of negative rank W-, the average rank of positive sign and the average rank of negative sign. If the two ranks are roughly equal, it can be considered that the positive and negative changes of the two pairs of data are similar, and the distribution gap is small. Its Z statistic approximately follows the normal distribution, and its calculation formula is:
Among them, n is the number of observations;
SPSS software can automatically calculate the Z statistic and the corresponding accompanying probability. Moreover, when the associated probability value is less than or equal to the significance level α, the null hypothesis H0 is accepted, that is, the results of two different treatments of the same object are not significantly different.
The null hypothesis H0 of Wilcoxon signed rank test in this paper is: There is no significant difference between the RAROC risk pricing interest rate and the actual interest rate of the online loan platform. The actual pricing of the online loan platform adopts the fixed interest rate pricing model under the platform pricing model. That is, the platform regards all borrowers whose default probability is within the acceptable range of the online loan platform as a whole, and uses the cost-plus pricing model to calculate the overall pricing interest rate. Moreover, on this basis, experts comprehensively consider the development needs of the online loan platform and the restrictions on private lending by national laws, and combine the borrower’s credit score results to divide the borrower into 7 different credit ratings. Ultimately, the best credit-rated interest rate is given, the worst credit-rated interest rate enforces the civil law’s interest rate of 24%, and the best worst-rated interest rate difference is divided by 6 that is the interest rate difference for each level. As a result, the interest rates for each grade are obtained, as shown in Table 1.
The real interest rate of the online loan platform
The results of the Wilcoxon signed rank test comparing the RAROC interest rate with the actual interest rate are shown in Table 1. It can be seen from the table that the median difference between the null hypothesis RAROC interest rate and the actual interest rate is equal to 0, P = 0.398 > 0.05, which supports the null hypothesis. That is, there is no significant difference between the RAROC risk pricing interest rate and the actual interest rate of the online loan platform, and the RAROC pricing model based on the New Basel Agreement is suitable for Internet financial online loan.
Further analysis of the RAROC interest rate through Fig. 1 shows that the change trend of the RAROC interest rate and the borrower’s default probability are the same, and both increase as the borrower’s credit rating decreases. The change probability of default probability in the first five levels is relatively slow, and the rate of increase in interest rates is also relatively small. Among them, the AA grade has the best score, the default probability and interest rate are also the lowest, and the HR grade has the worst score, the default probability and interest rate are also the highest. Meanwhile, the interest rate of the two grades is extremely different by 17.40 percentage points. Among them, the interest rate of Grade E is 23.32%, which is close to the 24% interest rate that should be protected by national laws. Moreover, the interest rate of HR level is 31.02%, which exceeds the interest rate red line of 24% and is in the natural debt area interest rate. This may lead to a situation where the investor cannot legally protect the interest rate exceeding 24% when the borrower defaults. From another perspective, the risks actually taken by investors exceed the theoretical default risks.

Average default probability and RAROC interest rate of each grade.
Through descriptive statistical analysis methods, the actual interest rate and RAROC pricing results were further compared, and the results are shown in Table 2 and Fig. 2.
Wilcoxon signed rank test
Note: The significance level is 0.05.

Descriptive statistical analysis diagram.
It can be found from the table that the average interest rate of RAROC is 6 percentage points lower than the actual interest rate, but the pricing interval is 7.32 percentage points higher than the actual interest rate. Moreover, the maximum value of the interest rate of 31.02% and the standard deviation of 6.26% are also significantly higher than the real interest rate of 24% and 3.6%. The above results indicate that the RAROC interest rate is more flexible than the real interest rate. This article believes that under the current immature financial market, insufficient data accumulation of online loan companies and the state’s control of the upper limit of online loan interest rates, the current fixed interest rate pricing method of the Internet loan company combined with the cost-plus pricing model and the expert’s reasonable estimation has strong maneuverability and certain accuracy.. However, it is easy to overestimate the interest rate of borrowers with better credit ratings and underestimate the interest rate of borrowers with poor credit ratings. Moreover, as the probability of default increases, its ability to cover risks will become worse and worse, which not only increases the churn rate of borrowers with better credit ratings, but is also not conducive to reducing the overall default risk of online loans. However, the RAROC pricing method takes into account not only the expected loss of online loans, but also the unexpected loss. Moreover, its comprehensive measurement of default risk makes the matching of pricing interest rate and default risk more precise. Therefore, it is superior to the current fixed-rate pricing model of online loan platforms.
This article selects a certain area as an example to carry out research and analysis.
The definition of specific indicators is shown in Table 4.
Descriptive statistical analysis table
Descriptive statistical analysis table
Index system of financial structure
According to the calculation method defined by the above variables, the data in Table 5 is calculated and obtained in this paper.
The original data of each indicator
The descriptive statistical analysis of variables is shown in Table 6 and Fig. 3.
Descriptive statistical analysis table of variables

Correlation coefficient matrix of each variable.
This paper measures the financial structure in terms of the scale, financial assets, currency, and financial efficiency of the three major financial industries: banking, securities, and insurance. Since the financial structure itself is an organic whole, when selecting each variable for measurement, each variable must be related to each other, and there is a problem of multi-collinearity. Therefore, before conducting empirical measurement analysis, it is necessary to test the correlation of each variable to prevent the existence of multicollinearity. The specific results are shown in Tables 7 and 8 and Figs. 3 and 4.
The correlation coefficient matrix of each variable
Correlation coefficient matrix P value of each variable

P-value statistics of correlation coefficient matrix of each variable.
According to the P value of the correlation coefficient matrix of each variable in Table 7 and the correlation coefficient matrix of each variable in Table 8, it can be seen that there is a strong correlation between the eight indicators of bank development scale, securities conversion rate, insurance density, insurance depth, financial correlation rate, monetization rate, financial indirect conversion efficiency, and financial direct conversion efficiency, thereby increasing the complexity and feasibility of studying the financial structure.
In order to solve the problem of multi-collinearity among some variables, this paper uses factor analysis to reduce the dimensionality of the original explanatory variables. Moreover, in the case of ensuring the full extraction of information, this paper turns the variables with intricate relationships into a few independent factors. In addition, this article introduces the regulatory indicators of the New Basel Agreement to solve the problems of high risk of Internet financial online loan transactions and high credit risk of borrowers and uses the New Basel Agreement’s estimation method to measure expected losses and economic capital. Financial efficiency is an important indicator to measure the degree of financial development, and financial efficiency is the efficiency of financial communication, which can be divided into micro-financial efficiency and macro-financial efficiency. Micro-financial efficiency refers to the level of efficiency of households and non-financial enterprises in allocating loans after obtaining loans from banks. However, it corresponds to macro-financial efficiency. Macro-financial efficiency refers to the ratio of loans issued by a commercial bank in a certain period to the growth of total economic growth during that period, that is, the efficiency of financial services in the real economy. Therefore, this article selects indicators to measure financial efficiency from the above two levels. This article uses (sum of regional banking deposits and loans/national financial deposits and loans)×national money supply to indirectly replace regional money supply.
Through the case study, we can see that there is a strong correlation between the eight indicators of bank development scale, securities conversion rate, insurance density, insurance depth, financial correlation rate, monetization rate, financial indirect conversion efficiency, and financial direct conversion efficiency, thereby in-creasing the complexity and feasibility of studying the financial structure.
Footnotes
Acknowledgments
Superiority Discipline of Jiangsu Province’s Universities and Colleges, Third Phase of Applied Economics of Nanjing Audit University, No. 2018 87. Major Tendering Projects of National Social Sciences Fund, A Theoretical Framework of Social Sciences for the Aged, International Experience and Research on China’s Path, 17ZD072. School of Government Audit, Nanjing Audit University, An Audit Research into the Protection of the Financial Consumers’ Rights Based on Internet, GAS161027. Audit Research Institute of Audit Administration and Nanjing Audit University: Joint Tackling of Socialist Financial Audit with Chinese Characteristics in the New Era, Research on Internet Financial Risk Supervision and Audit, 20181031.
