当前位置:网站首页>In the last process before the use of the risk control model, 80% of children's shoes are trampled here

In the last process before the use of the risk control model, 80% of children's shoes are trampled here

2022-07-05 11:06:00 Tomato risk control

Scorecard Model , It is a very common model expression in the credit risk control system , Not only can the comprehensive risk degree of users be intuitively quantified through scores , And in the actual business, it shows the understandable significance of risk control interpretation .
The scorecard model depends on the differences of specific business scenarios , Will distinguish between different model application types , For example, the pre loan application scorecard model (A card )、 Credit behavior scorecard model (B card )、 Post loan collection scorecard model (C card )、 Pre loan anti fraud scorecard model (F card ) etc. . These common types of scorecard models , Although developing data from models 、 Model target definition 、 Consider the application and implementation of the model , Each has different attributes , But from the whole process of modeling , Including data preprocessing 、 Feature Engineering 、 Model training and other links , The processing steps of modeling nodes are similar . meanwhile ,A、B、C、F The final score of risk control models such as cards , They are all converted from the prediction probability of model training , And the principle of this scoring logic conversion is the same .
Common scorecard models for credit risk control scenarios , Understand the principle, mechanism and application characteristics of score conversion , For small partners engaged in risk control model positions , It is a very basic and important content , Whether in the interview of applying for model positions , Or the working scenario of daily model development , The scorecard model has become a necessary topic . therefore , Be familiar with and master the score conversion principle of the scorecard model , And deeply understand the risk control scenario of scoring Application , It seems very necessary . This process is also the last step in the model work part, There are many children's shoes in this hole . Based on this , This article will combine this business background , Let's focus on the score conversion form of the scorecard model , It is divided into standard scale Scorecard and simple probability scorecard .

1、 Standard scale scorecard
The essence of scorecard score is converted from probability , Specifically, the score scale is set through a certain scoring scale , At a bad to good ratio (odds) A linear expression of logarithm to define fractions , It can be expressed by the following formula , among odds=p/(1-p), With A Description of card implementation scenario ,p Predict the probability of default for customers ,1-p The probability of predicting normality for customers .
Score=A-Bln(odds)
In the above formula ,A And B All are constants , It can be understood intuitively , When the default probability of the user p The higher the value , The lower the final score , In this way, we can pass the high and low scores in business , To quantify and evaluate the risk level of users . therefore , The transformation from the probability of the model to the score , It is of great significance in practical business , It can be summarized into two main points : On the one hand, the range of the original probability interval 0~1, Zoom in as a fraction , Compared with the decimal form of probability , Scoring integer is more convenient for business personnel to understand ; On the other hand, the size of the score interval can be specified in combination with the actual scene , It is more intuitive and concise to formulate risk control strategies through scores .
In a real business scenario , For the formula Score=A-B
ln(odds), To get the final score Score, You need to specify parameters A and B, But these two parameters are difficult to explain the meaning of period business . therefore , We often specify other parameters ( A bad benchmark is like odds、 Benchmark score base_score、odds Double the score pdo), Come and reason out the parameters A and B Value , The specific expression is as follows :
Score=A-Bln(odds)
Score-pdo-=A-B
ln(2odds)
Parameters can be obtained from the above two formulas A and B The logic of :
B=pdo/ln(2)
A=Score+Bln(odds)
Now let's combine scenario requirements with business experience ( There is no absolute standard ), Suppose that base_score by 300 when , Corresponding odds by 1:10, and pdo Set to 30( finger odds Double to 2:10 when , fraction 500 Reduced to 270), Then bring the initial value into the above formula , You can get specific A、B value .
B=30/ln(2)=43.28
A=300+43.28
ln(1/10)=200.34
The above is about the process of setting the standard coefficient of the scorecard , The code implementation is shown in the following figure 1 Shown :
 Insert picture description here
chart 1 Scorecard standard setting

When you get the parameters A and B After the specific value of , According to the scoring card formula Score=A-B*ln(p/(1-p)), It is easy to calculate the final score of each sample user , among p The value is the response probability ( for example A Card model p The value is the predicted probability of default ), Now through the following figure 2 The sample data further illustrates the scoring results , among score The final result is expressed as an integer .
 Insert picture description here
chart 2 Standard scale score

From the above results, we can intuitively understand , The higher the probability of default predicted by the model , The lower the corresponding score . among id=N003 The sample of , The model is divided into score=300 It happens to be the basic score we set before , That corresponds to odds=p/(1-p)=0.0909/(1-0.0909)=1:10, It is our preset benchmark that is bad like odds, This also verifies the previously set scoring criteria .
The principle and logic of the above standard score conversion , It is the core idea of scoring card , Although we are in risk control “ appearance ” What you see on is the model score , And the business embodied in essence “ Inside ” Is the probability of customer default , And in the middle of the relationship “ link ” The sample is bad, like odds.

2、 Simple probability scorecard
The formulation logic of the above standard scale scorecard , It is the most commonly used conversion method in our actual scene , In addition, there is a relatively simple mapping method , Is to predict the probability result p The value is directly linearly constrained to our desired scoring range , This is also of practical business significance , Here we still choose the previous figure 2 The data sample of . Suppose we set the scoring range of the model as 300~600, Then predict the probability of default p=0 And p=1 Score respectively 600 And 300, In this way, through the linear relationship ,p Every increase in value 0.0001, Scores decrease in turn 0.03 branch , vice versa . According to this linear transformation relationship between probability and fraction , chart 2 The final score result of the sample raw data is shown in Figure 3 Shown , among score The final result is expressed as an integer .
 Insert picture description here
chart 3 Simple probability score

For the above two scoring conversion methods , It has certain risk control value and analysis significance in practical scenario applications , Compare the two , Although the simple probability scoring method is simpler and easier to understand , But the standard scale score is more meaningful in practical interpretation , At the same time, it is also conducive to adjust according to the business situation . therefore , In the real world , Give priority to using standard scale scoring method to build scoring model .

3、 Scorecard Application scenario
Through the introduction of the above scoring conversion methods , We are familiar with the risk control value of model scores in actual business , Because of this , The scorecard model has been applied in many scenarios of credit , Including getting customers 、 Before loan 、 In credit 、 Post loan stage , This is of great significance for the refined management of credit risk control , For example, common A、B、C、F The main landing application scenarios of the card are as follows :
(1) Pre loan application scoring model (A card ): Default risk forecast 、 Establishment of credit line
(2) Credit behavior scoring model (B card ): Risk monitoring and early warning 、 Product quota adjustment
(3) Post loan collection scoring model (C card ): Repayment capacity forecast 、 Formulation of collection strategy
(4) Pre loan anti fraud scoring model (F card ): Apply for fraud prediction
Now take the pre loan risk control scenario as an example , Describe below A Important indicators displayed in the scoring range of card model , The specific example is shown in the figure 4 Shown .
 Insert picture description here
chart 4 Model scoring index

It can be seen from the above figure , With the user's application score score elevated , Default bad debt rate of the corresponding interval badrate Gradually reduce , And the more monotonous the trend is , It shows that the stronger the distinguishing performance of the model . chart 4 Response bad debt rate of sample data badrate And model scores score The changing relationship of , The visual chart is shown in the figure 5 Shown , It can more vividly reflect the business interpretation significance of the model . In a real business scenario , According to the selected decision score threshold, the model strategy is formulated for application , For example, when applying for a user's model score<=440 Then refuse , Specific threshold selection basis , It needs to be determined in combination with the overall bad debt rate of the sample and the actual business needs .
 Insert picture description here
chart 5 Model rating trends

4、 Model scoring presentation
When we develop the scorecard model , Whether for model training or model testing , It is inevitable to analyze the specific distribution of model scores ( Pictured above 4 Shown ), We often divide the score of the sample into multiple intervals , Then explore the sample frequency of each scoring interval 、 Proportion of quantity 、 Bad debt performance, etc , Thus, the application effect of the scorecard model is obtained . Here's what we need to pay attention to , When dividing the scoring interval of the model , Generally, we choose equidistance or equifrequency , Equidistance is to ensure that the score spacing of each scoring interval is equal , But the number of samples is not necessarily equal ; And equal frequency is to ensure that the number of samples in each scoring interval is equal , But the distance between fractions is not necessarily equal . In a real business scenario , We usually use the way of equidistant scoring to divide the scoring range , Let's briefly introduce the specific advantages .
chart 6 And graph 7 For the scorecard model, score equidistant 、 The model distribution results are displayed in two ways of equal frequency scoring , Each method is divided into 10 Score range bin.
 Insert picture description here
chart 6 Equidistant interval of model score

 Insert picture description here
chart 7 Equal frequency interval of model score

According to the results above , The equidistance or equifrequency of the scoring interval shows that the response trend is consistent , That is, the sample bad debt rate badrate With the score score Increase and gradually decrease , This is determined by the essence that the model has a good differentiation effect . however , When we further explore the proportion distribution of samples in each fractional segment of the model , Obviously 7 It is difficult to get an effective and reasonable analysis of the distribution results of equal frequency mode . In general , If the training effect of the scorecard model is better , The sample population distribution of the final score tends to show a form similar to the normal distribution , That is, the sample proportion of the low score segment and the high score segment corresponding to both ends is the lowest , This is important for us to choose the decision threshold of the model , Or it is very effective to rate the sample users . chart 8 It is the sample proportion distribution of each score interval obtained by the way of score equidistance , You can intuitively understand the concentration and dispersion of the sample group model scores , However, such analysis effect cannot be obtained from the data of scoring equal frequency .
 Insert picture description here
chart 8 Proportion of samples with equidistant scores

therefore , When we analyze the distribution of sample model scores , It is more appropriate to show the data in the way of score equidistance , It can not only effectively describe the relationship between model score and bad debt response , And it can reasonably explore the form of model score and sample proportion . meanwhile , This is in determining the model decision threshold 、 Customer risk rating 、 Model report display and other aspects are also very convenient .
The above content is an introduction to the conversion logic of credit risk control scorecard model scores , You need to focus on understanding the mapping relationship between model probability and score , This is something we must be familiar with in our modeling work . In a real business scenario , For the problem of classification model , Whether using the traditional algorithm of logistic regression , still XGBoost、LightGBM Wait for decision tree algorithm , When the model results output probability , We can combine business characteristics with practical experience , By setting the criteria of the scorecard , Then convert it into model scores , So as to facilitate the formulation of relevant model application strategies . Besides , According to the scorecard model in different scenarios , Adopt appropriate model practice methods , Provide effective guarantee for the refined management of credit risk control , To realize the value and significance of the risk control model .
This time about the model , In planet information sharing , We also prepare a report on model monitoring for you , Details are as follows :
 Insert picture description here
 Insert picture description here

For details, please go to knowledge planet to check this content :
 Insert picture description here

~ Original article

原网站

版权声明
本文为[Tomato risk control]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/186/202207051015401780.html