当前位置:网站首页>A thorough understanding of the development of scorecards - the determination of Y (Vintage analysis, rolling rate analysis, etc.)
A thorough understanding of the development of scorecards - the determination of Y (Vintage analysis, rolling rate analysis, etc.)
2022-07-02 03:43:00 【Ali Yiyang】
The scorecard has been actually applied to business in major banks and companies , Many predecessors also elaborated on it in detail . This article will be from the perspective of payment and credit scorecard establishment , Compare and analyze the dependent variables of different industries when establishing scorecards Y Identified differences . Let the little buddy who wants to know about the scorecard , Have a deeper understanding . And draw inferences from one instance , Apply scorecards to more industries .
One 、 What is a scorecard ?
In the field of risk control , Scorecard is a means to measure the risk of customers in the form of scores . Similar to the familiar sesame credit score , Be situated between 350 - 950 branch , The higher the score , On behalf of credit, the better .
350 - 550 Sub users , Poor credit , It's hard to enjoy the benefits of Alipay .
550 - 600 Sub users , Medium credit , You can enjoy some benefits , For example, hotel without charge 、 Open Huabai service, etc .
600 - 650 Sub users , Good credit , You can enjoy more benefits , In addition to the benefits mentioned earlier , You can also use some travel services without deposit .
650-700 Sub users , Good credit , You can enjoy the flowers 、 Borrow the high amount of treatment .
700-950 Sub users , Excellent credit , You can get convenience when applying for visas from some countries .
This paper focuses on how to determine the score card in the field of payment and credit Y, Scorecard Principle and python See the principle of scorecard and Python Realization .
Two 、 How to determine the dependent variable in the scorecard Y?
In different industries, scorecard modeling Y There are differences in the definition of , There are also similarities . This paper takes the field of payment and credit as the entry point , Introduce the dependent variable Y The determination of .
1 The payment field determines the dependent variable Y
For the payment field , The dependent variable Y The definition of is relatively simple . For example, the company has 1 Network access and transaction data of 100 million merchants , Now I want to build a model , Evaluate what transaction types of merchants have gambling risks . In defining the dependent variable Y when , You can bring the historical shutdown reason in the system “ gambling ” A merchant whose current merchant status is closed is defined as 1( Bad samples ).
Why is it required that the merchant status must be closed ? The reason is that the transactions of some merchants are similar to gambling transactions , There will be false audits . If the merchant is audited as gambling , Made a material complaint , Prove that you are in normal business , And meet the business scope of merchants . The merchant account will be reopened , The merchant status will be adjusted to normal . When there are enough bad sample data , Sometimes for the convenience of calculation , Only the merchants defined as gambling and closed in the last two years are intercepted as bad samples for modeling .
Define the bad sample , What situation is defined as 0( Good sample )?
There are two ways , One is that the merchants whose current status is normal are defined as 0, The other is that the current state is normal , Merchants that have not been historically audited as gambling are defined as 0. Generally, the data volume of normal merchants is too large , According to the number of bad samples 、 Take some normal samples in time to build the model .
2 Determining dependent variables in the credit field Y
For the fraud model in the credit field , Usually, the overdue performance of the first installment of repayment can be used ( First over ) To define the dependent variable Y. It is the same as the scorecard model in the payment field , The definition is relatively simple . But for the credit model in the field of credit , The dependent variable Y The determination of is relatively complex , Generally, it should be combined with rolling rate analysis and vintage analysis .
Rolling rate analysis determines to what extent overdue customers are defined as bad ,vintage Analyze and determine how long the performance period of customers can be included in the model . In order to let everyone understand the dependent variable more clearly Y The determination of , First, define some nouns that need to be used .
One 、 Definition of noun
For the sake of simplicity , Take a single person as an example . Suppose there is a person in 2021 year 4 month 12 The morning of 10 spot 8 Fen borrowed a sum on the network platform 1 A credit loan of 10000 yuan , In the future in the form of equal principal and interest 12 Monthly repayment . In order to show some nouns more clearly , Put these nouns in the figure below to show :
1. Observation points (obs_date): Time point of customer loan (2021 year 4 month 12 The morning of 10 spot 8 branch ). We use the data as of the time point of loan application to predict the possibility of future overdue customers .
2. Observation period : Used to generate customer characteristics ( The independent variables ) The time interval of .
3. Performance period : Time interval used to define whether customers are good or bad . Strictly speaking , by stages 12 Only after all the money has been paid off can customers in the future define good or bad . But by Vintage It can be seen from the analysis how long the loan customer has passed , What should have gone bad has gone bad , The rest can basically be repaid on time , Thus, the time of performance period can be shortened , Increase the number of customers who can enter the modeling .
4. Behave : How long can customers be defined as “ Good customer ” and ” Bad customer “.
5. Account age MOB(Month on Book): Asset lending month .
MOB0: From the lending date to the end of the month , Example refers to 2021 year 4 month 12 The day is coming 2021 year 4 month 30 Japan .
MOB1: The second month of loan , Example refers to 2021 year 5 month 1 The day is coming 2021 year 5 month 31 Japan .
MOB2: The third month of loan , Example refers to 2021 year 5 month 1 The day is coming 2021 year 5 month 31 Japan .
MOB3: The fourth month of loan , Example refers to 2021 year 6 month 1 The day is coming 2021 year 6 month 30 Japan .
And so on ,
MOB12: Refers to the loan section 13 Months , Example refers to 2022 year 3 month 31 The day is coming 2022 year 4 month 30 Japan .
If the product is 12 Episodic , The life cycle of the asset is 12 period ,MOB Maximum to MOB12. If the product is 24 Episodic ,MOB Maximum to MOB24.
6. Within the time limit : The customer fails to repay the due amount of the current month in full on the due date , Then the contract will be overdue .
7. Days overdue DPD(Days Past Due)
Definition : The customer has not paid the repayment on the due date , The overdue days are from the next day of the due date to the actual repayment date ( contain ) Number of days in the period , If the customer fails to repay in the current period and has no actual repayment date , Then take the data statistics day to replace the actual repayment day .
The way of expression : DPDN+ Indicates the number of days overdue ≥N Day's customers , Such as DPD60+ Indicates the number of days overdue ≥60 Day's customers .
Example :
That is, on the first repayment date (2021 year 5 month 12 Japan ) Outstanding , that 2021 year 5 month 13 Day is one day overdue , Customers in 5 month 17 Daily repayment , The customer first exceeded 5 God .
Other instructions :
① Any overdue days can be used in the analysis as required , If overdue 3 God /7 God /15 God /30 God wait .
② The setting of overdue days during analysis depends on the method of incoming call and the rate of call back .
8. Number of overdue periods
Calculation method : Take the specified overdue days as the overdue period , If overdue 130 Day correspondence M1、 Within the time limit 3160 Day correspondence M2、 And so on , There is a certain correspondence between the number of overdue periods and the number of overdue days , The number of overdue periods can be calculated directly by the number of overdue days ( notes : Different institutional divisions may vary ).
Definition : From the next day of the due date to the actual date ( contain ) Number of periods , If the customer fails to repay in the current period and has no actual repayment date , Then take the data statistics day to replace the actual repayment day .
The way of expression :
M0: Normal assets , Currently not overdue ( Also available C To express ).
M1: Within the time limit 1-30 God , Overdue for one period .
M2: Within the time limit 31-60 God , Overdue for two periods .
M3: Within the time limit 61-90 God , Three overdue .
M4: Within the time limit 91-120 God , Overdue for four periods .
M5: Within the time limit 121-150 God , Five overdue .
M6: Within the time limit 151-180 God , Six overdue .
Mn: Within the time limit 30n-29~30n God , Within the time limit N period .
Allied ,
M3+: Within the time limit 90 Days or more , Within the time limit 3 period ( Not included ) above .
M4+: Within the time limit 120 Days or more , Within the time limit 4 period ( Not included ) above .
M6+: Within the time limit 180 Days or more , Within the time limit 6 period ( Not included ) above , Also known as bad debts , Will cancel the account .
Mn+: Within the time limit 3*n Days or more , Within the time limit n period ( Not included ) above .
9. Overdue rate
Number of orders : Overdue rate = Number of overdue orders / Total number of loan orders
Amount caliber : Overdue rate = Overdue remaining principal / Total loan principal .
Two 、 Rolling rate analysis
1. Purpose : In order to make the risk control model have better distinguishing ability , We need to determine how long overdue customers are defined as 1( Bad customer ). Because some customers who are several days overdue are likely to forget to repay , It will be returned after reminding , Not without willingness and ability to repay . If all customers with overdue performance are defined as 1, It will lead to unclear definition of bad customers in the model , Thus affecting the distinguishing ability of the model . Rolling rate analysis can show customers' transition from one state to another in different periods of time , Thus, we can analyze the development and changes of customers in different overdue States .
2. Definition : From the observation point 1 A while ago ( Observation period 1) The worst state of , To the observation point 1 After a while ( Observation period 2) The worst-case transition of .
3. Specific steps of rolling rate analysis :
step1: Select the observation point 1, Take observation points 1 Is the deadline , According to the repayment schedule, the customer is counted in the observation period 1( As in the past 6 Months ) The longest overdue period of , Customers are divided into different levels according to the worst overdue status , Such as C、M1、M2、M3、M4+ etc. .
step2: Take observation points 1 Is the start time , Count the customers in the observation period 2( Like the future 6 Months ) The longest overdue period of , Users are divided into different levels according to the worst overdue status , Such as C、M1、M2、M3、M4+ etc. .
step3: Cross count the number of customers in the transfer matrix .
step4: Count the proportion according to the number of customers in the transfer matrix .
step5: Choose different observation points , repeat step1~step4, Compare the rolling rate value .
for example , Select the observation point as 2021 year 6 month 30 On Tuesday night 12 spot , take 20,000 Customers as observation objects , Statistics of these customers from the observation period 1 To the observation period 2 Maximum overdue status change of . First, make statistics of the following details of customer overdue status ( Just to understand business needs , Untrue data ):
The following rolling rate analysis matrix is calculated according to the overdue status schedule :
Observe the rolling rate analysis matrix :
① Observation period 1 Medium overdue status is C( normal ) The customer , some time 6 In the months , Yes 95.29% Will continue to be normal ,4.71% Will turn into overdue customers .
② Observation period 1 Medium overdue status is M1 The customer , The future has 81.16% Will return to normal , That is, the yield is 81.86%, Yes 11.96% Is still M1 state ,6.88% It will get worse .
③ Observation period 1 Medium overdue status is M2 The customer , The yield rate is 25.96%, Yes 6.41% Turn into M1 state ,26.12% Is still M2 state ,41.51% It will get worse .
④ Observation period 1 Medium overdue status is M3 The customer , The yield rate is 19.77%,10.6% Turn into M1 and M2,11.46% Is still M3 state ,58.17% It will get worse .
⑤ Observation period 1 Medium overdue status is M3+ The customer , The yield rate is 3.36%,24.16% Turn into M1、M2 and M3,72.48% Is still M3+ state .
According to the quantity of yield , Overdue status is M3+ Our customers will hardly be good , In order to make the risk control model have better distinguishing ability , It can be defined that the bad customer is overdue and the status is M3+( Overdue more than 90 God ) The customer .
When modeling actual credit , Due to the business scale 、 Product launch time and other constraints , The modeling sample size may be small , Resulting in fewer bad samples . Sometimes it is artificially overdue n More than days is 1( Bad samples ), Customers who are not overdue are defined as 0( Good sample ), Within the time limit n Within days is defined as ash sample ( Abandon ). Now, customers who are overdue for more than a few days are defined as bad customers, and there is a measurement standard . It is necessary to determine how long the performance period of customers can be included in the evaluation .
Suppose the loan term of a product is 12 period , We need to 12 Can we define whether a customer is a bad customer after the deadline ? Strictly speaking , Such is the case . otherwise , We can only say so far , This customer is not a bad customer , But I don't know whether it will become a bad customer in the next few installments . And some accounts have reached M3+, Some of them arrived in later periods M3+. therefore , We just need to make sure that a suitable performance period can cover enough bad customers .vintage Analysis is to determine how long the performance period is appropriate .
3、 ... and 、 Account age (Vintage) analysis
1. Purpose : Statistics of new loans each month after each MOB Overdue situation in , Compare the overdue situation of monthly loan , Judging strategy 、 The validity of the model , Analyze the risk maturity of customers .
2. The way of expression : Vintage The abscissa of the curve is MOB, The ordinate is the overdue rate . The overdue rate can be calculated in the amount dimension , You can also calculate the order dimension .
3. Overdue rate calculation and statistical method ( amount of money ):
Overdue rate = Overdue remaining principal / Total loan principal .
The denominator is the total principal of the month in which the loan is made , That is, the contract amount , Not changing over time ( No decrease due to settlement or write off ).
The numerator is overdue Bad The principal balance at the time of definition , hypothesis Bad Defined as M3+, Molecules can be calculated in two ways .
4. Overdue rate calculation and statistical method ( Number of orders ):
Overdue rate = Number of overdue orders / Total number of loan orders
The denominator is the total number of orders in the month of disbursement , Not changing over time ( No decrease due to settlement or write off ).
The numerator is overdue Bad Number of orders at the time of definition , hypothesis Bad Defined as M3+, Molecules can also be calculated in two ways .
5.Vintage Table creation
Now there is a guest rate 36% Cash loan products , Product term 12 period , Everything 2 One thousand yuan , The average number of monthly loans is 1 About ten thousand , The repayment method is equal principal and interest . From the rolling rate analysis, we can get the overdue Mn+ Our customers will hardly be good , Thus, the bad customer can be defined as Mn+ The customer . This paper assumes that the overdue status of the product is M3+ Our customers will hardly be good . Statistics 2021 year 3 Month to 2022 year 5 month ( Now ) The lending performance of this product , The following table can be obtained :
Press MOB Dimensions , Rearrange the orders in different lending months , The following table can be obtained :
hold MOB The table of dimensions is drawn into a line chart , We can get the following Vintage surface :
from vintage Apparent knowledge :
① The horizontal axis indicates the customer's life cycle , It reflects the changes that occur in the process of customer maturity .
② The vertical axis reflects the changes of customers with the same aging over time , It shows the change of default rate in different months .
③ As the product term is 12 period , therefore MOB( Account age ) The longest is 12 Months , And vice versa .
④ The statistics are Ever M3+ Overdue rate , So aging MOB1、MOB2 All for 0.
⑤ The lending month is from 2021 year 3 Month to 2021 year 11 The overdue rate of monthly accounts is decreasing , It shows that the quality of assets is constantly improving , It may be that risk control has a more comprehensive understanding of the risk dimension of the product , The level of risk control is constantly improving .
⑥ Customers who lend money in different months are going through 9 individual MOB Post overdue rate M3+ Tend to be stable , Explain that the maturity period of the account is 9 Months .
⑦ Because the statistics are Ever M3+ Overdue rate , Therefore, the overdue rate of a single month only increases without decreasing . From the Vintage Apparent knowledge , If we want to build a credit pre loan scorecard model now (A card ), Have a complete performance ( The loan is over 12 period ) The lending month of is 2021 year 3 Month to 2021 year 6 month .
If you only model with complete data , The sample can only be obtained from the month of loan 2021 year 3 Month to 2021 year 5 From customers in the month . If the account maturity 9 Months of data modeling , Samples can be taken from 2021 year 3 Month to 2021 year 8 Take in the middle of the month , Three more months of sample data . Because of the Vintage The data of the table is fabricated , It looks clear . In reality, some lending data may suddenly be due to traffic in a certain month 、 The external environment 、 Risk control strategy adjustment and other factors led to a sudden increase in overdue performance .
For example, there is an e-commerce customer group loan , Product term 12 period 、 Everything 5,000 element 、 Passenger rate 36% Cash loan products ,Vintage Perform the following ( The data is processed ):
From the Vintage You can see in the table 2018 year 10 The overdue rate of loans in September increased sharply compared with the previous months , It may be due to the flow 、 The external environment 、 Risk control strategy adjustment and other factors .
Four 、 The dependent variable Y The determination of
1. Definition : The dependent variable Y That is, the customer good or bad label variable .
2. Method : Define the quality of customers with rolling rate analysis ,Vintage Determine the appropriate performance period .
3. Specific operation steps :
step1: Use the rolling rate to define bad customers , For example, as defined in the above case : The overdue rate is M3+ Our customers are bad customers .
step2: With M3+ As a statistical indicator of asset quality , Statistics Vintage Data sheet , draw Vintage curve , Analysis of account maturity . For example, the above case determines : The maturity period of the account is 9 Months .
step3: Samples with a performance period greater than maturity can be used for modeling , Samples with a performance period less than maturity cannot be accurately defined Y Variable , Give up temporarily .
4. Conclusion : According to the above example , The performance period exceeds 9 Months , And M3+ Overdue customers are defined as 1, The performance period exceeds 9 Months , Customers who are not overdue are defined as 0, Other customers abandon .
thus , Payment field and credit field dependent variables Y The determination of has been analyzed , Welcome to share more pictures with friends in need .
You may be interested in :
use Python Drawing Picasso
use Python Draw a cloud of words
use Python draw 520 Eternal heart
Python Face recognition — You are the only one in my eyes
Python Draw a nice picture of the starry sky ( Aesthetic background )
【Python】 Valentine's Day confession fireworks ( With sound and text )
use Python Medium py2neo Library operation neo4j, Building the association map
Python Romantic confession source collection ( love 、 The roses 、 Photo wall 、 Advertising under the stars )

- PY3, PIP appears when installing the library, warning: ignoring invalid distribution -ip
- Haute performance et faible puissance Cortex - A53 Core Board | i.mx8m mini
- 近段时间天气暴热,所以采集北上广深去年天气数据,制作可视化图看下
- < job search> process and signal
- The fourth provincial competition of Bluebridge cup single chip microcomputer
- 蓝桥杯单片机省赛第十一届第二场
- MySQL advanced (Advanced) SQL statement (II)
- The 6th Blue Bridge Cup single chip microcomputer provincial competition
- ImageAI安装
- NLog use
Review materials of project management PMP high frequency examination sites (8-1)
The first game of the 12th Blue Bridge Cup single chip microcomputer provincial competition
[tips] use Matlab GUI to read files in dialog mode
Introduction to Robotics II. Forward kinematics, MDH method
Eight steps of agile development process
Oracle common SQL
How to establish its own NFT market platform in 2022
Oracle 常用SQL
Lost a few hairs, and finally learned - graph traversal -dfs and BFS
Eight steps of agile development process
FFMpeg AVFrame 的概念.
2022-07-01:某公司年会上,大家要玩一食发奖金游戏,一共有n个员工, 每个员工都有建设积分和捣乱积分, 他们需要排成一队,在队伍最前面的一定是老板,老板也有建设积分和捣乱积分, 排好队后,所有
[untitled] basic operation of raspberry pie (2)
Xlwings drawing