At present, there is a homogenization phenomenon in bank products , There are more and more ways for customers to choose products and services , Customers are becoming less and less loyal to their products , So customer churn has become one of the biggest concerns of the banking industry . The cost of acquiring new customers is much higher than that of maintaining old customers .
According to the survey , The loss of customers in commercial banks is more serious . Domestic commercial banks , The customer churn rate can reach 20% Even higher . And the cost of getting new customers , Can maintain existing customers 5 times . therefore , Dig out the information that has an impact on the loss from the massive customer transaction records , It is particularly important to establish an efficient early warning system of customer churn .
The following is a case of customer churn prediction of a commercial bank , Help you understand Data analysis The process of . The following data analysis process relies on the domestic top ranked BI Software Smartbi One stop data analysis platform , It can greatly reduce the complexity of data analysis , Break the dilemma of data island , And by simply dragging the mouse, you can quickly and easily complete the visual analysis of data , Let enterprises know the changes of business data indicators at the first time .
First, we need to analyze the current business situation , Select the high-value customer group of a certain business of retail customers in recent one year for analysis , If the loss rate is found to be very serious , It is necessary to establish a loss early warning model for high-value customer groups , Find out the reasons for the loss of customers , Guide business and strengthen customer maintenance , Enhance customers' viscosity of the bank's products .
The analysis idea can be from the personal information of high-value customers 、 Account information 、 Start with dimension data such as transaction information , And combining third-party data , Using random forest algorithm to build customer churn early warning model , And output the main factors affecting customer churn .
Data sources :
The data comes from CRM Customer basic information table in the system 、 Bills, etc ; Third party data , The data time window is the data of recent one year , Customers are high-value customers , Some data have been obtained in this case, totaling 100000 Data .

Data dimension information includes :
Bank owned fields : Account information 、 Personal information 、 Deposit information 、 consumption 、 Transaction information 、 conduct financial transactions 、 Fund information 、 Counter service 、 Online banking information ;
External third party data : Outbound customer service data 、 Asset data 、 Other consumer data ;
The loss of this case is defined as :3 Customers who have not done any business with banks within months .
The whole data preprocessing flow chart :

correlation analysis
We analyze the characteristic index data through the correlation node correlation analysis , Facilitate feature selection into model training , Pictured :

Through analysis, we found that : Whether to send the customer 、 Card level 、 Average monthly payment amount 、 The maximum amount to be issued is 、 Monthly average AUM、 At the beginning of AUM It is related to whether it is lost , The correlation between other characteristics and loss is 0.
Therefore, we select relevant features through features , As shown in the figure , The tag is listed as whether to lose .
model training
This case samples the random forest algorithm for model training , Split nodes to scale data 7:3 Split into training set and verification set .
The whole model training process is shown in the figure :

The parameter configuration is shown in the figure :

Model to evaluate
We evaluate the data through the evaluation node , As shown in the model training flow chart , The evaluation results are shown in the figure :

We found that in the evaluation results F1 The score is 0.95, It shows that the effect of model prediction is better .
Business analysis
We use random forest features to select nodes to output high importance 5 Features , The result is shown in Fig. :

Through the loss early warning analysis of high-value customers in a business line , It is found that the main factors affecting customer churn are : Monthly average AUM、 At the beginning of AUM、 Card grade, etc . The main reason may be the lack of competitiveness of products 、 Less activity, etc .
therefore , We can take relevant measures and suggestions , Such as : Strengthen customer relationship maintenance 、 Product follow-up 、 Maintenance access 、 Tracking system 、 Expand sales 、 Mechanism maintenance, etc .








