当前位置:网站首页>[basic data mining technology] exploratory data analysis
[basic data mining technology] exploratory data analysis
2022-07-24 20:28:00 【Sunny qt01】
1. Descriptive statistics ( Exploratory data analysis )
Not all projects use complex methods , Descriptive statistics is a simple method .
Case study 1: Telecommunication company
Customers are the biggest assets of telecom companies , Customer behavior exists in the call record of the switch
Understanding customer behavior has become the trend of telecom companies ,
1. Help telecom companies find customers who need telephone lines and let them undertake several more telephone lines , Can bring greater business opportunities to the company
2. When will customers need additional connection lines

We have the data of the switch , Convert time seconds into start time and end time , How to sort time , Get the number of busy lines at each time point , I found that sometimes the line is full , It may cause the customer's incoming call to be busy , Lost business opportunities . As long as two more lines are installed, this problem can be solved .
Simple problems are solved in simple ways .
Case study 2:
I'm sure we've all seen it when shopping in the mall , A preferential plan for a good gift over 500 .
Especially on anniversaries or holidays , This kind of interval preferential scheme is the best marketing scheme , Once consumers hear that there will be a discount if there is a little difference in the amount , It is possible to buy a little more to get a discount .
Customer unit price is an indicator of marketing , Represents the total amount of customers' one-time consumption
Usually, the industry will sell through bundling , Complementary products are placed ( Flashlight battery ), To increase the unit price , Increase performance , And it says 500 Send a gift ,1000 Send hundreds , These preferential activities can effectively increase the customer unit price , Generally, the salesperson will say 500,1000,1500, The intuitive way of multiple attracts customers to buy more , But is this really the best preferential plan ? It is possible that this will lose a lot of performance in vain
This data is the total consumption amount of each customer of a clothing e-commerce ,
We calculate the per customer unit price per consumer
Altogether 21598 Consumer records ,ID: Customer number
Total_Spending The field represents the sum of the total consumption of a customer .

We use the total consumption amount as the horizontal axis , The number of customers is y Axis , Draw a histogram , You can see from the picture , The largest distribution of all customers is 150—200 The interval between is the most people , about 4000 Many people

Draw this density diagram , The dotted line represents the pricing strategy
If the optimization strategy is set at 500,150~200 A customer of yuan will feel that it will cost more than twice as much to get a discount , I definitely don't want to be a wrongdoer , If you have already purchased 450 Yuan is likely to buy extra . therefore 500 Yuan is not a suitable preferential strategy .
The appropriate pricing strategy should be based on consumers' consumption habits (200,300,500)

We can see that these peaks are the consumption habits of consumers
Therefore, we can translate several segmented amounts to improve performance .

So we can put 300,500,800 As pricing data .
(python drawing )
Such simple descriptive statistics can effectively stratify customers , It can also be used to distinguish the member level of credit cards and other items . Again , No discount is allowed , Change to gifts ( Change according to user needs , For example, there was a hotel before , Their customers are business travelers , The money comes from the company , They prefer gifts and massages .).
While increasing performance, it can also increase consumer satisfaction .
Statistics : Take the previous magazine data as an example

Descriptive statistics ( The probability of buying various magazines ): It can be seen that the most popular is house magazine ,( The average ),comic0.081
Prediction accuracy .Naïve predictions Basic standards , If the accuracy is the standard , that naïve predictions Is the basic probability .
Descriptive statistics : The average value distribution of five magazines ( Customer differentiation ):

and naïve prediction The comparison shows the relationship .
2. Visual computing ( Exploratory data analysis )

Distribution of people who buy various magazines

Age distribution of purchasing magazines .( Very uniform , It can be compared with different magazines .)

Age distribution of buying Auto magazines ( A little concentrated )
Age is helpful in predicting whether to buy Auto magazines .

Age distribution of sports magazines ( Age has little effect on it .)

Age and income whether to buy music magazine scatter chart ( Light dots represent purchases , Black dots mean not to buy )
Obviously, people who buy music tend to have low income and low age . Age and income should help predict the purchase of music magazines
summary : Simple questions can be analyzed through exploratory data . Descriptive statistics can be combined with visual analysis .
边栏推荐
- [shader realizes the flicker effect of three primary colors of television signal _shader effect Chapter 5]
- [training Day10] tree [interval DP]
- How to set the allure test report
- Delete remote and local branches
- [msp430g2553] graphical development notes (2) system clock and low power consumption mode
- Pychart tutorial: 5 very useful tips
- Hide the middle digit of ID number
- The difference between delete, truncate and drop in MySQL
- Open source demo | release of open source example of arcall applet
- Alibaba sentinel basic operation
猜你喜欢

Huawei set up login with account and password

Fluoronisin peptide nucleic acid oligomer complex | regular active group alkyne, SH thiol alkynyl modified peptide nucleic acid

YouTube "label products" pilot project launched

API data interface of A-share transaction data

Todolist case

Login Huawei device in SSH mode

C# 窗体应用TreeView控件使用

2022 chemical automation control instrument test question simulation test platform operation

Lights of thousands of families in the year of xinchou

Working principle of envy of istio I
随机推荐
[training Day8] [luogu_p6335] staza [tarjan]
PHP page Jump mode
Do you want to verify and use the database in the interface test
Luogu - p1616 crazy herb picking
Synthesis of peptide nucleic acid PNA labeled with heptachydrin dye cy7 cy7-pna
The U.S. economy continues to be weak, and Microsoft has frozen recruitment: the cloud business and security software departments have become the hardest hit
[FreeRTOS] 10 event flag group
Modulenotfounderror: no module named 'pysat.solvers' (resolved)
Easy to use office network optimization tool onedns
Alibaba sentinel basic operation
English grammar_ Demonstrative pronoun this / these / that / those
Usage and introduction of MySQL binlog
Leetcode 146: LRU cache
[training Day9] light tank [dynamic planning]
Lunch break train & problem thinking: on multidimensional array statistics of the number of elements
Todolist case
[msp430g2553] graphical development notes (1) configuration environment
Leetcode 300 longest increasing subsequence (greedy + binary search for the first element subscript smaller than nums[i]), leetcode 200 island number (deep search), leetcode 494 target sum (DFS backtr
[training Day6] game [mathematics]
OpenGL (1) vertex buffer