当前位置:网站首页>[basic data mining technology] exploratory data analysis
[basic data mining technology] exploratory data analysis
2022-07-24 20:28:00 【Sunny qt01】
1. Descriptive statistics ( Exploratory data analysis )
Not all projects use complex methods , Descriptive statistics is a simple method .
Case study 1: Telecommunication company
Customers are the biggest assets of telecom companies , Customer behavior exists in the call record of the switch
Understanding customer behavior has become the trend of telecom companies ,
1. Help telecom companies find customers who need telephone lines and let them undertake several more telephone lines , Can bring greater business opportunities to the company
2. When will customers need additional connection lines

We have the data of the switch , Convert time seconds into start time and end time , How to sort time , Get the number of busy lines at each time point , I found that sometimes the line is full , It may cause the customer's incoming call to be busy , Lost business opportunities . As long as two more lines are installed, this problem can be solved .
Simple problems are solved in simple ways .
Case study 2:
I'm sure we've all seen it when shopping in the mall , A preferential plan for a good gift over 500 .
Especially on anniversaries or holidays , This kind of interval preferential scheme is the best marketing scheme , Once consumers hear that there will be a discount if there is a little difference in the amount , It is possible to buy a little more to get a discount .
Customer unit price is an indicator of marketing , Represents the total amount of customers' one-time consumption
Usually, the industry will sell through bundling , Complementary products are placed ( Flashlight battery ), To increase the unit price , Increase performance , And it says 500 Send a gift ,1000 Send hundreds , These preferential activities can effectively increase the customer unit price , Generally, the salesperson will say 500,1000,1500, The intuitive way of multiple attracts customers to buy more , But is this really the best preferential plan ? It is possible that this will lose a lot of performance in vain
This data is the total consumption amount of each customer of a clothing e-commerce ,
We calculate the per customer unit price per consumer
Altogether 21598 Consumer records ,ID: Customer number
Total_Spending The field represents the sum of the total consumption of a customer .

We use the total consumption amount as the horizontal axis , The number of customers is y Axis , Draw a histogram , You can see from the picture , The largest distribution of all customers is 150—200 The interval between is the most people , about 4000 Many people

Draw this density diagram , The dotted line represents the pricing strategy
If the optimization strategy is set at 500,150~200 A customer of yuan will feel that it will cost more than twice as much to get a discount , I definitely don't want to be a wrongdoer , If you have already purchased 450 Yuan is likely to buy extra . therefore 500 Yuan is not a suitable preferential strategy .
The appropriate pricing strategy should be based on consumers' consumption habits (200,300,500)

We can see that these peaks are the consumption habits of consumers
Therefore, we can translate several segmented amounts to improve performance .

So we can put 300,500,800 As pricing data .
(python drawing )
Such simple descriptive statistics can effectively stratify customers , It can also be used to distinguish the member level of credit cards and other items . Again , No discount is allowed , Change to gifts ( Change according to user needs , For example, there was a hotel before , Their customers are business travelers , The money comes from the company , They prefer gifts and massages .).
While increasing performance, it can also increase consumer satisfaction .
Statistics : Take the previous magazine data as an example

Descriptive statistics ( The probability of buying various magazines ): It can be seen that the most popular is house magazine ,( The average ),comic0.081
Prediction accuracy .Naïve predictions Basic standards , If the accuracy is the standard , that naïve predictions Is the basic probability .
Descriptive statistics : The average value distribution of five magazines ( Customer differentiation ):

and naïve prediction The comparison shows the relationship .
2. Visual computing ( Exploratory data analysis )

Distribution of people who buy various magazines

Age distribution of purchasing magazines .( Very uniform , It can be compared with different magazines .)

Age distribution of buying Auto magazines ( A little concentrated )
Age is helpful in predicting whether to buy Auto magazines .

Age distribution of sports magazines ( Age has little effect on it .)

Age and income whether to buy music magazine scatter chart ( Light dots represent purchases , Black dots mean not to buy )
Obviously, people who buy music tend to have low income and low age . Age and income should help predict the purchase of music magazines
summary : Simple questions can be analyzed through exploratory data . Descriptive statistics can be combined with visual analysis .
边栏推荐
- [training Day8] [luogu_p6335] staza [tarjan]
- Leetcode 560 and the subarray of K (with negative numbers, one-time traversal prefix and), leetcode 438 find all alphabetic ectopic words in the string (optimized sliding window), leetcode 141 circula
- Near infrared dye cy7.5 labeling PNA polypeptide experimental steps cy7.5-pna|188re labeling anti gene peptide nucleic acid (agpna)
- [training Day8] series [matrix multiplication]
- Framework API online viewing source code
- Unitywebgl project summary (unfinished)
- Selenium is detected as a crawler. How to shield and bypass it
- [training Day6] triangle [mathematics] [violence]
- How to set the allure test report
- Usage and introduction of MySQL binlog
猜你喜欢

147-利用路由元信息设置是否缓存——include和exclude使用——activated和deactivated的使用

Leetcode 560 and the subarray of K (with negative numbers, one-time traversal prefix and), leetcode 438 find all alphabetic ectopic words in the string (optimized sliding window), leetcode 141 circula

Risk control system, implemented by flink+clickhouse!

Lunch break train & problem thinking: on multidimensional array statistics of the number of elements

"Hualiu is the top stream"? Share your idea of yyds
![[Extension Program - cat scratch 1.0.15 _ online video and audio acquisition artifact _ installation tutorial plus acquisition]](/img/75/5eca7f63758802ecf86a90a1bbdeaf.png)
[Extension Program - cat scratch 1.0.15 _ online video and audio acquisition artifact _ installation tutorial plus acquisition]

Generate self signed certificate: generate certificate and secret key

Valdo2021 - vascular space segmentation in vascular disease detection challenge (I)

Lights of thousands of families in the year of xinchou

Istio一之Envoy工作原理
随机推荐
(posted) differences and connections between beanfactory and factorybean
Substr and substring function usage in SQL
Covid-19-20 - basic method of network segmentation based on vnet3d
[training Day10] linear [mathematics] [thinking]
C form application treeview control use
The difference between map and flatmap in stream
Leetcode 206 reverse linked list, 3 longest substring without repeated characters, 912 sorted array (fast row), the kth largest element in 215 array, 53 largest subarray and 152 product largest subarr
Easy to use office network optimization tool onedns
Istio二之流量劫持过程
Understand the domestic open source Magnolia license series agreement in simple terms
[training Day9] light tank [dynamic planning]
Each blogger needs to ask himself seven basic questions
Leetcode 560 and the subarray of K (with negative numbers, one-time traversal prefix and), leetcode 438 find all alphabetic ectopic words in the string (optimized sliding window), leetcode 141 circula
In the era of new knowledge economy, who is producing knowledge?
Leetcode 48 rotating image (horizontal + main diagonal), leetcode 221 maximum square (dynamic programming DP indicates the answer value with ij as the lower right corner), leetcode 240 searching two-d
[training Day8] [luogu_p6335] staza [tarjan]
Flink Window&Time 原理
Transport layer protocol parsing -- UDP and TCP
Leetcode 1928. minimum cost of reaching the destination within the specified time
From code farmer to great musician, you only need these music processing tools