当前位置:网站首页>Graphic data analysis | business cognition and data exploration
Graphic data analysis | business cognition and data exploration
2022-06-12 02:01:00 【ShowMeAI】

author : Han Xinzi @ShowMeAI
Tutorial address :http://www.showmeai.tech/tutorials/33
This paper addresses :http://www.showmeai.tech/article-detail/137
Statement : copyright , For reprint, please contact the platform and the author and indicate the source

The core steps of data analysis are divided into : Business cognition and data exploration 、 Data preprocessing 、 Business cognition and data exploration Wait for three core steps .
This article introduces the first step —— Business cognition and data exploration .
One 、 Common business objectives

(1) Descriptive analysis : Analyze and describe the characteristics of the data
Descriptive analysis is a good way to deal with information aggregation , Combined with visual analysis , Can provide a comprehensive data structure , And show the conclusions through the dashboard . Through analysis in enterprises KPI To evaluate performance , Is one of the common applications of descriptive analysis .
(2) Predictive analysis : Predict future results
Predictive analysis is a complex field , Need large-scale historical data , And with the help of technological progress ( Especially machine learning ) Build a high concurrency prediction model , To gain predictive insight into the future .
(3) Diagnostic analysis : Diagnose practical problems through data
Through diagnostic analysis , You can think critically about the data , Judge the actual problems , And further optimize management or reduce losses . Logistics enterprises can reduce logistics delay by using diagnostic analysis , E-commerce enterprises can update their marketing strategies with the help of diagnostic analysis , Reduce cost and improve efficiency .
(4) Normative analysis : The integrator of data analysis
Normative analysis combines all the above analysis techniques , It can help the company make relevant decisions on the basis of data conclusions . It should be noted that , The necessary basis for using canonical analysis is a large enough data scale 、 Quantity use AI technology , Therefore, it is often used in large Internet companies ( Such as Google) And financial institutions .
Two 、 Common business indicators
2.1 Vanity indicators VS Effective indicators
An indicator is an index to measure the quality of things . There are many data indicators , Polaris indicators , Two level index 、 Qualitative indicators 、 Quantitative index 、 Vanity indicators, etc . Good data indicators , Guidance should be provided for the development stage of the product business line , Some data have the opposite effect .

- Vanity indicators (Vanity Metrics) Make people feel good , But it is superficial and even deceptive . for example , Only focus on the number of visits to the website (PV、UV), But ignore the jump rate 、 When users browse data indicators such as time , It's easy to fall into the trap brought by vanity indicators .
- Effective indicators (Clarity Metrics) It refers to those indicators that can really bring benefits . for example , Conversion rate 、 Retention 、 Proportion of daily active people, etc , Can better insight into the actual trend of products and user behavior .
2.2 Select the appropriate indicators
(1) Ratio indicators
The ratio indicator has 『 Comparative 』, So as to become the best data index . An example of driving , Mileage reveals distance information , And speed ( distance / Time ) To tell you it's accelerating , Or slow down .
The comparison object of the ratio index , It's different periods 『 Self - 』 contrast , Used to compare the growth trend of things ; Or the comparison between different objects in the same period , To compare the growth of different things .

Usually , User behavior analysis uses the following ratio indicators ( Or one of them ):
- Time related indicators ——『XXX Speed 』, for example , Growth rate of new users ( Number of new users per unit time ),
- Proportion related to quantity ——『XXX rate / Than 』, for example , More active users than ( Proportion of active users in total users ).
(2) Multi index joint analysis
Put aside the cardinality , The meaning of ratio will be greatly reduced . quote 《 Benefit data analysis 》 Examples in ,“ When your product was first launched , Strictly speaking , Your father registered an account , It can also double your number of users ”.
Be careful ! Although the ratio is the best data indicator , But it also needs to be supplemented by other qualitative and quantitative indicators . Indicators never exist alone , Instead, it should be comprehensively evaluated from multiple indicators . This requires understanding the coupling between indicators , Design index system , Reveal the hidden facts in the data from various angles .
2.3 Combined with application scenarios
When analyzing data and designing key indicators , It should be combined with the actual business , Restore to a specific scene . for example , When analyzing user behavior , The stickiness of users to services or products (Stickiness) Is an effective indicator . In different usage scenarios , There are also differences in the measurement of stickiness :
Stickiness refers to when a customer purchases a product or service , The degree of willingness to buy again or recommend to others .

Jump off rate (Bounce Rate): For content products , When visitors click on your web page , How many people will shut down immediately , What's the percentage .
Conversion rate (Conversion Rate): When the product is promoted , The conversion rate is the ratio of the number of registrations to the number of visitors , Indicates how many visitors actually become registered users of the website .
Retention (Retention Rate): For new users , It is used to test whether new users start using the product within a certain period of time , After a period of time , Proportion of users who continue to use the product .
Activity level It is also an effective measure . For social networking sites , Number of daily active users (Daily Active Users, DAU) Proportion is a key indicator .
2.4 Typical cases

Case study : At a cloud product presentation ( Activities ) in , The organizer made a comprehensive demonstration and explanation of the major update of the product , I hope that through offline interaction and online live broadcast , To attract more users to use the product .
(1) Clear analysis objectives
If the goal is to evaluate the benefits of the activity , As a data analyst , What will be done ?
Ideas : For this scene , You can use contemporaneous group analysis , According to whether participants are registered , Break down participants into unregistered people ( Potential users ) And registered personnel ( Old users ).
After the event , Some unregistered users become new users . There may be a variety of incentives to attract new users to register , The most common is a free trial . Later, you can track the product usage behavior of users from different sources .
Old users are fans of products , It's paid users , Can bring benefits to the company . Paid users may lose , It is also possible to rely more on products 、 Use the product more frequently .
Be careful ! When analyzing user behavior , Be aware of , Although some users can't bring direct benefits , But it can bring more users , So as to indirectly create income for the company .
(2) Design analysis index
Ideas : Take human behavior as the starting point , from 『 Quantitative data 』 and 『 Qualitative data 』 Design indicators from two angles . In order to analyze the benefits of the activity , The following key indicators can be developed :
[1] Participation (Engagement): Measure how many people participated in the event , Used to evaluate the impact of activities
The total number of 、 Number of old users 、 Number of potential users .
After participating in the activity , How many users have registered ( Free or trial ).
[2] Observe the behavior of new users : The conversion rate can evaluate the benefit of the activity , Retention rate can evaluate the stickiness of new users to products
At the meeting 、 Unregistered personnel , Is a potential user of the product . Regardless of the use of 『 Free trial 』 still 『 Buy + Give 』 And other promotion methods , As long as the user registers , It can be considered to have completed a transformation .
Conversion rate of new users (Conversion Rate): Proportion of new users registered after participating in the activity , Higher conversion rate , It shows that the more attractive the product is to users , The better the benefits of the activity .
The source of new users : Analyze the source and transformation path of new users , It helps to determine which sources bring more effective transformation .
Conversion path for new users ( conversion funnel ): Free trial (Free Trial) New users registered , The data indicators of the transformation path are 『 New trial (New Trial)』『 Active the next day (Activated Trial)』『7 The day is active (Activated Trial in 7 days)』.
Retention rate of new users : In a continuous billing cycle , Proportion of new users still active in the same period .
[3] Observe the behavior of old users : Loyalty and usage of old users
Regular users refer to before participating in the event , Registered users , These users are loyal fans of the product . Analyze the behavior of old users , Can also evaluate the effectiveness of activities .
Loyalty refers to how often users use products , And the degree of dependence on products and services . If this activity encourages regular users to use the product more frequently ( That is, the consumption of old users increases ), It shows that holding this event has improved the loyalty of old users , Bring about an increase in benefits .
The consumption of old users has increased : After participating in the activity , The consumption of old users is compared with that of the previous payment cycle , Increase or decrease .
Proportion of users with increased consumption : After participating in the activity , Proportion of users with increased consumption .
Consider some special circumstances , How many old users have been recalled , How many old users are lost :
Zombie account activation (New billed Customers): How many old users left , Re pay for products or services .
Loss of paying users (Churned Customers): Users in a payment cycle , No longer pay for products or services .
[4] Observe the dosage of the product : Service type
Amount of product , It is also an effective index to evaluate the benefits of activities . The behavior of users using products , In fact, the services provided in consumer products . increase 『 Service type 』 This dimension , You can assess whether the increase in consumption is related to this activity —— Compare the total dosage 、 Dosage related to the theme of the activity .
If the consumption related to the theme increases greatly , The growth of total consumption is relatively flat , So it shows that this activity has promoted the growth of consumption .
If the consumption related to the theme increases slowly , The growth of total consumption is also flat , So it means that this activity has no effect on the increase of consumption .
Be careful ! The comparison should be comparable . In the processing of comparison data , The impact of new users on data should be eliminated . Take the time of the event as the dividing point , Only compared with old users N(1-3) Within and after a payment cycle N(1-3) Usage in a payment cycle . In order to calculate the dosage more accurately , The influence of individual factors should be avoided , have access to 2-3 The average consumption in a payment cycle .
Analysis of consumption growth of old users :
- Proportion of old users with increased consumption
- After participating in the activity , How much does the consumption of old users increase , What is the proportion of growth
- Average usage per user
Consumption growth analysis related to the activity theme :
- Proportion of old users with increased consumption ( Activities related )
- After participating in the activity , How much does the consumption of old users increase , What is the proportion of growth ( Activities related )
- Average usage per user ( Activities related )
(3) Adjust as needed
it is to be noted that , Data analysis should be combined with business , According to the main problems faced , Set key data indicators , To answer the problems encountered in decision-making . for example :
- When the product is not well known , The number of visits to the website can be used as a key indicator .
- When you have a large user base , You can count the number of registered people ( Or number of probationers ) As a key indicator .
- When the number of registrants increases to a certain extent , The conversion rate can be ( namely , Convert free users into paid users ) As a key indicator .
2.5 Common data indicators for website analysis
Attach the data indicators commonly used in website analysis , Can you identify which indicators of vanity ?

(1) Website traffic indicators
Page view The amount (Page View,PV): Visitors are recorded every time they open a page .
Number of unique visitors (Unique Visitors,UV): The number of unique visitors to the website in a day .
Number of repeat visitors (Repeat Visitors,RV): In a day , Repeat visitors to the site .
Page views visited by visitors (Page Views per User): The average number of pages visited by each visitor , If the index is high , It indicates that the user is highly viscous , in other words , Visitors show interest in the website 、 Willing to stay for a long time .
(2) User behavior indicators
Jump out rate : It means that users only browse one page and leave the website , The jump out rate shows the degree of interest of visitors to the website : The lower the bounce rate, the more interested visitors are in the website .
Average visit time : It refers to the length of stay of each visit , The larger the indicator , It means that the longer visitors stay on the web page , The more interested you are in the website .
Average number of pages visited : It refers to the number of pages viewed per visit , The larger the indicator , It shows that visitors are more interested in the website .
(3) Transformation indicators
Number of conversions : It refers to the number of times visitors arrive at the conversion target page , Transformation means that the visitor has done what the website administrator wants the visitor to do , Related to the promotion purpose expected by the website operator .
Conversion rate : It refers to the efficiency of access conversion , Conversion rate = Number of conversions / Number of visits , Higher conversion rate , It shows that the promotion effect of the website is better .
Transformation path : Path refers to a series of intermediate pages that visitors pass through before arriving at the target page you set , What you expect from a prospect is to complete a transformation , It is closely related to your promotion purpose and the definition of promotion effect . By tracking the transformation path , You can learn about visitors' access to each step of transformation .
(4) Retention indicators
Number of retained persons : Registered users after their first visit to the website , Number of people who continue to use the website in the next cycle .
Retention (Retention Rate): Number of people who continue to use the website in the next cycle , Proportion of registered users in the current period , The higher the retention rate , It shows that the higher the stickiness of users to the website .
(5) Source analysis
source : How do visitors access the website , Direct access , Or search engines .
Search term analysis : What search terms did you find and visit the website on various search engines .
(6) Visitor attribute analysis
The age of the visitor
Visitor's area
Old visitors / New visitors
3、 ... and 、 Exploratory data analysis
The basis of data analysis is the cognition of data , First there's data , And then there's the analysis . There's no credible data , The results of data analysis will be castles in the air . Variables and data are commonly used in data analysis : Variables to describe the characteristics of things , Data is the specific value of a variable ( Also known as observations ).

3.1 Variable
A variable used to describe a member of a population . In data analysis , Variable (Variable) Can be associated with properties 、 dimension (Dimension)、 features (feature) Use interchangeably . Common variables are gender 、 Age 、 height 、 Income, etc .
According to the function of variable value , Variables can be divided into qualitative variables and quantitative variables .
(1) Qualitative variables
Qualitative variables refer to the use of text to describe the characteristics of objects . Qualitative data is usually an angle of data analysis , Increase the dimension , Look at things from a different perspective , Ability to segment indicators , Increase the depth of analysis . Qualitative variables are mainly divided into three categories :
Nominal variable : Also called category variables , Used to classify data objects (Category). such as , The color of hair 、 occupation .
Binary variable : There are only two categories of variables , If the two states of the binary variable have the same value or have the same weight , Then consider that the binary variable is symmetric , for example , Gender ; Asymmetry means that the results of two states are not equally important . for example , Whether or not smoking is related to the effect of treatment , The weights are different .
Ordinal variable : The order of variables is meaningful , Usually used for rating . Usually , Ordinal variables are qualitative text , such as , Official position 、 Consumer satisfaction . however , Ordinal variables can also be obtained by dividing numerical variables into different intervals , such as , age group .
- In ordinal variables , There is an important class of variables , It's called a time variable , Some common methods of analysis , Such as timing analysis , Periodicity analysis is based on time variables .
(2) Quantitative variable
Quantitative variables refer to the use of numerical values to describe objects , You can compare the size , Are quantifiable variables . Quantitative variables usually contain dimensions . for example , The dimension of height is cm, And the salary dimension is yuan . Data of the same dimension can be compared in size ; Data of different dimensions , After dimensioning by normalization , Comparison of size makes sense . Quantitative variables are mainly divided into two scales :
Interval scale : Measurable value , Expressed as an integer or a real number . such as , Age 、 salary
Ratio scale : Proportional value . such as , Speed 、 Retention
3.2 data
Data is the specific value of a variable .
According to the type of variable , The data can be divided into : Classified data 、 Sequential data and numerical data .
According to the purpose of data analysis , The data can be divided into : test group (Treatment) And reference groups (Control).
According to the type of data , The data can be divided into : Text data 、 Numerical data and date time data .
3.3 Basic statistical description of data
Statistics is a good assistant for data analysis , View the basic statistical description of the dataset , Can help us understand the whole picture of the data , Identify the distribution of data . Because quantitative data is inherently computational , The distribution of data is usually a statistical description of quantitative data .
Basic statistical description mainly refers to the centralized trend of data 、 Discrete trends and distributions to understand data . Each statistical description , Use specific statistics to measure .
Data and code download
The code for this tutorial series can be found in ShowMeAI Corresponding github Download , Can be local python Environment is running , Babies who can surf the Internet scientifically can also use google colab One click operation and interactive operation learning Oh !
The quick look-up tables involved in this series of tutorials can be downloaded and obtained at the following address :
Expand references
ShowMeAI Recommended articles
- Introduction to data analysis
- Data analysis thinking
- Mathematical basis of data analysis
- Business cognition and data exploration
- Data cleaning and preprocessing
- Business analysis and data mining
- Data analysis tool map
- Statistical and data science computing tool library Numpy Introduce
- Numpy And 1 Dimension array operations
- Numpy And 2 Dimension array operations
- Numpy And high-dimensional array operation
- Data analysis tool library Pandas Introduce
- The illustration Pandas A complete collection of core operating functions
- The illustration Pandas Data transformation advanced functions
- Pandas Data grouping and operation
- Principles and methods of data visualization
- be based on Pandas Data visualization
- seaborn Tools and data visualization
ShowMeAI A series of tutorials are recommended
- The illustration Python Programming : From introduction to mastery
- Graphical data analysis : From introduction to mastery
- The illustration AI Mathematical basis : From introduction to mastery
- Illustrate big data technology : From introduction to mastery

边栏推荐
- 通用树形结构的迭代与组合模式实现方案
- [learn FPGA programming from scratch -19]: quick start chapter - operation steps 4-1- Verilog software download and construction of development environment - Altera quartz II version
- LeetCode Algorithm 1791. 找出星型图的中心节点
- Modification of system module information of PHP security development 12 blog system
- 联调这夜,我把同事打了...
- A mystery of the end of vagrant up
- Point cloud perception algorithm interview knowledge points (I)
- Various error reporting solutions encountered by Kali during Empire installation
- 聯調這夜,我把同事打了...
- 如何定位关键词使得广告精准投放。
猜你喜欢

聯調這夜,我把同事打了...

括号生成(回溯)

matplotlib. pyplot. Bar chart (II)

阿里云oss文件上传系统

Pyinstaller packaging Exe (detailed tutorial)

Why should a redis cluster use a reverse proxy? Just read this one

How to automatically color cells in Excel

2022 blind box applet app has become a new drainage outlet for enterprises

如何最大化的利用各种匹配方式? ——Google SEM

商城开发知识点
随机推荐
Advantages of Google ads
What are the preparations for setting up Google search advertising series?
Linux(CentOS7)安装MySQL-5.7版本
如何让杀毒软件停止屏蔽某个网页?以GDATA为例
如何提高广告的广告评级,也就是质量得分?
Annotate your own point cloud dataset with labelcloud open source tool as a tutorial of Kitti annotation format (support PCD and bin point clouds)
Simulated 100 questions and simulated examination for safety management personnel of metal and nonmetal mines (small open pit quarries) in 2022
阿里云oss文件上传系统
Tiobe - programming language ranking in June 2022
MySQL advanced knowledge points
[untitled]
Concepts of programs, processes, and threads
颠倒字符串中的单词(split、双端队列)
Bracket generation (backtracking)
Sogou Pinyin official website screenshot tool pycharm installation
LeetCode Algorithm 1791. Find the central node of the star chart
Why should a redis cluster use a reverse proxy? Just read this one
Three line code solution - Maximum sub segment and - DP
国资入股,建业地产这回稳了吗?
Modification of system module information of PHP security development 12 blog system