当前位置:网站首页>10 statistical methods commonly used for "dry goods" data analysis, with key application scenarios attached
10 statistical methods commonly used for "dry goods" data analysis, with key application scenarios attached
2022-06-30 19:12:00 【Little fire dragon said data】
Estimated reading time :6min
Reading suggestions : This article is a summary of dry goods , Long term use , Suggest collecting before reading .
Solve the pain : Recently, I received some private letters from my classmates , One of the most frequently asked questions is :“ What statistical knowledge do you need to learn to do data analysis ?” Therefore, I would like to share some statistical methods that are widely used in my work .
00
preface
From the perspective of origin , Data analysis is 「 statistical 」 And 「 Computer 」 Interdisciplinary , Statistical knowledge is applied ; From a work point of view , Apply statistical knowledge , It can measure the value of data to the business more scientifically .
therefore , As a data analysis student , It is necessary to master basic statistical knowledge . below , Little fire dragon summarized 「 Commonly used in work 10 Two statistical methods 」.
01
Descriptive statistics 「 Commonly used index :5 star 」
Descriptive statistics , Through the general mathematical method and chart method , Describe the business data and its distribution status , It is most commonly used in work .
The contents are as follows :
The main application scenarios are as follows :
【 Application scenario reference article , Can stamp the blue part 】
- weekly / Monthly report :「 If you complete a high-quality data weekly report / Monthly report 」
02
Hypothesis testing 「 Commonly used index :5 star 」
Hypothesis testing , Used to judge samples and samples 、 The difference between the sample and the population , It is caused by sampling error , Or because there are differences in itself . It mainly covers 「 Parameter test 」 and 「 Nonparametric test 」, The concepts of both are as follows :
Parameter test : Suppose the data obey a certain distribution ( It is generally normal distribution ), The population parameters are tested by the estimators of sample parameters .
Nonparametric test : Regardless of the overall distribution , Test the distribution of data directly .
The contents are as follows :
The main application scenarios are as follows :
【 Application scenario reference article , Can stamp the blue part 】
- Transaction analysis :「 Three ways to quickly locate abnormal dimensions 」
- Causal analysis :「AB Best process of experiment 」
03
Contingency table analysis 「 Commonly used index :3 star 」
Contingency table analysis , It is used to judge whether there is obvious correlation between discrete variables . for example : Whether there is correlation between performance grade and gender .
The contents are as follows :
The main application scenarios are as follows :
04
Correlation analysis 「 Commonly used index :4 star 」
Correlation analysis , It is used to judge a certain relationship and degree of correlation between phenomena , for example : positive correlation 、 negative correlation , It is frequently used in exploratory analysis .
The contents are as follows :
The main application scenarios are as follows :
【 Application scenario reference article , Can stamp the blue part 】
- User growth :「 My understanding of user growth 『 New users 』」
05
variance analysis 「 Commonly used index :2 star 」
variance analysis ( also called F test ), A significance test used to measure the difference between the mean values of two or more samples .
The contents are as follows :
06
regression analysis 「 Commonly used index :5 star 」
regression analysis , Used for fitting daily indicators , And predictions of future trends , It is widely used in work .
The contents are as follows :
The main application scenarios are as follows :
【 Application scenario reference article , Can stamp the blue part 】
- Index prediction :「 Time series prediction artifact -Prophet『 Theory Chapter 』」、「 Time series prediction artifact -Prophet『 Realization chapter 』」
07
Clustering analysis 「 Commonly used index :4 star 」
Clustering analysis , Used to transfer the user to / Content , Without a priori guidance , To divide into categories .
The contents are as follows :
The main application scenarios are as follows :
【 Application scenario reference article , Can stamp the blue part 】
- Clustering analysis :「 On the application of cluster analysis in work 」
08
discriminant analysis 「 Commonly used index :4 star 」
discriminant analysis ( Classification problem ), Judge the category according to the characteristics of the research object . The difference from cluster analysis lies in , Cluster analysis before analysis , I don't know how many categories there are and what they are , And discriminant analysis is when the total category is known , Judge the group to which the new sample belongs .
The contents are as follows :
The main application scenarios are as follows :
【 Application scenario reference article , Can stamp the blue part 】
- User growth :「 My understanding of user growth 『 Loss warning 』」
09
Principal component analysis 「 Commonly used index :2 star 」
Principal component analysis (Principal Component Analysis,PCA), Is a set of possible correlation variables , Into a set of linearly unrelated variables , The transformed set of variables is called the principal component .
The most important function of principal component analysis is 「 Dimension reduction 」, It can also be used for 「 Explore the relationship between variables 」. A brief explanation , In the process of modeling , Many variables are often selected as features , And there is often a correlation between these variables , This will cause 「 Multicollinearity problem 」. therefore , Need a way , Transform these variables into relatively independent and as much information as possible covering the original variables , The principal component is one of them , Transform the original variable into several new variables that are independent of each other .
Eliminate illiteracy - Multicollinearity problem
The independent variables ( features ) Because there is a correlation between , Thus, the model estimation is distorted ( The result is not stable , for example : Contribution degree of random forest characteristics , The results of multiple runs are quite different ).
10
Factor analysis 「 Commonly used index :2 star 」
The function of factor analysis is the same as that of principal component analysis , Also for 「 Dimension reduction 」. The principle is between multiple independent variables , Look for potential factors , Similar variables are grouped into a factor , Replace the original independent variable with a factor .
The same as principal component analysis : Play the role of cleaning up the internal relations in the original independent variables .
Different from principal component analysis : Principal component analysis focuses on the information of aggregated variables , Factor analysis focuses on the information of explanatory variables , Principal component analysis is a subset of factor analysis .
above 10 Three statistical methods are common in work , But the way is more than that , It also includes : Reliability analysis 、 Survival analysis 、 Multiple response analysis 、 Distance analysis, etc . Waiting for you to continue to explore in your work 、 Explore the scene , And apply the knowledge .
The above is the content sharing of this issue .
边栏推荐
猜你喜欢

3.10 haas506 2.0开发教程-example-TFT

What if icloud photos cannot be uploaded or synchronized?

充值满赠,IM+RTC+X 全通信服务「回馈季」开启

屏幕显示技术进化史

Reading notes of "high EQ means being able to talk"

Nodejs 安装与介绍

领导:谁再用 Redis 过期监听实现关闭订单,立马滚蛋!

Evolution of screen display technology

熵-条件熵-联合熵-互信息-交叉熵

20220607跌破建议零售价,GPU市场正全面走向供过于求...
随机推荐
Dlib库实现人脸关键点检测(Opencv实现)
「杂谈」如何改善数据分析工作中的三大被动局面
【社区明星评选】第23期 7月更文计划 | 点滴创作,汇聚成塔!华为FreeBuds 4E等酷爽好礼送不停
sqlserver SQL Server Management Studio和Transact-SQL创建账户、创建访问指定数据库的只读用户
Where do the guests come from
NBI可视化平台快速入门教程(五)编辑器功能操作介绍
Adhering to the concept of 'home in China', 2022 BMW children's traffic safety training camp was launched
Pytorch learning (III)
德国AgBB VoC有害物质测试
Practice and Thinking on the architecture of a set of 100000 TPS im integrated message system
全栈代码测试覆盖率及用例发现系统的建设和实践
The folder is transferred between servers. The folder content is empty
传统微服务框架如何无缝过渡到服务网格 ASM
How to seamlessly transition from traditional microservice framework to service grid ASM
ForkJoinPool
The cloud native landing practice of using rainbow for Tuowei information
20220528【聊聊假芯片】贪便宜往往吃大亏,盘点下那些假的内存卡和固态硬盘
TCP packet sticking problem
屏幕显示技术进化史
拓維信息使用 Rainbond 的雲原生落地實踐