当前位置:网站首页>One time summary: 64 common terms for data analysis!
One time summary: 64 common terms for data analysis!
2022-06-25 14:07:00 【Xinyi 2002】
Source: grounding gas school
Hello everyone , I'm Junxin ~
This article , Let's talk about the common words of data analysis
1、 Absolute and relative numbers
Absolute number : It reflects the objective phenomenon in a certain time 、 Total size at a given location 、 Comprehensive indicators of the overall level , It is also a common indicator in data analysis . Such as in GDP, Total population, etc .
Relative number : It refers to the value calculated by two related indicators , It is a comprehensive index that reflects the close degree of quantitative relationship between objective phenomena . Relative numbers are generally expressed in multiples 、 Percentages, etc. indicate . The calculation formula of relative number :
Relative number = It's worth ( Ratio )/ Base value ( base )
2、 Percentages and percentages
percentage : Is one of the relative numbers , It means that one number is a percentage of another number , Also called percentage or percentage . The denominator of the percentage is 100, Also is to use 1% As a unit of measure , So it's easy to compare .
Percentage points : It refers to the change range of relative indicators expressed in the form of percentage in different periods ,1% be equal to 1 percentage .
3、 Frequency and frequency
frequency : The number of times a data appears in the whole .
frequency : The ratio of the number of times an event occurs to the total number of events . Frequency is usually expressed in proportion or percentage .
4、 Proportion and ratio
The proportion : It refers to the proportion of each data in the total , It usually reflects the composition and proportion of the whole , That is, the relationship between part and whole .
ratio : Is the sample ( Or the overall ) The ratio between different categories of data in , Since the ratio is not a comparative relationship between part and whole , Therefore, the ratio may be greater than 1.
5、 Multiples and multiples
Multiple : Divide one data by another to get , Multiples are generally used to indicate rise 、 Growth rate , Generally, it does not mean the reduction range .
Fan number : Refers to the original number of 2 Of n Power .
6、 Year on year and month on month
Year on year : It refers to the ratio obtained by comparing with the data of the same period in history , Reflect the relativity of the development of things .
Chain ratio : It refers to the value obtained by comparing with the value in the previous statistical period , It mainly reflects the phase by phase development of things .
7、 Variable
Variables come from mathematics , It can store calculation results or represent abstract concepts of values in computer language . Variable can be accessed by variable name .
8、 Continuous variable
In statistics , Variables can be divided into continuous variables and discrete variables according to whether their values are continuous or not . Variables that can take any value in a certain interval are called continuous variables , Its value is continuous , Two adjacent values can be infinitely divided , You can take infinite numbers . Such as : Age 、 Weight and other variables .
9、 Discrete variables
The values of discrete variables are separated by integers , Such as number 、 Number of factories 、 Number of machines, etc , Can only be calculated as an integer . The values of discrete variables can only be obtained by counting .
10、 Qualitative variables
Also known as categorical variable : When the observed individual can only belong to one of several incompatible categories , Generally, non numbers are used to express its category , Such observations are called qualitative variables . It can be understood that other variables can be classified , Such as education background 、 Gender 、 Marriage, etc .
11、 mean value
That's the average , An average is a quantity that represents a trend in a set of data sets , It refers to the sum of all the data in a group of data and then divided by the number of data in this group .
12、 Median
For a finite set of numbers , You can find the middle one as the median after ranking all the observations . If there are even numbers of observations , The median is usually the average of the two values in the middle .
13、 Missing value
It means that the value of one or some attributes in the existing dataset is incomplete .
14、 outliers
It refers to a group of measured values whose deviation from the average value exceeds twice the standard deviation , The deviation from the mean is more than three times of the standard deviation , Outliers called height anomalies .
15、 variance
It is a measure of the dispersion of random variables or a set of data . In probability theory, variance is used to measure random variables and their mathematical expectations ( Mean value ) The degree of deviation between . Variance in Statistics ( Sample variance ) Is the average of the square of the difference between each sample value and the average of all sample values . In many practical problems , It is of great significance to study the variance, that is, the degree of deviation . Variance is a measure of the difference between the source data and the expected value .
16、 Standard deviation
It is also called mean square error , Is the square root of the arithmetic mean of the square of the deviation from the mean , use σ Express . Standard deviation is the arithmetic square root of variance . Standard deviation can reflect the discrete degree of a data set . Two sets of data with the same average , Standard deviation may not be the same .
17、 Pearson correlation coefficient
Pearson correlation coefficient is a statistic used to reflect the linear correlation degree of two variables . The correlation coefficient is r Express , among n Is the sample size , They are the observed values and mean values of the two variables .r Describes the degree of linear correlation between two variables .r The greater the absolute value of, the stronger the correlation .
18、PV(Page View) Page views
It refers to the total number of users who visit the website or a page in a certain period of time , It is usually used to measure the traffic effect brought by an article or an activity , It is also an important index to evaluate the daily traffic data of the website .PV Repeatable , Take the user's visit to the website as the statistical basis , The user recalculates every time he refreshes .
19、UV(Unique Visitor) Independent visitor
Refers to the total number of users who come to the website or page , This user is independent , The same user visiting the website at different times is only counted as an independent visitor , No repeated accumulation , Usually, the PC Terminal Cookie Take the quantity as the statistical basis .
20、Visit visit
It means that users come to the website through external links , From when the user comes to the website to when the user closes the page in the browser , This process counts as a visit .
21、Bounce Rate Jump out rate
It means that the user comes to the website through the link , Leaving the site without any interaction on the current page , This is considered as an addition to this page “ Jump out of ”, The bounce rate is generally for a certain page of the website .
Jump out rate = The number of users that pop up on this page /PV
22、 Exit rate
Generally speaking, for a certain page . After a user visits a page of a website , Close all pages related to this website from the browser , Even if this page adds a “ sign out “.
Exit rate = Number of users exiting this page /PV
23、Click Click on
Generally speaking, for paid advertising , It means that the user clicks on a link 、 page 、banner The number of times , Repeatable . For example, I am in PC When I see a news link on the end, I click in to see it for a while and then close it , After a while, I clicked in and looked at it again , That's even if I contributed two clicks to this news .
24、avr.time Average length of stay
Refers to a page that is accessed by users , The average length of stay on the page , Usually used to measure the quality of a page's content .
avr.time= Number of visitors / Total length of stay of users
25、CTR Click through rate
Refers to an advertisement 、Banner、URL The ratio of the number of clicks to the total number of views . It is generally used to assess the drainage effect of advertising .
CTR= clicks (click)/ The number of times that users see
26、Conversion rate Conversion rate
It refers to the number of times the user completes the set conversion phase and the percentage of the total number of sessions , It is usually used to evaluate the quality of a transformation link , If the conversion rate is low, the conversion link needs to be optimized . Conversion rate = Number of conversion sessions / Total number of calls
27、 funnel
It usually refers to a clear process before generating target transformation , For example, shopping on Taobao , From clicking on the product link to viewing the details page , Then go to check customer comments 、 Get merchant coupons , Then go to fill in the address 、 payment , Every link may lose users , This requires businesses to do a good job in every transformation link , Funnel is an index to evaluate the advantages and disadvantages of transformation links .
28、 Return on investment (ROI:Return On Investment )
Reflect the relationship between input and output , Measure whether my investment is worth , How much value can it give me ( Not just profits ), This is from the perspective of investment or long-term business .
Its calculation formula is : Return on investment (ROI)= Annual profit or average annual profit / Total investment ×100%, It is usually used to evaluate the value of an enterprise to an activity ,ROI High indicates that the project is of high value .
29、 Repeat purchase rate
Refers to the number of repeat purchases by consumers on the website
30、 Loss analysis (Churn Analysis/Attrition Analysis)
Describe which customers may stop using the company's products / Business , And identify which customers will lose the most . The results of churn analysis are used to prepare new offers for customers who may lose .
31、 Customer segmentation & portrait (Customer Segmentation & Profiling)
Based on existing customer data , Will feature 、 Customers with similar behaviors are classified and grouped . Describe and compare groups .
32、 Customer life cycle value (Lifetime Value, LTV)
The customer is waiting for him / The expected converted profits generated for a company in her life .
33、 Shopping basket analysis (Market Basket Analysis)
Identify the combination of goods or services that often occur simultaneously in transactions , For example, products that are often purchased together . The results of this analysis are used to recommend additional products , Provide basis for the decision-making of displaying goods, etc .
34、 Real time decision making (Real Time Decisioning, RTD)
Help enterprises make real-time decisions ( Almost no delay ) The best sales / Marketing decisions . such as , Real time decision system ( Scoring system ) You can use a variety of business rules or models , At the moment when customers interact with the company , Score and rank customers .
35、 Retain / Customer retention (Retention / Customer Retention)
Refers to the percentage of customer relationships that can be maintained for a long time after establishment .
36、 correlation analysis (Correlation analysis)
It's a data analysis method , Used to analyze whether there is a positive correlation between variables , Or negative correlation .
37、 Survival analysis (Survival Analysis)
Estimate how long a customer will continue to use a business , Or the possibility of loss in subsequent periods . This kind of information enables the enterprise to judge the customer retention of the period to be predicted , And introduce appropriate loyalty policies .
38、 Algorithm (Algorithms)
A mathematical formula that can complete some kind of data analysis .
39、 business intelligence (Business Intelligence)
Analyze the data 、 Present information to help enterprise executives 、 management layer 、 The application of more informed business decisions by others 、 facilities 、 Tools 、 The process .
40、 Classification analysis (Classification analysis)
The systematic process of obtaining important relevant information from data ; This kind of data is also called metadata (meta data), Is the data that describes the data .
41、 Clustering analysis (Clustering analysis)
It is the aggregation of similar objects , Each class of similar objects is combined into a cluster ( Also called clusters ) The process of . The purpose of this analysis method is to analyze the differences and similarities between the data .
42、 comparative analysis (Comparative analysis)
When pattern matching in a very large dataset , Through step-by-step comparison and calculation process, the analysis results are obtained .
43、 Data analysis (Data Analysis)
According to the purpose of analysis , Use appropriate analytical methods and tools , Process and analyze the data , Extract valuable information , The process of forming valid conclusions .
44、 Data processing (Data Processing)
Data processing refers to the purpose of data analysis , Process the collected data 、 Arrangement , Form a style suitable for data analysis , It's an essential stage before data analysis .
45、 data mining (Data mining)
Data mining is through the use of complex pattern recognition technology , To find meaningful patterns , And get a lot of data insights .
46、 Data cleaning (Data cleansing)
The process of re examining and verifying data , The purpose is to remove duplicate information 、 Correct existing errors , And provide data consistency .
47、 Data quality (Data Quality)
Processes and techniques to ensure the reliability and practical value of data . High quality data should faithfully reflect the transaction process behind it , And can meet the requirements in operation 、 Decision making 、 Intended use in planning .
48、 Data modeling (Data modelling)
Use data modeling technology to analyze data objects , In order to understand the internal meaning of data .
49、 Data sets (Data set)
A collection of large amounts of data .
50、 discriminant analysis (Discriminant analysis)
Classify the data , According to different classification , Data can be assigned to different groups , Category or directory . It's a statistical analysis , The known information of some groups or clusters in the data can be analyzed , And get the classification rules .
51、 Exploratory analysis (Exploratory analysis)
Mining patterns from data without standard processes or methods . Is a way to discover the main characteristics of data and data sets .
52、 machine learning (Machine learning)
Part of AI , It means that machines can learn from the tasks they complete , Self improvement through long-term accumulation .
53、 Network analysis (Network analysis)
Analyze the relationship between nodes in network or graph theory , That is to analyze the connection and strength relationship between nodes in the network .
54、 Outlier detection (Outlier detection)
Outliers are objects that deviate significantly from the total average of a data set or a data combination , This object is very different from other objects in the dataset , therefore , The occurrence of outliers means that there is a problem with the system , This needs additional analysis .
55、 pattern recognition (Pattern Recognition)
Identify patterns in data through algorithms , And predict the new data in the same data source
56、 Predictive analysis (Predictive analysis)
One of the most valuable big data analysis methods , This method helps to predict an individual's future ( In the near future ) act , For example, someone is likely to buy some goods , You may visit some websites , Do something or act . By using a variety of different data sets , For example, historical data , Transaction data , Social data , Or the customer's personal information data , To identify risks and opportunities .
57、 regression analysis (Regression analysis)
Determine the dependency between two variables . This method assumes that there is a one-way causal relationship between two variables ( translator's note : The independent variables , The dependent variable , The two are not interchangeable ).
58、 Path analysis (Routing analysis)
For a certain transportation method, an optimal path can be found by using a variety of different variable analysis , To reduce fuel costs , The purpose of improving efficiency .
59、 Sentiment analysis (Sentiment Analysis)
Through the algorithm, we can analyze how people treat some topics .
60、SQL
In a relational database , A programming language used to retrieve data .
61、 Timing analysis (Time series analysis)
Analyze well-defined data obtained during repeated measurement time . The data analyzed must be well-defined , And it should be taken from continuous time points at the same time interval .
62、 Text mining (Text Mining)
Analysis of data containing natural language . Statistical calculation of words and phrases in the source data , In order to express the text structure in mathematical terms , Then we use traditional data mining technology to analyze the text structure .
63、 visualization (Visualization)
Only the right Visualization , Raw data can be put into use . there “ visualization ” Not an ordinary pattern or pie chart , Visualization refers to complex diagrams , The chart contains a lot of data information , But it can be easily understood and read .
64、 instrument panel (Dashboard)
Use algorithms to analyze data , And the results are displayed in the dashboard in the form of chart .
All rivers and mountains are always in love , Order one OK? .
Recommended reading
NO.1
Previous recommendation
Historical articles
【 Hard core dry goods 】Pandas Data type conversion in modules
Pandas The value is divided into boxes 4 Methods
use Python among Plotly.Express The module draws several charts , I was really amazed !!
Hands teach you how to get started Python Medium Web Development framework , Dry cargo is full. !!
··· END ···
Share 、 Collection 、 give the thumbs-up 、 I'm looking at the arrangement ?




边栏推荐
- 专家建议|8大措施加速你的创新职业规划和成长
- shell 运算符
- 删库跑路、“投毒”、改协议,开源有哪几大红线千万不能踩?
- Gorm---- Association query
- Solving error: creating window glfw error: glew initialization error: missing GL version
- Simple realization of mine sweeping
- Network remote access using raspberry pie
- Where can the brightness of win7 display screen be adjusted
- turtlebot+lms111+gmapping实践
- 一次性总结:64个数据分析常用术语!
猜你喜欢

‘nvidia-smi‘ 不是内部或外部命令,也不是可运行的程序或批处理文件

Cesium learning notes

How does hash eagle, the founder of equity NFT, redefine NFT and use equity to enable long-term value?

Realization of neural networks with numpy

英语中的九大词性与九大时态

When the input tag type is number, the input of E, e, -, + is blocked

Les neuf caractéristiques de la parole et les neuf temps en anglais

Getting started with shell variables

数据采集系统网关采集工厂效率

论文阅读:Graph Contrastive Learning with Augmentations
随机推荐
Today in history: Netease was founded; The first consumer electronics exhibition was held; The first webcast in the world
两种方法实现pycharm中代码回滚到指定版本(附带截图展示)
Rust, le meilleur choix pour un programmeur de démarrer une entreprise?
楼宇自动化专用BACnet网关BL103
电脑必须打开的设置
What if the CPU temperature of Dell computer is too high
Class usage and inheritance in ES6
国信证券股票开户是安全的吗?
Typescript and go --- essence
Golang project dependency management tool go vendor, go Mod
How to determine if a web worker has been created or closed
用NumPy实现神经网络(Mysteries of Neural Networks Part III)
Preventing overfitting of deep neural networks (mysteries of neural networks Part II)
完整详细的汇编实验报告
Network remote access using raspberry pie
Is qiniu regular? Is it safe to open a stock account?
Error1822 and error1824 are displayed in the database
Asp. Net webform exporting excel using npoi
解决报错:Creating window glfw ERROR: GLEW initalization error: Missing GL version
对白:推荐系统快速入门路线及各知识点总结