当前位置:网站首页>Data standardization processing
Data standardization processing
2022-06-28 20:15:00 【Burger jingle】
1. Why should data standardization be carried out
In the multi index evaluation system , Due to the different nature of each evaluation index , It usually has different dimensions and orders of magnitude . When the level of each index varies greatly , If the original index value is directly used for analysis , It will highlight the role of indicators with higher values in comprehensive analysis , Relatively weaken the role of indicators with low numerical level . therefore , In order to ensure the reliability of the results , It is necessary to standardize the original index data .
2. What is data standardization
Scale the data , To fall into a small, specific area . It is often used in the processing of some comparison and evaluation indicators , Remove the unit limit of data , Convert it to dimensionless pure values , It is convenient to compare and weight indexes of different units or scales .
3. Data standardization methods
Common methods are :min-max Standardization (Min-max normalization),log Function conversion ,atan Function conversion ,z-score Standardization (zero-mena normalization, This method is most commonly used ), Fuzzy quantification . This article only introduces min-max Law ( Standardized methods ),z-score Law ( Normalization method ), Normalization method .
Method 1 : Standardized methods (min-max Law )
min-max Standardization (Min-maxnormalization) Also called dispersion standardization , It's a linear transformation of the original data , Make the result fall to [0,1] Section , The conversion function is as follows : among max Is the maximum value of the sample data ,min Is the minimum value of sample data . One drawback of this approach is that when new data is added , May lead to max and min The change of , Need to redefine .

characteristic : It's a linear transformation of the original data , The result falls into [0,1] Section
Method 2 : Normalization method (z-score Law )
The most common way to standardize is Z Standardization , It's also SPSS The most commonly used standardization method in :z-score Standardization (zero-meannormalization) It's also called standard deviation standardization , The processed data conform to the standard normal distribution , That is, the mean value is 0, The standard deviation is 1, The transformation function is : among μ Is the mean of all sample data ,σ Is the standard deviation of all sample data .

- This method is based on the mean value of the original data (mean) And standard deviation (standard deviation) Standardize data . take A Original value x Use z-score Standardize to x’.
- z-score Standardized methods apply to properties A When the maximum and minimum values of are unknown , Or there are outliers beyond the value range .
- spss The default standardization method is z-score Standardization .
- use Excel Conduct z-score Standardized methods : stay Excel There are no ready-made functions in , Step by step calculation is required , In fact, the standardized formula is very simple .
Steps are as follows :
1. Find out the variables ( indicators ) The arithmetic mean of ( Mathematical expectation )xi And standard deviation si ;
2. Standardized treatment :
zij=(xij-xi)/si
among :zij Is the normalized variable value ;xij Is the actual variable value .
3. Reverse the sign in front of the indicator .
The standardized variable values revolve around 0 Up and down , Greater than 0 Above average , Less than 0 It means below average .
Method 3 : Normalization method

边栏推荐
- 蓝桥杯 历届试题 蚂蚁感冒
- [go language questions] go from 0 to entry 5: comprehensive review of map, conditional sentences and circular sentences
- Echart: category text position adjustment of horizontal histogram
- R language GLM generalized linear model: logistic regression, Poisson regression fitting mouse clinical trial data (dose and response) examples and self-test questions
- ComparisonChain-文件名排序
- iterator中的next()为什么要强转?
- 怎么理解云原生数据库的易用性?
- resilience4j 重试源码分析以及重试指标采集
- C # connect to the database to complete the operation of adding, deleting, modifying and querying
- Quaternion quaternion and Euler angle transformation in ROS
猜你喜欢

Ali open source (easyexcel)

Rsync remote synchronization

严重性 代码 说明 项目 文件 行 禁止显示状态 错误 C1047 对象或库文件“.lib”是使用与其他对象(如“x64\Release\main.obj”)不同的

不同框架的绘制神经网络结构可视化

计网 | 一文解析TCP协议所有知识点

Leetcode 36. 有效的数独(可以,一次过)

【毕业季·进击的技术er】努力只能及格,拼命才能优秀!

Racher add / delete node

Are you still paying for your thesis? Come and join me

easypoi
随机推荐
【Go语言刷题篇】Go从0到入门5:Map综合复习、条件语句、循环语句练习
Real number operation
Software supply chain security risk guide for enterprise digitalization and it executives
Why does next() in iterator need to be forcibly converted?
easypoi
【学习笔记】主成分分析法介绍
Pipeline | and redirection >
TcWind 模式設定
30讲 线性代数 第四讲 线性方程组
2022焊工(初级)特种作业证考试题库及答案
resilience4j 重试源码分析以及重试指标采集
输入和输出字符型数据
Lecture 30 linear algebra Lecture 4 linear equations
压缩与解压缩命令
酷学院华少:如何在SaaS赛道里做成一家头部公司
关键字long
Win 10 create a gin framework project
2837. The total number of teams
1. 整合 Servlet
Grep text search tool