当前位置:网站首页>Feature scaling for machine learning
Feature scaling for machine learning
2022-06-21 06:54:00 【One question per day】
The reason for data normalization
As shown in the figure below :
The distance between samples is dominated by the time of discovery . This will make the data inaccurate , So our solution is to map all the data to the same size , Next we introduce two methods .
Maximum normalization (normalization)
Definition : Map all data to 0-1 Between .
notes : It is applicable to the case where the distribution has obvious boundary , But by outlier Greater impact .
Code :
import numpy as np
import matplotlib.pyplot as plt
X = np.random.randint(0, 100, (50, 2))
X = np.array(X, dtype = float)
X[:,0] = (X[:,0] - np.min(X[:,0])) / (np.max(X[:,0]) - np.min(X[:,0]))
X[:,1] = (X[:,1] - np.min(X[:,1])) / (np.max(X[:,1]) - np.min(X[:,1]))
notes : because numpy Only one data type can exist , So we need to convert the data type to float, Otherwise, the normalized data cannot be displayed correctly .
Then we plot the normalized data :
plt.scatter(X[:,0], X[:,1])
plt.show()

It is obvious that all the data are in 0-1 Between .
The mean variance is normalized (standardization)
Definition : Put all the data together and put it on the average 0 The variance of 1 The distribution of ](/img/ee/a84580065ad36881a5d51104a68fc2.png)
Applicable conditions : The data distribution has no obvious boundary ; There may be extreme data values .
Code :
X2 = np.random.randint(0, 100, (50, 2))
X2 = np.array(X, dtype = float)
X2[:,0] = (X2[:,0] - np.mean(X2[:,0])) / np.std(X2[:,0])
X2[:,1] = (X2[:,1] - np.mean(X2[:,1])) / np.std(X2[:,1])
Then we can draw :
plt.scatter(X2[:,0],X2[:,1])
plt.show()

It can be seen that the data is not 0-1 Between .
Next, let's look at his mean and standard deviation :
It is clear that the mean value is almost equal to 0, And the standard deviation is 1.
边栏推荐
- 工作那点事
- Argo CD usage
- 154 Solana distribution token
- Tweenmax oscilloscope 3D animation
- [graduation season] a brief talk on the learning experience before sophomore year
- 【JDBC從入門到實戰】JDBC基礎通關教程(全面總結上篇)
- 0-1 knapsack problem (violent recursion / dynamic programming)
- Markdown mathematical grammar [detailed summary]
- Issue 7: roll inside and lie flat. How do you choose
- How to access MySQL database through JDBC? Hand to hand login interface (illustration + complete code)
猜你喜欢

(各种规律数的编程练习)输出范围内的素数,一个整数的分解质因数,两个数的最大公约数和最小公倍数以及水仙花数和完数等等

PyG教程(5):剖析GNN中的消息传播机制

动态规划习题(二)
The origin of Butler Volmer formula

【JDBC从入门到实战】JDBC基础通关教程(全面总结上篇)

Recursively establish a chained binary tree, complete the traversal of the first, middle and last order and other functions (with source code)

TweenMax示波器3d动画

Trick or treat SVG Halloween JS special effect

156 rust and Solana environment configuration

Innovation project training: data analysis and visualization
随机推荐
WordPress实现左边栏显示文章目录
MySQL使用什么作为主键比较好
scikit-learn中的Scaler
超参数和模型参数
Old users come back and have a look
Trick or treat SVG Halloween JS special effect
flutter jpush
使用Loupe Cell Browser查看10X单细胞转录组分析结果
海明码校验【简单详细】
Filtre Bloom
关于#mysql#的问题,如何解决?
TweenMax不规则几何图形背景带动画js特效
My college entrance examination experience and summary
如何通过JDBC访问MySQL数据库?手把手实现登录界面(图解+完整代码)
麦克风loading动画效果
[MySQL] database function clearance Tutorial Part I (aggregation, mathematics, string, date, control flow function)
[JDBC from introduction to actual combat] JDBC basic customs clearance tutorial (comprehensive summary part I)
Butler-Volmer 公式的由来
Consistency between database and cache data
Pyg tutorial (5): analyzing the message propagation mechanism in GNN