当前位置:网站首页>Machine learning - principal component analysis (PCA)
Machine learning - principal component analysis (PCA)
2022-06-24 10:10:00 【Cpsu】
# Start by creating a random dataset with dependencies
import numpy as np
from matplotlib import pyplot as plt
from numpy import linalg
np.random.seed(2)
# Construct data set
x1=[i for i in np.arange(1,10,0.1)]
x2=[np.random.uniform(2,4)*i+np.random.randn() for i in x1]
plt.scatter(x1,x2)
#zeros Create one that conforms to shape The random matrix of , It's not all 0 matrix , It could also be random numbers
# Transform the data set into matrix form
x=np.zeros((90,2))
x[:,0]=np.array(x1)
x[:,1]=np.array(x2)
x.shape
#(90, 2)

The first step is centralization
#axis This parameter is used to select whether to calculate the average value in row direction or column direction
data_array=x
mean_array=np.mean(data_array,axis=0)
center_array=data_array-mean_array
# Or use subtract
center_array=np.subtract(data_array,np.mean(data_array,axis=0) )
The second step is to calculate the covariance matrix and eigenvalue 、 Eigenvector
#rowvar The parameter is to select whether the behavior is a sample or listed as a sample
cov_array=np.cov(center_array,rowvar=False)
eig_vals, eig_vects = linalg.eig(cov_array)
""" # The eigenvalue (array([ 1.23589914, 80.95385223]), # Eigenvector array([[-0.96430755, -0.26478471], [ 0.26478471, -0.96430755]])) Wherein, characteristic value 1.23589914 The corresponding eigenvector is array([-0.96430755,0.26478471]) """
# Here should be selected before K The largest eigenvalue is the principal component
# It is convenient to understand the algorithm. All eigenvalues are selected here
# Get the index of characteristic value sorting
val_index=np.argsort(eig_vals)
# The reverse
val_index=val_index[::-1]
# Select the corresponding eigenvector
eig_vect=eig_vects [:,val_index]
# Here we choose the first principal component matrix
np.dot(center_array, eig_vect)[:,0]

call sklearn Module for verification
from sklearn.decomposition import PCA
data_mat = x
pca = PCA(n_components=1)
pca.fit(data_mat)
x_p=pca.fit(data_mat).transform(data_mat)
x_p
# The results are consistent
边栏推荐
- Which of the top ten securities companies has the lowest Commission and is the safest and most reliable? Do you know anything
- 静态链接库和动态链接库的区别
- canvas掉落的小球重力js特效动画
- 正规方程、、、
- 一群骷髅在飞canvas动画js特效
- MySQL data advanced
- How to improve the efficiency of network infrastructure troubleshooting and bid farewell to data blackouts?
- Engine localization adaptation & Reconstruction notes
- How to solve multi-channel customer communication problems in independent stations? This cross-border e-commerce plug-in must be known!
- 413-二叉树基础
猜你喜欢
随机推荐
学习使用KindEditor富文本编辑器,点击上传图片遮罩太大或白屏解决方案
2021-08-17
Which of the top ten securities companies has the lowest Commission and is the safest and most reliable? Do you know anything
微信小程序rich-text图片宽高自适应的方法介绍(rich-text富文本)
2022-06-23: given a nonnegative array, select any number to make the maximum cumulative sum a multiple of 7, and return the maximum cumulative sum. N is larger, to the 5th power of 10. From meituan. 3
PHP file lock
Practical analysis: implementation principle of APP scanning code landing (app+ detailed logic on the web side) with source code
SQL Server AVG函数取整问题
机器学习——主成分分析(PCA)
Thinkphp5 clear the cache cache, temp cache and log cache under runtime
What are the characteristics of EDI local deployment and cloud hosting solutions?
414-二叉树的递归遍历
机器学习——感知机及K近邻
利用pandas读取SQL Sever数据表
静态链接库和动态链接库的区别
vim的使用
PHP uses recursive and non recursive methods to create multi-level folders
How large and medium-sized enterprises build their own monitoring system
JCIM|药物发现中基于AI的蛋白质结构预测:影响和挑战
Yolov6: the fast and accurate target detection framework is open source









