当前位置:网站首页>Fundamentals of machine learning - principal component analysis pca-16
Fundamentals of machine learning - principal component analysis pca-16
2022-07-28 12:52:00 【gemoumou】
Principal component analysis PCA(Principal Component Analysis)













PCA- A simple example
import numpy as np
import matplotlib.pyplot as plt
# Load data
data = np.genfromtxt("data.csv", delimiter=",")
x_data = data[:,0]
y_data = data[:,1]
plt.scatter(x_data,y_data)
plt.show()
print(x_data.shape)

# Data centric
def zeroMean(dataMat):
# Average by column , That is, the average of each feature
meanVal = np.mean(dataMat, axis=0)
newData = dataMat - meanVal
return newData, meanVal
newData,meanVal=zeroMean(data)
# np.cov Used to find the covariance matrix , Parameters rowvar=0 Explain that a row of data represents a sample
covMat = np.cov(newData, rowvar=0)
# Covariance matrix
covMat

# np.linalg.eig Find the eigenvalue and eigenvector of the matrix
eigVals, eigVects = np.linalg.eig(np.mat(covMat))
# The eigenvalue
eigVals

# Eigenvector
eigVects

# Sort the eigenvalues from small to large
eigValIndice = np.argsort(eigVals)
eigValIndice

top = 1
# maximal top Subscript of an eigenvalue
n_eigValIndice = eigValIndice[-1:-(top+1):-1]
n_eigValIndice

# maximal n The eigenvectors corresponding to the eigenvalues
n_eigVect = eigVects[:,n_eigValIndice]
n_eigVect

# Data in low dimensional feature space
lowDDataMat = newData*n_eigVect
lowDDataMat

# Using low latitude data to reconstruct data
reconMat = (lowDDataMat*n_eigVect.T) + meanVal
reconMat

# Load data
data = np.genfromtxt("data.csv", delimiter=",")
x_data = data[:,0]
y_data = data[:,1]
plt.scatter(x_data,y_data)
# Reconstructed data
x_data = np.array(reconMat)[:,0]
y_data = np.array(reconMat)[:,1]
plt.scatter(x_data,y_data,c='r')
plt.show()

Dimension reduction visualization of handwritten numeral recognition
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report,confusion_matrix
import numpy as np
import matplotlib.pyplot as plt
digits = load_digits()# Load data
x_data = digits.data # data
y_data = digits.target # label
x_train,x_test,y_train,y_test = train_test_split(x_data,y_data) # Split data 1/4 For test data ,3/4 For training data
x_data.shape

mlp = MLPClassifier(hidden_layer_sizes=(100,50) ,max_iter=500)
mlp.fit(x_train,y_train)

# Data centric
def zeroMean(dataMat):
# Average by column , That is, the average of each feature
meanVal = np.mean(dataMat, axis=0)
newData = dataMat - meanVal
return newData, meanVal
def pca(dataMat,top):
# Data centric
newData,meanVal=zeroMean(dataMat)
# np.cov Used to find the covariance matrix , Parameters rowvar=0 Explain that a row of data represents a sample
covMat = np.cov(newData, rowvar=0)
# np.linalg.eig Find the eigenvalue and eigenvector of the matrix
eigVals, eigVects = np.linalg.eig(np.mat(covMat))
# Sort the eigenvalues from small to large
eigValIndice = np.argsort(eigVals)
# maximal n Subscript of an eigenvalue
n_eigValIndice = eigValIndice[-1:-(top+1):-1]
# maximal n The eigenvectors corresponding to the eigenvalues
n_eigVect = eigVects[:,n_eigValIndice]
# Data in low dimensional feature space
lowDDataMat = newData*n_eigVect
# Using low latitude data to reconstruct data
reconMat = (lowDDataMat*n_eigVect.T) + meanVal
# Return the data of low dimensional feature space and reconstructed matrix
return lowDDataMat,reconMat
lowDDataMat,reconMat = pca(x_data,2)
# Reconstructed data
x = np.array(lowDDataMat)[:,0]
y = np.array(lowDDataMat)[:,1]
plt.scatter(x,y,c='r')
plt.show()

predictions = mlp.predict(x_data)
# Reconstructed data
x = np.array(lowDDataMat)[:,0]
y = np.array(lowDDataMat)[:,1]
plt.scatter(x,y,c=y_data)
plt.show()

lowDDataMat,reconMat = pca(x_data,3)
from mpl_toolkits.mplot3d import Axes3D
x = np.array(lowDDataMat)[:,0]
y = np.array(lowDDataMat)[:,1]
z = np.array(lowDDataMat)[:,2]
ax = plt.figure().add_subplot(111, projection = '3d')
ax.scatter(x, y, z, c = y_data, s = 10) # The point is a red triangle
plt.show()

sklearn- Handwritten numeral dimensionality reduction prediction
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report,confusion_matrix
from sklearn import decomposition
import matplotlib.pyplot as plt
digits = load_digits()# Load data
x_data = digits.data # data
y_data = digits.target # label
x_train,x_test,y_train,y_test = train_test_split(x_data,y_data) # Split data 1/4 For test data ,3/4 For training data
mlp = MLPClassifier(hidden_layer_sizes=(100,50) ,max_iter=500)
mlp.fit(x_train,y_train )

predictions = mlp.predict(x_test)
print(classification_report(predictions, y_test))
print(confusion_matrix(predictions, y_test))

pca = decomposition.PCA()
pca.fit(x_data)

# variance
pca.explained_variance_

# Proportion of variance
pca.explained_variance_ratio_

variance = []
for i in range(len(pca.explained_variance_ratio_)):
variance.append(sum(pca.explained_variance_ratio_[:i+1]))
plt.plot(range(1,len(pca.explained_variance_ratio_)+1), variance)
plt.show()

pca = decomposition.PCA(whiten=True,n_components=0.8)
pca.fit(x_data)

pca.explained_variance_ratio_

x_train_pca = pca.transform(x_train)
mlp = MLPClassifier(hidden_layer_sizes=(100,50) ,max_iter=500)
mlp.fit(x_train_pca,y_train )

x_test_pca = pca.transform(x_test)
predictions = mlp.predict(x_test_pca)
print(classification_report(predictions, y_test))
print(confusion_matrix(predictions, y_test))

边栏推荐
- Ccf201912-2 recycling station site selection
- LeetCode394 字符串解码
- 第九章 REST 服务安全
- Is it overtime to be on duty? Take up legal weapons to protect your legitimate rights and interests. It's time to rectify the working environment
- 揭秘界面控件DevExpress WinForms为何弃用受关注的MaskBox属性
- New progress in the implementation of the industry | the openatom openharmony sub forum of the 2022 open atom global open source summit was successfully held
- SuperMap iclient3d for webgl to realize floating thermal map
- Detailed explanation of the usage of C # static
- AI制药的数据之困,分子建模能解吗?
- Unity installs the device simulator
猜你喜欢

新零售电商O2O模式解析

Aopmai biological has passed the registration: the half year revenue is 147million, and Guoshou Chengda and Dachen are shareholders

05 pyechars 基本图表(示例代码+效果图)

FlexPro软件:生产、研究和开发中的测量数据分析

New progress in the implementation of the industry | the openatom openharmony sub forum of the 2022 open atom global open source summit was successfully held

Hongjiu fruit passed the hearing: five month operating profit of 900million Ali and China agricultural reclamation are shareholders

leetcode:704二分查找

洪九果品通过聆讯:5个月经营利润9亿 阿里与中国农垦是股东

Review the IO stream again, and have an in-depth understanding of serialization and deserialization

DART 三维辐射传输模型申请及下载
随机推荐
Sliding Window
西门子对接Leuze BPS_304i 笔记
Leetcode: array
揭秘界面控件DevExpress WinForms为何弃用受关注的MaskBox属性
How can non-standard automation equipment enterprises do well in product quality management with the help of ERP system?
LeetCode 42.接雨水
The input string contains an array of numbers and non characters, such as a123x456. Take the consecutive numbers as an integer, store them in an array in turn, such as 123 in a[0], 456 in a[1], and ou
通过Jenkins 拉取服务器代码 权限不足问题及其他注意事项
上位机和三菱FN2x通信实例
The openatom openharmony sub forum was successfully held, and ecological and industrial development entered a new journey
FlexPro软件:生产、研究和开发中的测量数据分析
Vs code is not in its original position after being updated
MySQL is always installed unsuccessfully. Just do it like this
Machine learning practice - logistic regression-19
MySQL总是安装不成功,这样处理就好啦
SuperMap arsurvey license module division
Cloud native - runtime environment
与元素类型 “item” 相关联的 “name” 属性值不能包含'<” 字符解决办法
01 pyechars 特性、版本、安装介绍
Review the IO stream again, and have an in-depth understanding of serialization and deserialization