当前位置:网站首页>Fundamentals of machine learning - principal component analysis pca-16
Fundamentals of machine learning - principal component analysis pca-16
2022-07-28 12:52:00 【gemoumou】
Principal component analysis PCA(Principal Component Analysis)













PCA- A simple example
import numpy as np
import matplotlib.pyplot as plt
# Load data
data = np.genfromtxt("data.csv", delimiter=",")
x_data = data[:,0]
y_data = data[:,1]
plt.scatter(x_data,y_data)
plt.show()
print(x_data.shape)

# Data centric
def zeroMean(dataMat):
# Average by column , That is, the average of each feature
meanVal = np.mean(dataMat, axis=0)
newData = dataMat - meanVal
return newData, meanVal
newData,meanVal=zeroMean(data)
# np.cov Used to find the covariance matrix , Parameters rowvar=0 Explain that a row of data represents a sample
covMat = np.cov(newData, rowvar=0)
# Covariance matrix
covMat

# np.linalg.eig Find the eigenvalue and eigenvector of the matrix
eigVals, eigVects = np.linalg.eig(np.mat(covMat))
# The eigenvalue
eigVals

# Eigenvector
eigVects

# Sort the eigenvalues from small to large
eigValIndice = np.argsort(eigVals)
eigValIndice

top = 1
# maximal top Subscript of an eigenvalue
n_eigValIndice = eigValIndice[-1:-(top+1):-1]
n_eigValIndice

# maximal n The eigenvectors corresponding to the eigenvalues
n_eigVect = eigVects[:,n_eigValIndice]
n_eigVect

# Data in low dimensional feature space
lowDDataMat = newData*n_eigVect
lowDDataMat

# Using low latitude data to reconstruct data
reconMat = (lowDDataMat*n_eigVect.T) + meanVal
reconMat

# Load data
data = np.genfromtxt("data.csv", delimiter=",")
x_data = data[:,0]
y_data = data[:,1]
plt.scatter(x_data,y_data)
# Reconstructed data
x_data = np.array(reconMat)[:,0]
y_data = np.array(reconMat)[:,1]
plt.scatter(x_data,y_data,c='r')
plt.show()

Dimension reduction visualization of handwritten numeral recognition
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report,confusion_matrix
import numpy as np
import matplotlib.pyplot as plt
digits = load_digits()# Load data
x_data = digits.data # data
y_data = digits.target # label
x_train,x_test,y_train,y_test = train_test_split(x_data,y_data) # Split data 1/4 For test data ,3/4 For training data
x_data.shape

mlp = MLPClassifier(hidden_layer_sizes=(100,50) ,max_iter=500)
mlp.fit(x_train,y_train)

# Data centric
def zeroMean(dataMat):
# Average by column , That is, the average of each feature
meanVal = np.mean(dataMat, axis=0)
newData = dataMat - meanVal
return newData, meanVal
def pca(dataMat,top):
# Data centric
newData,meanVal=zeroMean(dataMat)
# np.cov Used to find the covariance matrix , Parameters rowvar=0 Explain that a row of data represents a sample
covMat = np.cov(newData, rowvar=0)
# np.linalg.eig Find the eigenvalue and eigenvector of the matrix
eigVals, eigVects = np.linalg.eig(np.mat(covMat))
# Sort the eigenvalues from small to large
eigValIndice = np.argsort(eigVals)
# maximal n Subscript of an eigenvalue
n_eigValIndice = eigValIndice[-1:-(top+1):-1]
# maximal n The eigenvectors corresponding to the eigenvalues
n_eigVect = eigVects[:,n_eigValIndice]
# Data in low dimensional feature space
lowDDataMat = newData*n_eigVect
# Using low latitude data to reconstruct data
reconMat = (lowDDataMat*n_eigVect.T) + meanVal
# Return the data of low dimensional feature space and reconstructed matrix
return lowDDataMat,reconMat
lowDDataMat,reconMat = pca(x_data,2)
# Reconstructed data
x = np.array(lowDDataMat)[:,0]
y = np.array(lowDDataMat)[:,1]
plt.scatter(x,y,c='r')
plt.show()

predictions = mlp.predict(x_data)
# Reconstructed data
x = np.array(lowDDataMat)[:,0]
y = np.array(lowDDataMat)[:,1]
plt.scatter(x,y,c=y_data)
plt.show()

lowDDataMat,reconMat = pca(x_data,3)
from mpl_toolkits.mplot3d import Axes3D
x = np.array(lowDDataMat)[:,0]
y = np.array(lowDDataMat)[:,1]
z = np.array(lowDDataMat)[:,2]
ax = plt.figure().add_subplot(111, projection = '3d')
ax.scatter(x, y, z, c = y_data, s = 10) # The point is a red triangle
plt.show()

sklearn- Handwritten numeral dimensionality reduction prediction
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report,confusion_matrix
from sklearn import decomposition
import matplotlib.pyplot as plt
digits = load_digits()# Load data
x_data = digits.data # data
y_data = digits.target # label
x_train,x_test,y_train,y_test = train_test_split(x_data,y_data) # Split data 1/4 For test data ,3/4 For training data
mlp = MLPClassifier(hidden_layer_sizes=(100,50) ,max_iter=500)
mlp.fit(x_train,y_train )

predictions = mlp.predict(x_test)
print(classification_report(predictions, y_test))
print(confusion_matrix(predictions, y_test))

pca = decomposition.PCA()
pca.fit(x_data)

# variance
pca.explained_variance_

# Proportion of variance
pca.explained_variance_ratio_

variance = []
for i in range(len(pca.explained_variance_ratio_)):
variance.append(sum(pca.explained_variance_ratio_[:i+1]))
plt.plot(range(1,len(pca.explained_variance_ratio_)+1), variance)
plt.show()

pca = decomposition.PCA(whiten=True,n_components=0.8)
pca.fit(x_data)

pca.explained_variance_ratio_

x_train_pca = pca.transform(x_train)
mlp = MLPClassifier(hidden_layer_sizes=(100,50) ,max_iter=500)
mlp.fit(x_train_pca,y_train )

x_test_pca = pca.transform(x_test)
predictions = mlp.predict(x_test_pca)
print(classification_report(predictions, y_test))
print(confusion_matrix(predictions, y_test))

边栏推荐
- mysql limit 分页优化
- Machine learning practice - neural network-21
- Uncover why devaxpress WinForms, an interface control, discards the popular maskbox property
- What is C generic, generic cache, generic constraint
- 区块反转(暑假每日一题 7)
- SuperMap itablet license module division
- 界面控件Telerik UI for WPF - 如何使用RadSpreadsheet记录或评论
- 01 pyechars 特性、版本、安装介绍
- The input string contains an array of numbers and non characters, such as a123x456. Take the consecutive numbers as an integer, store them in an array in turn, such as 123 in a[0], 456 in a[1], and ou
- C语言项目中使用json
猜你喜欢

Redis implements distributed locks

Distributed session solution

DART 三维辐射传输模型申请及下载

leetcode 1518. 换酒问题

Interface control telerik UI for WPF - how to use radspreadsheet to record or comment

The largest rectangle in leetcode84 histogram

机器学习实战-神经网络-21

机器学习基础-支持向量机 SVM-17

01 introduction to pyechars features, version and installation

Ccf201912-2 recycling station site selection
随机推荐
十三. 实战——常用依赖的作用
Force buckle 315 calculates the number of elements smaller than the current element on the right
Machine learning practice - integrated learning-23
遭受痛苦和创伤后的四种本真姿态 齐泽克
GMT installation and use
Four authentic postures after suffering and trauma, Zizek
Unity loads GLB model
LeetCode 42.接雨水
Leetcode394 string decoding
[half understood] zero value copy
Hongjiu fruit passed the hearing: five month operating profit of 900million Ali and China agricultural reclamation are shareholders
牛客网二叉树题解
机器学习基础-支持向量机 SVM-17
机器学习实战-逻辑回归-19
Fundamentals of machine learning - support vector machine svm-17
leetcode 376. Wiggle Subsequence
How can non-standard automation equipment enterprises do well in product quality management with the help of ERP system?
机器学习基础-贝叶斯分析-14
Merge sort
Leetcode 1518. wine change