当前位置:网站首页>Machine learning 07: Interpretation of PCA and its sklearn source code
Machine learning 07: Interpretation of PCA and its sklearn source code
2022-06-26 05:46:00 【SP FA】
List of articles
summary
Principal Component Analysis That is, the main idea of principal component analysis is to n Dimension features map to k D on , this k Dimension is a new orthogonal feature, also known as the main component , It's in the original n Based on the characteristics of dimension, we reconstruct k Whitman's sign .PCA My job is to find a set of mutually orthogonal coordinate axes from the original space , among , The first new axis selection is the direction with the largest variance in the original data , The second new coordinate axis is selected in the plane orthogonal to the first coordinate axis to maximize the variance , The third axis is the same as 1,2 The one with the largest variance in the plane with orthogonal axes , By analogy , obtain n One of these axes . The new coordinate axis obtained in this way , Most of the variance is included in the front k In two axes , The variance of the following coordinate axis is almost 0 . therefore , We can keep only the former k dimension , So as to reduce the dimension of data features .
PCA There are two ways to calculate :
- Decompose covariance matrix : The specific implementation process mainly includes three steps :
- Calculate the covariance matrix of the data matrix
- Calculate the eigenvalue eigenvector of the covariance matrix
- Select the maximum eigenvalue ( That is, the variance is the largest ) Of k A matrix composed of eigenvectors corresponding to features
- Decompose the inner product matrix
Decompose covariance matrix
Covariance matrix
Covariance is used to describe the correlation between two random variables . Simply compare variance and covariance :
mean value : x ˉ = 1 n ∑ i = 1 N x ⃗ i \bar x=\frac1n\sum\limits^N_{i=1}\vec x_i xˉ=n1i=1∑Nxi
Sample variance : S 2 = 1 n − 1 ∑ i = 1 N ( x ˉ − x ⃗ i ) 2 S^2=\frac1{n-1}\sum\limits^N_{i=1}(\bar x-\vec x_i)^2 S2=n−11i=1∑N(xˉ−xi)2
sample X X X And the sample Y Y Y The covariance : C o v ( X , Y ) = 1 n − 1 ∑ i = 1 N ( x ˉ − x ⃗ i ) ( y ˉ − y ⃗ i ) Cov(X,Y)=\frac1{n-1}\sum\limits^N_{i=1}(\bar x-\vec x_i)(\bar y-\vec y_i) Cov(X,Y)=n−11i=1∑N(xˉ−xi)(yˉ−yi)
We can find out : The variance is calculated for one-dimensional characteristics , Covariance requires at least two dimensions . Variance is a special case of covariance . When the covariance is positive , explain X , Y X,Y X,Y It's a positive correlation , When it is negative , It means a negative correlation , by 0 It means that the two samples are independent of each other . When there are multiple samples , The covariance between two sets constitutes the covariance matrix .
Covariance matrix and divergence matrix are closely related , From the definition of divergence matrix : S = ∑ i = 1 N ( x ⃗ i − x ˉ ) ( x ⃗ i − x ˉ ) T S=\sum\limits^N_{i=1}(\vec x_i-\bar x)(\vec x_i-\bar x)^T S=i=1∑N(xi−xˉ)(xi−xˉ)T We found that , For a set of data X X X: S = ( n − 1 ) C o v ( X ) S=(n-1)Cov(X) S=(n−1)Cov(X), So they have the same eigenvalues and eigenvectors .
Eigenvalue decomposition and SVD
For the detailed process, please refer to another article of mine : Eigendecomposition and singular value decomposition
To calculate the n After eigenvectors , We sort them , And take the front k Just one .
Decompose the inner product matrix
stay sklearn In the library PCA Is calculated using this method .
hypothesis X X X A matrix is a sample set , It has been centralized (PCA Data centralization is required ), So the covariance matrix is : 1 n X X T \frac1nXX^T n1XXT
We are right. X X X Conduct SVD decompose : X = U Σ V T X=U\Sigma V^T X=UΣVT, be X T = V Σ U T X^T=V\Sigma U^T XT=VΣUT
So there is : 1 n X X T = U Σ 2 U T \frac1nXX^T=U\Sigma^2U^T n1XXT=UΣ2UT
therefore , We can X X X Conduct SVD decompose , Then seek Σ \Sigma Σ To calculate the eigenvector .
About source code
Spent the night reading sklearn Of Official code , Take a rough record . The source code is too long , So only the key parts are recorded .
Calculation PCA
adopt svd_solver Parameters , We can choose different ways to solve the data , But the basic idea is the same , Here is just svd_solver == 'full' That is, take the case of feature decomposition using all data as an example .
fit() Method implementation :
among ,explained_variance_ Is the variance value ,explained_variance_ratio_ Contribution rate for each equation , That is, proportion , This parameter can be used to determine n Specific value .
Be careful ,n_components Parameters can take several values :
- “mle”: Automatically determine n value .
- 0 < n_components < 1 0<\text{n\_components}<1 0<n_components<1: Indicates the proportion of information you want to keep .PCA Will automatically select to make The amount of information ≥ n_components \ge\text{n\_components} ≥n_components Number of features .
- n_components ≥ 1 \text{n\_components} \ge1 n_components≥1 And it's an integer : Before the election n individual
def _fit_full(self, X, n_components):
n_samples, n_features = X.shape
... # Validate the incoming data , Omitted for the time being
# Center data
self.mean_ = np.mean(X, axis=0)
X -= self.mean_
# Use the decomposition inner product matrix method to find the eigenvector
U, S, Vt = linalg.svd(X, full_matrices=False)
U, Vt = svd_flip(U, Vt) # Because sometimes it is possible to find negative values , So flip the eigenvector symbols to enforce deterministic output
components_ = Vt
explained_variance_ = (S**2) / (n_samples - 1)
total_var = explained_variance_.sum()
explained_variance_ratio_ = explained_variance_ / total_var
singular_values_ = S.copy() # Store the singular values.
# according to n_components Make sure to take several features , That is to say n Value .
if n_components == "mle":
n_components = _infer_dimension(explained_variance_, n_samples)
elif 0 < n_components < 1.0:
ratio_cumsum = stable_cumsum(explained_variance_ratio_)
n_components = np.searchsorted(ratio_cumsum, n_components, side="right") + 1
# Compute noise covariance using Probabilistic PCA model
# The sigma2 maximum likelihood (cf. eq. 12.46)
if n_components < min(n_features, n_samples):
self.noise_variance_ = explained_variance_[n_components:].mean()
else:
self.noise_variance_ = 0.0
... # Assign values to parameters , Omitted for the time being
return U, S, Vt
transform() Method implementation :
The whitening operation is used here , That is, to decorrelate the data , To reduce the redundancy of input data , Whitened data has two characteristics : Eliminate the correlation between features , And the variances of all features are 1.
def transform(self, X):
... # Verify the program and data , Let's ignore
if self.mean_ is not None:
X = X - self.mean_
X_transformed = np.dot(X, self.components_.T)
# An albino
if self.whiten:
X_transformed /= np.sqrt(self.explained_variance_)
return X_transformed
fit_transform() Method implementation :
The relevant formulas are carefully written in the notes by the author .
def fit_transform(self, X, y=None):
U, S, Vt = self._fit(X)
U = U[:, : self.n_components_]
if self.whiten:
# X_new = X * V / S * sqrt(n_samples) = U * sqrt(n_samples)
U *= sqrt(X.shape[0] - 1)
else:
# X_new = X * V = U * S * Vt * V = U * S
U *= S[: self.n_components_]
return U
Automatic selection n Value
Judge by calculating the log likelihood n How many to take , That is, keep several dimensions . Relevant papers can be seen here link
def _assess_dimension(spectrum, rank, n_samples):
"""Compute the log-likelihood of a rank ``rank`` dataset. The dataset is assumed to be embedded in gaussian noise of shape(n, dimf) having spectrum ``spectrum``. Parameters ---------- spectrum : ndarray of shape (n_features,) Data spectrum. rank : int Tested rank value. It should be strictly lower than n_features, otherwise the method isn't specified (division by zero in equation (31) from the paper). n_samples : int Number of samples. Returns ------- ll : float The log-likelihood. """
n_features = spectrum.shape[0]
if not 1 <= rank < n_features:
raise ValueError("the tested rank should be in [1, n_features - 1]")
eps = 1e-15
if spectrum[rank - 1] < eps:
# When the tested rank is associated with a small eigenvalue, there's
# no point in computing the log-likelihood: it's going to be very
# small and won't be the max anyway. Also, it can lead to numerical
# issues below when computing pa, in particular in log((spectrum[i] -
# spectrum[j]) because this will take the log of something very small.
return -np.inf
pu = -rank * log(2.0)
for i in range(1, rank + 1):
pu += (
gammaln((n_features - i + 1) / 2.0)
- log(np.pi) * (n_features - i + 1) / 2.0
)
pl = np.sum(np.log(spectrum[:rank]))
pl = -pl * n_samples / 2.0
v = max(eps, np.sum(spectrum[rank:]) / (n_features - rank))
pv = -np.log(v) * n_samples * (n_features - rank) / 2.0
m = n_features * rank - rank * (rank + 1.0) / 2.0
pp = log(2.0 * np.pi) * (m + rank) / 2.0
pa = 0.0
spectrum_ = spectrum.copy()
spectrum_[rank:n_features] = v
for i in range(rank):
for j in range(i + 1, len(spectrum)):
pa += log(
(spectrum[i] - spectrum[j]) * (1.0 / spectrum_[j] - 1.0 / spectrum_[i])
) + log(n_samples)
ll = pu + pl + pv + pp - pa / 2.0 - rank * log(n_samples) / 2.0
return ll
def _infer_dimension(spectrum, n_samples):
"""Infers the dimension of a dataset with a given spectrum. The returned value will be in [1, n_features - 1]. """
ll = np.empty_like(spectrum)
ll[0] = -np.inf # we don't want to return n_components = 0
for rank in range(1, spectrum.shape[0]):
ll[rank] = _assess_dimension(spectrum, rank, n_samples)
return ll.argmax()
Sample scores
The performance is reflected by calculating the log likelihood of each sample . Relevant papers can be seen here link
def get_covariance(self):
"""Compute data covariance with the generative model. ``cov = components_.T * S**2 * components_ + sigma2 * eye(n_features)`` where S**2 contains the explained variances, and sigma2 contains the noise variances. Returns ------- cov : array of shape=(n_features, n_features) Estimated covariance of data. """
components_ = self.components_
exp_var = self.explained_variance_
if self.whiten:
components_ = components_ * np.sqrt(exp_var[:, np.newaxis])
exp_var_diff = np.maximum(exp_var - self.noise_variance_, 0.0)
cov = np.dot(components_.T * exp_var_diff, components_)
cov.flat[:: len(cov) + 1] += self.noise_variance_ # modify diag inplace
return cov
def get_precision(self):
"""Compute data precision matrix with the generative model. Equals the inverse of the covariance but computed with the matrix inversion lemma for efficiency. Returns ------- precision : array, shape=(n_features, n_features) Estimated precision of data. """
n_features = self.components_.shape[1]
# handle corner cases first
if self.n_components_ == 0:
return np.eye(n_features) / self.noise_variance_
if np.isclose(self.noise_variance_, 0.0, atol=0.0):
return linalg.inv(self.get_covariance())
# Get precision using matrix inversion lemma
components_ = self.components_
exp_var = self.explained_variance_
if self.whiten:
components_ = components_ * np.sqrt(exp_var[:, np.newaxis])
exp_var_diff = np.maximum(exp_var - self.noise_variance_, 0.0)
precision = np.dot(components_, components_.T) / self.noise_variance_
precision.flat[:: len(precision) + 1] += 1.0 / exp_var_diff
precision = np.dot(components_.T, np.dot(linalg.inv(precision), components_))
precision /= -(self.noise_variance_**2)
precision.flat[:: len(precision) + 1] += 1.0 / self.noise_variance_
return precision
def score_samples(self, X):
"""Return the log-likelihood of each sample. Parameters ---------- X : array-like of shape (n_samples, n_features) The data. """
... # Verification of procedures and data , Let's ignore
Xr = X - self.mean_
n_features = X.shape[1]
precision = self.get_precision()
log_like = -0.5 * (Xr * (np.dot(Xr, precision))).sum(axis=1)
log_like -= 0.5 * (n_features * log(2.0 * np.pi) - fast_logdet(precision))
return log_like
def score(self, X, y=None):
"""Return the average log-likelihood of all samples."""
return np.mean(self.score_samples(X))
Sync update on :SP-FA The blog of
边栏推荐
- Some doubts about ARP deception experiment
- Using Jenkins to perform testng+selenium+jsup automated tests and generate extendreport test reports
- A new explanation of tcp/ip five layer protocol model
- The news of thunderbolt
- Redis discovery bloom filter
- June 3 is a happy day
- 最后一次飞翔
- 使用Jenkins执行TestNg+Selenium+Jsoup自动化测试和生成ExtentReport测试报告
- [MySQL] MySQL million level data paging query method and its optimization
- Posting - don't get lost in the ocean of Technology
猜你喜欢

Supplementary course on basic knowledge of IM development (II): how to design a server-side storage architecture for a large number of image files?

423- binary tree (110. balanced binary tree, 257. all paths of binary tree, 100. same tree, 404. sum of left leaves)

Consul服务注册与发现

pytorch(网络模型训练)

Kolla ansible deploy openstack Yoga version
![[activity recommendation] cloud native, industrial Internet, low code, Web3, metauniverse... Which is the architecture hot spot in 2022](/img/64/5b2aec7a26c64c104c86e200f83b2d.png)
[activity recommendation] cloud native, industrial Internet, low code, Web3, metauniverse... Which is the architecture hot spot in 2022

家庭记账程序(第一版)

使用Jenkins执行TestNg+Selenium+Jsoup自动化测试和生成ExtentReport测试报告

5 minutes to learn regular expressions

12 multithreading
随机推荐
Source code of findcontrol
June 3 is a happy day
小程序如何关联微信小程序二维码,实现二码聚合
Redis usage and memory optimization
About abstact and virtual
Ribbon负载均衡服务调用
Pytorch (network model)
定位设置水平,垂直居中(多种方法)
cross entropy loss = log softmax + nll loss
Overloading and overriding
Combined mode, transparent mode and secure mode
pytorch(网络模型)
机器学习 07:PCA 及其 sklearn 源码解读
Bingc (inheritance)
Day3 - variables and operators
The most refined language interprets the event dispatcher (also known as the event scheduler)
Leetcode114. Expand binary tree into linked list
一段不离不弃的爱情
状态模式,身随心变
Introduction to GUI programming to game practice (I)