当前位置:网站首页>R language classification
R language classification
2022-07-03 10:23:00 【Small tear nevus of atobe】
R Problems encountered in language
wildcard
%*% Matrix multiplicationPCA Principal component analysis
#1 Import data
data(iris)# Import the built-in data set directly
head(iris)
#2 Centralize variables ( Subtract the mean value from each data ) And standardization ( And divide it by the standard deviation )
iris2=scale(iris[,1:4], center=T,scale=T)
head(iris2)
#3 Calculate the covariance matrix
cm1<-cor(iris2)
cm1
#4 Calculate the eigenvalue matrix , Get eigenvalues and eigenvectors
rs1<-eigen(cm1)
rs1
eigenvalues <- rs1$values
eigenvector2 <- as.matrix(rs1$vectors)
#5 Calculate the variance contribution of each variable
(Proportion_of_Variance <- eigenvalues/sum(eigenvalues))
(Cumulative_Proportion <- cumsum(Proportion_of_Variance))
# Drawing the gravel map
par(mar=c(6,6,2,2))
plot(rs1$values,type="b",
cex=2,
cex.lab=2,
cex.axis=2,
lty=2,
lwd=2,
xlab = "Principal components",
ylab="Eigenvalues")
# Calculate the principal component score
dt<-as.matrix(iris2)
PC <- dt %*% eigenvector2
colnames(PC) <- c("PC1","PC2","PC3","PC4")
head(PC)
# Combine principal component scores and category labels
iris3<-data.frame(PC,iris$V5)
head(iris3)
# Calculate the variance contribution value of the first two principal components
xlab<-paste0("PC1(",round(Proportion_of_Variance[1]*100,2),"%)")
ylab<-paste0("PC2(",round(Proportion_of_Variance[2]*100,2),"%)")
# Draw the category matrix
p1<-ggplot(data = iris3,aes(x=PC1,y=PC2,color=iris3[,5]))+
stat_ellipse(aes(fill=iris3[,5]),
type ="norm", geom ="polygon",alpha=0.2,color=NA)+
geom_point()+labs(x=xlab,y=ylab,color="")+
guides(fill=F)
p1
- LDA discriminant analysis
And text mining LDA distinguish , Classification of LDA Model refers to projecting data into a discriminant equation , Make the inter class data variance as large as possible , The variance of intra class data should be as small as possible . The maximum number of discriminant equations is min( Number of categories of labels -1, The amount of data that needs to be predicted ), At the same time of classification, dimensionality reduction can also be achieved , The discriminant equation is the new dimension .
And the previous pca The difference in method is ,pca It is to eliminate variables that have little impact on category labels , And no category labels are required .
#LDA model
f <- paste(names(train_raw.df)[5], "~", paste(names(train_raw.df)[-5], collapse=" + "))# Build regression equation
iris_raw.lda <- lda(as.formula(paste(f)), data = train_raw.df)
iris_raw.lda.predict <- predict(iris_raw.lda, newdata = test_raw.df)
# Use LDA To make predictions
pred_y<-iris_raw.lda.predict$class
# draw LDA Prediction chart
ldaPreds <- iris_raw.lda.predict$x
head(ldaPreds)
test_raw.df %>%
mutate(LD1 = ldaPreds[, 1],
LD2 = ldaPreds[, 2]) %>%
ggplot(aes(LD1, LD2, col = species)) +
geom_point() +
stat_ellipse() +
theme_bw()
# According to the prediction , Calculate the prediction accuracy
t = table(pred_y,test_y)
acc1 = sum(diag(t))/nrow(test_x) *100
print(paste(" The accuracy of model prediction is :",round(acc1,4),'%',sep=''))
#ROC
lda_pre2 = predict(iris_raw.lda,test_raw.df,type = "prob")
roc_lda=multiclass.roc(test_y,lda_pre2$posterior)
auc(roc_lda)
- Decision tree
There are two kinds of regression tree and classification tree , call R In language rpart package . If it is a classification tree, you need to quantify the category label to identify , Subsequent calls predict Function prediction can be selected type yes prob still class, Then we can get the data of posterior probability , Calculation ROC Curve data .
#Decesion Tree
library(rpart)
library(rpart.plot)
library(caret)
train_raw.df$species <- factor(train_raw.df$species)# Category label vectorization
tree = rpart(species ~ .,data = train_raw.df)# Classification tree model
summary(tree)
rpart.plot(tree,type = 2)# Draw decision tree
tree_pre1 = predict(tree,test_raw.df) # Prediction accuracy
t2 = table(tree_pre1,test_y)
acc2 = sum(diag(t2))/nrow(test_x) *100
print(paste(" The accuracy of model prediction is :",round(acc2,4),'%',sep=''))
tree_pre2 = predict(tree,test_raw.df,type = "prob")
roc_tree=multiclass.roc(test_y, tree_pre2$posterior)
auc(roc_tree)
- Classification algorithm evaluation index
Multi classification algorithm calculation ROC(Receiver Operating characteristic Curve) Curve should call pROC Bag multiclass.roc function , Two classification algorithm can be used directly roc function . The multi classification algorithm uses computation roc Generally, you can only get AUC(Multi-class area under the curve) Value , namely ROC The area under the curve enclosed by the coordinate axis ,AUC The value range of is 0.5 and 1 Between .AUC The closer the 1.0, The better the prediction effect of the model ; be equal to 0.5 when , The authenticity is the lowest , No application value . If you want to draw ROC The graph should specify two classification labels .
边栏推荐
- Tensorflow2.0 save model
- What useful materials have I learned from when installing QT
- CV learning notes - scale invariant feature transformation (SIFT)
- Opencv notes 17 template matching
- Notes - regular expressions
- Policy Gradient Methods of Deep Reinforcement Learning (Part Two)
- Anaconda installation package reported an error packagesnotfounderror: the following packages are not available from current channels:
- LeetCode - 895 最大频率栈(设计- 哈希表+优先队列 哈希表 + 栈) *
- [LZY learning notes dive into deep learning] 3.4 3.6 3.7 softmax principle and Implementation
- 2.1 Dynamic programming and case study: Jack‘s car rental
猜你喜欢
Step 1: teach you to trace the IP address of [phishing email]
Anaconda安装包 报错packagesNotFoundError: The following packages are not available from current channels:
4.1 Temporal Differential of one step
Flutter 退出当前操作二次确认怎么做才更优雅?
[C question set] of Ⅵ
Rewrite Boston house price forecast task (using paddlepaddlepaddle)
Connect Alibaba cloud servers in the form of key pairs
Leetcode - the k-th element in 703 data flow (design priority queue)
LeetCode - 5 最长回文子串
LeetCode - 508. Sum of subtree elements with the most occurrences (traversal of binary tree)
随机推荐
Opencv feature extraction - hog
Opencv Harris corner detection
CV learning notes - deep learning
One click generate traffic password (exaggerated advertisement title)
Hands on deep learning pytorch version exercise solution - 3.1 linear regression
2.1 Dynamic programming and case study: Jack‘s car rental
Opencv+dlib to change the face of Mona Lisa
Implementation of "quick start electronic" window dragging
Deep learning by Pytorch
Opencv image rotation
LeetCode - 900. RLE iterator
Dictionary tree prefix tree trie
20220607其他:两整数之和
CV learning notes - Stereo Vision (point cloud model, spin image, 3D reconstruction)
LeetCode - 715. Range module (TreeSet)*****
When the reference is assigned to auto
LeetCode - 895 最大频率栈(设计- 哈希表+优先队列 哈希表 + 栈) *
2018 y7000 upgrade hard disk + migrate and upgrade black apple
Leetcode-513: find the lower left corner value of the tree
LeetCode 面试题 17.20. 连续中值(大顶堆+小顶堆)