当前位置:网站首页>R language classification
R language classification
2022-07-03 10:23:00 【Small tear nevus of atobe】
R Problems encountered in language
wildcard
%*% Matrix multiplicationPCA Principal component analysis
#1 Import data
data(iris)# Import the built-in data set directly
head(iris)
#2 Centralize variables ( Subtract the mean value from each data ) And standardization ( And divide it by the standard deviation )
iris2=scale(iris[,1:4], center=T,scale=T)
head(iris2)
#3 Calculate the covariance matrix
cm1<-cor(iris2)
cm1
#4 Calculate the eigenvalue matrix , Get eigenvalues and eigenvectors
rs1<-eigen(cm1)
rs1
eigenvalues <- rs1$values
eigenvector2 <- as.matrix(rs1$vectors)
#5 Calculate the variance contribution of each variable
(Proportion_of_Variance <- eigenvalues/sum(eigenvalues))
(Cumulative_Proportion <- cumsum(Proportion_of_Variance))
# Drawing the gravel map
par(mar=c(6,6,2,2))
plot(rs1$values,type="b",
cex=2,
cex.lab=2,
cex.axis=2,
lty=2,
lwd=2,
xlab = "Principal components",
ylab="Eigenvalues")
# Calculate the principal component score
dt<-as.matrix(iris2)
PC <- dt %*% eigenvector2
colnames(PC) <- c("PC1","PC2","PC3","PC4")
head(PC)
# Combine principal component scores and category labels
iris3<-data.frame(PC,iris$V5)
head(iris3)
# Calculate the variance contribution value of the first two principal components
xlab<-paste0("PC1(",round(Proportion_of_Variance[1]*100,2),"%)")
ylab<-paste0("PC2(",round(Proportion_of_Variance[2]*100,2),"%)")
# Draw the category matrix
p1<-ggplot(data = iris3,aes(x=PC1,y=PC2,color=iris3[,5]))+
stat_ellipse(aes(fill=iris3[,5]),
type ="norm", geom ="polygon",alpha=0.2,color=NA)+
geom_point()+labs(x=xlab,y=ylab,color="")+
guides(fill=F)
p1
- LDA discriminant analysis
And text mining LDA distinguish , Classification of LDA Model refers to projecting data into a discriminant equation , Make the inter class data variance as large as possible , The variance of intra class data should be as small as possible . The maximum number of discriminant equations is min( Number of categories of labels -1, The amount of data that needs to be predicted ), At the same time of classification, dimensionality reduction can also be achieved , The discriminant equation is the new dimension .
And the previous pca The difference in method is ,pca It is to eliminate variables that have little impact on category labels , And no category labels are required .
#LDA model
f <- paste(names(train_raw.df)[5], "~", paste(names(train_raw.df)[-5], collapse=" + "))# Build regression equation
iris_raw.lda <- lda(as.formula(paste(f)), data = train_raw.df)
iris_raw.lda.predict <- predict(iris_raw.lda, newdata = test_raw.df)
# Use LDA To make predictions
pred_y<-iris_raw.lda.predict$class
# draw LDA Prediction chart
ldaPreds <- iris_raw.lda.predict$x
head(ldaPreds)
test_raw.df %>%
mutate(LD1 = ldaPreds[, 1],
LD2 = ldaPreds[, 2]) %>%
ggplot(aes(LD1, LD2, col = species)) +
geom_point() +
stat_ellipse() +
theme_bw()
# According to the prediction , Calculate the prediction accuracy
t = table(pred_y,test_y)
acc1 = sum(diag(t))/nrow(test_x) *100
print(paste(" The accuracy of model prediction is :",round(acc1,4),'%',sep=''))
#ROC
lda_pre2 = predict(iris_raw.lda,test_raw.df,type = "prob")
roc_lda=multiclass.roc(test_y,lda_pre2$posterior)
auc(roc_lda)
- Decision tree
There are two kinds of regression tree and classification tree , call R In language rpart package . If it is a classification tree, you need to quantify the category label to identify , Subsequent calls predict Function prediction can be selected type yes prob still class, Then we can get the data of posterior probability , Calculation ROC Curve data .
#Decesion Tree
library(rpart)
library(rpart.plot)
library(caret)
train_raw.df$species <- factor(train_raw.df$species)# Category label vectorization
tree = rpart(species ~ .,data = train_raw.df)# Classification tree model
summary(tree)
rpart.plot(tree,type = 2)# Draw decision tree
tree_pre1 = predict(tree,test_raw.df) # Prediction accuracy
t2 = table(tree_pre1,test_y)
acc2 = sum(diag(t2))/nrow(test_x) *100
print(paste(" The accuracy of model prediction is :",round(acc2,4),'%',sep=''))
tree_pre2 = predict(tree,test_raw.df,type = "prob")
roc_tree=multiclass.roc(test_y, tree_pre2$posterior)
auc(roc_tree)
- Classification algorithm evaluation index
Multi classification algorithm calculation ROC(Receiver Operating characteristic Curve) Curve should call pROC Bag multiclass.roc function , Two classification algorithm can be used directly roc function . The multi classification algorithm uses computation roc Generally, you can only get AUC(Multi-class area under the curve) Value , namely ROC The area under the curve enclosed by the coordinate axis ,AUC The value range of is 0.5 and 1 Between .AUC The closer the 1.0, The better the prediction effect of the model ; be equal to 0.5 when , The authenticity is the lowest , No application value . If you want to draw ROC The graph should specify two classification labels .
边栏推荐
- Replace the files under the folder with sed
- 20220602数学:Excel表列序号
- Leetcode-513:找树的左下角值
- [C question set] of Ⅵ
- 20220610其他:任务调度器
- Leetcode - 933 number of recent requests
- LeetCode - 715. Range 模块(TreeSet) *****
- 20220608 other: evaluation of inverse Polish expression
- What can I do to exit the current operation and confirm it twice?
- Policy gradient Method of Deep Reinforcement learning (Part One)
猜你喜欢

QT is a method of batch modifying the style of a certain type of control after naming the control

Pytorch ADDA code learning notes

Anaconda安装包 报错packagesNotFoundError: The following packages are not available from current channels:

ECMAScript--》 ES6语法规范 ## Day1

LeetCode - 703 数据流中的第 K 大元素(设计 - 优先队列)

LeetCode - 715. Range module (TreeSet)*****

LeetCode - 715. Range 模块(TreeSet) *****

CV learning notes ransca & image similarity comparison hash

CV learning notes alexnet

openCV+dlib實現給蒙娜麗莎換臉
随机推荐
What can I do to exit the current operation and confirm it twice?
Matplotlib drawing
3.2 Off-Policy Monte Carlo Methods & case study: Blackjack of off-Policy Evaluation
openCV+dlib实现给蒙娜丽莎换脸
Leetcode - 1172 plate stack (Design - list + small top pile + stack))
Leetcode-100:相同的树
4.1 Temporal Differential of one step
CV learning notes ransca & image similarity comparison hash
What did I read in order to understand the to do list
CV learning notes - reasoning and training
Step 1: teach you to trace the IP address of [phishing email]
Markdown latex full quantifier and existential quantifier (for all, existential)
Hands on deep learning pytorch version exercise solution - 2.3 linear algebra
One click generate traffic password (exaggerated advertisement title)
CV learning notes - deep learning
Opencv histogram equalization
重写波士顿房价预测任务(使用飞桨paddlepaddle)
Discrete-event system
一步教你溯源【钓鱼邮件】的IP地址
Hands on deep learning pytorch version exercise answer - 2.2 preliminary knowledge / data preprocessing