当前位置：网站首页>R language book learning 03 "in simple terms R language data analysis" - Chapter 10 association rules Chapter 11 random forest

R language book learning 03 "in simple terms R language data analysis" - Chapter 10 association rules Chapter 11 random forest

2022-06-11 21:52:00 【Deep bamboo breeze】

1. Association rules

1.1 summary

Association rules are one of the simple solutions to big data problems , It belongs to unsupervised learning algorithm . It is used to recognize patterns without any prior knowledge of patterns . It can be applied to personalized recommendation and other scenarios .

Association rules have two parts , The premise and the result . The prerequisite part is the items found in the data , The result is a combination of antecedents . By analyzing frequent if/then Pattern data and use support and confidence to identify the most important relationships to create association rules .

1.2 Code implementation

First use summary() Show data details , Use inspect() View the data , Use itemFrequency() Function to see how often an item appears .

Use eclat() Function to build the model , The first parameter is the data set used ,parameter Parameter is used to specify related indicators .

freq.itemsets<-eclat(data,parameter=list(supp=0.075,maxlen=15))

And then use apriorr() Function to build association rules , The first parameter is the data set used ,parameter Used to specify the corresponding indicators , For example, what is the minimum confidence level required by the project .

gro<-apriori(data,parameter=list(support=0.006,confidence=0.25,minlen=2))

Then use summary Function to view and evaluate .

1.3 visualization

Use arulesViz Packages can be visualized .

library(arulesViz)
plot(berryrules,method="graph")

Use treemap Package can realize drawing treemap chart .

1.4 summary

Association rules are used to discover potential relationships in data , This relationship is different from the clustering algorithm , Use similarity to measure the relationship between data , Association rules include certain causal concepts .

2 Random forests

2.1 Basic concepts

Random forest is an integrated machine learning model , yes Bagging A special case of the algorithm . A random forest consists of multiple decision trees , It has been mentioned in previous books （ Number 01）.

2.2 Code implementation

The most common package for building random forests is randomForest, among randomForest() Function is used to build a random forest model , These include Formula（ The formula of the model ）\ntree（ The number of decision trees in a random forest ）\mtry（ The number of features extracted when dividing decision tree nodes ） Three parameters .

library(randomForest)
library(tidyverse)
tmp<-data.frame(x1=runif(100,0,1),x2=runif(100,0,1),x3=runif(100,0,1),x4=runif(100,01,1),x5=runif(100,0,1),x6=runif(100,0,1),x7=runif(100,0,1),x8=runif(100,0,1),y=sample(c(1,0),100,T))
tmp$y<-as.factor(tmp$y)

rf<-randomForest(y~.,data=tmp,proximity=TRUE)

For the variable importance of random forest, we can use varlmPlot() Functions to visualize , Use it directly plot() Function can get error results .

Use runeRF() Function to search for the optimal number of features . among x Represent the features used in the training model ,y It means label ,mtryStart Indicates how much to start searching ,ntreeTry Indicates how many trees there are in the random forest model ,setpFactor Represents the time interval of each iteration .

set.seed(1)
mtry<-tuneRF(x=tmp[,-9],y=tmp[,9]
pre<-predict(rf,newdata=tmp,type="prob")

原网站

版权声明
本文为[Deep bamboo breeze]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/162/202206112138256556.html