当前位置:网站首页>R language [data management]
R language [data management]
2022-07-05 21:04:00 【桜キャンドル yuan】
Catalog
1、 Direct horizontal consolidation
2. If you need to specify an index
6、 ... and 、 Get subset of data set
1、 Select the variable ( Column )
2、 Remove variables ( Column )
7、 ... and 、 Using functions subset() Select observation
8、 ... and 、 Utility functions for processing data objects
Ten 、 Apply the function to each list element
11、 ... and 、 Apply functions to rows or columns
Twelve 、 Apply functions to group data
13、 ... and 、 Apply functions to row groups
fourteen 、 User defined functions
One 、mode Access to type
stu_name <- "Richard"
yield <- 100
mode(stu_name)
mode(yield)
List related operations
v <- c(1,2,3)
v
v <- c(v,4,5)
v
w <- c(6,7,8,9)
v <- c(v,w)
v[15] <- 0
v
append operation
append You can append elements , In the specified after Then you can specify the location of the append
x <- 1:10
x
y <- append(x,100,after=5)
y
z <- append(x,100,after=0)
z
Here we can see that we can specify to add some two columns in the table , Averaging , The generated data will generate a new column of data .
mydata <- data.frame( x1 = c(1,2,3,4), x2 = c(5,6,7,8))
mydata
mydata$sumx <- mydata$x1+mydata$x2
mydata
mydata$meanx <- (mydata$x1+mydata$x2)/2
mydata
attach And detach
attach And detach It's used in pairs .attach It can be connected to the table ,detach You can cancel the connection with the table
attach(mydata)
mydata$sumx <- x1+x2
mydata$meanx <- (x1+x2)/2
detach(mydata)
mydata <- transform(mydata, sumx=x1+x2, meanx=(x1+x2)/2)
transform
Use transform To change the data frame
mydata <- transform(mydata, sumx=x1+x2, meanx=(x1+x2)/2)
mydata
Two 、 Special values
From the following test , We can see if it's NULL Words ,R Language loss generates an object for us , But there is no storage space . If it is NA Words ,R Missing allocates a space to the object , But the content in this space is spatiotemporal .
NA Indicates a value that cannot be obtained
NULL Indicates a value that cannot be obtained because it does not exist
Inf infinite ,,1/0 What you get is Inf
NAN Can't express , It's not a number , Use Inf-Inf What you get is NAN
x <- NULL
y <- NA
length(x)
length(y)
Check to see if there is NA
y <- c(1,2,3,NA)
is.na(y)
If the data contains NA It cannot be calculated , But we can use na.rm=T Come and take our NA It's ignored
x <- y[1]+y[2]+y[3]+y[4]
x
z <- sum(y)
z
l <- sum(y, na.rm=T)
l
na.omit
We can also use na.omit() To remove all of us including NA The observation of .
mydata <- read.table("/Users/yangkailiang/Documents/R/data/leadership.csv",
header=T, sep=",")
mydata
newdata <- na.omit(mydata)
newdata
3、 ... and 、 Type conversion
Determine whether it is this type | Type conversion |
is.character() | as.character() |
is.vector() | as.vector() |
is.matrix() | as.matrix() |
is.data.frame() | as.data.frame() |
is.factor() | as.factor() |
is.logical() | as.logical() |
is.numeric() | as.numeric() |
a <- c(1,2,3)
a
is.numeric(a)
is.vector(a)
a <- as.character(a)
is.numeric(a)
is.vector(a)
is.character(a)
Four 、 Data sorting
sort()/order()
We can see from the following test code , If used directly sort Its own words , The order of its own data will not change , Burning stone, if we send the sorted data to other variables to receive , The received data is in good order . The default order is ascending .
x <- c(12,4,7,11,2)
sort(x)
x
y <- sort(x)
y
z <- order(x)
x[z]
Besides , We can also use order Make our data orderly . but order What is generated is the index value sorted by the size of each element in the array . So we need to use x[z] Make our data present an orderly state . As we can see from the figure below x The first element in should be placed in the fifth position , The second element should be in the second position , And so on .
Sort a column in the table
# According to our age Sort columns
newdata <- mydata[order(mydata$age),]
# According to our age Sort columns in descending order
newdata2 <- mydata[order(mydata$age,decreasing=T),]
# First according to gender Sort , Again according to age Sort in ascending order
newdata3 <- mydata[order(mydata$gender,mydata$age),]
attach(mydata)
# First according to gender Sort , Then sort by age in descending order
newdata4 <- mydata[order(gender,-age),]
newdata4
detach(mydata)
5、 ... and 、 Data set merging
1、 Direct horizontal consolidation
Direct horizontal merge does not need to specify a public index
C = cbind(A,B)
2. If you need to specify an index
C = merge(A,B, by=" Public index ")
C = merge(A,B, by="ID")
C = merge(A,B, by=c("ID","Country"))
3. Vertical merger
If vertical consolidation is required ,A And B,A And B You need to have exactly the same variables ( The order can be different )
4. Example
Here are the data of our four tables
# Merge our two tables directly ( about )
cdata <- cbind(data1,data3)
data5 <- cbind(data1,data4)
# Use merge Function words , The corresponding values will be merged automatically
data6 <- merge(data1,data2)
data7 <- merge(data1,data4)
# according to kids Merge and sort columns
data8 <- merge(data1,data4,by="kids")
6、 ... and 、 Get subset of data set
1、 Select the variable ( Column )
# From our mydata Select all rows , Take the first place 1,2,4 Column
newdata <- mydata[,c(1,2,4)]
# From our mydata Select all rows , take managerID,country,age Column
newdata2 <- mydata[,c("managerID","country","age")]
2、 Remove variables ( Column )
mydata
# Method 1
# To eliminate mydata The first row of planting , Second column , The fourth column
newdata <- mydata[,c(-1,-2,-4)]
newdata
# Method 2
# To eliminate "managerID","country","age" These three columns , Will return a vector of boolean type
#%in% For an operator
delvar <- names(mydata) %in% c("managerID","country","age")
newdata2 <- mydata[!delvar]
3. Selected Observations
# Select one to three rows , All columns
newdata <- mydata[1:3,]
newdata
# The selected gender is M, And older than 30 The observation of
newdata1 <- mydata[which(mydata$gender=="M" & mydata$age>30),]
newdata1
attach(mydata)
# The selected gender is M, And older than 30 The observation of
newdata2 <- mydata[which(gender=="M" & age>30),]
newdata2
detach(mydata)
7、 ... and 、 Using functions subset() Select observation
# For us mydata Filter age ≥ in data frame 35 Or younger than 24 Observation of , And put it q1,q2,q3 The data of the column is displayed
newdata <- subset(mydata, age>=35 | age < 24,select=c("q1","q2","q3"))
newdata
# take mydata The gender of the data frame is M, And older than 25, Will the gender List to q4 The columns are all displayed
newdata2 <- subset(mydata, gender == "M" & age >25,select=gender:q4)
newdata2
8、 ... and 、 Utility functions for processing data objects
length(newdata)
dim(newdata)
#str yes struct Abbreviation
str(newdata)
class(newdata)
mode(newdata)
names(newdata)
head(newdata)
tail(newdata)
ls(newdata)
Nine 、 Vector grouping
split() The return value is a list , Separate the data set according to a factor variable .
library(MASS)
groups<-split(Cars93$MPG.city,Cars93$Origin)
groups[[1]]
groups[[2]]
Ten 、 Apply the function to each list element
lapply The return result is a list
sapply The return result is a vector
scores <- list(S1=numeric(0),S2=numeric(0),S3=numeric(0),S4=numeric(0))
scores$S1 <- c(89,85,85,62,93,77,85)
scores$S2 <- c(60,100,83,77,86)
scores$S3 <- c(95,86,91,82,63,67,97,64,55)
scores$S4 <- c(67,63,83,89)
lapply(scores,length)
sapply(scores,mean)
11、 ... and 、 Apply functions to rows or columns
apply()
results <- apply(matrix,1,function) # Process the rows of the matrix
results <- apply(matrix,2,function) # Process the columns of the matrixFor data frames , If the data type of each row is consistent , You can use apply To deal with it
If the column data types of the data frame are inconsistent , You need to use the above lapply And sapply To process
Mat_long <- matrix(c(-1.85,0.94,-0.54,-1.41,1.35,-1.71,-1.01,-0.16,-0.35,-3.72,1.63,-0.28,-0.28,2.45,-1.22),nrow=3,dimnames=list(c("Moe", "Larry","Curly"),c("trial1","trial2","trial3","trial4","trial")))
apply(Mat_long,1,mean)
apply(Mat_long,1,range)
apply(Mat_long,2,mean)
Twelve 、 Apply functions to group data
tapply(vector,factor,function)
vector It's a vector ,factor Is a grouping factor ,function It's a function , That is to say, we need to be right vector The method of carrying out
wagedata <- read.csv("/Users/Documents/R/data/wagedata.csv")
attach(wagedata)
# First the edu according to female Grouping , Then calculate the average value respectively
tapply(educ,female,mean)
detach(wagedata)
educ
13、 ... and 、 Apply functions to row groups
by(dataframe,factor,function)
dataframe It's a vector ,factor Is a grouping factor ,function It's a function
wagedata <- read.csv("/Users/Documents/R/data/wagedata.csv")
by(wagedata,female,mean)
fourteen 、 User defined functions
myfunction <- function(arg1,arg2,...){
statements
return(object)
}
# The following is the compilation of Pythagorean theorem
# The square of a vector is the square of each component , Then we add the squares of each component and square it , We can get the length of our third side .
lengththm <- function(x) {
thirdl <- sqrt(sum(x^2))
return(thirdl)
}
a <- c(3,4)
lengththm(a)
15、 ... and 、 loop
for (var in seq) {
statements
}
while (condition) {
statements
}
x <- c(2,5,10)
for (n in x)
print(n^2)
边栏推荐
- XML建模
- 从架构上详解技术(SLB,Redis,Mysql,Kafka,Clickhouse)的各类热点问题
- Which is the best online collaboration product? Microsoft loop, notion, flowus
- Vant source code parsing event Detailed explanation of TS event processing global function addeventlistener
- Use of thread pool
- 序列联配Sequence Alignment
- ts 之 类的简介、构造函数和它的this、继承、抽象类、接口
- MYSQL IFNULL使用功能
- Analyze the knowledge transfer and sharing spirit of maker Education
- 【案例】元素的显示与隐藏的运用--元素遮罩
猜你喜欢
Write an interface based on flask
使用WebAssembly在浏览器端操作Excel
Duchefa d5124 md5a medium Chinese and English instructions
Abbkine trakine F-actin Staining Kit (green fluorescence) scheme
请查收.NET MAUI 的最新学习资源
PVC 塑料片BS 476-6 火焰传播性能测定
木板ISO 5660-1 热量释放速率摸底测试
显示器要申请BS 476-7 怎么送样?跟显示屏一样吗??
Duchefa p1001 plant agar Chinese and English instructions
中国的软件公司为什么做不出产品?00后抛弃互联网;B站开源的高性能API网关组件|码农周刊VIP会员专属邮件周报 Vol.097
随机推荐
基于flask写一个接口
Prior knowledge of machine learning in probability theory (Part 1)
2. < tag hash table, string> supplement: Sword finger offer 50 The first character DBC that appears only once
Hdu2377bus pass (build more complex diagram +spfa)
Which is the best online collaboration product? Microsoft loop, notion, flowus
100 cases of shell programming
Binary search
Dictionary tree simple introductory question (actually blue question?)
MYSQL IFNULL使用功能
How to make ERP inventory accounts of chemical enterprises more accurate
js常用方法封装
浅聊我和一些编程语言的缘分
POJ 3414 pots (bfs+ clues)
Binary search
MySQL 千万数据量深分页优化, 拒绝线上故障!
10000+ 代码库、3000+ 研发人员大型保险集团的研发效能提升实践
Prosci LAG-3 recombinant protein specification
Use of thread pool
LeetCode_ Hash table_ Difficulties_ 149. Maximum number of points on the line
驱动壳美国测试UL 2043 符合要求有哪些?