当前位置:网站首页>(4) Data visualization of R language -- matrix chart, histogram, pie chart, scatter chart, linear regression and strip chart

(4) Data visualization of R language -- matrix chart, histogram, pie chart, scatter chart, linear regression and strip chart

2022-07-06 12:21:00 EricFrenzy

notes : This blog aims to share personal learning experience , Please forgive me for any irregularities !

Histogram Rectangle

In statistics , Rectangular graphs are often used to show a continuous variable ( Such as length 、 Weight and other measured values ) The distribution of . stay R In language, rectangular graph is constructed with hist() function , See the following example for details :

#hist() The first parameter of the function is data , It's used here sampleData The column title in the list is length The data of 
#main Is the title of the diagram ;xlab yes x Axis title ;ylab yes y Axis title 
#freq yes TRUE What I draw is frequency , yes FALSE What I draw is probability density 
#breaks Determine the number of rectangles . there 20 There will be 20 A rectangle 
#xlim and ylim Determine the scope of the drawing 
hist(sampleData$length, main = "Histogram of Protein Lengths", 
	xlab = "Length (AA)", ylab = "Frequency", freq = TRUE,breaks = 20,
	xlim = c(0,10000), ylim = c(0,7000))

The following figure shows the effect drawn :
hist
In this picture , We can find most protein length All in 2000 following , It is a very obvious positive skewness distribution .

Bar plot Histogram

Histogram is often used in statistics to show discontinuous variables ( Such as number 、 Count and measure money ) The distribution of . stay R The construction of histograms in language uses barplot() function , See the following example for details :

#barplot() The first parameter of is also the data used . Here is education The column title in the list is spending The data of 
#names.arg It is equivalent to x value . This vector Corresponding to each year spending
#ylab yes y Axis title ;main Is the title of the diagram ;ylim yes y The scope of the shaft 
#width Adjust the width of each column ;space Adjust the distance between columns 
barplot(education$spending, 
			names.arg = education$year,  
			ylab = "Spending per student($)", 
			main = "Education Spending per Student", 
			ylim=c(0,7000))

The following figure shows the effect drawn :
barplot
Visible in the sample data , The average number of students in education is increasing year by year .

Pie chart The pie chart

Pie chart can intuitively show the proportion of different kinds of data .R Language comes with pie() Function function is not very powerful . stay R See the following example for constructing pie charts in language :

#pie() The first parameter of is data , It can be the percentage of various classes in the total , It can also be the quantity of various kinds 
#labels Is the data marked outside each sector , The format is vector
#main Is the title of the diagram 
#col Adjust the color of each sector .rainbow() yes R Built in functions , It can divide rainbow colors into a specified number 
#clockwise Adjust whether the drawing is clockwise or counterclockwise ;init.angle Adjust where the fan starts to draw 
percentage <- c(36, 8, 17, 52)
desserts <- c("Ice cream", "Cookie", "Brownie", "Cake")
pie(percentage, labels = desserts, 
		main = "Dessert Preference", 
		col = rainbow(length(percentage)),
		clockwise = FALSE, init.angle = 0)

The following figure shows the effect drawn :
 Insert picture description here

Dot plot Scatter plot

Scatter charts are often used in scientific experiments , Further analysis is carried out on the basis of visualizing independent variables and dependent variables . stay R In language, it is used to construct scatter diagram plot() function , See the following example for details :

a <- c(12, 17, 25, 33, 39, 45) # The independent variables (x)
b <- c(10, 13, 17, 20, 26, 31) # The dependent variable (y)
#plot() The first parameter of is the argument (x value ) data , The second parameter is the dependent variable (y value ) data 
#main Set the title of the diagram ;xlab and ylab Set the title of the horizontal and vertical coordinates 
plot(a, b, main="My Plot", xlab="x variable", ylab="y variable")

#lm(Y~X) Will generate a linear regression data , Including intercept and slope of regression function 
#abline() This regression line will be drawn on the map ;col Set the color of the line 
#summary() Will generate details about the regression line 
abline(lm(b~a), col="red")
summary(lm(b~a))

The following figure shows the effect drawn :
plot
The following figure for summary() stay Console Output result :
summary
In the picture Estimate Below is the intercept of the regression line (2.09396) And the slope (0.61074).Multiple R-squared It's what we often use R Square value (0.975).

Box-and-whisker plot Box chart

The box chart can well show the distribution of data .R It is used to construct box graph in language boxplot() function , See the following example for details :

# The first few items of Fibonacci series are selected for the data 
#boxplot() The first parameter of is data ;horizontal Whether the control chart is drawn horizontally 
# It can also be used. main Set the title and xlab Set the horizontal axis title 
data <- c(1, 1, 2, 3, 5, 8, 13, 21, 34)
boxplot(data, horizontal=TRUE)

The following figure shows the effect drawn :
boxplot
Look at the vertical line in the picture from left to right , We can see the lower edge of the data 、 Lower quartile 、 Median 、 Upper quartile 、 The upper edge , And circles represent outliers . It can be seen that this group of data has a positive skew distribution 、 It has characteristics such as outliers .

Strip chart Strip chart

A strip chart is similar to a one-dimensional scatter chart , The function is close to the box diagram , It can visually display the distribution of data under a small amount of data .R In language, banded graphs are constructed with stripchart() function , See the following example for details :

#stripchart() The first parameter of the function is data 
#method Control the coincidence of data points ."jitter" To avoid overlapping ,"stack" To display points with the same value side by side , default "overplot" All the data generated is on a straight line ,
# If method="jitter",jitter Adjust the degree of dispersion of data points to avoid coincidence 
# If method="stack",offset The interval between points with the same parameter adjustment value 
# It can also be used main Set chart title ,xlab Set the horizontal axis title 
data <- c(1, 1, 2, 3, 5, 8, 13, 21, 34)
par(mfrow=c(1, 2)) # Set two pictures to display side by side , The format here is 1 That's ok 2 Column 
stripchart(data, method = "jitter", jitter=1)
stripchart(data, method = "stack", offset=1)

The following figure shows the effect drawn :
stripchart
It can be seen that , Most of the data is more compact on the left , The more to the right, the more scattered .

Conclusion

Introduced so many uses R Language is a way to visualize data , The most important thing to do data analysis is to choose a graph suitable for data type and analysis purpose . If you have any questions or ideas, please leave messages and comments !

原网站

版权声明
本文为[EricFrenzy]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207060913448161.html