当前位置:网站首页>[data analysis and visualization] key points of data mapping 7- over mapping
[data analysis and visualization] key points of data mapping 7- over mapping
2022-06-13 02:34:00 【The winter holiday of falling marks】
Key points of data drawing 7- Overprinting
Over plotting is a common problem in data plotting . When your data set is large , The points of a scatter plot tend to overlap , Make graphics unreadable . In this article , Several solutions will be given to avoid over drawing .
List of articles
Over drawing instances
The following scatter diagram illustrates the problems with over plotting . At first glance, it may be concluded that :X and Y There is no obvious relationship between . But later we will prove how wrong this conclusion is .
# # Load the library
library(tidyverse)
library(hrbrthemes)
library(viridis)
library(patchwork)
# Dataset:
a <- data.frame( x=rnorm(20000, 10, 1.2), y=rnorm(20000, 10, 1.2), group=rep("A",20000))
b <- data.frame( x=rnorm(20000, 14.5, 1.2), y=rnorm(20000, 14.5, 1.2), group=rep("B",20000))
c <- data.frame( x=rnorm(20000, 9.5, 1.5), y=rnorm(20000, 15.5, 1.5), group=rep("C",20000))
# Splicing data
data <- do.call(rbind, list(a,b,c))
# mapping
ggplot(data,aes(x=x, y=y)) +
geom_point(color="#69b3a2", size=2) +
theme(
legend.position="none"
)
resolvent
Reduce the size of the points
The simplest solution may be to reduce the size of the points , It can provide very satisfactory results . You can clearly see the existence here 3 Clusters , This is hidden in the image above .
ggplot(data,aes(x=x, y=y)) +
# Reduce the size of the points
geom_point(color="#69b3a2", size=0.02) +
theme(
legend.position="none"
)
transparency
Combined with the size of the reduction point , Using transparency can also further solve the problem of over drawing .
ggplot(data,aes(x=x, y=y)) +
# Set transparency
geom_point(color="#69b3a2", size=2, alpha=0.01) +
theme(
legend.position="none"
)
2 Dimensional density diagram
The two-dimensional density map basically calculates the number of observations in a specific area of the two-dimensional space , This count is represented in color , The distribution of points can be clearly seen
# draw 2 Dimensional density diagram
ggplot(data, aes(x=x, y=y) ) +
stat_density_2d(aes(fill = ..density..), geom = "raster", contour = FALSE) +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0)) +
scale_fill_viridis() +
theme(
legend.position='none'
)
Data sampling
Sometimes less is more . Only a small portion of the data is plotted ( Here is 5%) It can greatly reduce the calculation time and help avoid over drawing :
sample_data <- sample_frac(data, 0.05)
ggplot(sample_data, aes(x=x, y=y)) +
geom_point(color="#69b3a2", size=2) +
theme(
legend.position="none"
)
Highlight a specific group
Another way to reduce graphics complexity is to highlight specific groups .
ggplot( data,aes(x=x, y=y)) +
geom_point(color="grey", size=2) +
# Highlight group B
geom_point(data = data %>% filter(group=="B"), color="#69b3a2", size=2) +
theme(
legend.position="none",
plot.title = element_text(size=12)
) +
ggtitle('Behavior of the group B')
grouping
If the data is grouped , You can use different colors to represent different groups of points .
ggplot(data, aes(x=x, y=y, color=group)) +
geom_point( size=2, alpha=0.1) +
scale_color_viridis(discrete=TRUE)
Subgraph
Once you have multiple groups in your diagram , Another method is to use a partition , Highlight one group at a time .
ggplot(data, aes(x=x, y=y)) +
# Draw a point that highlights the category
geom_point( aes( color=group) , size=2, alpha=0.1) +
# Draw points that do not highlight categories
geom_point( data=data %>% select(-group), size=1, alpha=0.05, color="grey") +
scale_color_viridis(discrete=TRUE) +
theme(
legend.position="none",
) +
# Subgraph
facet_wrap(~group)
Three dimensional diagram
Use a three-dimensional graph to display density , under these circumstances , The position of each group becomes obvious .
library(plotly)
library(MASS)
kd <- with(data, MASS::kde2d(x, y, n = 50))
plot_ly(x = kd$x, y = kd$y, z = kd$z) %>% add_surface()
Edge distribution
Adding edge distributions allows you to detect distributions hidden in the over drawn portion of the graph . You can add a box chart to the edge 、 Histogram or density chart .
library(ggExtra)
# Create a scatter diagram
p <- ggplot(data, aes(x=x, y=y)) +
geom_point(color="#69b3a2", size=2, alpha=0.01) +
theme(
legend.position="none"
)
# Add edge histogram
ggExtra::ggMarginal(p, type = "histogram")
Reference resources
边栏推荐
- cmake_ example
- [reading papers] deep learning face representation by joint identification verification, deep learning applied to optimization problems, deepid2
- Introduction to easydl object detection port
- Opencv 15 face recognition and eye recognition
- Yovo3 and yovo3 tiny structure diagram
- 在IDEA使用C3P0連接池連接SQL數據庫後卻不能顯示數據庫內容
- Example 4 linear filtering and built-in filtering
- speech production model
- Branch and bound method, example sorting
- 柏瑞凱電子沖刺科創板:擬募資3.6億 汪斌華夫婦為大股東
猜你喜欢
[reading papers] visual convolution zfnet
[51nod.3210] binary Statistics (bit operation)
Barrykay electronics rushes to the scientific innovation board: it is planned to raise 360million yuan. Mr. and Mrs. Wang Binhua are the major shareholders
[reading papers] transformer miscellaneous notes, especially miscellaneous
Understand HMM
Area of basic exercise circle ※
Introduction to armv8/armv9 - learning this article is enough
05 tabBar导航栏功能
Port mapping between two computers on different LANs (anydesk)
[reading papers] deep learning face representation by joint identification verification, deep learning applied to optimization problems, deepid2
随机推荐
redis
Gadgets: color based video and image cutting
冲刺强基计划数学物理专题一
智能安全配电装置如何减少电气火灾事故的发生?
【LeetCode-SQL】1532. Last three orders
在IDEA使用C3P0連接池連接SQL數據庫後卻不能顯示數據庫內容
Bai ruikai Electronic sprint Scientific Innovation Board: proposed to raise 360 million Funds, Mr. And Mrs. Wang binhua as the main Shareholder
Yovo3 and yovo3 tiny structure diagram
[reading papers] transformer miscellaneous notes, especially miscellaneous
[keras] train py
Priority queue with dynamically changing priority
01 初识微信小程序
Opencv 15 face recognition and eye recognition
Easydl related documents and codes
Huffman tree and its application
[reading point paper] deeplobv3+ encoder decoder with Atlas separable revolution
Think about the possibility of attacking secure memory through mmu/tlb/cache
[reading papers] dcgan, the combination of generating countermeasure network and deep convolution
[reading papers] visual convolution zfnet
Armv8-m (Cortex-M) TrustZone summary and introduction