当前位置:网站首页>[data analysis and visualization] key points of data drawing 3- spaghetti map
[data analysis and visualization] key points of data drawing 3- spaghetti map
2022-06-13 02:33:00 【The winter holiday of falling marks】
Key points of data drawing 3- Spaghetti map
List of articles
Line charts with too many lines usually become unreadable , This kind of picture is generally called spaghetti picture . So this kind of graph can hardly provide information about the data .
Drawing examples
Let's start with the United States 1880 Year to 2015 Take the evolution of female baby names in .
# Libraries
library(tidyverse)
library(hrbrthemes)
library(kableExtra)
library(babynames)
library(viridis)
library(DT)
library(plotly)
# Display data
data <- babynames
head(data)
nrow(data)
| year | sex | name | n | prop |
|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> |
| 1880 | F | Mary | 7065 | 0.07238359 |
| 1880 | F | Anna | 2604 | 0.02667896 |
| 1880 | F | Emma | 2003 | 0.02052149 |
| 1880 | F | Elizabeth | 1939 | 0.01986579 |
| 1880 | F | Minnie | 1746 | 0.01788843 |
| 1880 | F | Margaret | 1578 | 0.01616720 |
1924665
# Pick data for certain names
data = filter(data,name %in% c("Mary","Emma", "Ida", "Ashley", "Amanda", "Jessica", "Patricia", "Linda", "Deborah", "Dorothy", "Betty", "Helen"))
head(data)
nrow(data)
| year | sex | name | n | prop |
|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> |
| 1880 | F | Mary | 7065 | 0.07238359 |
| 1880 | F | Emma | 2003 | 0.02052149 |
| 1880 | F | Ida | 1472 | 0.01508119 |
| 1880 | F | Helen | 636 | 0.00651606 |
| 1880 | F | Amanda | 241 | 0.00246914 |
| 1880 | F | Betty | 117 | 0.00119871 |
2599
# As long as the female data
data= filter(data,sex=="F")
head(data)
nrow(data)
| year | sex | name | n | prop |
|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> |
| 1880 | F | Mary | 7065 | 0.07238359 |
| 1880 | F | Emma | 2003 | 0.02052149 |
| 1880 | F | Ida | 1472 | 0.01508119 |
| 1880 | F | Helen | 636 | 0.00651606 |
| 1880 | F | Amanda | 241 | 0.00246914 |
| 1880 | F | Betty | 117 | 0.00119871 |
1593
# mapping
ggplot(data,aes(x=year, y=n, group=name, color=name)) +
geom_line() +
scale_color_viridis(discrete = TRUE) +
theme(
plot.title = element_text(size=14)
) +
ggtitle("A spaghetti chart of baby names popularity")

As you can see from the diagram, it is difficult to understand the evolution of the popularity of a particular name according to a line . in addition , Even if you try to follow a line to show the results , You also need to associate it with more difficult illustrations . Let's try to find some solutions to improve this graph .
How to improve
For specific groups
Suppose you draw many groups , But the actual reason is to explain the characteristics of a particular group compared with other groups . Then a good solution is to highlight the Group : Make it look different , And give it an appropriate comment . ad locum ,Amanda The evolution of popularity is obvious . It is important to keep other names , Because it allows you to Amanda Compare with all other names
# Add data items
data = mutate( data, highlight=ifelse(name=="Amanda", "Amanda", "Other"))
head(data)
| year | sex | name | n | prop | highlight |
|---|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> | <chr> |
| 1880 | F | Mary | 7065 | 0.07238359 | Other |
| 1880 | F | Emma | 2003 | 0.02052149 | Other |
| 1880 | F | Ida | 1472 | 0.01508119 | Other |
| 1880 | F | Helen | 636 | 0.00651606 | Other |
| 1880 | F | Amanda | 241 | 0.00246914 | Amanda |
| 1880 | F | Betty | 117 | 0.00119871 | Other |
ggplot(data,aes(x=year, y=n, group=name, color=highlight, size=highlight)) +
geom_line() +
scale_color_manual(values = c("#69b3a2", "lightgrey")) +
scale_size_manual(values=c(1.5,0.2)) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
geom_label( x=1990, y=55000, label="Amanda reached 3550\nbabies in 1970", size=4, color="#69b3a2") +
theme(,
plot.title = element_text(size=14)
)

Using subgraphs
Area maps can be used to provide a more comprehensive overview of the dataset , Especially when used with subgraphs . In the chart below , You can easily glimpse the evolution of any name :
ggplot(data,aes(x=year, y=n, group=name, fill=name)) +
geom_area() +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
theme(
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8),
plot.title = element_text(size=14)
) +
# Map by name
facet_wrap(~name)

As you can see from the picture ,Linda This name is a very popular name in a very short time . On the other hand ,Ida Never very popular , Less used in decades .
Combination method
If you want to compare the evolution of each line with other lines , You can combine targeting specific groups with using subgraphs
# Duplicate column ,name/name2 They have different uses , One is used to display the data in the sub graph , One for sorting
tmp <- data %>%
mutate(name2=name)
head(tmp)
| year | sex | name | n | prop | highlight | name2 |
|---|---|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> | <chr> | <chr> |
| 1880 | F | Mary | 7065 | 0.07238359 | Other | Mary |
| 1880 | F | Emma | 2003 | 0.02052149 | Other | Emma |
| 1880 | F | Ida | 1472 | 0.01508119 | Other | Ida |
| 1880 | F | Helen | 636 | 0.00651606 | Other | Helen |
| 1880 | F | Amanda | 241 | 0.00246914 | Amanda | Amanda |
| 1880 | F | Betty | 117 | 0.00119871 | Other | Betty |
tmp %>%
ggplot( aes(x=year, y=n)) +
# use name2 Display the data
geom_line( data=tmp %>% dplyr::select(-name), aes(group=name2), color="grey", size=0.5, alpha=0.5) +
geom_line( aes(color=name), color="#69b3a2", size=1.2 )+
scale_color_viridis(discrete = TRUE) +
theme(
legend.position="none",
plot.title = element_text(size=14),
panel.grid = element_blank()
) +
ggtitle("A spaghetti chart of baby names popularity") +
# use name Subgraph
facet_wrap(~name)

Reference resources
边栏推荐
- Automatic differential reference
- Yovo3 and yovo3 tiny structure diagram
- Understand speech denoising
- Solution of depth learning for 3D anisotropic images
- Test questions basic exercise 01 string
- Bai ruikai Electronic sprint Scientific Innovation Board: proposed to raise 360 million Funds, Mr. And Mrs. Wang binhua as the main Shareholder
- 01 初识微信小程序
- OpenCVSharpSample05Wpf
- Introduction to easydl object detection port
- Huffman tree and its application
猜你喜欢

4.11 introduction to firmware image package

OneNote使用指南(一)

Mbedtls migration experience

拍拍贷母公司信也季报图解:营收24亿 净利5.3亿同比降10%

Priority queue with dynamically changing priority

Bai ruikai Electronic sprint Scientific Innovation Board: proposed to raise 360 million Funds, Mr. And Mrs. Wang binhua as the main Shareholder

Port mapping between two computers on different LANs (anydesk)

What are the differences in cache/tlb?

Why does it feel that most papers still use RESNET as the backbone network rather than densenet?

Laravel permission export
随机推荐
Several articles on norms
Superficial understanding of conditional random fields
SANs证书生成
重定向设置参数-RedirectAttributes
Leetcode 450. 删除二叉搜索树中的节点 [二叉搜索树]
ROS learning -5 how function packs with the same name work (workspace coverage)
How to destroy a fragment- How to destroy Fragment?
Introduction to armv8/armv9 - learning this article is enough
Matlab: obtain the figure edge contour and divide the figure n equally
[reading some papers] introducing deep learning into the public horizon alexnet
Paper reading - beat tracking by dynamic programming
03 recognize the first view component
柏瑞凯电子冲刺科创板:拟募资3.6亿 汪斌华夫妇为大股东
How to learn to understand Matplotlib instead of simple code reuse
[pytorch] kaggle image classification competition arcface + bounding box code learning
ROS learning-7 error in custom message or service reference header file
Bai ruikai Electronic sprint Scientific Innovation Board: proposed to raise 360 million Funds, Mr. And Mrs. Wang binhua as the main Shareholder
Basic exercise of test questions Yanghui triangle (two-dimensional array and shallow copy)
The precision of C language printf output floating point numbers
Basic exercises of test questions letter graphics ※