当前位置:网站首页>[data analysis and visualization] key points of data drawing 3- spaghetti map
[data analysis and visualization] key points of data drawing 3- spaghetti map
2022-06-13 02:33:00 【The winter holiday of falling marks】
Key points of data drawing 3- Spaghetti map
List of articles
Line charts with too many lines usually become unreadable , This kind of picture is generally called spaghetti picture . So this kind of graph can hardly provide information about the data .
Drawing examples
Let's start with the United States 1880 Year to 2015 Take the evolution of female baby names in .
# Libraries
library(tidyverse)
library(hrbrthemes)
library(kableExtra)
library(babynames)
library(viridis)
library(DT)
library(plotly)
# Display data
data <- babynames
head(data)
nrow(data)
| year | sex | name | n | prop |
|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> |
| 1880 | F | Mary | 7065 | 0.07238359 |
| 1880 | F | Anna | 2604 | 0.02667896 |
| 1880 | F | Emma | 2003 | 0.02052149 |
| 1880 | F | Elizabeth | 1939 | 0.01986579 |
| 1880 | F | Minnie | 1746 | 0.01788843 |
| 1880 | F | Margaret | 1578 | 0.01616720 |
1924665
# Pick data for certain names
data = filter(data,name %in% c("Mary","Emma", "Ida", "Ashley", "Amanda", "Jessica", "Patricia", "Linda", "Deborah", "Dorothy", "Betty", "Helen"))
head(data)
nrow(data)
| year | sex | name | n | prop |
|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> |
| 1880 | F | Mary | 7065 | 0.07238359 |
| 1880 | F | Emma | 2003 | 0.02052149 |
| 1880 | F | Ida | 1472 | 0.01508119 |
| 1880 | F | Helen | 636 | 0.00651606 |
| 1880 | F | Amanda | 241 | 0.00246914 |
| 1880 | F | Betty | 117 | 0.00119871 |
2599
# As long as the female data
data= filter(data,sex=="F")
head(data)
nrow(data)
| year | sex | name | n | prop |
|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> |
| 1880 | F | Mary | 7065 | 0.07238359 |
| 1880 | F | Emma | 2003 | 0.02052149 |
| 1880 | F | Ida | 1472 | 0.01508119 |
| 1880 | F | Helen | 636 | 0.00651606 |
| 1880 | F | Amanda | 241 | 0.00246914 |
| 1880 | F | Betty | 117 | 0.00119871 |
1593
# mapping
ggplot(data,aes(x=year, y=n, group=name, color=name)) +
geom_line() +
scale_color_viridis(discrete = TRUE) +
theme(
plot.title = element_text(size=14)
) +
ggtitle("A spaghetti chart of baby names popularity")

As you can see from the diagram, it is difficult to understand the evolution of the popularity of a particular name according to a line . in addition , Even if you try to follow a line to show the results , You also need to associate it with more difficult illustrations . Let's try to find some solutions to improve this graph .
How to improve
For specific groups
Suppose you draw many groups , But the actual reason is to explain the characteristics of a particular group compared with other groups . Then a good solution is to highlight the Group : Make it look different , And give it an appropriate comment . ad locum ,Amanda The evolution of popularity is obvious . It is important to keep other names , Because it allows you to Amanda Compare with all other names
# Add data items
data = mutate( data, highlight=ifelse(name=="Amanda", "Amanda", "Other"))
head(data)
| year | sex | name | n | prop | highlight |
|---|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> | <chr> |
| 1880 | F | Mary | 7065 | 0.07238359 | Other |
| 1880 | F | Emma | 2003 | 0.02052149 | Other |
| 1880 | F | Ida | 1472 | 0.01508119 | Other |
| 1880 | F | Helen | 636 | 0.00651606 | Other |
| 1880 | F | Amanda | 241 | 0.00246914 | Amanda |
| 1880 | F | Betty | 117 | 0.00119871 | Other |
ggplot(data,aes(x=year, y=n, group=name, color=highlight, size=highlight)) +
geom_line() +
scale_color_manual(values = c("#69b3a2", "lightgrey")) +
scale_size_manual(values=c(1.5,0.2)) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
geom_label( x=1990, y=55000, label="Amanda reached 3550\nbabies in 1970", size=4, color="#69b3a2") +
theme(,
plot.title = element_text(size=14)
)

Using subgraphs
Area maps can be used to provide a more comprehensive overview of the dataset , Especially when used with subgraphs . In the chart below , You can easily glimpse the evolution of any name :
ggplot(data,aes(x=year, y=n, group=name, fill=name)) +
geom_area() +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
theme(
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8),
plot.title = element_text(size=14)
) +
# Map by name
facet_wrap(~name)

As you can see from the picture ,Linda This name is a very popular name in a very short time . On the other hand ,Ida Never very popular , Less used in decades .
Combination method
If you want to compare the evolution of each line with other lines , You can combine targeting specific groups with using subgraphs
# Duplicate column ,name/name2 They have different uses , One is used to display the data in the sub graph , One for sorting
tmp <- data %>%
mutate(name2=name)
head(tmp)
| year | sex | name | n | prop | highlight | name2 |
|---|---|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> | <chr> | <chr> |
| 1880 | F | Mary | 7065 | 0.07238359 | Other | Mary |
| 1880 | F | Emma | 2003 | 0.02052149 | Other | Emma |
| 1880 | F | Ida | 1472 | 0.01508119 | Other | Ida |
| 1880 | F | Helen | 636 | 0.00651606 | Other | Helen |
| 1880 | F | Amanda | 241 | 0.00246914 | Amanda | Amanda |
| 1880 | F | Betty | 117 | 0.00119871 | Other | Betty |
tmp %>%
ggplot( aes(x=year, y=n)) +
# use name2 Display the data
geom_line( data=tmp %>% dplyr::select(-name), aes(group=name2), color="grey", size=0.5, alpha=0.5) +
geom_line( aes(color=name), color="#69b3a2", size=1.2 )+
scale_color_viridis(discrete = TRUE) +
theme(
legend.position="none",
plot.title = element_text(size=14),
panel.grid = element_blank()
) +
ggtitle("A spaghetti chart of baby names popularity") +
# use name Subgraph
facet_wrap(~name)

Reference resources
边栏推荐
- Number of special palindromes in basic exercise of test questions
- 智能安全配电装置如何减少电气火灾事故的发生?
- Exam23 named windows and simplified paths, grayscale conversion
- 1000粉丝啦~
- Paper reading - joint beat and downbeat tracking with recurrent neural networks
- Basic exercise of test questions decimal to hexadecimal
- [reading papers] visual convolution zfnet
- Understand HMM
- Opencv 07, pixel read, change and bitmap write
- How to destroy a fragment- How to destroy Fragment?
猜你喜欢

Thinking back from the eight queens' question
![[51nod.3210] binary Statistics (bit operation)](/img/37/aa4a549deebf994b0049d41d49ff12.jpg)
[51nod.3210] binary Statistics (bit operation)
![Leetcode 926. Flip string to monotonically increasing [prefix and]](/img/ca/d23c1927bc32393cf023c748e4b449.png)
Leetcode 926. Flip string to monotonically increasing [prefix and]
![Leetcode 473. 火柴拼正方形 [暴力+剪枝]](/img/3a/975b91dd785e341c561804175b6439.png)
Leetcode 473. 火柴拼正方形 [暴力+剪枝]

Matlab: obtain the figure edge contour and divide the figure n equally

Impossible d'afficher le contenu de la base de données après que l'idée a utilisé le pool de connexion c3p0 pour se connecter à la base de données SQL

05 tabbar navigation bar function

哈夫曼树及其应用

Solution of depth learning for 3D anisotropic images

Rough understanding of wechat cloud development
随机推荐
[pytorch] kaggle image classification competition arcface + bounding box code learning
Understand speech denoising
redis 多个服务器共用一个
4.11 introduction to firmware image package
Redirection setting parameters -redirectattributes
Hstack, vstack and dstack in numpy
Basic exercise of test questions decimal to hexadecimal
Leetcode 473. 火柴拼正方形 [暴力+剪枝]
Leetcode 450. 删除二叉搜索树中的节点 [二叉搜索树]
Is space time attention all you need for video understanding?
Armv8-m learning notes - getting started
Basic exercises of test questions Fibonacci series
04 route jump and carry parameters
Number of special palindromes in basic exercise of test questions
I didn't expect that the index occupies several times as much space as the data MySQL queries the space occupied by each table in the database, and the space occupied by data and indexes. It is used i
After idea uses c3p0 connection pool to connect to SQL database, database content cannot be displayed
SANs证书生成
哈夫曼树及其应用
Understanding and thinking about multi-core consistency
[reading papers] deepface: closing the gap to human level performance in face verification. Deep learning starts with the face