当前位置:网站首页>[data analysis and visualization] key points of data drawing 3- spaghetti map
[data analysis and visualization] key points of data drawing 3- spaghetti map
2022-06-13 02:33:00 【The winter holiday of falling marks】
Key points of data drawing 3- Spaghetti map
List of articles
Line charts with too many lines usually become unreadable , This kind of picture is generally called spaghetti picture . So this kind of graph can hardly provide information about the data .
Drawing examples
Let's start with the United States 1880 Year to 2015 Take the evolution of female baby names in .
# Libraries
library(tidyverse)
library(hrbrthemes)
library(kableExtra)
library(babynames)
library(viridis)
library(DT)
library(plotly)
# Display data
data <- babynames
head(data)
nrow(data)
| year | sex | name | n | prop |
|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> |
| 1880 | F | Mary | 7065 | 0.07238359 |
| 1880 | F | Anna | 2604 | 0.02667896 |
| 1880 | F | Emma | 2003 | 0.02052149 |
| 1880 | F | Elizabeth | 1939 | 0.01986579 |
| 1880 | F | Minnie | 1746 | 0.01788843 |
| 1880 | F | Margaret | 1578 | 0.01616720 |
1924665
# Pick data for certain names
data = filter(data,name %in% c("Mary","Emma", "Ida", "Ashley", "Amanda", "Jessica", "Patricia", "Linda", "Deborah", "Dorothy", "Betty", "Helen"))
head(data)
nrow(data)
| year | sex | name | n | prop |
|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> |
| 1880 | F | Mary | 7065 | 0.07238359 |
| 1880 | F | Emma | 2003 | 0.02052149 |
| 1880 | F | Ida | 1472 | 0.01508119 |
| 1880 | F | Helen | 636 | 0.00651606 |
| 1880 | F | Amanda | 241 | 0.00246914 |
| 1880 | F | Betty | 117 | 0.00119871 |
2599
# As long as the female data
data= filter(data,sex=="F")
head(data)
nrow(data)
| year | sex | name | n | prop |
|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> |
| 1880 | F | Mary | 7065 | 0.07238359 |
| 1880 | F | Emma | 2003 | 0.02052149 |
| 1880 | F | Ida | 1472 | 0.01508119 |
| 1880 | F | Helen | 636 | 0.00651606 |
| 1880 | F | Amanda | 241 | 0.00246914 |
| 1880 | F | Betty | 117 | 0.00119871 |
1593
# mapping
ggplot(data,aes(x=year, y=n, group=name, color=name)) +
geom_line() +
scale_color_viridis(discrete = TRUE) +
theme(
plot.title = element_text(size=14)
) +
ggtitle("A spaghetti chart of baby names popularity")

As you can see from the diagram, it is difficult to understand the evolution of the popularity of a particular name according to a line . in addition , Even if you try to follow a line to show the results , You also need to associate it with more difficult illustrations . Let's try to find some solutions to improve this graph .
How to improve
For specific groups
Suppose you draw many groups , But the actual reason is to explain the characteristics of a particular group compared with other groups . Then a good solution is to highlight the Group : Make it look different , And give it an appropriate comment . ad locum ,Amanda The evolution of popularity is obvious . It is important to keep other names , Because it allows you to Amanda Compare with all other names
# Add data items
data = mutate( data, highlight=ifelse(name=="Amanda", "Amanda", "Other"))
head(data)
| year | sex | name | n | prop | highlight |
|---|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> | <chr> |
| 1880 | F | Mary | 7065 | 0.07238359 | Other |
| 1880 | F | Emma | 2003 | 0.02052149 | Other |
| 1880 | F | Ida | 1472 | 0.01508119 | Other |
| 1880 | F | Helen | 636 | 0.00651606 | Other |
| 1880 | F | Amanda | 241 | 0.00246914 | Amanda |
| 1880 | F | Betty | 117 | 0.00119871 | Other |
ggplot(data,aes(x=year, y=n, group=name, color=highlight, size=highlight)) +
geom_line() +
scale_color_manual(values = c("#69b3a2", "lightgrey")) +
scale_size_manual(values=c(1.5,0.2)) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
geom_label( x=1990, y=55000, label="Amanda reached 3550\nbabies in 1970", size=4, color="#69b3a2") +
theme(,
plot.title = element_text(size=14)
)

Using subgraphs
Area maps can be used to provide a more comprehensive overview of the dataset , Especially when used with subgraphs . In the chart below , You can easily glimpse the evolution of any name :
ggplot(data,aes(x=year, y=n, group=name, fill=name)) +
geom_area() +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
theme(
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8),
plot.title = element_text(size=14)
) +
# Map by name
facet_wrap(~name)

As you can see from the picture ,Linda This name is a very popular name in a very short time . On the other hand ,Ida Never very popular , Less used in decades .
Combination method
If you want to compare the evolution of each line with other lines , You can combine targeting specific groups with using subgraphs
# Duplicate column ,name/name2 They have different uses , One is used to display the data in the sub graph , One for sorting
tmp <- data %>%
mutate(name2=name)
head(tmp)
| year | sex | name | n | prop | highlight | name2 |
|---|---|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> | <chr> | <chr> |
| 1880 | F | Mary | 7065 | 0.07238359 | Other | Mary |
| 1880 | F | Emma | 2003 | 0.02052149 | Other | Emma |
| 1880 | F | Ida | 1472 | 0.01508119 | Other | Ida |
| 1880 | F | Helen | 636 | 0.00651606 | Other | Helen |
| 1880 | F | Amanda | 241 | 0.00246914 | Amanda | Amanda |
| 1880 | F | Betty | 117 | 0.00119871 | Other | Betty |
tmp %>%
ggplot( aes(x=year, y=n)) +
# use name2 Display the data
geom_line( data=tmp %>% dplyr::select(-name), aes(group=name2), color="grey", size=0.5, alpha=0.5) +
geom_line( aes(color=name), color="#69b3a2", size=1.2 )+
scale_color_viridis(discrete = TRUE) +
theme(
legend.position="none",
plot.title = element_text(size=14),
panel.grid = element_blank()
) +
ggtitle("A spaghetti chart of baby names popularity") +
# use name Subgraph
facet_wrap(~name)

Reference resources
边栏推荐
- How to destroy a fragment- How to destroy Fragment?
- Cumulative tax law: calculate how much tax you have paid in a year
- Opencv 08 demonstrates the effect of opening and closing operations of erode, dilate and morphological function morphologyex.
- Understanding and thinking about multi-core consistency
- For loop instead of while loop - for loop instead of while loop
- 01 initial knowledge of wechat applet
- Introduction to easydl object detection port
- I didn't expect that the index occupies several times as much space as the data MySQL queries the space occupied by each table in the database, and the space occupied by data and indexes. It is used i
- cmake_ example
- Leetcode 926. Flip string to monotonically increasing [prefix and]
猜你喜欢
![[reading papers] dcgan, the combination of generating countermeasure network and deep convolution](/img/31/8c225627177169f1a3d6c48fd7e97e.jpg)
[reading papers] dcgan, the combination of generating countermeasure network and deep convolution

ROS learning-7 error in custom message or service reference header file

微信云开发粗糙理解

Area of basic exercise circle ※

Hstack, vstack and dstack in numpy

Chapter7-13_ Dialogue State Tracking (as Question Answering)

How can intelligent safe power distribution devices reduce the occurrence of electrical fire accidents?

Bai ruikai Electronic sprint Scientific Innovation Board: proposed to raise 360 million Funds, Mr. And Mrs. Wang binhua as the main Shareholder

Understand CRF

Yovo3 and yovo3 tiny structure diagram
随机推荐
02 优化微信开发者工具默认的结构
Paipai loan parent company Xinye quarterly report diagram: revenue of RMB 2.4 billion, net profit of RMB 530million, a year-on-year decrease of 10%
ROS learning-7 error in custom message or service reference header file
Fast Color Segementation
[reading some papers] introducing deep learning into the public horizon alexnet
GMM Gaussian mixture model
Paper reading - jukebox: a generic model for music
SANs证书生成
[keras] train py
Easydl related documents and codes
Area of basic exercise circle ※
Basic exercise of test questions decimal to hexadecimal
[pytorch]fixmatch code explanation (super detailed)
Impossible d'afficher le contenu de la base de données après que l'idée a utilisé le pool de connexion c3p0 pour se connecter à la base de données SQL
[reading point paper] deeplobv3+ encoder decoder with Atlas separable revolution
OpenCVSharpSample05Wpf
Classification and summary of system registers in aarch64 architecture of armv8/arnv9
Branch and bound method, example sorting
Opencv 10 brightness contrast adjustment
An image is word 16x16 words: transformers for image recognition at scale