当前位置:网站首页>[data analysis and visualization] key points of data drawing 3- spaghetti map
[data analysis and visualization] key points of data drawing 3- spaghetti map
2022-06-13 02:33:00 【The winter holiday of falling marks】
Key points of data drawing 3- Spaghetti map
List of articles
Line charts with too many lines usually become unreadable , This kind of picture is generally called spaghetti picture . So this kind of graph can hardly provide information about the data .
Drawing examples
Let's start with the United States 1880 Year to 2015 Take the evolution of female baby names in .
# Libraries
library(tidyverse)
library(hrbrthemes)
library(kableExtra)
library(babynames)
library(viridis)
library(DT)
library(plotly)
# Display data
data <- babynames
head(data)
nrow(data)
| year | sex | name | n | prop |
|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> |
| 1880 | F | Mary | 7065 | 0.07238359 |
| 1880 | F | Anna | 2604 | 0.02667896 |
| 1880 | F | Emma | 2003 | 0.02052149 |
| 1880 | F | Elizabeth | 1939 | 0.01986579 |
| 1880 | F | Minnie | 1746 | 0.01788843 |
| 1880 | F | Margaret | 1578 | 0.01616720 |
1924665
# Pick data for certain names
data = filter(data,name %in% c("Mary","Emma", "Ida", "Ashley", "Amanda", "Jessica", "Patricia", "Linda", "Deborah", "Dorothy", "Betty", "Helen"))
head(data)
nrow(data)
| year | sex | name | n | prop |
|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> |
| 1880 | F | Mary | 7065 | 0.07238359 |
| 1880 | F | Emma | 2003 | 0.02052149 |
| 1880 | F | Ida | 1472 | 0.01508119 |
| 1880 | F | Helen | 636 | 0.00651606 |
| 1880 | F | Amanda | 241 | 0.00246914 |
| 1880 | F | Betty | 117 | 0.00119871 |
2599
# As long as the female data
data= filter(data,sex=="F")
head(data)
nrow(data)
| year | sex | name | n | prop |
|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> |
| 1880 | F | Mary | 7065 | 0.07238359 |
| 1880 | F | Emma | 2003 | 0.02052149 |
| 1880 | F | Ida | 1472 | 0.01508119 |
| 1880 | F | Helen | 636 | 0.00651606 |
| 1880 | F | Amanda | 241 | 0.00246914 |
| 1880 | F | Betty | 117 | 0.00119871 |
1593
# mapping
ggplot(data,aes(x=year, y=n, group=name, color=name)) +
geom_line() +
scale_color_viridis(discrete = TRUE) +
theme(
plot.title = element_text(size=14)
) +
ggtitle("A spaghetti chart of baby names popularity")

As you can see from the diagram, it is difficult to understand the evolution of the popularity of a particular name according to a line . in addition , Even if you try to follow a line to show the results , You also need to associate it with more difficult illustrations . Let's try to find some solutions to improve this graph .
How to improve
For specific groups
Suppose you draw many groups , But the actual reason is to explain the characteristics of a particular group compared with other groups . Then a good solution is to highlight the Group : Make it look different , And give it an appropriate comment . ad locum ,Amanda The evolution of popularity is obvious . It is important to keep other names , Because it allows you to Amanda Compare with all other names
# Add data items
data = mutate( data, highlight=ifelse(name=="Amanda", "Amanda", "Other"))
head(data)
| year | sex | name | n | prop | highlight |
|---|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> | <chr> |
| 1880 | F | Mary | 7065 | 0.07238359 | Other |
| 1880 | F | Emma | 2003 | 0.02052149 | Other |
| 1880 | F | Ida | 1472 | 0.01508119 | Other |
| 1880 | F | Helen | 636 | 0.00651606 | Other |
| 1880 | F | Amanda | 241 | 0.00246914 | Amanda |
| 1880 | F | Betty | 117 | 0.00119871 | Other |
ggplot(data,aes(x=year, y=n, group=name, color=highlight, size=highlight)) +
geom_line() +
scale_color_manual(values = c("#69b3a2", "lightgrey")) +
scale_size_manual(values=c(1.5,0.2)) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
geom_label( x=1990, y=55000, label="Amanda reached 3550\nbabies in 1970", size=4, color="#69b3a2") +
theme(,
plot.title = element_text(size=14)
)

Using subgraphs
Area maps can be used to provide a more comprehensive overview of the dataset , Especially when used with subgraphs . In the chart below , You can easily glimpse the evolution of any name :
ggplot(data,aes(x=year, y=n, group=name, fill=name)) +
geom_area() +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
theme(
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8),
plot.title = element_text(size=14)
) +
# Map by name
facet_wrap(~name)

As you can see from the picture ,Linda This name is a very popular name in a very short time . On the other hand ,Ida Never very popular , Less used in decades .
Combination method
If you want to compare the evolution of each line with other lines , You can combine targeting specific groups with using subgraphs
# Duplicate column ,name/name2 They have different uses , One is used to display the data in the sub graph , One for sorting
tmp <- data %>%
mutate(name2=name)
head(tmp)
| year | sex | name | n | prop | highlight | name2 |
|---|---|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <int> | <dbl> | <chr> | <chr> |
| 1880 | F | Mary | 7065 | 0.07238359 | Other | Mary |
| 1880 | F | Emma | 2003 | 0.02052149 | Other | Emma |
| 1880 | F | Ida | 1472 | 0.01508119 | Other | Ida |
| 1880 | F | Helen | 636 | 0.00651606 | Other | Helen |
| 1880 | F | Amanda | 241 | 0.00246914 | Amanda | Amanda |
| 1880 | F | Betty | 117 | 0.00119871 | Other | Betty |
tmp %>%
ggplot( aes(x=year, y=n)) +
# use name2 Display the data
geom_line( data=tmp %>% dplyr::select(-name), aes(group=name2), color="grey", size=0.5, alpha=0.5) +
geom_line( aes(color=name), color="#69b3a2", size=1.2 )+
scale_color_viridis(discrete = TRUE) +
theme(
legend.position="none",
plot.title = element_text(size=14),
panel.grid = element_blank()
) +
ggtitle("A spaghetti chart of baby names popularity") +
# use name Subgraph
facet_wrap(~name)

Reference resources
边栏推荐
- redis
- Cumulative tax law: calculate how much tax you have paid in a year
- SANs证书生成
- Basic exercises of test questions Fibonacci series
- Image table solid line and dashed line detection
- Use of OpenCV 11 kmeans clustering
- I didn't expect that the index occupies several times as much space as the data MySQL queries the space occupied by each table in the database, and the space occupied by data and indexes. It is used i
- Impossible d'afficher le contenu de la base de données après que l'idée a utilisé le pool de connexion c3p0 pour se connecter à la base de données SQL
- 1000 fans ~
- 智能安全配电装置如何减少电气火灾事故的发生?
猜你喜欢

Matlab: find the inner angle of n-sided concave polygon

冲刺强基计划数学物理专题一

拍拍贷母公司信也季报图解:营收24亿 净利5.3亿同比降10%

For loop instead of while loop - for loop instead of while loop

Cumulative tax law: calculate how much tax you have paid in a year
![Leetcode 450. Delete node in binary search tree [binary search tree]](/img/39/d5c4d424a160635791c4645d6f2e10.png)
Leetcode 450. Delete node in binary search tree [binary search tree]

Huffman tree and its application
![[reading papers] deepface: closing the gap to human level performance in face verification. Deep learning starts with the face](/img/e4/a25716ae7aa8bdea64eb9314ca2cc7.jpg)
[reading papers] deepface: closing the gap to human level performance in face verification. Deep learning starts with the face

I didn't expect that the index occupies several times as much space as the data MySQL queries the space occupied by each table in the database, and the space occupied by data and indexes. It is used i

Area of basic exercise circle ※
随机推荐
[reading point paper] deeplobv3 rethinking atlas revolution for semantic image segmentation ASPP
Rough understanding of wechat cloud development
[dest0g3 520 orientation] dest0g3, which has been a long time since we got WP_ heap
Number of special palindromes in basic exercise of test questions
[reading point paper] yolo9000:better, faster, stronger, (yolov2), integrating various methods to improve the idea of map and wordtree data fusion
Introduction to easydl object detection port
Common web page status return code crawler
Mbedtls migration experience
GMM Gaussian mixture model
Opencv 15 face recognition and eye recognition
cmake_ example
Microsoft Pinyin opens U / V input mode
Armv8-m learning notes - getting started
Leetcode 450. 删除二叉搜索树中的节点 [二叉搜索树]
Basic exercises of test questions Fibonacci series
Paper reading - joint beat and downbeat tracking with recurrent neural networks
[reading some papers] introducing deep learning into the public horizon alexnet
Basic principle of bilateral filtering
How to do Internet for small enterprises
L1 regularization and its sparsity