当前位置：网站首页>[data analysis and visualization] key points of data drawing 4- problems of pie chart

[data analysis and visualization] key points of data drawing 4- problems of pie chart

2022-06-13 02:33:00 【The winter holiday of falling marks】

Key points of data drawing 4- Pie chart problem

This article lets us know the most criticized chart types in history ： The pie chart .

Bad definition

A pie chart is a circle , It's divided into several parts , Each part represents a part of the whole . It is usually used to display percentages , Where the sum of sectors is equal to 100%. The problem is that humans are very bad at reading . In the adjacent pie chart , Try to find the largest group , And try to sort them by value . It may be difficult for you to do this , This is why you must avoid using pie charts . Let's try to compare 3 A pie chart . Try to understand here 3 Which group in the graph has the highest value . Besides , Try to figure out what the value evolution between groups is .

# Libraries
library(tidyverse)
library(hrbrthemes)
library(viridis)
library(patchwork)

# create 3 data frame  Create data 
data1 <- data.frame( name=letters[1:5], value=c(17,18,20,22,24) )
data2 <- data.frame( name=letters[1:5], value=c(20,18,21,20,20) )
data3 <- data.frame( name=letters[1:5], value=c(24,23,21,19,18) )
#  View the data 
data1
data2
data3

A data.frame: 5 × 2
name	value
<fct>	<dbl>
a	17
b	18
c	20
d	22
e	24

A data.frame: 5 × 2
name	value
<fct>	<dbl>
a	20
b	18
c	21
d	20
e	20

A data.frame: 5 × 2
name	value
<fct>	<dbl>
a	24
b	23
c	21
d	19
e	18

#  Define the drawing function 
plot_pie <- function(data, vec){

ggplot(data, aes(x="name", y=value, fill=name)) +
  #  A pie chart is a bar chart 
  geom_bar(width = 1, stat = "identity") +
  #  Change to polar coordinate system 
  coord_polar("y", start=0, direction = -1) +
  #  Set fill color 
  scale_fill_viridis(discrete = TRUE,  direction=-1) + 
  #  According to the text   
  geom_text(aes(y = vec, label = rev(name), size=4, color=c( "white", rep("black", 4)))) +
  scale_color_manual(values=c("black", "white")) +
  theme(
    legend.position="none",
    plot.title = element_text(size=14),
    panel.grid = element_blank(),
    axis.text = element_blank()
  ) +
  xlab("") +
  ylab("")
}

a <- plot_pie(data1, c(10,35,55,75,93))
b <- plot_pie(data2, c(10,35,53,75,93))
c <- plot_pie(data3, c(10,29,50,75,93))
a + b + c

png

Now? , Let's use a bar chart barplot Represents identical data ：

#  Define the drawing function 
plot_bar  <- function(data){

ggplot(data, aes(x=name, y=value, fill=name)) +
  #  Draw a bar graph 
  geom_bar(stat = "identity") +
  #  Set fill color 
  scale_fill_viridis(discrete = TRUE,  direction=-1) + 
  scale_color_manual(values=c("black", "white")) +
  theme(
    legend.position="none",
    plot.title = element_text(size=14),
    panel.grid = element_blank(),
  ) +
  ylim(0,25) +
  xlab("") +
  ylab("")

}

a <- plot_bar (data1)
b <- plot_bar (data2)
c <- plot_bar (data3)
a + b + c

png

Let's talk about the reasons for using charts .

Charts are a way to get information and make it easier to understand .
Generally speaking , The purpose of the chart is to make it easier to compare different data sets .
Charts can convey as much information as possible without adding complexity .

As you can see by comparing the pictures , Pie charts are difficult to visualize the differences between data , The bar chart is the opposite , You can clearly see the difference between different data . Pie charts can't compare different values , And there is no way to convey more information .

Solution

Bar chart , The bar chart is the best substitute for the pie chart . If you have many values to display , You can also consider a more elegant lollipop chart in my opinion . The following is based on a few countries in the world / Demonstration examples of the number of important items sold in the region ：

#  from github Load data 
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/7_OneCatOneNum.csv", header=TRUE, sep=",")
#  Clear null data 
data <- filter(data,!is.na(Value))
nrow(data)
head(data)
#  Arrange data 
data<- arrange(data,Value)
#  take Contry Convert to factor term , To represent classified data 
data<- mutate(data,Country=factor(Country, Country))
#  mapping 
ggplot(data,aes(x=Country, y=Value) ) +
#  Define the data axis 
geom_segment( aes(x=Country ,xend=Country, y=0, yend=Value), color="grey") +
#  Draw points 
geom_point(size=3, color="#69b3a2") +
# x,y Shaft exchange 
coord_flip() +
#  Set the theme 
theme(
    #  Set the inner line to empty 
    panel.grid.minor.y = element_blank(),
    panel.grid.major.y = element_blank(),
    legend.position="none"
) +
#  original x The axis is now in the image y The axis title of the axis is set to null 
xlab("")

A data.frame: 6 × 2
	Country	Value
	<fct>	<int>
1	United States	12394
2	Russia	6148
3	Germany (FRG)	1653
4	France	2162
5	United Kingdom	1214
6	China	1131

png

If your goal is to describe the composition of the whole , Another possibility is to create a tree view .

# Package
#  Import specialized packages 
library(treemap)

# Plot  mapping 
treemap(data,    
        # data
        index="Country",
        vSize="Value",
        type="index",

        #  Set the color 
        title="",
        palette="Dark2",

        # Border  Bounding box settings 
        border.col=c("black"),
        #  Bounding box lineweight 
        border.lwds=3,                         

        # Labels  Set label color 
        fontcolor.labels="white",
        #  Set the font 
        fontface.labels=2,
        #  Set label position 
        align.labels=c("left", "top"),
        #  The larger the setting area , The bigger the label is 
        inflate.labels=T,
        #  Set the display label level , The smaller the size, the fewer labels are displayed 
        fontsize.labels=5
)

png