当前位置:网站首页>[data processing] boxplot drawing
[data processing] boxplot drawing
2022-07-28 02:33:00 【HoveXb】
1. summary
boxplot Is in 1977 By American statistician John · tukey (John Tukey) Invented , It contains five basic elements :
- minimum value (Q0, The first 0 The quartile ): The minimum value after removing outliers
- Maximum (Q4): The maximum value after removing outliers
- Median : All values in the sample are arranged from small to large, and then 50% The number of
- First quartile (Q1): also called “ Lower quartile ”, It is equal to the number of all values in the sample arranged from small to large 25% The number of .
- third quartile (Q3): also called “ Larger quartile ”, It is equal to the number of all values in the sample arranged from small to large 75% The number of .
In addition to the above 5 Two basic elements , Four minute spacing (InterQuartile Range,IQR) It is also often used for boxplot Construction , It is defined as the difference between the third quartile and the first quartile , namely IQR=Q3-Q1.
boxplot The figure of is as follows , It consists of boxes (box) And whiskers (whiskers ) constitute , among , The upper and lower boundaries of the box are Q3、Q1 constitute , The middle of the box is separated by the median . Must be (whiskers ) There are many variations on the upper and lower boundaries of .
A standard must be defined as : The maximum and minimum values in the dataset ;
There is also : Adopt upper limit =Q3+1.5IQR, Lower limit =Q1-1.5IQR, As the upper and lower boundaries of whiskers , Points outside this boundary are considered outliers .
In addition, there are, for example, the following ways to define boundaries :
- The minimum and the maximum value of the data set
- One standard deviation above and below the mean of the data set
- The 9th percentile and the 91st percentile of the data set
- The 2nd percentile and the 98th percentile of the data set

2. Calculation
Premise : Sort the data
Set the data length as n
- Median Q2: It is the calculation of the median in the general statistical sense (n Take the middle of odd numbers ,n Average the middle two values for even numbers )
- First quartile Q1: Calculate the first quarter position pos, Then calculate the value of the first quartile . among , There is no unified standard for the calculation of the first quartile position , But there are usually two ways to calculate : Mode one : p o s = 1 + n − 1 4 pos=1+\frac{n-1}{4} pos=1+4n−1; Mode two : p o s = n + 1 4 pos=\frac{n+1}{4} pos=4n+1. The quartile value is calculated as a simple linear interpolation .
- third quartile Q3: Calculate the third and fourth points pos, Then calculate the value of the third quartile . among , There is no unified standard for the calculation of the third quartile position , But there are usually two ways to calculate : Mode one : p o s = 1 + 3 ∗ ( n − 1 ) 4 pos=1+\frac{3*(n-1)}{4} pos=1+43∗(n−1); Mode two : p o s = 3 ( n + 1 ) 4 pos=\frac{3(n+1)}{4} pos=43(n+1). The quartile value is calculated as a simple linear interpolation .
3. Example ( Take mode 1 as an example ):
The data is :num=[1,2,3,4,5,6,7,8], Data length n=8
- Median Q2=(num[4]+num[5])/2=(4+5)/2=4.5
- First quartile Q1: p o s = 1 + n − 1 4 = 2.75 pos=1+\frac{n-1}{4}=2.75 pos=1+4n−1=2.75; Q 1 = n u m [ 1 ] + 0.75 ∗ ( n u m [ 2 ] − n u m [ 1 ] ) = 1 + 0.75 ∗ ( 2 − 1 ) = 2.75 Q1 = num[1]+0.75*(num[2]-num[1]) =1+0.75*(2-1)=2.75 Q1=num[1]+0.75∗(num[2]−num[1])=1+0.75∗(2−1)=2.75
- third quartile Q3: p o s = p o s = 1 + 3 ∗ ( n − 1 ) 4 = 6.25 pos=pos=1+\frac{3*(n-1)}{4}=6.25 pos=pos=1+43∗(n−1)=6.25; Q 3 = n u m [ 6 ] + 0.25 ∗ ( n u m [ 7 ] − n u m [ 6 ] ) = 6 + 0.25 ∗ ( 7 − 6 ) = 6.25 Q3= num[6]+0.25*(num[7]-num[6]) =6+0.25*(7-6)=6.25 Q3=num[6]+0.25∗(num[7]−num[6])=6+0.25∗(7−6)=6.25
Code :
Median 、 First quartile Q1、 third quartile Q3, The calculation process is shown above ,pandas The display of Zhongxu is : utilize Q3+1.5IQR、Q1-1.5IQR Identify outliers , Take the maximum and minimum of the remaining values as the upper and lower bounds :
import pandas as pd
num =[1,2,3,4,5,6,7,8]
df = pd.DataFrame(num)
boxplot = df.boxplot()
print(df.describe())

import pandas as pd
num =[-5,2,3,4,5,6,7,13]
df = pd.DataFrame(num)
boxplot = df.boxplot()
print(df.describe())

Reference resources :wiki
边栏推荐
- "The faster the code is written, the slower the program runs"
- Sqlserver problem solving: replication components are not installed on this server. Please run SQL Server Setup again and select the option to install replication components
- 【愚公系列】2022年07月 Tabby集成终端的使用
- [Yugong series] July 2022 go teaching course 019 - for circular structure
- Lombok prompts variable log error when using JUnit test in idea
- 0动态规划中等 LeetCode873. 最长的斐波那契子序列的长度
- CeresDAO:Ventures DAO的“新代言”
- "Risking your life to upload" proe/creo product structure design - seam and buckle
- Class notes (5) (1) - 593. Binary search
- [solution] solve the problem of SSH connection being inactive for a long time and being stuck and disconnected
猜你喜欢

Alipay applet authorization / obtaining user information
![[hcip] routing strategy, strategic routing](/img/3d/9389fb441cdd3591595ed2918d928b.png)
[hcip] routing strategy, strategic routing

Plato Farm在Elephant Swap上铸造的ePLATO是什么?

OBS键盘插件自定义diy

MySQL 中的 INSERT 是怎么加锁的?(荣耀典藏版)

Read Plato & nbsp; Eplato of farm and the reasons for its high premium

基于stm32的恒功率无线充电

MySQL's way to solve deadlock - lock analysis of common SQL statements

借助Elephant Swap打造的ePLATO,背后的高溢价解析

Learn this trick and never be afraid to let the code collapse by mistake
随机推荐
Find - block search
retainface使用报错:ModuleNotFoundError: No module named 'rcnn.cython.bbox'
pytorch优化器设置
What is eplato cast by Plato farm on elephant swap?
正则表达式
Flume (5 demos easy to get started)
实际工作中,我是如何使用 Postman 做接口测试?
What can you say to comfort your girlfriend or daughter-in-law
MySQL create stored procedure ------ [hy000][1418] this function has none of deterministic, no SQL
Go learning 01
C # introducing WinAPI to pass the character set of Chinese string parameters
Four common post data submission methods
MySQL 中的 INSERT 是怎么加锁的?(荣耀典藏版)
并发编程的三大核心问题(荣耀典藏版)
Unity 保存图片到相册以及权限管理
【愚公系列】2022年07月 Go教学课程 019-循环结构之for
Promise from introduction to mastery (Chapter 2 understanding and use of promise)
Soft test - database (2) relational model
Detailed explanation of the lock algorithm of MySQL lock series (glory Collection Edition)
Wechat campus bathroom reservation applet graduation design finished product (3) background function