当前位置:网站首页>[data processing] boxplot drawing
[data processing] boxplot drawing
2022-07-28 02:33:00 【HoveXb】
1. summary
boxplot Is in 1977 By American statistician John · tukey (John Tukey) Invented , It contains five basic elements :
- minimum value (Q0, The first 0 The quartile ): The minimum value after removing outliers
- Maximum (Q4): The maximum value after removing outliers
- Median : All values in the sample are arranged from small to large, and then 50% The number of
- First quartile (Q1): also called “ Lower quartile ”, It is equal to the number of all values in the sample arranged from small to large 25% The number of .
- third quartile (Q3): also called “ Larger quartile ”, It is equal to the number of all values in the sample arranged from small to large 75% The number of .
In addition to the above 5 Two basic elements , Four minute spacing (InterQuartile Range,IQR) It is also often used for boxplot Construction , It is defined as the difference between the third quartile and the first quartile , namely IQR=Q3-Q1.
boxplot The figure of is as follows , It consists of boxes (box) And whiskers (whiskers ) constitute , among , The upper and lower boundaries of the box are Q3、Q1 constitute , The middle of the box is separated by the median . Must be (whiskers ) There are many variations on the upper and lower boundaries of .
A standard must be defined as : The maximum and minimum values in the dataset ;
There is also : Adopt upper limit =Q3+1.5IQR, Lower limit =Q1-1.5IQR, As the upper and lower boundaries of whiskers , Points outside this boundary are considered outliers .
In addition, there are, for example, the following ways to define boundaries :
- The minimum and the maximum value of the data set
- One standard deviation above and below the mean of the data set
- The 9th percentile and the 91st percentile of the data set
- The 2nd percentile and the 98th percentile of the data set

2. Calculation
Premise : Sort the data
Set the data length as n
- Median Q2: It is the calculation of the median in the general statistical sense (n Take the middle of odd numbers ,n Average the middle two values for even numbers )
- First quartile Q1: Calculate the first quarter position pos, Then calculate the value of the first quartile . among , There is no unified standard for the calculation of the first quartile position , But there are usually two ways to calculate : Mode one : p o s = 1 + n − 1 4 pos=1+\frac{n-1}{4} pos=1+4n−1; Mode two : p o s = n + 1 4 pos=\frac{n+1}{4} pos=4n+1. The quartile value is calculated as a simple linear interpolation .
- third quartile Q3: Calculate the third and fourth points pos, Then calculate the value of the third quartile . among , There is no unified standard for the calculation of the third quartile position , But there are usually two ways to calculate : Mode one : p o s = 1 + 3 ∗ ( n − 1 ) 4 pos=1+\frac{3*(n-1)}{4} pos=1+43∗(n−1); Mode two : p o s = 3 ( n + 1 ) 4 pos=\frac{3(n+1)}{4} pos=43(n+1). The quartile value is calculated as a simple linear interpolation .
3. Example ( Take mode 1 as an example ):
The data is :num=[1,2,3,4,5,6,7,8], Data length n=8
- Median Q2=(num[4]+num[5])/2=(4+5)/2=4.5
- First quartile Q1: p o s = 1 + n − 1 4 = 2.75 pos=1+\frac{n-1}{4}=2.75 pos=1+4n−1=2.75; Q 1 = n u m [ 1 ] + 0.75 ∗ ( n u m [ 2 ] − n u m [ 1 ] ) = 1 + 0.75 ∗ ( 2 − 1 ) = 2.75 Q1 = num[1]+0.75*(num[2]-num[1]) =1+0.75*(2-1)=2.75 Q1=num[1]+0.75∗(num[2]−num[1])=1+0.75∗(2−1)=2.75
- third quartile Q3: p o s = p o s = 1 + 3 ∗ ( n − 1 ) 4 = 6.25 pos=pos=1+\frac{3*(n-1)}{4}=6.25 pos=pos=1+43∗(n−1)=6.25; Q 3 = n u m [ 6 ] + 0.25 ∗ ( n u m [ 7 ] − n u m [ 6 ] ) = 6 + 0.25 ∗ ( 7 − 6 ) = 6.25 Q3= num[6]+0.25*(num[7]-num[6]) =6+0.25*(7-6)=6.25 Q3=num[6]+0.25∗(num[7]−num[6])=6+0.25∗(7−6)=6.25
Code :
Median 、 First quartile Q1、 third quartile Q3, The calculation process is shown above ,pandas The display of Zhongxu is : utilize Q3+1.5IQR、Q1-1.5IQR Identify outliers , Take the maximum and minimum of the remaining values as the upper and lower bounds :
import pandas as pd
num =[1,2,3,4,5,6,7,8]
df = pd.DataFrame(num)
boxplot = df.boxplot()
print(df.describe())

import pandas as pd
num =[-5,2,3,4,5,6,7,13]
df = pd.DataFrame(num)
boxplot = df.boxplot()
print(df.describe())

Reference resources :wiki
边栏推荐
- [understanding of opportunity -53]: Yang Mou stands up and plots to defend himself
- 智能合约安全——selfdestruct攻击
- what‘s the meaning of “rc“ in release name
- Manual installation of Dlib Library
- Clear the cause of floating and six methods (solve the problem that floating affects the parent element and the global)
- CeresDAO:Ventures DAO的“新代言”
- Flume (5 demos easy to get started)
- From prediction to decision-making, Chapter 9 Yunji datacanvas launched the ylearn causal learning open source project
- Canvas 从入门到劝朋友放弃(图解版)
- The cooperation between starfish OS and metabell is just the beginning
猜你喜欢

Detailed explanation of the lock algorithm of MySQL lock series (glory Collection Edition)

regular expression

"Risking your life to upload" proe/creo product structure design - seam and buckle

MySQL 中的 INSERT 是怎么加锁的?(荣耀典藏版)

Under the new retail format, retail e-commerce RPA helps reshape growth

Plato Farm在Elephant Swap上铸造的ePLATO是什么?

【ROS进阶篇】第十讲 基于Gazebo的URDF集成仿真流程及实例

Promise from getting started to mastering (Chapter 3: customize (handwriting) promise)

MySQL高可用和主从同步

【HCIP】路由策略、策略路由
随机推荐
Go learning 01
[understanding of opportunity -53]: Yang Mou stands up and plots to defend himself
[advanced ROS chapter] Lecture 10 gadf integrated simulation process and examples based on gazebo
Explore flex basis
Common SQL statement query
Understand the "next big trend" in the encryption industry - ventures Dao
Pytorch optimizer settings
C # introducing WinAPI to pass the character set of Chinese string parameters
From prediction to decision-making, Chapter 9 Yunji datacanvas launched the ylearn causal learning open source project
Appium click operation sorting
[hcip] routing strategy, strategic routing
Learn this trick and never be afraid to let the code collapse by mistake
APP如何上架App Store?
ps 简单使用
Three core issues of concurrent programming (glory Collection Edition)
软工必备知识点
Say yes, I will love you, and I will love you well
重要安排-DX12引擎开发课程后续直播将在B站进行
【愚公系列】2022年07月 Tabby集成终端的使用
How to put app on the app store?