当前位置:网站首页>Exploratory data analysis of heartbeat signal
Exploratory data analysis of heartbeat signal
2022-07-07 23:06:00 【Anny Linlin】
One 、 understand EDA
First, what is exploratory data analysis ? And the purpose of exploratory data analysis ?
Exploratory data analysis refers to the analysis of existing data ( Especially raw data from investigation or observation ) Explore with as few prior assumptions as possible , By drawing 、 Tabulation 、 Equation fitting 、 A data analysis method to explore the structure and law of data by calculating characteristic quantity . Guide data science practitioners in data processing and Feature Engineering steps , Make the structure and feature set of data set more reliable for the next prediction problem . It is worth noting that , EDA The process is the characteristics of the original data ( Statistical characteristics 、 Distribution characteristics 、 Correlation, etc ) Mining , But no features are deleted or constructed .
What kind of process is exploratory data analysis ?
1、 Load data science and visualization libraries :
Data Science Database pandas、numpy、scipy;
Visualization Library matplotlib、seabon;
2、 Loading data sets :
Training data and test data , Simple data observation , In general use head and shape.3、 Data overview :
adopt describe() To get familiar with the relevant statistics of data ; adopt info() To get familiar with data types .
4、 Judge whether the data is missing or abnormal
Check the existence of each column nan situation ; Outlier detection .
5、 Understand the distribution of predicted values
General distribution ( Unbounded Johnson distribution, etc ); see skewness and kurtosis; Look at the frequency of the predicted value .
Two 、 Use EDA
1、 Import library
import warnings
warnings.filterwarnings('ignore')
import missingno as msno
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
2、 Loading data sets
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('testA.csv')
train_data.head().append(train_data.tail())
train_data.shape
3、 Data overview
train_data.describe()
train_data.info
4、 Judge the missing value and abnormal data
train_data.isnull().sum()
5、 Understand the distribution of predicted values
train_data['label']
train_data['label'].value_counts()
(1) General distribution :
import scipy.stats as st
y = train_data['label']
plt.subplot(121)
sns.distplot(y,rug=True,bins=20)
plt.subplot(122)
sns.distplot(y,kde=False,fit=st.norm)
plt.subplot(123)
sns.distplot(y,kde=False,fit=st.lognorm)
plt.show()
(2) see skewness and kurtosis
sns.distplot(train_data['label']);
print("Skewness: %f" % train_data['label'].skew())
print("Kurtosis: %f" % train_data['label'].kurt())
train_data.skew(),train_data.kurt()
(3) Look at the frequency of the predicted value
# View the specific frequency of prediction
plt.hist(train_data['label'],orientation='vertical',histtype='bar',color='red')
plt.show()
边栏推荐
- 微生物健康网,如何恢复微生物群落
- Debezium系列之:支持 mysql8 的 set role 语句
- Cascade-LSTM: A Tree-Structured Neural Classifier for Detecting Misinformation Cascades-KDD2020
- Microservice Remote debug, nocalhost + rainbond microservice Development second Bomb
- Debezium series: binlogreader for source code reading
- 数据库每日一题---第22天:最后一次登录
- GBU1510-ASEMI电源专用15A整流桥GBU1510
- Leetcode206. Reverse linked list
- Line test - graphic reasoning - 4 - alphabetic class
- Line test - graphic reasoning -7- different graphic classes
猜你喜欢
消息队列与快递柜之间妙不可言的关系
ASP. Net core introduction V
行测-图形推理-5-一笔画类
行测-图形推理-9-线条问题类
How to operate DTC community?
行測-圖形推理-4-字母類
一次搞明白 Session、Cookie、Token,面试问题全稿定
Unity and webgl love each other
Cascade-LSTM: A Tree-Structured Neural Classifier for Detecting Misinformation Cascades-KDD2020
[record of question brushing] 3 Longest substring without duplicate characters
随机推荐
QT graphicsview graphical view usage summary with flow chart development case prototype
Introduction to anomaly detection
Debezium series: set role statement supporting mysql8
行测-图形推理-9-线条问题类
Yarn cannot view the historical task log of yarn after enabling ACL user authentication. Solution
详解全志V853上的ARM A7和RISC-V E907之间的通信方式
【测试面试题】页面很卡的原因分析及解决方案
Online interview, how to better express yourself? In this way, the passing rate will be increased by 50%~
De la famille debezium: SET ROLE statements supportant mysql8
CTF练习
Software test classification
Leetcode interview question 02.07 Linked list intersection [double pointer]
Unity 动态合并网格纹理
软件测评中心▏自动化测试有哪些基本流程和注意事项?
Txt file virus
Years of summary, some core suggestions for learning programming
Unity FAQ (I) lack of references
Leetcode19. Delete the penultimate node of the linked list [double pointer]
LeetCode203. Remove linked list elements
小程序多种开发方式对比-跨端?低代码?原生?还是云开发?