当前位置:网站首页>Exploratory data analysis of heartbeat signal
Exploratory data analysis of heartbeat signal
2022-07-07 23:06:00 【Anny Linlin】
One 、 understand EDA
First, what is exploratory data analysis ? And the purpose of exploratory data analysis ?
Exploratory data analysis refers to the analysis of existing data ( Especially raw data from investigation or observation ) Explore with as few prior assumptions as possible , By drawing 、 Tabulation 、 Equation fitting 、 A data analysis method to explore the structure and law of data by calculating characteristic quantity . Guide data science practitioners in data processing and Feature Engineering steps , Make the structure and feature set of data set more reliable for the next prediction problem . It is worth noting that , EDA The process is the characteristics of the original data ( Statistical characteristics 、 Distribution characteristics 、 Correlation, etc ) Mining , But no features are deleted or constructed .
What kind of process is exploratory data analysis ?
1、 Load data science and visualization libraries :
Data Science Database pandas、numpy、scipy;
Visualization Library matplotlib、seabon;
2、 Loading data sets :
Training data and test data , Simple data observation , In general use head and shape.3、 Data overview :
adopt describe() To get familiar with the relevant statistics of data ; adopt info() To get familiar with data types .
4、 Judge whether the data is missing or abnormal
Check the existence of each column nan situation ; Outlier detection .
5、 Understand the distribution of predicted values
General distribution ( Unbounded Johnson distribution, etc ); see skewness and kurtosis; Look at the frequency of the predicted value .
Two 、 Use EDA
1、 Import library
import warnings
warnings.filterwarnings('ignore')
import missingno as msno
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
2、 Loading data sets
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('testA.csv')
train_data.head().append(train_data.tail())
train_data.shape
3、 Data overview
train_data.describe()
train_data.info
4、 Judge the missing value and abnormal data
train_data.isnull().sum()
5、 Understand the distribution of predicted values
train_data['label']
train_data['label'].value_counts()
(1) General distribution :
import scipy.stats as st
y = train_data['label']
plt.subplot(121)
sns.distplot(y,rug=True,bins=20)
plt.subplot(122)
sns.distplot(y,kde=False,fit=st.norm)
plt.subplot(123)
sns.distplot(y,kde=False,fit=st.lognorm)
plt.show()
(2) see skewness and kurtosis
sns.distplot(train_data['label']);
print("Skewness: %f" % train_data['label'].skew())
print("Kurtosis: %f" % train_data['label'].kurt())
train_data.skew(),train_data.kurt()
(3) Look at the frequency of the predicted value
# View the specific frequency of prediction
plt.hist(train_data['label'],orientation='vertical',histtype='bar',color='red')
plt.show()
边栏推荐
- Digital transformation: five steps to promote enterprise progress
- Unity technical notes (I) inspector extension
- Signal feature extraction +lstm to realize gear reducer fault diagnosis -matlab code
- Debezium系列之:源码阅读之SnapshotReader
- How to operate DTC community?
- Two minutes, talk about some wrong understandings of MySQL index
- iNFTnews | Web5 vs Web3:未来是一个过程,而不是目的地
- Redis集群安装
- Quick sort (diagram +c code)
- Debezium series: introducing support for the final operator
猜你喜欢
Transform XL translation
消息队列与快递柜之间妙不可言的关系
Digital transformation: five steps to promote enterprise progress
Line test - graphic reasoning -7- different graphic classes
Are the microorganisms in the intestines the same as those on the skin?
Gbu1510-asemi power supply special 15A rectifier bridge gbu1510
不夸张地说,这是我见过最通俗易懂的,pytest入门基础教程
Database daily question --- day 22: last login
Microbial Health Network, How to restore Microbial Communities
聊聊 Dart 的空安全 (null safety) 特性
随机推荐
Cascade-LSTM: A Tree-Structured Neural Classifier for Detecting Misinformation Cascades-KDD2020
【测试面试题】页面很卡的原因分析及解决方案
Microbial Health Network, How to restore Microbial Communities
Yarn cannot view the historical task log of yarn after enabling ACL user authentication. Solution
One question per day - pat grade B 1002 questions
The wonderful relationship between message queue and express cabinet
Redis集群安装
Transform XL translation
详解全志V853上的ARM A7和RISC-V E907之间的通信方式
ASP. Net core introduction V
Leetcode interview question 02.07 Linked list intersection [double pointer]
iNFTnews | NFT技术的广泛应用及其存在的问题
Leetcode94. Middle order traversal of binary trees
Years of summary, some core suggestions for learning programming
Debezium series: support the use of variables in the Kill Command
微生物健康网,如何恢复微生物群落
Database daily question --- day 22: last login
嵌入式音频开发中的两种曲线
知识点滴 - PCB制造工艺流程
Circumvention Technology: Registry