当前位置:网站首页>Exploratory data analysis of heartbeat signal

Exploratory data analysis of heartbeat signal

2022-07-07 23:06:00 Anny Linlin

One 、 understand EDA

First, what is exploratory data analysis ? And the purpose of exploratory data analysis ?

Exploratory data analysis refers to the analysis of existing data ( Especially raw data from investigation or observation ) Explore with as few prior assumptions as possible , By drawing 、 Tabulation 、 Equation fitting 、 A data analysis method to explore the structure and law of data by calculating characteristic quantity . Guide data science practitioners in data processing and Feature Engineering steps , Make the structure and feature set of data set more reliable for the next prediction problem . It is worth noting that , EDA The process is the characteristics of the original data ( Statistical characteristics 、 Distribution characteristics 、 Correlation, etc ) Mining , But no features are deleted or constructed .

What kind of process is exploratory data analysis ?

1、 Load data science and visualization libraries :
Data Science Database pandas、numpy、scipy;
Visualization Library matplotlib、seabon;

2、 Loading data sets :

Training data and test data , Simple data observation , In general use head and shape.3、 Data overview :
adopt describe() To get familiar with the relevant statistics of data ; adopt info() To get familiar with data types .
4、 Judge whether the data is missing or abnormal
Check the existence of each column nan situation ; Outlier detection .
5、 Understand the distribution of predicted values
General distribution ( Unbounded Johnson distribution, etc ); see skewness and kurtosis; Look at the frequency of the predicted value .
 

Two 、 Use EDA

1、 Import library

import warnings
warnings.filterwarnings('ignore')
import missingno as msno
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np


2、 Loading data sets

train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('testA.csv')

train_data.head().append(train_data.tail())

train_data.shape

3、 Data overview

train_data.describe()

train_data.info


4、 Judge the missing value and abnormal data

train_data.isnull().sum()


5、 Understand the distribution of predicted values

train_data['label']

train_data['label'].value_counts()


(1) General distribution :
import scipy.stats as st
y = train_data['label']
plt.subplot(121)
sns.distplot(y,rug=True,bins=20)
plt.subplot(122)
sns.distplot(y,kde=False,fit=st.norm)
plt.subplot(123)
sns.distplot(y,kde=False,fit=st.lognorm)
plt.show()

(2) see skewness and kurtosis
sns.distplot(train_data['label']);
print("Skewness: %f" % train_data['label'].skew())
print("Kurtosis: %f" % train_data['label'].kurt())

train_data.skew(),train_data.kurt()

(3) Look at the frequency of the predicted value

# View the specific frequency of prediction
plt.hist(train_data['label'],orientation='vertical',histtype='bar',color='red')
plt.show()

 

 

 

 

 

 

原网站

版权声明
本文为[Anny Linlin]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202130601098290.html