当前位置:网站首页>Exploratory data analysis of heartbeat signal
Exploratory data analysis of heartbeat signal
2022-07-07 23:06:00 【Anny Linlin】
One 、 understand EDA
First, what is exploratory data analysis ? And the purpose of exploratory data analysis ?
Exploratory data analysis refers to the analysis of existing data ( Especially raw data from investigation or observation ) Explore with as few prior assumptions as possible , By drawing 、 Tabulation 、 Equation fitting 、 A data analysis method to explore the structure and law of data by calculating characteristic quantity . Guide data science practitioners in data processing and Feature Engineering steps , Make the structure and feature set of data set more reliable for the next prediction problem . It is worth noting that , EDA The process is the characteristics of the original data ( Statistical characteristics 、 Distribution characteristics 、 Correlation, etc ) Mining , But no features are deleted or constructed .
What kind of process is exploratory data analysis ?
1、 Load data science and visualization libraries :
Data Science Database pandas、numpy、scipy;
Visualization Library matplotlib、seabon;
2、 Loading data sets :
Training data and test data , Simple data observation , In general use head and shape.3、 Data overview :
adopt describe() To get familiar with the relevant statistics of data ; adopt info() To get familiar with data types .
4、 Judge whether the data is missing or abnormal
Check the existence of each column nan situation ; Outlier detection .
5、 Understand the distribution of predicted values
General distribution ( Unbounded Johnson distribution, etc ); see skewness and kurtosis; Look at the frequency of the predicted value .
Two 、 Use EDA
1、 Import library
import warnings
warnings.filterwarnings('ignore')
import missingno as msno
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
2、 Loading data sets
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('testA.csv')
train_data.head().append(train_data.tail())
train_data.shape
3、 Data overview
train_data.describe()
train_data.info
4、 Judge the missing value and abnormal data
train_data.isnull().sum()
5、 Understand the distribution of predicted values
train_data['label']
train_data['label'].value_counts()
(1) General distribution :
import scipy.stats as st
y = train_data['label']
plt.subplot(121)
sns.distplot(y,rug=True,bins=20)
plt.subplot(122)
sns.distplot(y,kde=False,fit=st.norm)
plt.subplot(123)
sns.distplot(y,kde=False,fit=st.lognorm)
plt.show()
(2) see skewness and kurtosis
sns.distplot(train_data['label']);
print("Skewness: %f" % train_data['label'].skew())
print("Kurtosis: %f" % train_data['label'].kurt())
train_data.skew(),train_data.kurt()
(3) Look at the frequency of the predicted value
# View the specific frequency of prediction
plt.hist(train_data['label'],orientation='vertical',histtype='bar',color='red')
plt.show()
边栏推荐
- 微生物健康網,如何恢複微生物群落
- Understand the session, cookie and token at one time, and the interview questions are all finalized
- Line test - graphic reasoning - 6 - similar graphic classes
- Yarn cannot view the historical task log of yarn after enabling ACL user authentication. Solution
- 详解全志V853上的ARM A7和RISC-V E907之间的通信方式
- Use JfreeChart to generate curves, histograms, pie charts, and distribution charts and display them to JSP-1
- Software test classification
- 2021-01-12
- Yarn开启ACL用户认证之后无法查看Yarn历史任务日志解决办法
- 开发那些事儿:Go加C.free释放内存,编译报错是什么原因?
猜你喜欢
Leetcode19. Delete the penultimate node of the linked list [double pointer]
Knowledge drop - PCB manufacturing process flow
iNFTnews | NFT技术的广泛应用及其存在的问题
今日创见|企业促进创新的5大关键要素
【刷题记录】3. 无重复字符的最长子串
肠道里的微生物和皮肤上的一样吗?
Cases of agile innovation and transformation of consumer goods enterprises
Leetcode1984. Minimum difference in student scores
The author of LinkedList said he didn't use LinkedList himself
Cascade-LSTM: A Tree-Structured Neural Classifier for Detecting Misinformation Cascades-KDD2020
随机推荐
Unity FAQ (I) lack of references
详解全志V853上的ARM A7和RISC-V E907之间的通信方式
双非大厂测试员亲述:对测试员来说,学历重要吗?
Txt file virus
Build an "immune" barrier in the cloud to prepare your data
Online interview, how to better express yourself? In this way, the passing rate will be increased by 50%~
oc 可变參数传递
7-51 combination of two ordered linked list sequences
[language programming] exe virus code example
[network] Introduction to C language
Time convolution Network + soft threshold + attention mechanism to realize residual life prediction of mechanical equipment
行測-圖形推理-4-字母類
30讲 线性代数 第五讲 特征值与特征向量
Basic knowledge of linked list
PCL . VTK files and Mutual conversion of PCD
GBU1510-ASEMI电源专用15A整流桥GBU1510
Digital collections accelerated out of the circle, and marsnft helped diversify the culture and tourism economy!
Debezium series: introducing support for the final operator
行测-图形推理-6-相似图形类
Software evaluation center ▏ what are the basic processes and precautions for automated testing?