当前位置:网站首页>Exploratory data analysis of heartbeat signal
Exploratory data analysis of heartbeat signal
2022-07-07 23:06:00 【Anny Linlin】
One 、 understand EDA
First, what is exploratory data analysis ? And the purpose of exploratory data analysis ?
Exploratory data analysis refers to the analysis of existing data ( Especially raw data from investigation or observation ) Explore with as few prior assumptions as possible , By drawing 、 Tabulation 、 Equation fitting 、 A data analysis method to explore the structure and law of data by calculating characteristic quantity . Guide data science practitioners in data processing and Feature Engineering steps , Make the structure and feature set of data set more reliable for the next prediction problem . It is worth noting that , EDA The process is the characteristics of the original data ( Statistical characteristics 、 Distribution characteristics 、 Correlation, etc ) Mining , But no features are deleted or constructed .
What kind of process is exploratory data analysis ?
1、 Load data science and visualization libraries :
Data Science Database pandas、numpy、scipy;
Visualization Library matplotlib、seabon;
2、 Loading data sets :
Training data and test data , Simple data observation , In general use head and shape.3、 Data overview :
adopt describe() To get familiar with the relevant statistics of data ; adopt info() To get familiar with data types .
4、 Judge whether the data is missing or abnormal
Check the existence of each column nan situation ; Outlier detection .
5、 Understand the distribution of predicted values
General distribution ( Unbounded Johnson distribution, etc ); see skewness and kurtosis; Look at the frequency of the predicted value .
Two 、 Use EDA
1、 Import library
import warnings
warnings.filterwarnings('ignore')
import missingno as msno
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
2、 Loading data sets
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('testA.csv')
train_data.head().append(train_data.tail())
train_data.shape
3、 Data overview
train_data.describe()
train_data.info
4、 Judge the missing value and abnormal data
train_data.isnull().sum()
5、 Understand the distribution of predicted values
train_data['label']
train_data['label'].value_counts()
(1) General distribution :
import scipy.stats as st
y = train_data['label']
plt.subplot(121)
sns.distplot(y,rug=True,bins=20)
plt.subplot(122)
sns.distplot(y,kde=False,fit=st.norm)
plt.subplot(123)
sns.distplot(y,kde=False,fit=st.lognorm)
plt.show()
(2) see skewness and kurtosis
sns.distplot(train_data['label']);
print("Skewness: %f" % train_data['label'].skew())
print("Kurtosis: %f" % train_data['label'].kurt())
train_data.skew(),train_data.kurt()
(3) Look at the frequency of the predicted value
# View the specific frequency of prediction
plt.hist(train_data['label'],orientation='vertical',histtype='bar',color='red')
plt.show()
边栏推荐
- Build an "immune" barrier in the cloud to prepare your data
- Debezium series: introducing support for the final operator
- Leetcode19. Delete the penultimate node of the linked list [double pointer]
- 微生物健康网,如何恢复微生物群落
- CTF练习
- XMIND mind mapping software sharing
- Select sort (illustration +c code)
- Knowledge drop - PCB manufacturing process flow
- Online interview, how to better express yourself? In this way, the passing rate will be increased by 50%~
- Develop those things: go plus c.free to free memory, and what are the reasons for compilation errors?
猜你喜欢
![[record of question brushing] 3 Longest substring without duplicate characters](/img/44/1cd8128d93c9c273e0f4718d84936e.png)
[record of question brushing] 3 Longest substring without duplicate characters

今日创见|企业促进创新的5大关键要素

0-5VAC转4-20mA交流电流隔离变送器/转换模块

Ligne - raisonnement graphique - 4 - classe de lettres

详解全志V853上的ARM A7和RISC-V E907之间的通信方式

Online interview, how to better express yourself? In this way, the passing rate will be increased by 50%~

Micro service remote debug, nocalhost + rainbow micro service development second bullet

GBU1510-ASEMI电源专用15A整流桥GBU1510

数据库每日一题---第22天:最后一次登录

Anta DTC | Anta transformation, building a growth flywheel that is not only FILA
随机推荐
It's no exaggeration to say that this is the most user-friendly basic tutorial of pytest I've ever seen
小程序多种开发方式对比-跨端?低代码?原生?还是云开发?
iNFTnews | Web5 vs Web3:未来是一个过程,而不是目的地
Unity technical notes (II) basic functions of scriptableobject
Years of summary, some core suggestions for learning programming
Debezium系列之:支持 mysql8 的 set role 语句
LeetCode203. Remove linked list elements
行测-图形推理-5-一笔画类
Yarn开启ACL用户认证之后无法查看Yarn历史任务日志解决办法
PCL . VTK files and Mutual conversion of PCD
This time, let's clear up: synchronous, asynchronous, blocking, non blocking
Use JfreeChart to generate curves, histograms, pie charts, and distribution charts and display them to JSP-1
What is fake sharing after filling the previous hole?
Debezium系列之:源码阅读之SnapshotReader
Understand the session, cookie and token at one time, and the interview questions are all finalized
Knowledge drop - PCB manufacturing process flow
7-18 simple simulation of banking business queue
“拧巴”的早教行业:万亿市场,难出巨头
行测-图形推理-8-图群类
Redis集群安装