当前位置:网站首页>The data of pandas was scrambled and the training machine and testing machine set were selected
The data of pandas was scrambled and the training machine and testing machine set were selected
2020-11-06 01:27:00 【Elementary school students in IT field】
describe
In machine learning , To get a pile of training data, we usually need to divide the data into training set and test set , Or cut it into training sets 、 Cross validation sets and test sets , In order to avoid bias in feature distribution of the segmented dataset , We need to scramble the data first , Make the data random , And then it's cutting .
The methods to be used are as follows :
notes :df Representing one pd.DataFrame
df = df.sample(frac=1.0): Press 100% The proportion of sampling is to achieve the effect of disrupting data
df = df.reset_index(): After scrambling the data index It's also messy , If your index If there is no characteristic meaning , Just reset it , Otherwise, we will put index Add a new column , Generate meaningless index
train = df.loc[0:a]: Carry out segmentation operation , The proportion depends on the situation
cv = df.loc[a+1:b]:
test = df.loc[b+1:-1]:
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- How to select the evaluation index of classification model
- What is the difference between data scientists and machine learning engineers? - kdnuggets
- How long does it take you to work out an object-oriented programming interview question from Ali school?
- Serilog原始碼解析——使用方法
- Synchronous configuration from git to consult with git 2consul
- High availability cluster deployment of jumpserver: (6) deployment of SSH agent module Koko and implementation of system service management
- “颜值经济”的野望:华熙生物净利率六连降,收购案遭上交所问询
- 6.1.2 handlermapping mapping processor (2) (in-depth analysis of SSM and project practice)
- Leetcode's ransom letter
- TRON智能钱包PHP开发包【零TRX归集】
猜你喜欢

速看!互联网、电商离线大数据分析最佳实践!(附网盘链接)

Don't go! Here is a note: picture and text to explain AQS, let's have a look at the source code of AQS (long text)

2018中国云厂商TOP5:阿里云、腾讯云、AWS、电信、联通 ...

华为云“四个可靠”的方法论

PN8162 20W PD快充芯片,PD快充充电器方案

How to select the evaluation index of classification model

Linked blocking Queue Analysis of blocking queue

ipfs正舵者Filecoin落地正当时 FIL币价格破千来了

Python download module to accelerate the implementation of recording

从海外进军中国,Rancher要执容器云市场牛耳 | 爱分析调研
随机推荐
Python3 e-learning case 4: writing web proxy
Python crawler actual combat details: crawling home of pictures
JVM memory area and garbage collection
Just now, I popularized two unique skills of login to Xuemei
[event center azure event hub] interpretation of error information found in event hub logs
(2)ASP.NET Core3.1 Ocelot路由
IPFS/Filecoin合法性:保护个人隐私不被泄露
How do the general bottom buried points do?
中小微企业选择共享办公室怎么样?
Working principle of gradient descent algorithm in machine learning
Summary of common string algorithms
Thoughts on interview of Ali CCO project team
Serilog原始碼解析——使用方法
ES6 essence:
How to become a data scientist? - kdnuggets
Every day we say we need to do performance optimization. What are we optimizing?
在大规模 Kubernetes 集群上实现高 SLO 的方法
Group count - word length
100元扫货阿里云是怎样的体验?
Let the front-end siege division develop independently from the back-end: Mock.js