当前位置:网站首页>The data of pandas was scrambled and the training machine and testing machine set were selected
The data of pandas was scrambled and the training machine and testing machine set were selected
2020-11-06 01:27:00 【Elementary school students in IT field】
describe
In machine learning , To get a pile of training data, we usually need to divide the data into training set and test set , Or cut it into training sets 、 Cross validation sets and test sets , In order to avoid bias in feature distribution of the segmented dataset , We need to scramble the data first , Make the data random , And then it's cutting .
The methods to be used are as follows :
notes :df Representing one pd.DataFrame
df = df.sample(frac=1.0): Press 100% The proportion of sampling is to achieve the effect of disrupting data
df = df.reset_index(): After scrambling the data index It's also messy , If your index If there is no characteristic meaning , Just reset it , Otherwise, we will put index Add a new column , Generate meaningless index
train = df.loc[0:a]: Carry out segmentation operation , The proportion depends on the situation
cv = df.loc[a+1:b]:
test = df.loc[b+1:-1]:
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- Leetcode's ransom letter
- 深度揭祕垃圾回收底層,這次讓你徹底弄懂她
- Do not understand UML class diagram? Take a look at this edition of rural love class diagram, a learn!
- 6.3 handlerexceptionresolver exception handling (in-depth analysis of SSM and project practice)
- 从海外进军中国,Rancher要执容器云市场牛耳 | 爱分析调研
- Just now, I popularized two unique skills of login to Xuemei
- [C / C + + 1] clion configuration and running C language
- 前端工程师需要懂的前端面试题(c s s方面)总结(二)
- 一篇文章带你了解CSS3图片边框
- Summary of common algorithms of linked list
猜你喜欢
随机推荐
100元扫货阿里云是怎样的体验?
深度揭祕垃圾回收底層,這次讓你徹底弄懂她
ES6 essence:
Calculation script for time series data
6.5 request to view name translator (in-depth analysis of SSM and project practice)
NLP model Bert: from introduction to mastery (1)
Computer TCP / IP interview 10 even asked, how many can you withstand?
Relationship between business policies, business rules, business processes and business master data - modern analysis
Nodejs crawler captures ancient books and records, a total of 16000 pages, experience summary and project sharing
PN8162 20W PD快充芯片,PD快充充电器方案
Python3 e-learning case 4: writing web proxy
一篇文章带你了解CSS3图片边框
在大规模 Kubernetes 集群上实现高 SLO 的方法
This article will introduce you to jest unit test
Classical dynamic programming: complete knapsack problem
一篇文章带你了解CSS 分页实例
Vuejs development specification
小程序入门到精通(二):了解小程序开发4个重要文件
中小微企业选择共享办公室怎么样?
华为云“四个可靠”的方法论