当前位置:网站首页>The data of pandas was scrambled and the training machine and testing machine set were selected
The data of pandas was scrambled and the training machine and testing machine set were selected
2020-11-06 01:27:00 【Elementary school students in IT field】
describe
In machine learning , To get a pile of training data, we usually need to divide the data into training set and test set , Or cut it into training sets 、 Cross validation sets and test sets , In order to avoid bias in feature distribution of the segmented dataset , We need to scramble the data first , Make the data random , And then it's cutting .
The methods to be used are as follows :
notes :df Representing one pd.DataFrame
df = df.sample(frac=1.0): Press 100% The proportion of sampling is to achieve the effect of disrupting data
df = df.reset_index(): After scrambling the data index It's also messy , If your index If there is no characteristic meaning , Just reset it , Otherwise, we will put index Add a new column , Generate meaningless index
train = df.loc[0:a]: Carry out segmentation operation , The proportion depends on the situation
cv = df.loc[a+1:b]:
test = df.loc[b+1:-1]:
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- (2)ASP.NET Core3.1 Ocelot路由
- Nodejs crawler captures ancient books and records, a total of 16000 pages, experience summary and project sharing
- Grouping operation aligned with specified datum
- 合约交易系统开发|智能合约交易平台搭建
- Advanced Vue component pattern (3)
- Skywalking series blog 5-apm-customize-enhance-plugin
- Common algorithm interview has been out! Machine learning algorithm interview - KDnuggets
- 阿里云Q2营收破纪录背后,云的打开方式正在重塑
- 一篇文章带你了解CSS对齐方式
- 一篇文章教会你使用HTML5 SVG 标签
猜你喜欢

数字城市响应相关国家政策大力发展数字孪生平台的建设

Filecoin的经济模型与未来价值是如何支撑FIL币价格破千的

前端基础牢记的一些操作-Github仓库管理

阿里云Q2营收破纪录背后,云的打开方式正在重塑

CCR炒币机器人:“比特币”数字货币的大佬,你不得不了解的知识

关于Kubernetes 与 OAM 构建统一、标准化的应用管理平台知识!(附网盘链接)

Filecoin主网上线以来Filecoin矿机扇区密封到底是什么意思

Character string and memory operation function in C language

Examples of unconventional aggregation

Python download module to accelerate the implementation of recording
随机推荐
git rebase的時候捅婁子了,怎麼辦?線上等……
Electron application uses electronic builder and electronic updater to realize automatic update
Common algorithm interview has been out! Machine learning algorithm interview - KDnuggets
vue任意关系组件通信与跨组件监听状态 vue-communication
做外包真的很难,身为外包的我也无奈叹息。
Skywalking series blog 5-apm-customize-enhance-plugin
TRON智能钱包PHP开发包【零TRX归集】
Architecture article collection
6.5 request to view name translator (in-depth analysis of SSM and project practice)
I'm afraid that the spread sequence calculation of arbitrage strategy is not as simple as you think
容联完成1.25亿美元F轮融资
How to select the evaluation index of classification model
数字城市响应相关国家政策大力发展数字孪生平台的建设
Brief introduction and advantages and disadvantages of deepwalk model
Network security engineer Demo: the original * * is to get your computer administrator rights! 【***】
小程序入门到精通(二):了解小程序开发4个重要文件
PHPSHE 短信插件说明
Do not understand UML class diagram? Take a look at this edition of rural love class diagram, a learn!
Relationship between business policies, business rules, business processes and business master data - modern analysis
Every day we say we need to do performance optimization. What are we optimizing?