当前位置:网站首页>The data of pandas was scrambled and the training machine and testing machine set were selected

The data of pandas was scrambled and the training machine and testing machine set were selected

2020-11-06 01:27:00 Elementary school students in IT field


In machine learning , To get a pile of training data, we usually need to divide the data into training set and test set , Or cut it into training sets 、 Cross validation sets and test sets , In order to avoid bias in feature distribution of the segmented dataset , We need to scramble the data first , Make the data random , And then it's cutting .
The methods to be used are as follows :
notes :df Representing one pd.DataFrame

df = df.sample(frac=1.0): Press 100% The proportion of sampling is to achieve the effect of disrupting data

df = df.reset_index(): After scrambling the data index It's also messy , If your index If there is no characteristic meaning , Just reset it , Otherwise, we will put index Add a new column , Generate meaningless index

train = df.loc[0:a]: Carry out segmentation operation , The proportion depends on the situation

cv = df.loc[a+1:b]:

test = df.loc[b+1:-1]:

本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢