当前位置:网站首页>The difference between stratifiedkfold (classification) and kfold (regression)
The difference between stratifiedkfold (classification) and kfold (regression)
2022-07-03 13:14:00 【Levi Bebe】
One 、 StratifiedKFlod And KFlod The main difference
StratifiedKFlod: Stratified sampling , The comparison of samples in each category in the training set and the test set is the same as that in the original data ;( Classification problem )
KFlod: Stratified sampling , Divide the data into training set and test set , Whether the data of each category in the training set and the test set are the same is not considered ;( The return question )
from sklearn.model_selection import KFold,StratifiedKFold
KFold(n_split, shuffle, random_state)
Parameters :
n_splits: It means dividing the data into several equal parts
shuffle: In each division , Whether to shuffle
if False, The effect is equivalent to random_state Integers ( With zero ), The result of each division is the same
if True, The result of each division is different , Indicates that the cards have been shuffled , Random sampling
random_state: Random seed number , When the set value ( It's usually 0) It is convenient to adjust parameters after , Because the data set generated each time is the same
stratifiedKFold(n_split, shuffle, random_state)
Parameters :
n_splits: It means dividing the data into several equal parts
shuffle: In each division , Whether to shuffle
if False, The effect is equivalent to random_state Integers ( With zero ), The result of each division is the same
if True, The result of each division is different , Indicates that the cards have been shuffled , Random sampling
random_state: Random seed number , When the set value ( It's usually 0) It is convenient to adjust parameters after , Because the data set generated each time is the same
Two 、 StratifiedKFlod And KFlod Different cases
import numpy as np
from sklearn.model_selection import KFold,StratifiedKFold
X=np.array([
[1,2,3,4],
[11,12,13,14],
[21,22,23,24],
[31,32,33,34],
[41,42,43,44],
[51,52,53,54],
[61,62,63,64],
[71,72,73,74]
])
y=np.array([1,1,0,0,1,1,0,0])
KFold = KFold(n_splits=4,shuffle=True,random_state=2021)
StratifiedKFold = StratifiedKFold(n_splits=4,shuffle=True,random_state=2021)
print('---------------------KFlod---------------------------')
for train, test in KFold.split(X,y):
print('Train: %s | test: %s' % (train, test))
print(' Training set label type : %s' % y[train])
print(' Test set label type : %s' % y[test])
print('----------------StratifiedKFold----------------------')
for train, test in StratifiedKFold.split(X,y):
print('Train: %s | test: %s' % (train, test))
print(' Training set label type : %s' % y[train])
print(' Test set label type : %s' % y[test])
# The input result is as follows
''' ---------------------KFlod--------------------------- Train: [0 1 2 4 5 6] | test: [3 7] Training set label type : [1 1 0 1 1 0] Test set label type : [0 0] Train: [0 1 3 4 5 7] | test: [2 6] Training set label type : [1 1 0 1 1 0] Test set label type : [0 0] Train: [2 3 4 5 6 7] | test: [0 1] Training set label type : [0 0 1 1 0 0] Test set label type : [1 1] Train: [0 1 2 3 6 7] | test: [4 5] Training set label type : [1 1 0 0 0 0] Test set label type : [1 1] ----------------StratifiedKFold---------------------- Train: [0 1 2 3 4 6] | test: [5 7] Training set label type : [1 1 0 0 1 0] Test set label type : [1 0] Train: [0 1 2 3 5 7] | test: [4 6] Training set label type : [1 1 0 0 1 0] Test set label type : [1 0] Train: [0 3 4 5 6 7] | test: [1 2] Training set label type : [1 0 1 1 0 0] Test set label type : [1 0] Train: [1 2 4 5 6 7] | test: [0 3] Training set label type : [1 0 1 1 0 0] Test set label type : [1 0] '''


summary :
KFlod It is applicable to user regression type data division
stratifiedKFlod Applicable to classification data division
Reference resources :
https://blog.csdn.net/qq_34107425/article/details/105548800
https://blog.csdn.net/wqh_jingsong/article/details/77896449
边栏推荐
- Harmonic current detection based on synchronous coordinate transformation
- 2022-02-14 analysis of the startup and request processing process of the incluxdb cluster Coordinator
- Flink SQL knows why (VIII): the wonderful way to parse Flink SQL tumble window
- 【数据库原理及应用教程(第4版|微课版)陈志泊】【第三章习题】
- [Database Principle and Application Tutorial (4th Edition | wechat Edition) Chen Zhibo] [sqlserver2012 comprehensive exercise]
- [Exercice 5] [principe de la base de données]
- OpenHarmony应用开发之ETS开发方式中的Image组件
- 【习题六】【数据库原理】
- Slf4j log facade
- SQL learning notes (I)
猜你喜欢

Node. Js: use of express + MySQL

【R】【密度聚类、层次聚类、期望最大化聚类】

有限状态机FSM

Quick learning 1.8 front and rear interfaces

【数据挖掘复习题】
![[colab] [7 methods of using external data]](/img/cf/07236c2887c781580e6f402f68608a.png)
[colab] [7 methods of using external data]

已解决TypeError: Argument ‘parser‘ has incorrect type (expected lxml.etree._BaseParser, got type)

sitesCMS v3.1.0发布,上线微信小程序

Flink SQL knows why (XV): changed the source code and realized a batch lookup join (with source code attached)

Sword finger offer 12 Path in matrix
随机推荐
[judgment question] [short answer question] [Database Principle]
C graphical tutorial (Fourth Edition)_ Chapter 13 entrustment: what is entrustment? P238
Flink SQL knows why (19): the transformation between table and datastream (with source code)
[exercise 6] [Database Principle]
CVPR 2022 图像恢复论文
Two solutions of leetcode101 symmetric binary tree (recursion and iteration)
Logback log framework
Seven habits of highly effective people
Finite State Machine FSM
剑指 Offer 16. 数值的整数次方
Sword finger offer 14- ii Cut rope II
剑指 Offer 12. 矩阵中的路径
Analysis of the influence of voltage loop on PFC system performance
[Database Principle and Application Tutorial (4th Edition | wechat Edition) Chen Zhibo] [Chapter III exercises]
SSH login server sends a reminder
Flink SQL knows why (VIII): the wonderful way to parse Flink SQL tumble window
Method overloading and rewriting
Sword finger offer 12 Path in matrix
Integer case study of packaging
stm32和电机开发(从mcu到架构设计)