当前位置:网站首页>The difference between stratifiedkfold (classification) and kfold (regression)
The difference between stratifiedkfold (classification) and kfold (regression)
2022-07-03 13:14:00 【Levi Bebe】
One 、 StratifiedKFlod And KFlod The main difference
StratifiedKFlod: Stratified sampling , The comparison of samples in each category in the training set and the test set is the same as that in the original data ;( Classification problem )
KFlod: Stratified sampling , Divide the data into training set and test set , Whether the data of each category in the training set and the test set are the same is not considered ;( The return question )
from sklearn.model_selection import KFold,StratifiedKFold
KFold(n_split, shuffle, random_state)
Parameters :
n_splits: It means dividing the data into several equal parts
shuffle: In each division , Whether to shuffle
if False, The effect is equivalent to random_state Integers ( With zero ), The result of each division is the same
if True, The result of each division is different , Indicates that the cards have been shuffled , Random sampling
random_state: Random seed number , When the set value ( It's usually 0) It is convenient to adjust parameters after , Because the data set generated each time is the same
stratifiedKFold(n_split, shuffle, random_state)
Parameters :
n_splits: It means dividing the data into several equal parts
shuffle: In each division , Whether to shuffle
if False, The effect is equivalent to random_state Integers ( With zero ), The result of each division is the same
if True, The result of each division is different , Indicates that the cards have been shuffled , Random sampling
random_state: Random seed number , When the set value ( It's usually 0) It is convenient to adjust parameters after , Because the data set generated each time is the same
Two 、 StratifiedKFlod And KFlod Different cases
import numpy as np
from sklearn.model_selection import KFold,StratifiedKFold
X=np.array([
[1,2,3,4],
[11,12,13,14],
[21,22,23,24],
[31,32,33,34],
[41,42,43,44],
[51,52,53,54],
[61,62,63,64],
[71,72,73,74]
])
y=np.array([1,1,0,0,1,1,0,0])
KFold = KFold(n_splits=4,shuffle=True,random_state=2021)
StratifiedKFold = StratifiedKFold(n_splits=4,shuffle=True,random_state=2021)
print('---------------------KFlod---------------------------')
for train, test in KFold.split(X,y):
print('Train: %s | test: %s' % (train, test))
print(' Training set label type : %s' % y[train])
print(' Test set label type : %s' % y[test])
print('----------------StratifiedKFold----------------------')
for train, test in StratifiedKFold.split(X,y):
print('Train: %s | test: %s' % (train, test))
print(' Training set label type : %s' % y[train])
print(' Test set label type : %s' % y[test])
# The input result is as follows
''' ---------------------KFlod--------------------------- Train: [0 1 2 4 5 6] | test: [3 7] Training set label type : [1 1 0 1 1 0] Test set label type : [0 0] Train: [0 1 3 4 5 7] | test: [2 6] Training set label type : [1 1 0 1 1 0] Test set label type : [0 0] Train: [2 3 4 5 6 7] | test: [0 1] Training set label type : [0 0 1 1 0 0] Test set label type : [1 1] Train: [0 1 2 3 6 7] | test: [4 5] Training set label type : [1 1 0 0 0 0] Test set label type : [1 1] ----------------StratifiedKFold---------------------- Train: [0 1 2 3 4 6] | test: [5 7] Training set label type : [1 1 0 0 1 0] Test set label type : [1 0] Train: [0 1 2 3 5 7] | test: [4 6] Training set label type : [1 1 0 0 1 0] Test set label type : [1 0] Train: [0 3 4 5 6 7] | test: [1 2] Training set label type : [1 0 1 1 0 0] Test set label type : [1 0] Train: [1 2 4 5 6 7] | test: [0 3] Training set label type : [1 0 1 1 0 0] Test set label type : [1 0] '''


summary :
KFlod It is applicable to user regression type data division
stratifiedKFlod Applicable to classification data division
Reference resources :
https://blog.csdn.net/qq_34107425/article/details/105548800
https://blog.csdn.net/wqh_jingsong/article/details/77896449
边栏推荐
- Sitescms v3.1.0 release, launch wechat applet
- Logback 日志框架
- 已解决TypeError: Argument ‘parser‘ has incorrect type (expected lxml.etree._BaseParser, got type)
- An example of newtonjason
- [Database Principle and Application Tutorial (4th Edition | wechat Edition) Chen Zhibo] [Chapter IV exercises]
- Flink code is written like this. It's strange that the window can be triggered (bad programming habits)
- Oracle memory management
- Harmonic current detection based on synchronous coordinate transformation
- 开始报名丨CCF C³[email protected]奇安信:透视俄乌网络战 —— 网络空间基础设施面临的安全对抗与制裁博弈...
- SSH login server sends a reminder
猜你喜欢

When the R language output rmarkdown is in other formats (such as PDF), an error is reported, latex failed to compile stocks Tex. solution

Flink SQL knows why (13): is it difficult to join streams? (next)

Flink SQL knows why (VIII): the wonderful way to parse Flink SQL tumble window

剑指 Offer 14- II. 剪绳子 II

Deeply understand the mvcc mechanism of MySQL

OpenHarmony应用开发之ETS开发方式中的Image组件
![[review questions of database principles]](/img/c3/81d192a40bcc4f5d72fcbe76c708bb.png)
[review questions of database principles]

The upward and downward transformation of polymorphism

Flink SQL knows why (12): is it difficult to join streams? (top)

Four problems and isolation level of MySQL concurrency
随机推荐
Quickly learn member inner classes and local inner classes
stm32和电机开发(从mcu到架构设计)
R语言gt包和gtExtras包优雅地、漂亮地显示表格数据:nflreadr包以及gtExtras包的gt_plt_winloss函数可视化多个分组的输赢值以及内联图(inline plot)
Analysis of the influence of voltage loop on PFC system performance
mysqlbetween实现选取介于两个值之间的数据范围
已解决TypeError: Argument ‘parser‘ has incorrect type (expected lxml.etree._BaseParser, got type)
In the promotion season, how to reduce the preparation time of defense materials by 50% and adjust the mentality (personal experience summary)
Setting up Oracle datagurd environment
When we are doing flow batch integration, what are we doing?
Solve system has not been booted with SYSTEMd as init system (PID 1) Can‘t operate.
Seven second order ladrc-pll structure design of active disturbance rejection controller
SLF4J 日志门面
这本数学书AI圈都在转,资深ML研究员历时7年之作,免费电子版可看
Huffman coding experiment report
Tencent cloud tdsql database delivery and operation and maintenance Junior Engineer - some questions of Tencent cloud cloudlite certification (TCA) examination
【习题五】【数据库原理】
Method overloading and rewriting
C graphical tutorial (Fourth Edition)_ Chapter 18 enumerator and iterator: enumerator samplep340
【数据库原理及应用教程(第4版|微课版)陈志泊】【第六章习题】
Cache penetration and bloom filter