当前位置:网站首页>False label aggregation
False label aggregation
2022-08-03 20:47:00 【Mark_Aussie】
Pseudo Label is a concept in semi-supervised learning, which can help models learn better from unlabeled information.
Compared with full unsupervised learning, semi-supervised learning has partial labeled data and a large amount of unlabeled data, which is more suitable for real-world and competition scenarios.
In reality, there is less labeled data and more unlabeled data;
In the competition, the training set is marked, but the test set is not marked;
Pseudo-labeling is one of the methods in semi-supervised learning. The idea is as follows: first use the existing labeled data to train the model; use the trained model to predict the unlabeled data;The predicted labels and data are added to the training set for training;
Not all unlabeled data are predicted and trained together,
If there are relatively few labeled datasets in the initial stage, there should not be too many pseudo-labels added each time;
The above process of predicting and adding training to unlabeled data is performed iteratively, not once.
Pseudo tags are not a panacea in competitions. Generally, pseudo tags are suitable for:
Unstructured data, common use of deep learning;
When the accuracy of the model is high, the added pseudo tags are accurate;
In the competition, according to Kaggle's mechanism pseudo-tags can be divided into:
Non-Kernel competition: offline pseudo-label, offline prediction, pseudo-label training, and then prediction;
Kernel competition: online pseudo-label, online prediction, pseudo-label retraining prediction, then prediction;
Pseudo tags and soft tags:
Pseudo Label predicts unlabeled data and performs secondary training;
Soft Label converts labels to discrete values for secondary training;
Soft labels are generally used in model distillation and training of some datasets, allowing the model to learn the overall class distribution of the sample.At the same time, compared with the hard label (Hard Label), the soft label can prevent the model from overfitting, and can be used together with the mixup.
Soft labels and pseudo labels are used at the same time; in the picture below, the original label of the photo is car, but the photo also has the category of person. If the hard label is used directly for training, it will bring a certain amount of model noise.The predicted probability results of the model (probability distribution of each type) can be used to replace the labels of the original pictures for training, so that the labels of the pictures are more reasonable and the model training process will be more stable.
If there is no other way to increase points in the competition, it is recommended to try pseudo tags, otherwise it is not recommended to try;
Pseudo-tags are suitable for deep learning methods, and samples with high prediction execution are generally selected for training;
Whether pseudo-tags can be used depends on the organizer's regulations;
Semi-supervised training process:
Step 1: Train the model with labeled data
Step 2: Use the trained model to predict labels for unlabeled data
Step 3: Retrain the model using both the pseudo and labeled datasets, and the model is used for final predictions on the test data.
Reference: Kaggle Knowledge Point: Pseudo Label Pseudo Label -Motian Wheel
边栏推荐
- 【HiFlow】经常忘记签到怎么办?使用腾讯云场景连接器每天提醒你。
- Go语言类型与接口的关系
- Mapper输出数据中文乱码
- leetcode 899. 有序队列
- Leetcode sword refers to Offer 15. 1 in the binary number
- Alexa染料标记RNA核糖核酸|RNA-Alexa 514|RNA-Alexa 488|RNA-Alexa 430
- Leetcode 899. An orderly queue
- alicloud3搭建wordpress
- ES6--residual parameters
- Often forget HiFlow 】 【 check-in?Use tencent cloud scenario connector to remind you every day.
猜你喜欢
Why BI software can't handle correlation analysis
Orcad Capture Cadence 新建原理图多部分smybol和Homogeneous、Heterogeneous类型介绍教程
在树莓派上搭建属于自己的网页(4)
461. 汉明距离
EMQX Newsletter 2022-07|EMQX 5.0 正式发布、EMQX Cloud 新增 2 个数据库集成
数学之美 第六章——信息的度量和作用
chartjs自定义柱状图插件
通关剑指 Offer——剑指 Offer II 009. 乘积小于 K 的子数组
tidyverse based on data.table?
AWTK开发编译环境踩坑记录1(编译提示powershell.exe出错)
随机推荐
leetcode 1837. The sum of the digits in the K-base representation
简单又有效的基本折线图制作方法
canvas螺旋动画js特效
alicloud3搭建wordpress
C51 存储类型与存储模式
leetcode 136. 只出现一次的数字(异或!!)
业界新标杆!阿里开源自研高并发编程核心笔记(2022 最新版)
leetcode 448. Find All Numbers Disappeared in an Array 找到所有数组中消失的数字(简单)
leetcode 125. 验证回文串
极验深知v2分析
DDD 中的几个困难问题
svg胶囊药样式切换按钮
Power button - 203 - remove the list elements linked list
ES6 - Arrow Functions
算法--交错字符串(Kotlin)
leetcode 1837. K 进制表示下的各位数字总和
华为设备配置VRRP与BFD联动实现快速切换
chart.js多条曲线图插件
Advantages and Disadvantages of Blind and Buried Via PCB Stacked Via Design
leetcode 072. Finding Square Roots