当前位置:网站首页>False label aggregation
False label aggregation
2022-08-03 20:47:00 【Mark_Aussie】
Pseudo Label is a concept in semi-supervised learning, which can help models learn better from unlabeled information.
Compared with full unsupervised learning, semi-supervised learning has partial labeled data and a large amount of unlabeled data, which is more suitable for real-world and competition scenarios.
In reality, there is less labeled data and more unlabeled data;
In the competition, the training set is marked, but the test set is not marked;
Pseudo-labeling is one of the methods in semi-supervised learning. The idea is as follows: first use the existing labeled data to train the model; use the trained model to predict the unlabeled data;The predicted labels and data are added to the training set for training;

Not all unlabeled data are predicted and trained together,
If there are relatively few labeled datasets in the initial stage, there should not be too many pseudo-labels added each time;
The above process of predicting and adding training to unlabeled data is performed iteratively, not once.
Pseudo tags are not a panacea in competitions. Generally, pseudo tags are suitable for:
Unstructured data, common use of deep learning;
When the accuracy of the model is high, the added pseudo tags are accurate;
In the competition, according to Kaggle's mechanism pseudo-tags can be divided into:
Non-Kernel competition: offline pseudo-label, offline prediction, pseudo-label training, and then prediction;
Kernel competition: online pseudo-label, online prediction, pseudo-label retraining prediction, then prediction;
Pseudo tags and soft tags:
Pseudo Label predicts unlabeled data and performs secondary training;
Soft Label converts labels to discrete values for secondary training;
Soft labels are generally used in model distillation and training of some datasets, allowing the model to learn the overall class distribution of the sample.At the same time, compared with the hard label (Hard Label), the soft label can prevent the model from overfitting, and can be used together with the mixup.
Soft labels and pseudo labels are used at the same time; in the picture below, the original label of the photo is car, but the photo also has the category of person. If the hard label is used directly for training, it will bring a certain amount of model noise.The predicted probability results of the model (probability distribution of each type) can be used to replace the labels of the original pictures for training, so that the labels of the pictures are more reasonable and the model training process will be more stable.

If there is no other way to increase points in the competition, it is recommended to try pseudo tags, otherwise it is not recommended to try;
Pseudo-tags are suitable for deep learning methods, and samples with high prediction execution are generally selected for training;
Whether pseudo-tags can be used depends on the organizer's regulations;
Semi-supervised training process:
Step 1: Train the model with labeled data
Step 2: Use the trained model to predict labels for unlabeled data
Step 3: Retrain the model using both the pseudo and labeled datasets, and the model is used for final predictions on the test data.
Reference: Kaggle Knowledge Point: Pseudo Label Pseudo Label -Motian Wheel
边栏推荐
- 为什么 BI 软件都搞不定关联分析
- TweenMax.js向日葵表情变化
- 手动输入班级人数及成绩求总成绩和平均成绩?
- 云服务器如何安全使用本地的AD/LDAP?
- leetcode 136. Numbers that appear only once (XOR!!)
- RNA核糖核酸修饰RNA-HiLyte FluorTM 405荧光染料|RNA-HiLyte FluorTM 405
- 盲埋孔PCB叠孔设计的利与弊
- 【HiFlow】经常忘记签到怎么办?使用腾讯云场景连接器每天提醒你。
- RNA核糖核酸修饰荧光染料|HiLyte Fluor 488/555/594/647/680/750标记RNA核糖核酸
- ES6-箭头函数
猜你喜欢

通关剑指 Offer——剑指 Offer II 009. 乘积小于 K 的子数组

在树莓派上搭建属于自己的网页(3)

15 years experience in software architect summary: in the field of ML, tread beginners, five hole

盲埋孔PCB叠孔设计的利与弊

leetcode 231. Powers of 2
[email protected] 594/[email prote"/>RNA核糖核酸修饰Alexa 568/[email protected] 594/[email prote

双线性插值公式推导及Matlab实现

【HiFlow】经常忘记签到怎么办?使用腾讯云场景连接器每天提醒你。

【使用 Pytorch 实现入门级的人工神经网络】

canvas螺旋动画js特效
随机推荐
2022/08/03 学习笔记 (day23)多线程(补充)
Lecture topics and guest blockbuster, TDengine developers conference to promote data technology "broken"
详解虚拟机!京东大佬出品 HotSpot VM 源码剖析笔记(附完整源码)
染料修饰核酸RNA|[email protected] 610/[email protected] 594/Alexa 56
收藏-即时通讯(IM)开源项目OpenIM-功能手册
力扣206-反转链表——链表
2022年强网杯rcefile wp
业界新标杆!阿里开源自研高并发编程核心笔记(2022 最新版)
ES6-箭头函数
canvas螺旋动画js特效
为什么 BI 软件都搞不定关联分析
leetcode 268. Missing Numbers (XOR!!)
回忆三年浮沉
leetcode 1837. The sum of the digits in the K-base representation
火了十几年的零信任,为啥还不能落地
独立站卖家在哪些平台做社交媒体营销效果最好?
系统运维系列 之CSV文件读取时内容中包含逗号的处理方法
Markdown syntax
Cesium 修改鼠标样式
分分钟教你读取 resources 目录下的文件路径