当前位置:网站首页>False label aggregation
False label aggregation
2022-08-03 20:47:00 【Mark_Aussie】
Pseudo Label is a concept in semi-supervised learning, which can help models learn better from unlabeled information.
Compared with full unsupervised learning, semi-supervised learning has partial labeled data and a large amount of unlabeled data, which is more suitable for real-world and competition scenarios.
In reality, there is less labeled data and more unlabeled data;
In the competition, the training set is marked, but the test set is not marked;
Pseudo-labeling is one of the methods in semi-supervised learning. The idea is as follows: first use the existing labeled data to train the model; use the trained model to predict the unlabeled data;The predicted labels and data are added to the training set for training;

Not all unlabeled data are predicted and trained together,
If there are relatively few labeled datasets in the initial stage, there should not be too many pseudo-labels added each time;
The above process of predicting and adding training to unlabeled data is performed iteratively, not once.
Pseudo tags are not a panacea in competitions. Generally, pseudo tags are suitable for:
Unstructured data, common use of deep learning;
When the accuracy of the model is high, the added pseudo tags are accurate;
In the competition, according to Kaggle's mechanism pseudo-tags can be divided into:
Non-Kernel competition: offline pseudo-label, offline prediction, pseudo-label training, and then prediction;
Kernel competition: online pseudo-label, online prediction, pseudo-label retraining prediction, then prediction;
Pseudo tags and soft tags:
Pseudo Label predicts unlabeled data and performs secondary training;
Soft Label converts labels to discrete values for secondary training;
Soft labels are generally used in model distillation and training of some datasets, allowing the model to learn the overall class distribution of the sample.At the same time, compared with the hard label (Hard Label), the soft label can prevent the model from overfitting, and can be used together with the mixup.
Soft labels and pseudo labels are used at the same time; in the picture below, the original label of the photo is car, but the photo also has the category of person. If the hard label is used directly for training, it will bring a certain amount of model noise.The predicted probability results of the model (probability distribution of each type) can be used to replace the labels of the original pictures for training, so that the labels of the pictures are more reasonable and the model training process will be more stable.

If there is no other way to increase points in the competition, it is recommended to try pseudo tags, otherwise it is not recommended to try;
Pseudo-tags are suitable for deep learning methods, and samples with high prediction execution are generally selected for training;
Whether pseudo-tags can be used depends on the organizer's regulations;
Semi-supervised training process:
Step 1: Train the model with labeled data
Step 2: Use the trained model to predict labels for unlabeled data
Step 3: Retrain the model using both the pseudo and labeled datasets, and the model is used for final predictions on the test data.
Reference: Kaggle Knowledge Point: Pseudo Label Pseudo Label -Motian Wheel
边栏推荐
猜你喜欢

YARN功能介绍、交互流程及调度策略

Engineering Effectiveness Governance for Agile Delivery

5 款漏洞扫描工具:实用、强力、全面(含开源)

化算力为战力:宁夏中卫的数字化转型启示录

火了十几年的零信任,为啥还不能落地

Power button 206 - reverse list - the list

ARMuseum

From September 1st, my country has granted zero-tariff treatment to 98% of tax items from 16 countries including Togo

canvas螺旋动画js特效

通关剑指 Offer——剑指 Offer II 009. 乘积小于 K 的子数组
随机推荐
C51 存储类型与存储模式
通关剑指 Offer——剑指 Offer II 009. 乘积小于 K 的子数组
数学之美 第六章——信息的度量和作用
DDD 中的几个困难问题
glusterfs 搭建使用
leetcode 剑指 Offer 58 - II. 左旋转字符串
在树莓派上搭建属于自己的网页(4)
收藏-即时通讯(IM)开源项目OpenIM-功能手册
极验深知v2分析
5 款漏洞扫描工具:实用、强力、全面(含开源)
canvas螺旋动画js特效
EasyCVR平台海康摄像头语音对讲功能配置的3个注意事项
力扣59-螺旋矩阵 II——边界判断
2022/08/03 学习笔记 (day23)多线程(补充)
CheckBox列表项选中动画js特效
tRNA修饰2-甲基胞嘧啶(m2C)|tRNA修饰m2G (N2-methylguanosine)
leetcode 136. Numbers that appear only once (XOR!!)
ES6简介及let、var、const区别
Abs (), fabs () and LABS ()
leetcode 16.01. 交换数字(不使用临时变量交换2个数的值)