当前位置:网站首页>False label aggregation
False label aggregation
2022-08-03 20:47:00 【Mark_Aussie】
Pseudo Label is a concept in semi-supervised learning, which can help models learn better from unlabeled information.
Compared with full unsupervised learning, semi-supervised learning has partial labeled data and a large amount of unlabeled data, which is more suitable for real-world and competition scenarios.
In reality, there is less labeled data and more unlabeled data;
In the competition, the training set is marked, but the test set is not marked;
Pseudo-labeling is one of the methods in semi-supervised learning. The idea is as follows: first use the existing labeled data to train the model; use the trained model to predict the unlabeled data;The predicted labels and data are added to the training set for training;

Not all unlabeled data are predicted and trained together,
If there are relatively few labeled datasets in the initial stage, there should not be too many pseudo-labels added each time;
The above process of predicting and adding training to unlabeled data is performed iteratively, not once.
Pseudo tags are not a panacea in competitions. Generally, pseudo tags are suitable for:
Unstructured data, common use of deep learning;
When the accuracy of the model is high, the added pseudo tags are accurate;
In the competition, according to Kaggle's mechanism pseudo-tags can be divided into:
Non-Kernel competition: offline pseudo-label, offline prediction, pseudo-label training, and then prediction;
Kernel competition: online pseudo-label, online prediction, pseudo-label retraining prediction, then prediction;
Pseudo tags and soft tags:
Pseudo Label predicts unlabeled data and performs secondary training;
Soft Label converts labels to discrete values for secondary training;
Soft labels are generally used in model distillation and training of some datasets, allowing the model to learn the overall class distribution of the sample.At the same time, compared with the hard label (Hard Label), the soft label can prevent the model from overfitting, and can be used together with the mixup.
Soft labels and pseudo labels are used at the same time; in the picture below, the original label of the photo is car, but the photo also has the category of person. If the hard label is used directly for training, it will bring a certain amount of model noise.The predicted probability results of the model (probability distribution of each type) can be used to replace the labels of the original pictures for training, so that the labels of the pictures are more reasonable and the model training process will be more stable.

If there is no other way to increase points in the competition, it is recommended to try pseudo tags, otherwise it is not recommended to try;
Pseudo-tags are suitable for deep learning methods, and samples with high prediction execution are generally selected for training;
Whether pseudo-tags can be used depends on the organizer's regulations;
Semi-supervised training process:
Step 1: Train the model with labeled data
Step 2: Use the trained model to predict labels for unlabeled data
Step 3: Retrain the model using both the pseudo and labeled datasets, and the model is used for final predictions on the test data.
Reference: Kaggle Knowledge Point: Pseudo Label Pseudo Label -Motian Wheel
边栏推荐
猜你喜欢
随机推荐
How can a cloud server safely use local AD/LDAP?
Power button - 203 - remove the list elements linked list
RNA核糖核酸修饰RNA-HiLyte FluorTM 405荧光染料|RNA-HiLyte FluorTM 405
数据库定时备份winserver2012篇
leetcode 268. Missing Numbers (XOR!!)
基于data.table的tidyverse?
Cesium 修改鼠标样式
CLIP论文解读
Orcad Capture Cadence 新建原理图多部分smybol和Homogeneous、Heterogeneous类型介绍教程
2022-8-3 第七组 潘堂智 锁、多线程
化算力为战力:宁夏中卫的数字化转型启示录
ES6-箭头函数
chartjs自定义柱状图插件
用 setTimeout 来实现 setInterval
在树莓派上搭建属于自己的网页(4)
信使mRNA甲基化偶联3-甲基胞嘧啶(m3C)|mRNA-m3C
nvm的使用 nodejs版本管理,解决用户名是汉字的问题
leetcode 2119. Numbers reversed twice
Mapper输出数据中文乱码
leetcode 326. Powers of 3








