当前位置:网站首页>What is label encoding? How to distinguish and use one hot encoding and label encoding?
What is label encoding? How to distinguish and use one hot encoding and label encoding?
2022-07-03 15:01:00 【Hali_ Botebie】
List of articles
What is? Label encoding
label encoding It means encoding with tags , That is, the original eigenvalue is encoded into a customized digital tag to complete the quantization coding process .
give an example :
If there are three color characteristics : red 、 yellow 、 blue . In the use of machine learning algorithms, it is generally necessary to carry out vectorization or digitization . Then you may want to make red =1, yellow =2, blue =3. So this actually implements tag coding , That is to label different categories .
characteristic
- advantage : It solves the problem of classification and coding , Quantitative numbers can be defined freely . But it is also a disadvantage , Because the value itself has no meaning , It's just sorting . For example, the code of large, medium and small is 123, It can also be coded as 321, That is, the value is meaningless .
- shortcoming : Poor interpretability . Such as the [dog,cat,dog,mouse,cat], We turn it into [1,2,1,3,2], There is a strange phenomenon :dog and mouse The average value of is cat. therefore ,Label encoding Coding doesn't have a wide application scenario .
one-hot encoding ,label encoding How to distinguish and use the two codes ?
1. Characteristic data type
For categorical data , It is recommended to use one-hot encoding. Classification is pure classification , Don't order , There's no logic . For example, gender is divided into male and female , There is no logical relationship between men and women , We can't say that men are better than women , Or vice versa . also , The classification of provinces and cities in China can also use the unique hot coding , Similarly, there is no logical relationship between provinces , Use at this time one-hot encoding It's better to meet you . But pay attention to , Generally, one variable is left out , For example, the opposite of men must be women , So women are repeating information , So just keep one of the variables .
For ordered data , It is recommended to use label encoding. Ordering types are also classifications , But there's sort logic , Higher in rank than in class . such as , Education is divided into primary schools , Junior high school , high school , Undergraduate , Graduate student , There is a certain logic between the categories , Obviously graduate education is the highest , Primary school is the lowest . Use at this time Label encoding It would be more appropriate , Because the custom number order can not destroy the original logic , And corresponding to this logic .
2. The model used
Models that are sensitive to numerical size must use one-hotencoding. A typical example is LR and SVM. The loss function of the two is sensitive to the numerical value , And the numerical value between variables is significant . and Label encoding There is no meaning of numerical value in the digital code of , It's just a sort of ordering , So for all of these models one-hot encoding.
Models insensitive to numerical size ( Like a tree model ) Not recommended one-hotencoding. Generally, this kind of model is tree model . If there are too many categories , that one-hot encoding It splits out a lot of characteristic variables . Now , If we limit the depth of the tree model and can't split it down , Some characteristic variables may be abandoned because the model can not continue to split . therefore , In this case, we can consider using Label encoding.
The above two considerations need to be considered comprehensively , Instead of judging alone . That is to say, we need to choose the coding method according to the data type and model .
————————————————
Copyright notice : This paper is about CSDN Blogger 「 Plain paper and breeze 」 The original article of , follow CC 4.0 BY-SA Copyright agreement , For reprint, please attach the original source link and this statement .
Link to the original text :https://blog.csdn.net/weixin_45834085/article/details/102991983
边栏推荐
- Yolov5系列(一)——網絡可視化工具netron
- [opengl] pre bake using computational shaders
- Zzuli:1052 sum of sequence 4
- Tensor ellipsis (three points) slice
- [combinatorics] permutation and combination (set combination, one-to-one correspondence model analysis example)
- [opengl] geometry shader
- C language to realize mine sweeping
- Zzuli:1040 sum of sequence 1
- Zero copy underlying analysis
- Leetcode sword offer find the number I (nine) in the sorted array
猜你喜欢

Série yolov5 (i) - - netron, un outil de visualisation de réseau

There are links in the linked list. Can you walk three steps faster or slower

Unity hierarchical bounding box AABB tree
![[graphics] efficient target deformation animation based on OpenGL es 3.0](/img/53/852ac569c930bc419846ac209c8d47.jpg)
[graphics] efficient target deformation animation based on OpenGL es 3.0

Remote server background hangs nohup

Centos7 deployment sentry redis (with architecture diagram, clear and easy to understand)

4-24--4-28

【注意力机制】【首篇ViT】DETR,End-to-End Object Detection with Transformers网络的主要组成是CNN和Transformer

Implement Gobang with C language

How does vs+qt set the software version copyright, obtain the software version and display the version number?
随机推荐
.NET六大设计原则个人白话理解,有误请大神指正
Write a 2-minute countdown.
Use of form text box (I) select text
Pytoch deep learning and target detection practice notes
Adobe Premiere Pro 15.4 has been released. It natively supports Apple M1 and adds the function of speech to text
4-33--4-35
Zzuli:1053 sine function
[engine development] in depth GPU and rendering optimization (basic)
How to color ordinary landscape photos, PS tutorial
Chapter 14 class part 1
Zzuli: cumulative sum of 1050 factorials
Leetcode sword offer find the number I (nine) in the sorted array
406. 根据身高重建队列
Yolov5系列(一)——網絡可視化工具netron
Troubleshooting method of CPU surge
什么是one-hot encoding?Pytorch中,将label变成one hot编码的两种方式
牛客 BM83 字符串變形(大小寫轉換,字符串反轉,字符串替換)
Zzuli:1044 failure rate
[engine development] rendering architecture and advanced graphics programming
Zzuli:1059 highest score