当前位置:网站首页>What is one hot code? Why use it and when?
What is one hot code? Why use it and when?
2022-07-28 18:47:00 【Xiaobai learns vision】
Click on the above “ Xiaobai studies vision ”, Optional plus " Star standard " or “ Roof placement ”
Heavy dry goods , First time delivery Reading guide
When you're playing ML When modeling , You will meet this anywhere “One hot encoding” The term of .

When you're playing ML When modeling , You will meet this anywhere “One hot encoding” The term . You can see a one hot Encoder sklearn file , It says “ Use one-hot That is to say one-of- k Pattern coding classifies integer features ”. Not very clear , Right ? Or at least not for me . Let's see one hot What is coding .
One hot The coding method is to convert the classified variables into a form , This form can be provided to ML Algorithm , In order to better predict .
Suppose the data set is as follows :
╔════════════╦═════════════════╦════════╗
║ CompanyName Categoricalvalue ║ Price ║
╠════════════╬═════════════════╣════════║
║ VW ╬ 1 ║ 20000 ║
║ Acura ╬ 2 ║ 10011 ║
║ Honda ╬ 3 ║ 50000 ║
║ Honda ╬ 3 ║ 10000 ║
╚════════════╩═════════════════╩════════╝The classification value represents the value of entries in the dataset . for example : If there is another company in the data set , Its classification value should be 4. As the number of unique entries increases , The classification value increases accordingly .
The above table is just a representation . actually , The classification value is from 0 Start all the way to N-1 Categories .
You may already know , have access to sklearn Of LabelEncoder Complete classification value allocation .
Now let's go back to one hot code : Let's say that we follow sklearn According to the instructions given in the document one hot code , Then do some cleaning , Finally, the following results are obtained :
╔════╦══════╦══════╦════════╦
║ VW ║ Acura║ Honda║ Price ║
╠════╬══════╬══════╬════════╬
║ 1 ╬ 0 ╬ 0 ║ 20000 ║
║ 0 ╬ 1 ╬ 0 ║ 10011 ║
║ 0 ╬ 0 ╬ 1 ║ 50000 ║
║ 0 ╬ 0 ╬ 1 ║ 10000 ║
╚════╩══════╩══════╩════════╝ 0 Does not exist ,1 Indicates presence . Before we go any further , Can you think of a reason ? Why is it not enough to use tag coding to train the model ? Why one hot code ?
The problem with tag coding is , It assumes that the higher the category value , The better the category .“ wait , what ! ?”
Let me explain : The premise of this form of organization is based on the value of analogy ,VW > Acura > Honda. Suppose your model calculates an average internally , So we get ,1+3 = 4/2 =2. It means :VW and Honda The average level of is Acura. This is definitely a disaster . There will be many errors in the prediction of this model .
That's why we use one hot Encoder to execute class “ Two valued ”, And take it as a feature to train the model .
Another example : Suppose you have a “flower” features , It's acceptable “daffodil”、“lily” and “rose” Value . One one hot Coding will “flower” The feature is transformed into three features ,“is_daffodil”、“is_lily” and “is_rose”, They are all binary .
See the picture below :

The good news !
Xiaobai learns visual knowledge about the planet
Open to the outside world

download 1:OpenCV-Contrib Chinese version of extension module
stay 「 Xiaobai studies vision 」 Official account back office reply : Extension module Chinese course , You can download the first copy of the whole network OpenCV Extension module tutorial Chinese version , Cover expansion module installation 、SFM Algorithm 、 Stereo vision 、 Target tracking 、 Biological vision 、 Super resolution processing and other more than 20 chapters .
download 2:Python Visual combat project 52 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :Python Visual combat project , You can download, including image segmentation 、 Mask detection 、 Lane line detection 、 Vehicle count 、 Add Eyeliner 、 License plate recognition 、 Character recognition 、 Emotional tests 、 Text content extraction 、 Face recognition, etc 31 A visual combat project , Help fast school computer vision .
download 3:OpenCV Actual project 20 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :OpenCV Actual project 20 speak , You can download the 20 Based on OpenCV Realization 20 A real project , Realization OpenCV Learn advanced .
Communication group
Welcome to join the official account reader group to communicate with your colleagues , There are SLAM、 3 d visual 、 sensor 、 Autopilot 、 Computational photography 、 testing 、 Division 、 distinguish 、 Medical imaging 、GAN、 Wechat groups such as algorithm competition ( It will be subdivided gradually in the future ), Please scan the following micro signal clustering , remarks :” nickname + School / company + Research direction “, for example :” Zhang San + Shanghai Jiaotong University + Vision SLAM“. Please note... According to the format , Otherwise, it will not pass . After successful addition, they will be invited to relevant wechat groups according to the research direction . Please do not send ads in the group , Or you'll be invited out , Thanks for your understanding ~边栏推荐
- Is it difficult for novices to change careers through self-study software testing?
- EasyNLP中文文图生成模型带你秒变艺术家
- @Autowired与@Resource区别
- MySQL index usage and optimization
- Calibration of vector network analyzer (vector network)
- LeetCode_ 63_ Different paths II
- UE5 GAS 学习笔记 1.3属性Attribute
- 1.1、稀疏数组
- LVS manual
- Composition and principle of vector network analyzer (vector network)
猜你喜欢

Random talk on GIS data (VI) - projection coordinate system

MYSQL入门与进阶(八)

2022-07-27 study notes of group 4 self-cultivation class (every day)

1.1. Sparse array

npm 无法将“npm”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写,如果包括路径,请确保路径正确,然后再试一次。

三分钟了解快来新媒体

mysql 索引使用与优化

Docker搭建Mysql主从复制

1.3、链表

面试官:ThreadLocal使用场景有哪些?内存泄露问题如何避免?
随机推荐
SQL Server stuff and for XML path
@Autowired与@Resource区别
Multithreading and high concurrency -- source code analysis AQS principle
高德地图实现自定义小蓝点 自定义点标记 绘制多边形/圆形区域 根据地图的移动显示或者隐藏自定义点标记的相关实现
408 review strategy (strengthening stage)
Introduction and advanced level of MySQL (9)
Meta Q2财报:营收首次下滑,Metaverse将与苹果竞争
Introduction and advanced level of MySQL (5)
数字化洪流 :企业转型中的资源重组与战略冲突
Record your interview experience in Xiamen for two years -- Conclusion
1.2、队列
EasyNLP中文文图生成模型带你秒变艺术家
从 SRE 看 DevOps 建设
面试官:ThreadLocal使用场景有哪些?内存泄露问题如何避免?
MYSQL入门与进阶(五)
Tencent Tang Daosheng: open source is a new mode of production and collaboration in the era of industrial Internet
当Golang遇到高并发秒杀
记录自己在厦门两年来的面试经历--完结篇
.net WCF wf4.5 state machine, bookmark and persistence
Golang concurrent lock