当前位置:网站首页>Embedding cutting-edge understanding
Embedding cutting-edge understanding
2022-07-31 06:14:00 【Young_win】
Content from https://mp.weixin.qq.com/s/j34nJGomvR23ZJiqIFMoAQ
Q: With massive sparse features, how to find a good feature Embedding expression?
(1) For Item Embedding in sequence behavior, what kind of Embedding expression is better?
(2) For the recommendation model of non-behavioral sequences, with regard to feature Embedding, the usual practice is to use the Embedding Size of the feature as a super-parameter, and manually test to find a good Embedding size.However, is there a better way?
A1: Res-embedding for Deep Learning Based Click-Through Rate Prediction Modeling.
Res-embedding first proved theoretically that the generalization error of the neural network CTR model is closely related to the distribution of Items in the Embedding space. If the Items with similar user interests, the smaller the envelope radius in the Embedding space, the smaller the envelope radius.That is to say, the more compact the items of the same interest are in the embedding space, and the smaller the cluster radius is, the smaller the model generalization error is, that is, the better the model's generalization ability.This conclusion is very meaningful.Because this conclusion can be used to constrain Item Embedding in the training process to make it meet certain conditions, so as to increase the model ability.On the basis of this conclusion, Res-embedding proposes a more general method: For Item Embedding with similar user interests, we let it consist of two parts superimposed, one is the interest center shared by all Items belonging to this interestCentral Embedding, the other is the residual Residual Embedding of the Item itself.
A2: Neural Input Search for Large Scale Recommendation Models (NIS).
First imagine a relatively perfect feature Embedding allocation scheme. If it exists, it should look like this: For high-frequency features, a longer Embedding size can be assigned to it, so that it can be encoded and expressed more fullyinformation.For low-frequency features, it is desirable to assign a shorter Embedding, because for low-frequency features, it appears less frequently in the training data. If a longer Embedding is assigned, overfitting is more likely to occur, which affects the generalization performance of the model.For those very low-frequency features, there is basically nothing to learn, but it will bring all kinds of noise, so we can not allocate or let them share a public Embedding.How big is the decision or search space of the scheme in the figure, it is obvious that each step has 5 choices and 4 decision steps, so the size of the decision space is 5 to the 4th power, which means that there are so many allocation schemes, and ENAS passes a certainThe AUC evaluation index performance of each allocation scheme under the validation set data and the size of the embedding space consumed by the scheme are used to evaluate the pros and cons of each decision-making scheme.We definitely encourage solutions with good performance of validation set indicators and less space consumption, and Reward of reinforcement learning is designed with this idea.Through this mode, a reinforcement learning scheme can be designed to find the optimal Embedding scheme.
边栏推荐
- cocos2d-x-3.2 Physics
- MYSQL transaction and lock problem handling
- ERROR Error: No module factory availabl at Object.PROJECT_CONFIG_JSON_NOT_VALID_OR_NOT_EXIST ‘Error
- Podspec verification dependency error problem pod lib lint , need to specify the source
- VS2017连接MYSQL
- 微信小程序源码获取与反编译方式
- 变分自编码器VAE实现MNIST数据集生成by Pytorch
- MW:3400 4-Arm PEG-DSPE 四臂-聚乙二醇-磷脂一种饱和的18碳磷脂
- Pytorch学习笔记7——处理多维特征的输入
- cocos create EditBox 输入文字被刘海屏遮挡修改
猜你喜欢

The browser looks for events bound or listened to by js

2022 SQL big factory high-frequency practical interview questions (detailed analysis)

Tencent Cloud GPU Desktop Server Driver Installation

Pytorch学习笔记13——Basic_RNN

使用 OpenCV 提取图像的 HOG、SURF 及 LBP 特征 (含代码)

Principle analysis of famous website msdn.itellyou.cn

VS2017 connects to MYSQL

wangeditor编辑器内容传至后台服务器存储

unicloud cloud development record

Xiaomi mobile phone SMS location service activation failed
随机推荐
Chinese garbled solution in UTF-8 environment in Powershell
qt:cannot open C:\Users\某某某\AppData\Local\Temp\main.obj.15576.16.jom for write
Cholesterol-PEG-DBCO 胆固醇-聚乙二醇-二苯基环辛炔化学试剂
cocos create EditBox 输入文字被刘海屏遮挡修改
A simple bash to powershell case
quick-3.5 无法正常显示有混合纹理的csb文件
超参数优化-摘抄
pyspark.ml特征变换模块
DSPE-PEG-COOH CAS:1403744-37-5 磷脂-聚乙二醇-羧基脂质PEG共轭物
Podspec verification dependency error problem pod lib lint , need to specify the source
Flutter mixed development module dependencies
Jupyter内核正忙、内核挂掉
科研试剂Cholesterol-PEG-Maleimide,CLS-PEG-MAL,胆固醇-聚乙二醇-马来酰亚胺
微信小程序启动优化
break and continue exit in js
Numpy常用函数
VS connects to MYSQL through ODBC (2)
Cholesterol-PEG-Acid CLS-PEG-COOH 胆固醇-聚乙二醇-羧基修饰肽类化合物
ROS 之订阅多个topic时间同步问题
VS通过ODBC连接MYSQL(一)