当前位置:网站首页>[advertising system] incremental training & feature access / feature elimination
[advertising system] incremental training & feature access / feature elimination
2022-07-05 10:57:00 【CC‘s World】
One 、 Incremental training
Sometimes there are a lot of training data , Tens of millions are also common . Although tens of millions of people only look at the records, the number is not much , But what if there are hundreds of features , That data set is terrible , If saved as numpy.float type , That's definitely exploding the memory . I'm in this situation , Start to consider incremental training of incremental model .
On very large datasets , There are usually several ways :1. Dimensionality reduction of data ,2. Incremental training , Use streaming or similar streaming processing ,3. Big machine , High memory , Or use spark colony .
Incremental training , In fact, it has the same meaning as online learning , The typical representative of online learning is SGD Optimization of the logistics regress, Initialize parameters with data first , Update the parameters with a data on the line , Although the passage of time , The effect is getting better and better . This avoids the problem of updating the model offline .
Incremental training has two main functions , One is to find ways to use all the data , The other is to find ways to make timely use of new data . It can improve the timeliness of the model 、 Sample size and saving cluster resources .
Recommended scenarios are usually due to the introduction of a large number of ID Class characteristics lead to the existence of a large number of sparse parameters , For example, in classic YouTube DNN In the model , Use the videos watched by users and user history search tokens As the main Embedded features . According to the discussion in the paper ,YouTube DNN in candidate video as well as search tokens There are millions . On this basis, if cross features are used , It will further aggravate the problem of parameter explosion .
Low frequency scenes are recommended ID Class features will also bring the risk of over fitting to the system , In response to this question , We designed feature access / Exit mechanism strategy , It is convenient to preset the expression ability according to the specific model , Adjust the influence of low-frequency sparse parameters on the model .
Two 、 Feature access
In the business scenario , New samples will be produced all the time , New samples bring new features . Some features appear less frequently , If all are added to the model , On the one hand, it is a challenge for memory , On the other hand , Low frequency features will bring over fitting . Therefore, some characteristic access mechanisms will be formulated , Including filtering based on probability , Bloon filters, etc .
The training framework will set feature access for new features “ The threshold ” To prevent frequent access of low-frequency features . We provide two mechanisms to limit access to new features :
- Probability increases , Every time you encounter new features , Generate probability according to the preset distribution , Control feature access ;
- Use Counting Bloom Filter Count the occurrence times of new features , When the number exceeds the threshold , admittance .
The picture above briefly describes CBF Principle , Suppose the capacity is 16, Two hash Function is used as Feature ID To Index Mapping . When querying the characteristic frequency ,Feature1 after Hash Function1 and Hash Function2 Get... Separately Slot 3 and Slot 6, Two Slot Values are 1,Feature The number of occurrences can be regarded as 1.Feature2 after Hash Function1 and Hash Function2 Get... Separately Slot 6 and Slot 15. Two Slot Values, respectively 1 and 0,Feature2 The number of occurrences can be regarded as 0. That is, map to all Slot in Value minimum value .
3、 ... and 、 Feature elimination
Some features will fail if they are not updated for a long time . To relieve memory pressure , Improve the timeliness of the model , Obsolete features need to be eliminated , Make elimination rules .
For features that have been admitted , There are three ways to judge whether it is in the low-frequency state :
- Update time . If a feature has not been updated for a long time , It is considered to have been in a low-frequency state ;
- L2 norm . If a feature L2 The result of norm calculation is too small , It is considered to have been in a low-frequency state ;
- Comprehensive score of statistical value . Support user-defined functions , Through characteristic statistics ( Exposure number , clicks , Number of likes , Number of comments, etc ) To calculate the comprehensive score of features , If the score is less than the threshold, it is considered to be in a low-frequency state .
Features judged to be in a low-frequency state will be eliminated and shielded , The next time it reappears, it will be treated as a new feature .
Use feature access & after , The recommended model can generally be reduced to a quarter of the size when it is not used , Online forecasting AUC Remain flat in the thousandth .
Reference material
边栏推荐
- [JS] extract the scores in the string, calculate the average score after summarizing, compare with each score, and output
- LSTM applied to MNIST dataset classification (compared with CNN)
- 购买小间距LED显示屏的三个建议
- Go project practice - parameter binding, type conversion
- The first product of Sepp power battery was officially launched
- 【JS】提取字符串中的分数,汇总后算出平均分,并与每个分数比较,输出
- Basic testing process of CSDN Software Testing Introduction
- 微信核酸检测预约小程序系统毕业设计毕设(6)开题答辩PPT
- Review the whole process of the 5th Polkadot Hackathon entrepreneurship competition, and uncover the secrets of the winning projects!
- Based on shengteng AI Yisa technology, it launched a full target structured solution for video images, reaching the industry-leading level
猜你喜欢
关于vray 5.2的使用(自研笔记)(二)
2022年危险化学品生产单位安全生产管理人员特种作业证考试题库模拟考试平台操作
[vite] 1371 - develop vite plug-ins by hand
Repair animation 1K to 8K
【广告系统】增量训练 & 特征准入/特征淘汰
谈谈对Flink框架中容错机制及状态的一致性的理解
在C# 中实现上升沿,并模仿PLC环境验证 If 语句使用上升沿和不使用上升沿的不同
About the use of Vray 5.2 (self research notes) (II)
微信核酸检测预约小程序系统毕业设计毕设(6)开题答辩PPT
【DNS】“Can‘t resolve host“ as non-root user, but works fine as root
随机推荐
deepfake教程
BOM//
上拉加载原理
Go-3-the first go program
使用bat命令一键启动常用浏览器
Review the whole process of the 5th Polkadot Hackathon entrepreneurship competition, and uncover the secrets of the winning projects!
[JS] extract the scores in the string, calculate the average score after summarizing, compare with each score, and output
Lombok 同时使⽤@Data和@Builder 的坑,你中招没?
When using gbase 8C database, an error is reported: 80000502, cluster:%s is busy. What's going on?
在C# 中实现上升沿,并模仿PLC环境验证 If 语句使用上升沿和不使用上升沿的不同
关于vray 5.2的使用(自研笔记)(二)
Data types ntext and varchar are incompatible in the not equal to operator - 95 small pang
关于vray5.2怎么关闭日志窗口
Beego cross domain problem solution - successful trial
第五届 Polkadot Hackathon 创业大赛全程回顾,获胜项目揭秘!
Web3 Foundation grant program empowers developers to review four successful projects
磨砺·聚变|知道创宇移动端官网焕新上线,开启数字安全之旅!
Go-2-Vim IDE常用功能
ModuleNotFoundError: No module named ‘scrapy‘ 终极解决方式
DGL中异构图的一些理解以及异构图卷积HeteroGraphConv的用法