当前位置:网站首页>[advertising system] incremental training & feature access / feature elimination
[advertising system] incremental training & feature access / feature elimination
2022-07-05 10:57:00 【CC‘s World】
One 、 Incremental training
Sometimes there are a lot of training data , Tens of millions are also common . Although tens of millions of people only look at the records, the number is not much , But what if there are hundreds of features , That data set is terrible , If saved as numpy.float type , That's definitely exploding the memory . I'm in this situation , Start to consider incremental training of incremental model .
On very large datasets , There are usually several ways :1. Dimensionality reduction of data ,2. Incremental training , Use streaming or similar streaming processing ,3. Big machine , High memory , Or use spark colony .
Incremental training , In fact, it has the same meaning as online learning , The typical representative of online learning is SGD Optimization of the logistics regress, Initialize parameters with data first , Update the parameters with a data on the line , Although the passage of time , The effect is getting better and better . This avoids the problem of updating the model offline .
Incremental training has two main functions , One is to find ways to use all the data , The other is to find ways to make timely use of new data . It can improve the timeliness of the model 、 Sample size and saving cluster resources .
Recommended scenarios are usually due to the introduction of a large number of ID Class characteristics lead to the existence of a large number of sparse parameters , For example, in classic YouTube DNN In the model , Use the videos watched by users and user history search tokens As the main Embedded features . According to the discussion in the paper ,YouTube DNN in candidate video as well as search tokens There are millions . On this basis, if cross features are used , It will further aggravate the problem of parameter explosion .
Low frequency scenes are recommended ID Class features will also bring the risk of over fitting to the system , In response to this question , We designed feature access / Exit mechanism strategy , It is convenient to preset the expression ability according to the specific model , Adjust the influence of low-frequency sparse parameters on the model .
Two 、 Feature access
In the business scenario , New samples will be produced all the time , New samples bring new features . Some features appear less frequently , If all are added to the model , On the one hand, it is a challenge for memory , On the other hand , Low frequency features will bring over fitting . Therefore, some characteristic access mechanisms will be formulated , Including filtering based on probability , Bloon filters, etc .
The training framework will set feature access for new features “ The threshold ” To prevent frequent access of low-frequency features . We provide two mechanisms to limit access to new features :
- Probability increases , Every time you encounter new features , Generate probability according to the preset distribution , Control feature access ;
- Use Counting Bloom Filter Count the occurrence times of new features , When the number exceeds the threshold , admittance .
The picture above briefly describes CBF Principle , Suppose the capacity is 16, Two hash Function is used as Feature ID To Index Mapping . When querying the characteristic frequency ,Feature1 after Hash Function1 and Hash Function2 Get... Separately Slot 3 and Slot 6, Two Slot Values are 1,Feature The number of occurrences can be regarded as 1.Feature2 after Hash Function1 and Hash Function2 Get... Separately Slot 6 and Slot 15. Two Slot Values, respectively 1 and 0,Feature2 The number of occurrences can be regarded as 0. That is, map to all Slot in Value minimum value .
3、 ... and 、 Feature elimination
Some features will fail if they are not updated for a long time . To relieve memory pressure , Improve the timeliness of the model , Obsolete features need to be eliminated , Make elimination rules .
For features that have been admitted , There are three ways to judge whether it is in the low-frequency state :
- Update time . If a feature has not been updated for a long time , It is considered to have been in a low-frequency state ;
- L2 norm . If a feature L2 The result of norm calculation is too small , It is considered to have been in a low-frequency state ;
- Comprehensive score of statistical value . Support user-defined functions , Through characteristic statistics ( Exposure number , clicks , Number of likes , Number of comments, etc ) To calculate the comprehensive score of features , If the score is less than the threshold, it is considered to be in a low-frequency state .
Features judged to be in a low-frequency state will be eliminated and shielded , The next time it reappears, it will be treated as a new feature .
Use feature access & after , The recommended model can generally be reduced to a quarter of the size when it is not used , Online forecasting AUC Remain flat in the thousandth .
Reference material
边栏推荐
- 【DNS】“Can‘t resolve host“ as non-root user, but works fine as root
- vite//
- C语言活期储蓄账户管理系统
- Some understandings of heterogeneous graphs in DGL and the usage of heterogeneous graph convolution heterographconv
- Crawler (9) - scrape framework (1) | scrape asynchronous web crawler framework
- 【DNS】“Can‘t resolve host“ as non-root user, but works fine as root
- 数据库三大范式
- MFC宠物商店信息管理系统
- [vite] 1371 - develop vite plug-ins by hand
- 微信核酸检测预约小程序系统毕业设计毕设(6)开题答辩PPT
猜你喜欢
微信核酸检测预约小程序系统毕业设计毕设(7)中期检查报告
Explanation of message passing in DGL
The first product of Sepp power battery was officially launched
风控模型启用前的最后一道工序,80%的童鞋在这都踩坑
Crawler (9) - scrape framework (1) | scrape asynchronous web crawler framework
About the use of Vray 5.2 (self research notes)
微信核酸检测预约小程序系统毕业设计毕设(6)开题答辩PPT
ModuleNotFoundError: No module named ‘scrapy‘ 终极解决方式
2022年T电梯修理操作证考试题及答案
【Oracle】使用DataGrip连接Oracle数据库
随机推荐
2022 chemical automation control instrument examination questions and online simulation examination
MFC宠物商店信息管理系统
The first product of Sepp power battery was officially launched
Data type
购买小间距LED显示屏的三个建议
关于vray 5.2的使用(自研笔记)
LSTM应用于MNIST数据集分类(与CNN做对比)
deepfake教程
字符串、、
A mining of edu certificate station
About the use of Vray 5.2 (self research notes)
DGL中异构图的一些理解以及异构图卷积HeteroGraphConv的用法
[JS learning notes 54] BFC mode
2021 Shandong provincial competition question bank topic capture
Go language learning notes - first acquaintance with go language
Do you really understand the things about "prototype"? [part I]
Web3基金会「Grant计划」赋能开发者,盘点四大成功项目
Repair animation 1K to 8K
Common functions of go-2-vim IDE
Function///