当前位置:网站首页>10. DCN introduction
10. DCN introduction
2022-06-13 12:11:00 【nsq1101】
Preface
Conventional CTR Predictive models require a lot of Feature Engineering , time-consuming ; introduce DNN after , Rely on the strong learning ability of neural network , It can realize automatic learning feature combination to a certain extent . however DNN The disadvantage of is the unexplainability caused by the combination of implicit learning features , And inefficient learning ( Not all feature combinations are useful ).
In the beginning FM The inner product of hidden vectors is used to model combined features ;FFM On this basis, we introduce field The concept of , For different field Use different hidden vectors on . however , Both of them are aimed at modeling low-order feature combination .
and DNN The learned features are highly nonlinear high-order composite features , The meaning is very difficult to explain .
1、 DCN Introduce
DCN Full name Deep & Cross Network, It's Google and Stanford University 2017 Proposed in for Ad Click Prediction Model of .DCN(Deep Cross Network) stay It is very efficient to learn the combined features of a specific order , And no feature engineering is required , The additional complexity introduced is also minimal .
2、DCN Model structure
DCN The architecture diagram is shown in the figure above : The beginning is Embedding and stacking layer, Then parallel Cross Network and Deep Network, And finally Combination Layer hold Cross Network and Deep Network Results of Output.
2.1 Embedding and Stacking Layer
Why Embed?
- stay web-scale Recommendation systems such as CTR Under estimation , Most of the input features are category features , The usual solution is one-hot, however one-hot After that, the input feature dimension is very high and very sparse .
- So there is Embedding To greatly reduce the input dimension , That's all binary features convert to dense vectors with real values.
- Embedding The operation is actually using a matrix and one-hot The subsequent inputs are multiplied , It can also be regarded as a query (lookup). This Embedding The matrix is the same as other parameters in the network , You need to learn along with the network .
Why Stack?
Finished processing categorical features , There are also continuous features that do not deal with that . So after we normalize continuous features , And embedded vectors stacking together , You get the original input :
2.2 Cross Network
Cross Network It is the core of the whole thesis . It is designed to efficiently learn combinatorial features , The key is how to do it efficiently feature crossing. Formalize it as follows :
xl and xl+1 They are the first l Tier and tier l+1 layer cross layer Output ,wl and bl Is the connection parameter between the two layers . Note that all variables in the above formula are column vectors ,W It's also a column vector , It's not a matrix .
- How to understand ?
It's not hard ,xl+1 = f(xl, wl, bl) + xl. The output of each layer , Are the output of the previous layer plus feature crossing f. and f Is to fit the residual between the output of this layer and the output of the previous layer . in the light of one cross layer The visualization is as follows :
- High-degree Interaction Across Features:
Cross Network The special network structure makes cross feature The order of increases with layer depth To increase by . Relative to input x0 Come on , One l Layer of cross network Of cross feature The order of is l+1.
- Complexity analysis :
Let's say there are Lc layer cross layer, Start input x0 The dimensions are d. So the whole cross network The number of parameters of is :
Because every floor W and b All are d Dimensional .
It can be found from the above formula that , Complexity is the input dimension d The linear function of . So compared to deep network,cross network The complexity introduced is negligible . That's the guarantee DCN The complexity and DNN It's a level of . The paper says ,Cross Network The reason why we can effectively learn combination features , Because of x0 * xT The rank of is 1, So that we can get all the... Without calculating and storing the whole matrix cross terms.
however , Precisely because cross network The relatively few parameters lead to its limited expression ability , In order to be able to learn highly nonlinear combinatorial features ,DCN Parallel introduces Deep Network.
2.3 Deep Network
This part is nothing special , It is a fully connected neural network with forward propagation , We can calculate the number of parameters to estimate the complexity . Assume that the input x0 Dimension for d, Altogether Lc Layer neural networks , The number of neurons in each layer is m individual . Then the total parameter or complexity is :
2.4 Combination Layer
Combination Layer hold Cross Network and Deep Network The output is spliced , Then after a weighted sum, we get logits, And then pass by sigmoid Function to get the final prediction probability . Formalize it as follows :
p Is the final prediction probability ;XL1 yes d Dimensional , Express Cross Network Final output of ;hL2 yes m Dimensional , Express Deep Network Final output of ;Wlogits yes Combination Layer The weight of ; Last pass sigmoid function , Get the final prediction probability .
The loss function uses... With regular terms log loss, Formalize it as follows :
in addition , in the light of Cross Network and Deep Network,DCN They train together , In this way, the network can know the existence of another network .
3、 example
The code used next is mainly open source DeepCTR, Corresponding API The documentation can be read here
https://deepctr-doc.readthedocs.io/en/latest/Examples.html
import pandas as pd
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.model_selection import train_test_split
from deepctr.models.dcn import DCN
from deepctr.feature_column import SparseFeat, DenseFeat, get_feature_names
data = pd.read_csv('./criteo_sample.txt')
sparse_features = ['C' + str(i) for i in range(1, 27)]
dense_features = ['I' + str(i) for i in range(1, 14)]
data[sparse_features] = data[sparse_features].fillna('-1', )
data[dense_features] = data[dense_features].fillna(0, )
target = ['label']
for feat in sparse_features:
lbe = LabelEncoder()
data[feat] = lbe.fit_transform(data[feat])
mms = MinMaxScaler(feature_range=(0, 1))
data[dense_features] = mms.fit_transform(data[dense_features])
sparse_feature_columns = [SparseFeat(feat, vocabulary_size=data[feat].nunique(), embedding_dim=4)
for i, feat in enumerate(sparse_features)]
# perhaps hash,vocabulary_size Usually bigger , To avoid hash Conflict too much
# sparse_feature_columns = [SparseFeat(feat, vocabulary_size=1e6,embedding_dim=4,use_hash=True)
# for i,feat in enumerate(sparse_features)]#The dimension can be set according to data
dense_feature_columns = [DenseFeat(feat, 1)
for feat in dense_features]
dnn_feature_columns = sparse_feature_columns + dense_feature_columns
linear_feature_columns = sparse_feature_columns + dense_feature_columns
feature_names = get_feature_names(linear_feature_columns + dnn_feature_columns)
train, test = train_test_split(data, test_size=0.2)
train_model_input = {name: train[name].values for name in feature_names}
test_model_input = {name: test[name].values for name in feature_names}
model = DCN(linear_feature_columns, dnn_feature_columns, task='binary')
model.compile("adam", "binary_crossentropy",
metrics=['binary_crossentropy'], )
history = model.fit(train_model_input, train[target].values,
batch_size=256, epochs=10, verbose=2, validation_split=0.2, )
pred_ans = model.predict(test_model_input, batch_size=256)
4、 summary
DCN The characteristics are as follows :
- Use cross network, Apply... At every level feature crossing. Efficient learning bounded degree Combination features . There is no need for artificial feature Engineering .
- The network structure is simple and efficient . Polynomial complexity is determined by layer depth decision .
- Compared with DNN,DCN Of logloss A lower , And the number of parameters is nearly one order of magnitude less .
边栏推荐
- Camunda定时器事件示例Demo(Timer Events)
- Web developer, web development background development
- 【Scala】Scala常用代码库—六大特征与引用
- How camunda uses script script nodes
- Design and implementation of database for banking system
- 004、torchserve 调用LR二分类预测
- [tcapulusdb knowledge base] Introduction to tcapulusdb analytical text export
- Kubernetes问题整理
- 【TcaplusDB知识库】TcaplusDB-tcapsvrmgr工具介绍(三)
- Notes on the development of raspberry pie (16): Raspberry pie 4b+ install MariaDB database (MySQL open source branch) and test basic operations
猜你喜欢
[truth] the reason why big factories are not afraid to spend money is...
89C51 single chip microcomputer driving LCD based on dream
(一)爬取Best Sellers的所有分类信息:爬取流程
复习指南,学生党必看
【福利】,分分钟搞定
机器学习(二)—逻辑回归理论与代码详解
【真相】大厂招人不怕花钱的原因竟然是。。。
7.5.4:Spire Office for . NET New Version
Wallys/Network_ Card/DR-NAS26/AR9223/2x2 MIMO
The answer to the subject of Municipal Administration of the second construction company in 2022 has been provided. Please keep it
随机推荐
Internal register type
Branch prediction of CPU
Based on STM32F103 - matrix key + serial port printing
Design and implementation of database for banking system
Pointnet: deep learning on point sets for 3D classification and segmentation
全网最全,含面试题+答案
10、DCN 介绍
The answer to the subject "highway" of the second construction company in 2022 has been provided. Please keep it
004、torchserve 调用LR二分类预测
How camunda uses script script nodes
7.5.4:Spire Office for .NET New Version
Lucene from introduction to practice
Intelligent customer service system framework rasa
[tcapulusdb knowledge base] Introduction to tcapulusdb analytical text export
Text error correction -- crisp model
基于STM32F103——SIM900A发送短信+串口打印
2022年二建《法规》科目答案已出,请收好
基于STM32F103+AS608指纹模块+4X4矩阵按键+SIM900A发短信——智能门禁卡系统
web开发项目,web单页开发
行业领先的界面组件包DevExpress 6月正式发布v21.2.8