当前位置:网站首页>Recommended model reproduction (II): fine arrangement model deepfm, DIN
Recommended model reproduction (II): fine arrangement model deepfm, DIN
2022-06-29 12:39:00 【GoAI】
1.DeepFM Model
1.1 DeepFM Model generation background
- DNN Parameter of is too large : When features One Hot The feature is transformed into Dense Vector when , The network parameter is too large .
- FNN and PNN There are few cross characteristics : Use pre trained FM modular , Connect to DNN It's formed on the surface FNN Model , After that Embedding layer and hidden layer1 Add a product layer , Use product layer Replace FM Pre training layer , formation PNN Model
FNN:
PNN:PNN Use product The way to do feature crossover is to think in ctr Scene , The intersection of features is even more important “ And ” Under the relationship of , and add The operation of , It's a kind of “ or ” The relationship between , therefore product The form of , It will have a better effect .
1.2 DeepFM Model
DeepFM Mainly in the FNN and PNN On the basis of , In parallel , combination FM Layer and Deep Layer, Improve the efficiency of model calculation .

1.2.1 FM part
The main function : Effectively train the weight of cross features
Model formula :

FM Layer It is mainly composed of first-order features and second-order features , after Sigmoid obtain logits
FM Layer The advantages of :
- The vector inner product is used as the weight of the cross feature , It can be used when the data is very sparse , Effectively train the weight of cross features ( Because you don't need two features that are not zero at the same time )
- Computational efficiency is very high
- Although the overall feature space in the recommended scenario is very large , however FM The training and prediction only need to deal with the non-zero features in the samples , This also improves the speed of model training and online prediction
- Due to the high computational efficiency of the model , And it can automatically mine long tail low-frequency materials in sparse scenes , Applicable to recall 、 There are three stages of rough and fine discharge . When applied at different stages , Sample construction 、 Fitting objectives and online services are different “
1.2.2 Deep part
- Use full connection to connect Dense Embedding Input to Hidden Layer, solve DNN The problem of parameter explosion in
- Embedding The output of the layer is to put all id Class features correspond to embedding The vectors are connected together , And type in DNN in
1.3 DeepFM Code
from torch_rechub.basic.layers import FM, MLP, LR, EmbeddingLayer
from tqdm import tqdm
import torch
class DeepFM(torch.nn.Module):
def __init__(self, deep_features, fm_features, mlp_params):
"""
Deep and FM Deal with... Separately deep_features and fm_features Two different characteristics
mlp_params Express MLP Parameters of multi-layer perceptron
"""
super().__init__()
self.deep_features = deep_features
self.fm_features = fm_features
self.deep_dims = sum([fea.embed_dim for fea in deep_features])
self.fm_dims = sum([fea.embed_dim for fea in fm_features])
# LR Modeling first-order feature interactions
self.linear = LR(self.fm_dims)
# FM Modeling second-order feature interactions
self.fm = FM(reduce_sum=True)
# Embedded representation of features
self.embedding = EmbeddingLayer(deep_features + fm_features)
# Set up MLP Multilayer perceptron
self.mlp = MLP(self.deep_dims, **mlp_params)
def forward(self, x):
# Dense Embeddings
input_deep = self.embedding(x, self.deep_features, squeeze_dim=True)
input_fm = self.embedding(x, self.fm_features, squeeze_dim=False)
y_linear = self.linear(input_fm.flatten(start_dim=1))
y_fm = self.fm(input_fm)
y_deep = self.mlp(input_deep)
# The final predicted value is the first-order feature interaction , Second order feature interaction , And the combination of deep models
y = y_linear + y_fm + y_deep
# utilize sigmoid Adjust the forecast score to 0,1 Within the interval
return torch.sigmoid(y.squeeze(1))
2. DIN ( Deep interest network )
2.1 DIN The background
- Insufficient attention to historical information
- Unilateral click prediction is difficult to predict the broad interests of users
- There is a lot of historical data
2.2 DIN Model

2.2.1 Base model

Activation Unit:
effect : A mechanism to introduce attention between the current candidate advertisement and the user's historical behavior , The historical behavior more related to the current product can promote the user's click behavior .
give an example : A mechanism to introduce attention between the current candidate advertisement and the user's historical behavior , The historical behavior more related to the current product can promote the user's click behavior .

- Embedding Layer: Transform high dimensional sparse input into low dimensional dense vector
- Pooling Layer and Concat Layer: The historical behavior of the user will be described above
- Embedding The result becomes a vector of fixed length , And spliced as MLP The input of
- MLP: Fully connected layer , Various interactions of learning features
- Loss: Use the following formula to calculate the loss

2.3 DIN Code
# Realize the attention part
class ActivationUnit(torch.nn.Module):
def __init__(self, emb_dim, dims=[36], activation="dice", use_softmax=False):
super(ActivationUnit, self).__init__()
self.emb_dim = emb_dim
self.use_softmax = use_softmax
# Dice(36)
self.attention = MLP(4 * self.emb_dim, dims=dims, activation=activation)
def forward(self, history, target):
seq_length = history.size(1)
target = target.unsqueeze(1).expand(-1, seq_length, -1)
# Concat
att_input = torch.cat([target, history, target - history, target * history], dim=-1)
# Dice(36)
att_weight = self.attention(att_input.view(-1, 4 * self.emb_dim))
# Linear(1)
att_weight = att_weight.view(-1, seq_length)
if self.use_softmax:
att_weight = att_weight.softmax(dim=-1)
# (batch_size,emb_dim)
output = (att_weight.unsqueeze(-1) * history).sum(dim=1)
return output
# DIN The implementation of the
class DIN(torch.nn.Module):
def __init__(self, features, history_features, target_features, mlp_params, attention_mlp_params):
super().__init__()
self.features = features
self.history_features = history_features
self.target_features = target_features
# The number of historical behavior characteristics
self.num_history_features = len(history_features)
# Calculate all the dim
self.all_dims = sum([fea.embed_dim for fea in features + history_features + target_features])
# structure Embeding layer
self.embedding = EmbeddingLayer(features + history_features + target_features)
# Build attention layer
self.attention_layers = nn.ModuleList(
[ActivationUnit(fea.embed_dim, **attention_mlp_params) for fea in self.history_features])
self.mlp = MLP(self.all_dims, activation="dice", **mlp_params)
def forward(self, x):
embed_x_features = self.embedding(x, self.features)
embed_x_history = self.embedding(x, self.history_features)
embed_x_target = self.embedding(x, self.target_features)
attention_pooling = []
for i in range(self.num_history_features):
attention_seq = self.attention_layers[i](embed_x_history[:, i, :, :], embed_x_target[:, i, :])
attention_pooling.append(attention_seq.unsqueeze(1))
# SUM Pooling
attention_pooling = torch.cat(attention_pooling, dim=1)
# Concat & Flatten
mlp_in = torch.cat([
attention_pooling.flatten(start_dim=1),
embed_x_target.flatten(start_dim=1),
embed_x_features.flatten(start_dim=1)
], dim=1)
# Can be introduced into [80, 200]
y = self.mlp(mlp_in)
# This is used in the code sigmoid(1)+BCELoss, Effect and the DIN Model softmax(2)+CELoss similar
return torch.sigmoid(y.squeeze(1))
3. summary
- Deep stay FNN and PNN On the basis of , In parallel , Combined with the FM Advantages of effectively implementing cross features , The prediction effect of the model is effectively improved .
- DIN Mainly combined with historical information , Use the similarity between the current information and the customer's historical information to confirm the attention to the historical information , Effective use of historical customer information , Improved prediction of customer clicks .
Reference resources :
Recommended model DeepFM And DIN_ Levi babe 、 The blog of -CSDN Blog
边栏推荐
- GBase 8s 扩展外连接1
- Gbase8s database select has order by Clause 6
- ERP编制物料清单 金蝶
- [comprehensive case] credit card virtual transaction identification
- Gbase8s database sorts standard or raw result tables
- Li Kou daily question - day 31 -13 Maximum perimeter of triangle
- Gbase8s database into standard and into raw clauses
- Huffman coding
- Engineering practice behind dall-e 2: ensure that the output of the model complies with the content policy
- Some printer driver PPD files of Lenovo Lingxiang lenovoimage
猜你喜欢

参加2022年杭州站Cocos Star Meetings

爱可可AI前沿推介(6.29)

Weekly recommended short video: How did Einstein think?

How to install oracle19c in Centos8

Do you think people who learn machinery are terrible?

推荐模型复现(一):熟悉Torch-RecHub框架与使用

ERP编制物料清单 华夏

Baidu cloud disk downloads large files without speed limit (valid for 2021-11 personal test)

Uncover the practice of Baidu intelligent test in the field of automatic test execution

Murphy safety was selected for signing 24 key projects of Zhongguancun Science City
随机推荐
Artbench: the first class balanced, high-quality, clean annotated and standardized artwork generation data set
Unified exception reporting practice based on bytecode
MySQL 主从复制原理以及流程
Method area of JVM
[JUC series] ThreadLocal of synchronization tool class
go 学习-搭建开发环境vscode开发环境golang
huffman编码
推荐模型复现(一):熟悉Torch-RecHub框架与使用
Gbase8s database select has order by Clause 5
How do I open an account now? Is there a faster and safer opening channel
Gbase8s database for update clause
GBase8s数据库select有ORDER BY 子句5
GBase8s数据库INTO STANDARD 和 INTO RAW 子句
Gbase8s database into table clause
Some printer driver PPD files of Lenovo Lingxiang lenovoimage
How to fix ORA-01017:用户名/口令无效 登录拒绝
Gbase8s database sorts standard or raw result tables
东方财富证券开户安全吗 证券开户办理
墨菲安全入选中关村科学城24个重点项目签约
Pangolin compilation error: 'numeric_ limits’ is not a member of ‘std’