当前位置:网站首页>Recommended model reproduction (II): fine arrangement model deepfm, DIN
Recommended model reproduction (II): fine arrangement model deepfm, DIN
2022-06-29 12:39:00 【GoAI】
1.DeepFM Model
1.1 DeepFM Model generation background
- DNN Parameter of is too large : When features One Hot The feature is transformed into Dense Vector when , The network parameter is too large .
- FNN and PNN There are few cross characteristics : Use pre trained FM modular , Connect to DNN It's formed on the surface FNN Model , After that Embedding layer and hidden layer1 Add a product layer , Use product layer Replace FM Pre training layer , formation PNN Model
FNN:
PNN:PNN Use product The way to do feature crossover is to think in ctr Scene , The intersection of features is even more important “ And ” Under the relationship of , and add The operation of , It's a kind of “ or ” The relationship between , therefore product The form of , It will have a better effect .
1.2 DeepFM Model
DeepFM Mainly in the FNN and PNN On the basis of , In parallel , combination FM Layer and Deep Layer, Improve the efficiency of model calculation .

1.2.1 FM part
The main function : Effectively train the weight of cross features
Model formula :

FM Layer It is mainly composed of first-order features and second-order features , after Sigmoid obtain logits
FM Layer The advantages of :
- The vector inner product is used as the weight of the cross feature , It can be used when the data is very sparse , Effectively train the weight of cross features ( Because you don't need two features that are not zero at the same time )
- Computational efficiency is very high
- Although the overall feature space in the recommended scenario is very large , however FM The training and prediction only need to deal with the non-zero features in the samples , This also improves the speed of model training and online prediction
- Due to the high computational efficiency of the model , And it can automatically mine long tail low-frequency materials in sparse scenes , Applicable to recall 、 There are three stages of rough and fine discharge . When applied at different stages , Sample construction 、 Fitting objectives and online services are different “
1.2.2 Deep part
- Use full connection to connect Dense Embedding Input to Hidden Layer, solve DNN The problem of parameter explosion in
- Embedding The output of the layer is to put all id Class features correspond to embedding The vectors are connected together , And type in DNN in
1.3 DeepFM Code
from torch_rechub.basic.layers import FM, MLP, LR, EmbeddingLayer
from tqdm import tqdm
import torch
class DeepFM(torch.nn.Module):
def __init__(self, deep_features, fm_features, mlp_params):
"""
Deep and FM Deal with... Separately deep_features and fm_features Two different characteristics
mlp_params Express MLP Parameters of multi-layer perceptron
"""
super().__init__()
self.deep_features = deep_features
self.fm_features = fm_features
self.deep_dims = sum([fea.embed_dim for fea in deep_features])
self.fm_dims = sum([fea.embed_dim for fea in fm_features])
# LR Modeling first-order feature interactions
self.linear = LR(self.fm_dims)
# FM Modeling second-order feature interactions
self.fm = FM(reduce_sum=True)
# Embedded representation of features
self.embedding = EmbeddingLayer(deep_features + fm_features)
# Set up MLP Multilayer perceptron
self.mlp = MLP(self.deep_dims, **mlp_params)
def forward(self, x):
# Dense Embeddings
input_deep = self.embedding(x, self.deep_features, squeeze_dim=True)
input_fm = self.embedding(x, self.fm_features, squeeze_dim=False)
y_linear = self.linear(input_fm.flatten(start_dim=1))
y_fm = self.fm(input_fm)
y_deep = self.mlp(input_deep)
# The final predicted value is the first-order feature interaction , Second order feature interaction , And the combination of deep models
y = y_linear + y_fm + y_deep
# utilize sigmoid Adjust the forecast score to 0,1 Within the interval
return torch.sigmoid(y.squeeze(1))
2. DIN ( Deep interest network )
2.1 DIN The background
- Insufficient attention to historical information
- Unilateral click prediction is difficult to predict the broad interests of users
- There is a lot of historical data
2.2 DIN Model

2.2.1 Base model

Activation Unit:
effect : A mechanism to introduce attention between the current candidate advertisement and the user's historical behavior , The historical behavior more related to the current product can promote the user's click behavior .
give an example : A mechanism to introduce attention between the current candidate advertisement and the user's historical behavior , The historical behavior more related to the current product can promote the user's click behavior .

- Embedding Layer: Transform high dimensional sparse input into low dimensional dense vector
- Pooling Layer and Concat Layer: The historical behavior of the user will be described above
- Embedding The result becomes a vector of fixed length , And spliced as MLP The input of
- MLP: Fully connected layer , Various interactions of learning features
- Loss: Use the following formula to calculate the loss

2.3 DIN Code
# Realize the attention part
class ActivationUnit(torch.nn.Module):
def __init__(self, emb_dim, dims=[36], activation="dice", use_softmax=False):
super(ActivationUnit, self).__init__()
self.emb_dim = emb_dim
self.use_softmax = use_softmax
# Dice(36)
self.attention = MLP(4 * self.emb_dim, dims=dims, activation=activation)
def forward(self, history, target):
seq_length = history.size(1)
target = target.unsqueeze(1).expand(-1, seq_length, -1)
# Concat
att_input = torch.cat([target, history, target - history, target * history], dim=-1)
# Dice(36)
att_weight = self.attention(att_input.view(-1, 4 * self.emb_dim))
# Linear(1)
att_weight = att_weight.view(-1, seq_length)
if self.use_softmax:
att_weight = att_weight.softmax(dim=-1)
# (batch_size,emb_dim)
output = (att_weight.unsqueeze(-1) * history).sum(dim=1)
return output
# DIN The implementation of the
class DIN(torch.nn.Module):
def __init__(self, features, history_features, target_features, mlp_params, attention_mlp_params):
super().__init__()
self.features = features
self.history_features = history_features
self.target_features = target_features
# The number of historical behavior characteristics
self.num_history_features = len(history_features)
# Calculate all the dim
self.all_dims = sum([fea.embed_dim for fea in features + history_features + target_features])
# structure Embeding layer
self.embedding = EmbeddingLayer(features + history_features + target_features)
# Build attention layer
self.attention_layers = nn.ModuleList(
[ActivationUnit(fea.embed_dim, **attention_mlp_params) for fea in self.history_features])
self.mlp = MLP(self.all_dims, activation="dice", **mlp_params)
def forward(self, x):
embed_x_features = self.embedding(x, self.features)
embed_x_history = self.embedding(x, self.history_features)
embed_x_target = self.embedding(x, self.target_features)
attention_pooling = []
for i in range(self.num_history_features):
attention_seq = self.attention_layers[i](embed_x_history[:, i, :, :], embed_x_target[:, i, :])
attention_pooling.append(attention_seq.unsqueeze(1))
# SUM Pooling
attention_pooling = torch.cat(attention_pooling, dim=1)
# Concat & Flatten
mlp_in = torch.cat([
attention_pooling.flatten(start_dim=1),
embed_x_target.flatten(start_dim=1),
embed_x_features.flatten(start_dim=1)
], dim=1)
# Can be introduced into [80, 200]
y = self.mlp(mlp_in)
# This is used in the code sigmoid(1)+BCELoss, Effect and the DIN Model softmax(2)+CELoss similar
return torch.sigmoid(y.squeeze(1))
3. summary
- Deep stay FNN and PNN On the basis of , In parallel , Combined with the FM Advantages of effectively implementing cross features , The prediction effect of the model is effectively improved .
- DIN Mainly combined with historical information , Use the similarity between the current information and the customer's historical information to confirm the attention to the historical information , Effective use of historical customer information , Improved prediction of customer clicks .
Reference resources :
Recommended model DeepFM And DIN_ Levi babe 、 The blog of -CSDN Blog
边栏推荐
- Unified exception reporting practice based on bytecode
- Li Kou daily question - day 31 -1779 Find the nearest point with the same X or Y coordinate
- Gbase8s database into standard and into raw clauses
- 速看|期待已久的2022年广州助理检测工程师真题解析终于出炉
- bison使用error死循环的记录
- Pangolin编译error: ‘numeric_limits’ is not a member of ‘std’
- Factorization of large numbers ← C language
- How to install oracle19c in Centos8
- MIT linear algebra Chinese Notes
- 云龙开炮版飞机大战(完整版)
猜你喜欢

Cereal mall project

When you are young, you should be awake to fight, and when you are young, you should have the courage to try

Interpolated scatter data

内插散点数据

Matlab GUI realizes the function of clicking the button, opening the file dialog box and importing pictures

架构实战营第五模块课后作业

推荐模型复现(二):精排模型DeepFM、DIN

Wonderful! Miaoying technology fully implements Zadig to help container construction, and fully embraces kubernetes and Yunyuan

《高难度谈话》突破谈话瓶颈,实现完美沟通

Unified exception reporting practice based on bytecode
随机推荐
Cocos star meetings at Hangzhou station in 2022
1. opencv realizes simple color recognition
Cereal mall project
Artbench: the first class balanced, high-quality, clean annotated and standardized artwork generation data set
InDesign插件-常规功能开发-JS调试器打开和关闭-js脚本开发-ID插件
Ttchat x Zadig open source co creates helm access scenarios, and environmental governance can be done!
LM07丨细聊期货横截面策略
oracle 19c : change the user sys/system username pasword under Linux
Gbase8s database select has order by Clause 3
云龙开炮版飞机大战(完整版)
When you are young, you should be awake to fight, and when you are young, you should have the courage to try
GBase8s数据库FOR UPDATE 子句
The blackened honeysnow ice city wants to grasp the hearts of consumers by marketing?
Set operator of gbase8s database in combined query
智能指标驱动的管理和决策平台 Kyligence Zen 全新上线,限量内测中
Gbase8s database into temp clause creates a temporary table to save query results.
[comprehensive case] credit card virtual transaction identification
地球观测卫星数据
Factorization of large numbers ← C language
Interview shock 61: tell me about MySQL transaction isolation level?