当前位置：网站首页>Recommended model reproduction (II): fine arrangement model deepfm, DIN

Recommended model reproduction (II): fine arrangement model deepfm, DIN

2022-06-29 12:39:00 【GoAI】

1.DeepFM Model

1.1 DeepFM Model generation background

DNN Parameter of is too large ： When features One Hot The feature is transformed into Dense Vector when , The network parameter is too large .
FNN and PNN There are few cross characteristics ： Use pre trained FM modular , Connect to DNN It's formed on the surface FNN Model , After that Embedding layer and hidden layer1 Add a product layer , Use product layer Replace FM Pre training layer , formation PNN Model
FNN:
PNN:PNN Use product The way to do feature crossover is to think in ctr Scene , The intersection of features is even more important “ And ” Under the relationship of , and add The operation of , It's a kind of “ or ” The relationship between , therefore product The form of , It will have a better effect .

1.2 DeepFM Model

DeepFM Mainly in the FNN and PNN On the basis of , In parallel , combination FM Layer and Deep Layer, Improve the efficiency of model calculation .

1.2.1 FM part

The main function ： Effectively train the weight of cross features

Model formula ：

FM Layer It is mainly composed of first-order features and second-order features , after Sigmoid obtain logits
FM Layer The advantages of ：

The vector inner product is used as the weight of the cross feature , It can be used when the data is very sparse , Effectively train the weight of cross features （ Because you don't need two features that are not zero at the same time ）
Computational efficiency is very high
Although the overall feature space in the recommended scenario is very large , however FM The training and prediction only need to deal with the non-zero features in the samples , This also improves the speed of model training and online prediction
Due to the high computational efficiency of the model , And it can automatically mine long tail low-frequency materials in sparse scenes , Applicable to recall 、 There are three stages of rough and fine discharge . When applied at different stages , Sample construction 、 Fitting objectives and online services are different “

1.2.2 Deep part

Use full connection to connect Dense Embedding Input to Hidden Layer, solve DNN The problem of parameter explosion in
Embedding The output of the layer is to put all id Class features correspond to embedding The vectors are connected together , And type in DNN in

1.3 DeepFM Code

from torch_rechub.basic.layers import FM, MLP, LR, EmbeddingLayer
from tqdm import tqdm
import torch

class DeepFM(torch.nn.Module):
    def __init__(self, deep_features, fm_features, mlp_params):
        """
        Deep and FM Deal with... Separately deep_features and fm_features Two different characteristics 
        mlp_params Express MLP Parameters of multi-layer perceptron 
        """
        super().__init__()
        self.deep_features = deep_features
        self.fm_features = fm_features
        self.deep_dims = sum([fea.embed_dim for fea in deep_features])
        self.fm_dims = sum([fea.embed_dim for fea in fm_features])
        # LR Modeling first-order feature interactions 
        self.linear = LR(self.fm_dims)
        # FM Modeling second-order feature interactions 
        self.fm = FM(reduce_sum=True)
        #  Embedded representation of features 
        self.embedding = EmbeddingLayer(deep_features + fm_features)
        #  Set up MLP Multilayer perceptron 
        self.mlp = MLP(self.deep_dims, **mlp_params)

    def forward(self, x):
        # Dense Embeddings
        input_deep = self.embedding(x, self.deep_features, squeeze_dim=True) 
        input_fm = self.embedding(x, self.fm_features, squeeze_dim=False)
        
        y_linear = self.linear(input_fm.flatten(start_dim=1))
        y_fm = self.fm(input_fm)
        y_deep = self.mlp(input_deep)
        #  The final predicted value is the first-order feature interaction , Second order feature interaction , And the combination of deep models 
        y = y_linear + y_fm + y_deep
        #  utilize sigmoid Adjust the forecast score to 0,1 Within the interval 
        return torch.sigmoid(y.squeeze(1))

2. DIN （ Deep interest network ）

2.1 DIN The background

Insufficient attention to historical information
Unilateral click prediction is difficult to predict the broad interests of users
There is a lot of historical data

2.2 DIN Model

2.2.1 Base model

Activation Unit：
effect ： A mechanism to introduce attention between the current candidate advertisement and the user's historical behavior , The historical behavior more related to the current product can promote the user's click behavior .
give an example ： A mechanism to introduce attention between the current candidate advertisement and the user's historical behavior , The historical behavior more related to the current product can promote the user's click behavior .

Embedding Layer： Transform high dimensional sparse input into low dimensional dense vector
Pooling Layer and Concat Layer： The historical behavior of the user will be described above
Embedding The result becomes a vector of fixed length , And spliced as MLP The input of
MLP： Fully connected layer , Various interactions of learning features
Loss： Use the following formula to calculate the loss

2.3 DIN Code

#  Realize the attention part 
class ActivationUnit(torch.nn.Module):
    def __init__(self, emb_dim, dims=[36], activation="dice", use_softmax=False):
        super(ActivationUnit, self).__init__()
        self.emb_dim = emb_dim
        self.use_softmax = use_softmax
        # Dice(36)
        self.attention = MLP(4 * self.emb_dim, dims=dims, activation=activation)

    def forward(self, history, target):
        seq_length = history.size(1)
        target = target.unsqueeze(1).expand(-1, seq_length, -1)
        # Concat
        att_input = torch.cat([target, history, target - history, target * history], dim=-1)  
        # Dice(36)
        att_weight = self.attention(att_input.view(-1, 4 * self.emb_dim))  
        # Linear(1)
        att_weight = att_weight.view(-1, seq_length)
        if self.use_softmax:
            att_weight = att_weight.softmax(dim=-1)
        # (batch_size,emb_dim)
        output = (att_weight.unsqueeze(-1) * history).sum(dim=1)
        return output
# DIN The implementation of the 
class DIN(torch.nn.Module):
    def __init__(self, features, history_features, target_features, mlp_params, attention_mlp_params):
        super().__init__()
        self.features = features
        self.history_features = history_features
        self.target_features = target_features
        #  The number of historical behavior characteristics 
        self.num_history_features = len(history_features)
        #  Calculate all the dim
        self.all_dims = sum([fea.embed_dim for fea in features + history_features + target_features])
        
        #  structure Embeding layer 
        self.embedding = EmbeddingLayer(features + history_features + target_features)
        #  Build attention layer 
        self.attention_layers = nn.ModuleList(
            [ActivationUnit(fea.embed_dim, **attention_mlp_params) for fea in self.history_features])
        self.mlp = MLP(self.all_dims, activation="dice", **mlp_params)

    def forward(self, x):
        embed_x_features = self.embedding(x, self.features)
        embed_x_history = self.embedding(x, self.history_features)
        embed_x_target = self.embedding(x, self.target_features)
        attention_pooling = []
        for i in range(self.num_history_features):
            attention_seq = self.attention_layers[i](embed_x_history[:, i, :, :], embed_x_target[:, i, :])
            attention_pooling.append(attention_seq.unsqueeze(1)) 
        # SUM Pooling
        attention_pooling = torch.cat(attention_pooling, dim=1)
        # Concat & Flatten
        mlp_in = torch.cat([
            attention_pooling.flatten(start_dim=1),
            embed_x_target.flatten(start_dim=1),
            embed_x_features.flatten(start_dim=1)
        ], dim=1)
        
        #  Can be introduced into [80, 200]
        y = self.mlp(mlp_in)
        
        #  This is used in the code sigmoid(1)+BCELoss, Effect and the DIN Model softmax(2)+CELoss similar 
        return torch.sigmoid(y.squeeze(1))

3. summary

Deep stay FNN and PNN On the basis of , In parallel , Combined with the FM Advantages of effectively implementing cross features , The prediction effect of the model is effectively improved .
DIN Mainly combined with historical information , Use the similarity between the current information and the customer's historical information to confirm the attention to the historical information , Effective use of historical customer information , Improved prediction of customer clicks .