当前位置：网站首页>7. Introduction to field sensing decomposing machine FFM

7. Introduction to field sensing decomposing machine FFM

2022-06-13 12:10:00 【nsq1101】

Preface

FFM Algorithm , The full name is Field-aware Factorization Machines, yes FM（Factorization Machines） Improved version .
source ：
The original concept came from Yu-Chin Juan（ Ruan Yuqin , Graduated from Taiwan University of China , Now America Criteo Work ） With the players in the game , It's them that draw lessons from Michael Jahrer In your paper field The concept of FM An upgraded version of the model . By introducing field The concept of ,FFM Attribute the same property to the same field.

1、 FFM principle

stay CTR Under estimation , We usually meet one-hot Variable of type , It will lead to sparse data features . It's not solved ,FFM stay FM Further improvement on the basis of , Introduce the concept of category into the model , namely field. Will be the same field The characteristics of the individual one-hot, So in FFM in , Each one-dimensional feature is targeted at each of the other features field, Learn a hidden variable separately , Cain variables are not only related to features , Also with the field relevant .

1.1 introduce field

With feed Take the stream recommendation scenario as an example , We introduce more user Dimension user age information , Both gender and age belong to user Dimensional features , and tag Belong to item Dimensional features . stay FM The principle is being explained ,“ men ” And “ Basketball ”、“ men ” And “ Age ” The potential effect is the same by default , But in fact, it is not necessarily .FM The algorithm cannot capture this difference , Because it does not distinguish between broader categories field The concept of , Instead, the dot product of the same parameter will be used to calculate .
stay FFM（Field-aware Factorization Machines ） Each one-dimensional feature in （feature） All belong to a particular and field,field and feature It's a one-to-many relationship . As shown in the following table ：
Please add a picture description

1.2 Combination features

Please add a picture description

2、 Practical examples

mac ：

pip install cmake
Prompt to install things , Click agree
pip install xlearn
After success, execute the following example

import xlearn as xl

#  Training 
ffm_model = xl.create_ffm()  #  Use FFM Model 
ffm_model.setTrain(r"./small_train.txt")  #  Training data 
ffm_model.setValidate(r"./small_test.txt")  #  Verify test data 

# param:
#  0. binary classification
#  1. learning rate: 0.2
#  2. regular lambda: 0.002
#  3. evaluation metric: accuracy
param = {'task': 'binary', 'lr': 0.2,
         'lambda': 0.002, 'metric': 'acc'}

#  Start training 
ffm_model.fit(param, './model.out')

#  forecast 
ffm_model.setTest(r"./small_test.txt")  #  Test data 
ffm_model.setSigmoid()  #  normalization [0,1] Between 

#  Start to predict 
ffm_model.predict("./model.out", "./output.txt")

xlearn User manual
https://xlearn-doc-cn.readthedocs.io/en/latest/python_api/index.html#id4

3、 FFM application

3.1 Scenario introduction

stay DSP Or in the recommended scenario ,FFM It is mainly used to evaluate CTR and CVR, That is, the potential click rate and conversion rate of a user to a product .

CTR and CVR Prediction models are trained offline , Then predict online . The two models adopt similar characteristics , There are three main categories ：

User related features
Age 、 Gender 、 occupation 、 Interest in 、 Category preference 、 Browse / Basic information such as purchase categories , And the number of recent hits / Purchase volume / Consumption and other statistical information
Commodity related features
Category of goods 、 sales 、 Price 、 score 、 history CTR/CVR Etc
user - Product matching features
Browse / Buy category matching 、 Browse / Buy business matching 、 Interest preference matching, etc

In order to apply FFM Model , All features have to be converted into "field_id:feat_id:value" The format of ,field_id Represents the characteristic field The number of ,feat_id It's the feature number ,value Is the value of the feature .

Numerical features are easier to handle , Just assign separate field Number , E.g. user rating 、 The history of commodities CTR/CVR etc. .
categorical Features need to go through One-Hot Code is converted to numeric type , All features generated by coding belong to the same field, And the value of the characteristic can only be 0 or 1, For example, the user's gender 、 age group 、 The category of goods, etc .
besides , There's a third kind of feature , Such as user browsing / Buy categories , There are many categories id And use a number to measure the number of products that users browse or buy in each category . This kind of characteristic follows categorical Feature handling , The difference is that the eigenvalue is no longer 0 or 1, It represents the number of users browsing or purchasing .
According to the above method field_id after , Then number the transformed features in sequence , obtain feat_id, The value of the feature can also be obtained according to the previous method .

CTR、CVR The categories of estimated samples are obtained in different ways .CTR The estimated positive sample is the users who click on the website - Product records , negative # Samples are displayed but not clicked records ;CVR The positive sample of estimation is the payment inside the station （ Transformation ） Users of - Product records , Negative samples are records that are clicked but not paid .

3.2 Minutiae

Training FFM In the process of , Small details deserve special attention

First of all , Sample normalization .
FFM The default is to normalize the sample data , namely pa.normpa.norm It's true ; If this parameter is set to false , It's easy to create data inf overflow , And then it leads to the gradient calculation nan error . therefore , Sample level data is recommended for normalization .
second , Feature normalization .
CTR/CVR The model uses a variety of source characteristics , Including numerical and categorical Type, etc . however ,categorical The value of the feature after class coding is only 0 or 1, The larger numerical characteristics will cause the normalization of samples categorical The values of class generated features are very small , There is no distinction . for example , A user - Product records , The user is “ male ” sex , The sales of goods are 5000 individual （ Assume that the value of other features is zero ）, Then the normalized feature “sex=male”（ Gender is male ） Is slightly less than 0.0002, and “volume”（ sales ） The value of is approximately 1. features “sex=male” The role in this sample is almost negligible , This is quite unreasonable . therefore , Normalize the values of source numerical features to [0,1][0,1] It's very necessary .
Third , Omit zero value features .
from FFM The expression of the model shows that , Zero value features have no contribution to the model at all . Both the first-order term and the combination term containing zero value features are zero , It has no effect on the estimation of training model parameters or target values . therefore , Zero value features can be omitted , Improve FFM The speed of model training and prediction , This is also a sparse sample using FFM The obvious advantage of .

4、FFM vs FM

FM yes FFM The special case of
FFM stay FM On the basis of that, the paper puts forward field The concept of ,FM Each feature in the model has only one hidden vector , and FFM Then there are multiple hidden vectors , Dot multiplication is based on the corresponding field Make a selection
In terms of computational complexity ,FM The complexity can be reduced to O(kn), and FFM It is O(kn^2)

FFM Advantages and disadvantages

FFM advantage ：
increase field The concept of , The same feature is for different field Use different hidden vectors , Model modeling is more accurate
FFM shortcoming ：
The computational complexity is relatively high , The number of parameters is nfk, The calculation complexity is O(kn2)

5、 summary

Analyze theoretically ,FFM The parametric factorization method of has some significant advantages , Especially suitable for dealing with the problem of sample sparsity , And ensure better performance ;
From the application results , Station CTR/CVR It is estimated that FFM It's very reasonable , All the indicators show that FFM Excellent performance in click through estimation .

原网站

版权声明
本文为[nsq1101]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/164/202206131205315078.html