当前位置:网站首页>This article takes you to understand the commonly used models and frameworks of recommender systems
This article takes you to understand the commonly used models and frameworks of recommender systems
2022-08-02 09:34:00 【timerring】
可以看KDD会议,Recent Recommender Systems Papers.
推荐系统概述
传统推荐模型Old school Model
协同过滤模型
through the relationship between users,The information is filtered together with the user's evaluation feedback on the item,So as to find the information that the target user is interested in.

用户—A rating matrix for the item(The matrix is likely to be sparse)
| 用户\物品 | |||
|---|---|---|---|
| x | x | ||
| x | x | ||
| x | x |
Row vectors represent each user's preferences,A column vector indicating the attributes of each item
Based on a scoring matrix(行列)计算相似度,Here are some ways to calculate similarity:
- 余弦相似度
- 皮尔逊相关系数
- 欧氏距离
- 曼哈顿距离
There are mainly user-based collaborative filtering and item-based collaborative filtering.
矩阵分解模型
Matrix decomposition is the product of two low-rank matrices,By the inner product of the two matrices after decomposition,来填补缺失的数据.
优点:思路简单,Prediction can be done easily;
缺点:It is difficult to train incrementally(when the sample surges,May have to rebuild the matrix),Feature fusion is difficult;

这里kis a hidden factor,Equivalent to a hyperparameter.
逻辑回归模型
Predict whether users will“点击商品”进行分类.into a classification model.
ϕ ( x ) = w 0 + w 1 x 1 + ⋯ + w n x n = w 0 + ∑ i = 1 n w i x i \begin{aligned} \phi(x) &=w_{0}+w_{1} x_{1}+\cdots+w_{n} x_{n} \\ &=w_{0}+\sum_{i=1}^{n} w_{i} x_{i} \end{aligned} ϕ(x)=w0+w1x1+⋯+wnxn=w0+i=1∑nwixi
优点:模型简单,可解释性强,训练速度快(SGD梯度下降);
缺点:Model modeling capabilities are limited(没有考虑特征之间的相关性,and the intersection between features),Manual feature engineering is required;
特征交叉模型
PLOY2
ϕ ( x ) = w 0 + ∑ i = 1 n w i x i + ∑ i = 1 n − 1 ∑ j = i + 1 n w i j x i x j \phi(x) = w_{0}+\sum_{i = 1}^{n} w_{i} x_{i}+\sum_{i = 1}^{n-1} \sum_{j = i+1}^{n} w_{i j} x_{i} x_{j} ϕ(x)=w0+∑i=1nwixi+∑i=1n−1∑j=i+1nwijxixj
Violence is added to logistic regression二阶特征交叉.
优点:Add second-order features,Enhanced modeling capabilities;
缺点:时间复杂度高 n − − > n 2 n-->n^2 n−−>n2;
Factorization Machine
ϕ ( x ) = w 0 + ∑ i = 1 n w i x i + ∑ i = 1 n − 1 ∑ j = i + 1 n * v i , v j * x i x j \phi(x)=w_{0}+\sum_{i=1}^{n} w_{i} x_{i}+\sum_{i=1}^{n-1} \sum_{j=i+1}^{n}\left\langle v_{i}, v_{j}\right\rangle x_{i} x_{j} ϕ(x)=w0+∑i=1nwixi+∑i=1n−1∑j=i+1n*vi,vj*xixj
Add implicit weights to each feature(Inner product between two vectors),as the weight of feature intersection.
优点∶相比于PLOY2Reduced the amount of model parameters( n 2 − − > n K n^2-->nK n2−−>nK),自动特征工程
缺点︰Feature intersection is limited(二阶)
GBDT+LR
GBDT:作为特征编码器;It is mainly used for feature filtering and feature encoding of input data,Generate discrete feature vectors
LR(逻辑回归)︰Use the encoded results for training

优点︰灵活,Suitable for adding new features(Use tree model for feature combination)
缺点:The tree model has high complexity
深度推荐模型
Deep Collaborative Filtering(Neural CF )
Treat user ratings of items as a classification problem.
Learn user interactions with items using fully connected layers.

Replacing the matrix factorization operation with a multi-layer neural network
Using a fully connected network may be a little more efficient than multiplying.
Wide & Deep
基本淘汰
Wide为线性模型,Deepis a deep model
浅层模型(记忆能力)and deep model models(泛化能力),

WideParts can be rememberedid,Make a model of this.类似于LR.
DeepIt can be regarded as a fully connected network,类似于NCF.
DeepFM
DeepFM包含FM和DNN两部分,两部分共享输入特征.使用FM替换wide & Deep中的wide部分.
DeepFM:一阶特征+二阶特征+深度特征

Abandon the previous orderWide部分,用FM代替,Enhance the ability to combine shallow features,Substitute first and second order.
DIN
首个加入Attention机制
Adjust weights based on users and items

推荐系统框架&工具
DeepCTR
https://github.com/shenweichen/DeepCTR
https://github.com/shenweichen/DeepCTR-Torch
https://deepctr-torch.readthedocs.io/en/latest/Quick-Start.html
The classic recommendation algorithm model is implemented,支持Keras和Pytroch.
It is better to encapsulate the model and output processing,suitable for competition.
xlearn
https://github.com/aksnzhy/xlearn
https://xlearn-doc-cn.readthedocs.io/en/latest/
LR、FM、FFM的高效实现,Suitable for offline modeling use.
RecBole
伯乐,一个统一、全面、Efficient recommender system codebase
https://recbole.io/cn/
支持72个模型,28个数据集,Suitable for academic use

文本编码方法Text Encoding
Count:Count the number of text characters、单词个数
LabelEncoder:Unified labeling
Multi One-Hot:Multi-value label encoding(例如one-hotAdd after encoding)
AB : 011 BC : 110 AC : 101
One-Hot:eg:A: 0 0 1 B:010 C:100
CounterVector:与Multi One-Hot,But join the count
TfidfVectorizer: 次数 和 词频统计
Word2Vec:词向量映射,然后聚合
边栏推荐
猜你喜欢

It's time for bank data people who are driven crazy by reporting requirements to give up using Excel for reporting

PyQt5安装配置(PyCharm) 亲测可用

Talk about the understanding of Volatile

【打新必读】麦澜德估值分析,骨盆及产后康复电刺激产品

8月份的.NET Conf 活动 专注于 .NET MAUI

Application scenarios of js anti-shake function and function throttling

spark:商品热门品类TOP10统计(案例)

leetcode 62. Unique Paths(独特的路径)

js函数防抖和函数节流及其使用场景

干货|如何在海量文件系统中选择合适自己的文件系统
随机推荐
ORBSLAM代码阅读
function call to print lua internal structure
Jenkins--基础--6.2--Pipeline--语法--声明式
Jetpack Compose 中的状态管理
智能网络安全网卡|这是不是你要的安全感
net start mysql MySQL 服务正在启动 . MySQL 服务无法启动。 服务没有报告任何错误。
裁员趋势下的大厂面试:“字节跳动”
百数应用中心——选择一款适合企业的标准应用
Talk about the understanding of Volatile
让电商运营10倍提效的自动化工具,你get了吗?
cococreator 动态设置精灵
破解wifi密码 暴力破解 保姆式教学
Navicat连接MySQL时弹出:1045:Access denied for user ‘root’@’localhost’
记某社区问答
小程序云开发(十):渐变与动画
Re23:读论文 How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence
【打新必读】麦澜德估值分析,骨盆及产后康复电刺激产品
day1-机器学习-回归问题
向量点积(Dot Product),向量叉积(Cross Product)
从零开始入门单片机(一):必会背景知识总结