当前位置:网站首页>协同过滤进化版本NeuralCF及tensorflow2实现
协同过滤进化版本NeuralCF及tensorflow2实现
2022-06-26 21:27:00 【浪漫的数据分析】
目标:
掌握NeuralCF比传统基于矩阵分解的协同过滤算法的改进点,以及算法的优点和缺点。
内容:
上篇学习了最经典的推荐算法:协同过滤,并基于矩阵分解得到了用户和物品的embeding向量。通过点积可以得到两者的相似度,可进行排序推荐。但传统协同过滤通过直接利用非常稀疏的共现矩阵进行预测的,所以模型的泛化能力非常弱,遇到历史行为非常少的用户,就没法产生准确的推荐结果了。矩阵分解是利用非常简单的内积方式来处理用户向量和物品向量的交叉问题的,所以,它的拟合能力也比较弱。
- 改进点
1、 能不能利用深度学习来改进协同过滤算法呢?包括计算embeding向量,和最后计算物品与用户相似度的点积。
2、 新加坡国立的研究者就使用深度学习网络来改进了传统的协同过滤算法,取名 NeuralCF(神经网络协同过滤)
算法思想:
对比几种算法思想
- 1、矩阵分解算法的原理

就是把共线矩阵分解成两个小矩阵相乘,小矩阵就是embeding向量。 - 2、传统的点积求相似度

- 3、 NeuralCF基本思想

改进点,就是用MLP替代原来的点积操作。 - 4、改进版本-双塔模型

- 用户侧的Layer的输出就当做用户侧embeding。
- 物品侧的Layer的输出就当做物品侧embeding。
- 优点:可以缓存物品、用户侧embeding,在线上推荐时。直接用物品、用户侧embeding计算点积得到相似度。
- 5、改进版本2-双塔模型+MLP
点积操作还是过于简单,不便于发现。采用MLP替换点积操作。
- 6、改进版本6-双塔模型+多特征组合+MLP
embeding只用了用户的id或者共线矩阵产生,忽略了物品和用户的其他固有属性,使用的特征过少,因此,可以加入更多特征一起输入到用户侧和物品侧的多层神经网络。这样可以充分利用特征。
模型代码:
GitHub地址:github源码
例如:
1、 NeuralCF基本模型
# neural cf model arch two. only embedding in each tower, then MLP as the interaction layers
def neural_cf_model_1(feature_inputs, item_feature_columns, user_feature_columns, hidden_units):
item_tower = tf.keras.layers.DenseFeatures(item_feature_columns)(feature_inputs)
user_tower = tf.keras.layers.DenseFeatures(user_feature_columns)(feature_inputs)
interact_layer = tf.keras.layers.concatenate([item_tower, user_tower])
for num_nodes in hidden_units:
interact_layer = tf.keras.layers.Dense(num_nodes, activation='relu')(interact_layer)
output_layer = tf.keras.layers.Dense(1, activation='sigmoid')(interact_layer)
neural_cf_model = tf.keras.Model(feature_inputs, output_layer)
return neural_cf_model
2、改进版本-双塔模型
# neural cf model arch one. embedding+MLP in each tower, then dot product layer as the output
def neural_cf_model_2(feature_inputs, item_feature_columns, user_feature_columns, hidden_units):
item_tower = tf.keras.layers.DenseFeatures(item_feature_columns)(feature_inputs)
for num_nodes in hidden_units:
item_tower = tf.keras.layers.Dense(num_nodes, activation='relu')(item_tower)
user_tower = tf.keras.layers.DenseFeatures(user_feature_columns)(feature_inputs)
for num_nodes in hidden_units:
user_tower = tf.keras.layers.Dense(num_nodes, activation='relu')(user_tower)
output = tf.keras.layers.Dot(axes=1)([item_tower, user_tower])
output = tf.keras.layers.Dense(1, activation='sigmoid')(output)
# output = tf.keras.layers.Dense(1)(output)
neural_cf_model = tf.keras.Model(feature_inputs, output)
return neural_cf_model

从结果可以看出,accuracy不是很高,模型欠拟合较严重。
3、 改进版本2-双塔模型+MLP
# neural cf model arch one. embedding+MLP in each tower, then MLP layer as the output
def neural_cf_model_3(feature_inputs, item_feature_columns, user_feature_columns, hidden_units):
item_tower = tf.keras.layers.DenseFeatures(item_feature_columns)(feature_inputs)
for num_nodes in hidden_units:
item_tower = tf.keras.layers.Dense(num_nodes, activation='relu')(item_tower)
user_tower = tf.keras.layers.DenseFeatures(user_feature_columns)(feature_inputs)
for num_nodes in hidden_units:
user_tower = tf.keras.layers.Dense(num_nodes, activation='relu')(user_tower)
output = tf.keras.layers.concatenate([item_tower, user_tower])
# output = tf.keras.layers.Dot(axes=1)([item_tower, user_tower])
for num_nodes in hidden_units:
output = tf.keras.layers.Dense(num_nodes,activation='relu')(output)
output = tf.keras.layers.Dense(1, activation='sigmoid')(output)
# output = tf.keras.layers.Dense(1)(output)
neural_cf_model = tf.keras.Model(feature_inputs, output)
return neural_cf_model

从运行结果看,这个模型的loss减小,准确度有提升。
Test Loss 0.19877538084983826, Test Accuracy 0.6881847977638245, Test ROC AUC 0.7592607140541077, Test PR AUC 0.7094590663909912
4、 改进版本6-双塔模型+多特征组合+MLP
终极版本:
# neural cf model arch one. embedding+MLP in each tower, then MLP layer as the output
def neural_cf_model_4(feature_inputs, item_feature_columns, user_feature_columns, hidden_units):
item_tower = tf.keras.layers.DenseFeatures(item_feature_columns)(feature_inputs)
item_tower = tf.keras.layers.concatenate([item_tower,iterm_f])
for num_nodes in hidden_units:
item_tower = tf.keras.layers.Dense(num_nodes, activation='relu')(item_tower)
user_tower = tf.keras.layers.DenseFeatures(user_feature_columns)(feature_inputs)
user_tower = tf.keras.layers.concatenate([user_tower,user_f])
for num_nodes in hidden_units:
user_tower = tf.keras.layers.Dense(num_nodes, activation='relu')(user_tower)
output = tf.keras.layers.concatenate([item_tower, user_tower])
# output = tf.keras.layers.Dot(axes=1)([item_tower, user_tower])
for num_nodes in hidden_units:
output = tf.keras.layers.Dense(num_nodes,activation='relu')(output)
output = tf.keras.layers.Dense(1, activation='sigmoid')(output)
# output = tf.keras.layers.Dense(1)(output)
neural_cf_model = tf.keras.Model(feature_inputs, output)
return neural_cf_model
最终运行结果:
Test Loss 0.6841861605644226, Test Accuracy 0.6669825315475464, Test ROC AUC 0.715860903263092, Test PR AUC 0.6257403492927551
效果和第三种相差不大,但是当数据量多的时候,理论上,第4种效果最好。
边栏推荐
- Shiniman household sprint A shares: annual revenue of nearly 1.2 billion red star Macalline and incredibly home are shareholders
- leetcode刷题:字符串04(颠倒字符串中的单词)
- 会计要素包括哪些内容
- The postgraduate entrance examination in these areas is crazy! Which area has the largest number of candidates?
- Leetcode question brushing: String 05 (Sword finger offer 58 - ii. left rotation string)
- windows系統下怎麼安裝mysql8.0數據庫?(圖文教程)
- 这些地区考研太疯狂!哪个地区报考人数最多?
- 基于启发式搜索的一字棋
- StringUtils判断字符串是否为空
- 诗尼曼家居冲刺A股:年营收近12亿 红星美凯龙与居然之家是股东
猜你喜欢

Leetcode question brushing: String 02 (reverse string II)

【protobuf 】protobuf 升级后带来的一些坑
Mongodb implements creating and deleting databases, creating and deleting tables (sets), and adding, deleting, modifying, and querying data

Treasure and niche cover PBR multi-channel mapping material website sharing

Netease Yunxin officially joined the smart hospital branch of China Medical Equipment Association to accelerate the construction of smart hospitals across the country

The relationship between the development of cloud computing technology and chip processor

【 protobuf 】 quelques puits causés par la mise à niveau de protobuf

茂莱光学科创板上市:拟募资4亿 范一与范浩兄弟为实控人

花店橱窗布置【动态规划】

Comment installer la base de données MySQL 8.0 sous Windows? (tutoriel graphique)
随机推荐
基于启发式搜索的一字棋
孙老师版本JDBC(2022年6月12日21:34:25)
[Bayesian classification 2] naive Bayesian classifier
Establish a connection with MySQL
慕课11、微服务的用户认证与授权
Netease Yunxin officially joined the smart hospital branch of China Medical Equipment Association to accelerate the construction of smart hospitals across the country
Listing of maolaiguang discipline on the Innovation Board: it is planned to raise 400million yuan. Fanyi and fanhao brothers are the actual controllers
Treasure and niche cover PBR multi-channel mapping material website sharing
Muke 8. Service fault tolerance Sentinel
Matrix calculator design for beginners of linear algebra based on Qt development
leetcode刷题:字符串03(剑指 Offer 05. 替换空格)
众多碎石3d材质贴图素材一键即可获取
JWT操作工具类分享
【贝叶斯分类4】贝叶斯网
0 basic C language (1)
SAP Spartacus 默认路由配置的工作原理
如何用 SAP BTP 平台上的图形建模器创建一个 OData 服务
0 basic C language (2)
Looking back at the moon
茂莱光学科创板上市:拟募资4亿 范一与范浩兄弟为实控人