当前位置:网站首页>Keras deep learning practice - recommend system data coding
Keras deep learning practice - recommend system data coding
2022-07-27 13:48:00 【Hope Xiaohui】
Keras Deep learning practice —— Recommended system data code
0. Preface
stay 《 Detailed explanation of self encoder 》 in , We introduced the necessity of data coding , At the same time, taking image coding as an example, the self encoder is realized (AutoEncoder) And its many variants . The recommendation system uses customer and product information , According to the user's interest characteristics and purchase behavior , Recommend information and products of interest to users . In this section , We will encode users and movies in the data set related to movie evaluation .
1. The necessity of recommending system data coding
In order to understand the necessity of data coding in the recommendation system , We consider the application scenario of movie recommendation to users . Similar to text analysis , If we want to make every movie / The user encodes alone , Because there are thousands of movies , It will eventually provide a vector coding of thousands of dimensions for each movie .
Code users in a lower dimension according to their viewing habits , Thus, movies can be grouped according to their similarity , This will help us mine the movies that users are more likely to watch . Similar ideas can also be applied to e-commerce recommendation systems , And recommend products to customers in supermarkets .
2. Recommended system data code
In this paper , We mainly consider the application scenarios of movie recommendation to users , There may be millions of users and thousands of movies in such a database , We cannot encode the data alone . under these circumstances , Data coding will come in handy . One of the most popular techniques used in recommender system coding is matrix decomposition . Next , We will understand how it works and generate embeddeds for users and movies .
2.1 Encode users and movies in the recommendation system
The principle of encoding users and movies is as follows : In terms of considering users' preferences for different movies , If two users like movies similar , Then it means that the two user coding vectors should be similar , Follow the same logic , If two movies are similar , For example, they belong to the same type or have similar cast , Then it means that the two films should have similar coding vectors .
2.2 Data set introduction
The data set used for model training contains user information and the rating information of movies they have watched , The first column represents the user number , The second column represents the movie number , The third column shows the user's rating of the movie , The last column represents the timestamp , The figure below shows some of the data downloaded .

The data set can be downloaded from the following link :https://pan.baidu.com/s/1yYQw6uuXVsj9PHsT68rx1w, Extraction code : ifjr.
2.3 Coding strategy for recommendation system
Before starting to implement the model , We first introduce the coding strategy workflow for the recommendation system , In order to recommend new movies according to the history of movies watched by users :
- Load data set , Assign to users and movies
ID - Convert users and movies to
32Dimensional coding vector - Use
KerasThe function inAPICalculate movies and users32Dot product of dimensional vectors :- If there is
1000000Users and10000A movie , Then the size of the encoded movie matrix is10000 × 32, The size of the user matrix is1000000 × 32Size - The size of the dot product of the two is
1000000 × 10000
- If there is
- Flatten the output after dot product and pass through a full connection layer , Then connect to the output layer , The output layer uses linear activation functions , The range of output values is
1To5, Indicates the predicted user's rating of the movie - Fitting model
- Extract the embedding weights of users and movies respectively
- You can find movies that are similar to a given movie by calculating the similarity between the movie that users are interested in and other movies in the data set
Next , We will encode users and movies into embedded vectors in the recommendation system .
2.4 Implement the coding model of recommendation system
(1) First , Import datasets and required libraries :
import numpy as np
import pandas as pd
from keras.layers import Input, Embedding, Dense, Dropout, merge, Flatten, dot
from keras.models import Model
from keras.optimizers import Adam
column_names = ['User', 'Movies', 'rating', 'timestamp']
ratings = pd.read_csv('u.data', sep='\t', names=column_names)
print(ratings.head())
Printed out use head() The data set data sample information read by the method is as follows :
User Movies rating timestamp
0 196 242 3 881250949
1 186 302 3 891717742
2 22 377 1 878887116
3 244 51 2 880606923
4 166 346 1 886397596
(2) Convert users and movies into classification variables , Created two new variables :User2 and Movies2, Used for classification :
ratings['User2']=ratings['User'].astype('category')
ratings['Movies2']=ratings['Movies'].astype('category')
(3) Assign unique to each user and movie ID:
users = ratings.User.unique()
movies = ratings.Movies.unique()
print(len(users))
print(len(movies))
userid2idx = {
o:i for i,o in enumerate(users)}
moviesid2idx = {
o:i for i,o in enumerate(movies)}
idx2userid = {
i:o for i,o in enumerate(users)}
idx2moviesid = {
i:o for i,o in enumerate(movies)}
(4) take ID Add as a new column to the original table :
ratings['Movies2'] = ratings.Movies.apply(lambda x: moviesid2idx[x])
ratings['User2'] = ratings.User.apply(lambda x: userid2idx[x])
print(ratings.head())
Again using head() The data sample information read by the method after adding a new column is as follows :
User Movies rating timestamp User2 Movies2
0 196 242 3 881250949 0 0
1 186 302 3 891717742 1 1
2 22 377 1 878887116 2 2
3 244 51 2 880606923 3 3
4 166 346 1 886397596 4 4
(5) For each user ID And the movie ID Define embedding :
n_users = ratings.User.nunique()
n_movies = ratings.Movies.nunique()
In the above code , Extract the total number of different users and different movies in the dataset . Next , Defined function embedding_input, This function converts a ID As input and convert it into an embedded vector , The dimension of this vector is n_out, share n_in Input values :
def embedding_input(name,n_in,n_out):
inp = Input(shape=(1,),dtype='int64',name=name)
return inp, Embedding(n_in,n_out,input_length=1)(inp)
Next , Extract one for each user and each movie 32 Coding vector of dimension .
n_factors = 32
user_in, u = embedding_input('user_in', n_users, n_factors)
movie_in, a = embedding_input('article_in', n_movies, n_factors)
(6) Build a neural network model :
import keras.backend as K
def rmse(y_true,y_pred):
score = K.sqrt(K.mean(K.pow(y_true - y_pred, 2)))
return score
x = dot([u,a], axes=1)
x=Flatten()(x)
x = Dense(500, activation='relu')(x)
x = Dense(1)(x)
model = Model([user_in, movie_in],x)
adam = Adam(lr=0.01)
model.compile(adam,loss='mse', metrics=[rmse])
model.summary()
The brief architecture information output of the model is as follows :
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
user_in (InputLayer) [(None, 1)] 0
__________________________________________________________________________________________________
article_in (InputLayer) [(None, 1)] 0
__________________________________________________________________________________________________
embedding (Embedding) (None, 1, 32) 30176 user_in[0][0]
__________________________________________________________________________________________________
embedding_1 (Embedding) (None, 1, 32) 53824 article_in[0][0]
__________________________________________________________________________________________________
dot (Dot) (None, 32, 32) 0 embedding[0][0]
embedding_1[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (None, 1024) 0 dot[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 500) 512500 flatten[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 1) 501 dense[0][0]
==================================================================================================
Total params: 597,001
Trainable params: 597,001
Non-trainable params: 0
__________________________________________________________________________________________________
(7) Fitting model :
model.fit([ratings.User2,ratings.Movies2], ratings.rating,
epochs=50,
batch_size=128)
(8) Extract the vector of each user or movie :
# Extracting user vectors
model.get_weights()[0]
# Extracting movie vectors
model.get_weights()[1]
(9) Last , We will verify whether similar movies have similar embeddedness . When calculating the similarity between embeddings , We usually use cosine similarity to measure . Select the first 600 A movie , The cosine similarity is calculated as follows :
from sklearn.metrics.pairwise import cosine_similarity
print(np.argmax(cosine_similarity(model.get_weights()[1][600].reshape(1,-1),model.get_weights()[1][:600].reshape(600,32))))
According to the above code , We can calculate the relation with 600 The movie with three positions is the most similar ID:
89
Check out the movie ID list , It can be seen directly , With the first 600 The most similar movie is indeed the first 89 A movie .
Related links
Keras Deep learning practice (1)—— Detailed explanation of neural network foundation and model training process
Keras Deep learning practice (2)—— Use Keras Building neural network
Keras Deep learning practice (7)—— Convolution neural network detailed explanation and implementation
Keras Deep learning practice (16)—— Detailed explanation of self encoder
边栏推荐
- 以科技传递温度,vivo亮相数字中国建设峰会
- Install the wireless network card driver
- 2022年7月24日 暑假第二周训练
- js回调函数(callback)
- 小程序毕设作品之微信校园洗衣小程序毕业设计成品(4)开题报告
- Thinkphp+ pagoda operation environment realizes scheduled tasks
- 软考 系统架构设计师 简明教程 | 软件测试
- opencv图像的缩放平移及旋转
- Construction and application of industrial knowledge atlas (2): representation and modeling of commodity knowledge
- Conditions and procedures of futures account opening
猜你喜欢

Evconnlistener of libevent_ new_ bind

Design of network abnormal traffic analysis system

宇宙没有尽头,UTONMOS能否将元宇宙照进现实?

Common distributed theories (cap, base) and consistency protocols (gosssip, raft)

Cute image classification -- a general explanation of the article "what are the common flops in CNN model?"

Install redis and graphical client under MacOS

【实习经验】Date工具类中添加自己实现的方法
![[internship experience] add your own implementation method to the date tool class](/img/56/54c5f62438a627c96f8b4ad06ba863.jpg)
[internship experience] add your own implementation method to the date tool class

Gray histogram

Redis summary: cache avalanche, cache breakdown, cache penetration and cache preheating, cache degradation
随机推荐
滑环的分类以及用途
Cute image classification -- a general explanation of the article "what are the common flops in CNN model?"
【300+精选大厂面试题持续分享】大数据运维尖刀面试题专栏(九)
opencv图像的缩放平移及旋转
Software system architecture designer concise tutorial | software system modeling
How to fix the slip ring
eBPF/Ftrace
Fiddler抓包工具+夜神模拟器
Figure 8 shows you how to configure SNMP
C ftp add, delete, modify, query, create multi-level directory, automatic reconnection, switch directory
Additional: [urlencoder.encode (string to be encoded, "encoding method");] (what is it?; why do we use this to encode when we set values in cookies?) (to be improved...)
Echart line chart displays the last point and vertical dotted line by default
MFC FTP创建多级文件夹、上传文件到FTP指定目录
字节跳动 AI Lab 总监李航:语言模型的过去、现在和未来
【2023复旦微电子提前批笔试题】~ 题目及参考答案
Construction and application of industrial knowledge atlas (I): overview of industrial knowledge atlas
Double material first!
Install redis and graphical client under MacOS
2、Citrix Virtual Apps and Desktops 2203剪贴板重定向策略
QT clipboard qclipboard copy paste custom data