当前位置:网站首页>Process the dataset and use labelencoder to convert all IDs to start from 0
Process the dataset and use labelencoder to convert all IDs to start from 0
2022-07-03 02:43:00 【strawberry47】
Data sets in the field of recommended algorithms always start from 1 Start , Or a string of numbers , Every time you deal with it, you need one more user2id The operation of , It's a real hassle
Simply handle it before using the dataset , And save it user2id Dictionaries , Convenient for follow-up query
Pay attention to the :
- sep To change to the separator of the current dataset (’ ‘,’\t’)
- names Change to the column name of the current dataset
The code is as follows :
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
def load_mat():
data_path = '../dataset/ml-100k/u.data'
df_data = pd.read_csv(data_path, header = None, sep='\t', names =['user_id', 'item_id', 'rating','time'])
lbe_user = LabelEncoder()
lbe_user.fit(df_data['user_id'].unique())
converted_user = lbe_user.transform(df_data['user_id'])
lbe_item = LabelEncoder() # Make it discrete
lbe_item.fit(df_data['item_id'].unique())
converted_item = lbe_item.transform(df_data['item_id'])
converted_data = pd.DataFrame()
converted_data['user_id'] = converted_user
converted_data['item_id'] = converted_item
converted_data['rating'] = df_data['rating']
# Corresponding relation
user2id = {
}
for user in lbe_user.classes_:
user2id.update({
user: lbe_user.transform([user])[0]})
item2id = {
}
for item in lbe_item.classes_:
item2id.update({
item: lbe_item.transform([item])[0]})
return converted_data,user2id,item2id
def save(converted_data,user2id,item2id):
sort = converted_data.sort_values(by=['user_id'])
sort.to_csv('../dataset/ml-100k/data_converted', header=None, index=False)
np.save('../dataset/ml-100k/user2id.npy', user2id)
np.save('../dataset/ml-100k/item2id.npy', item2id)
print('successfully saved')
if __name__ == '__main__':
converted_data,user2id,item2id = load_mat()
save(converted_data,user2id,item2id)
边栏推荐
- Restcloud ETL cross database data aggregation operation
- How to change the panet layer in yolov5 to bifpn
- Kubernetes cluster log and efk architecture log scheme
- sql server数据库添加 mdf数据库文件,遇到的报错
- GBase 8c系统表-pg_aggregate
- Tongda OA homepage portal workbench
- 【翻译】Flux安全。通过模糊处理获得更多信心
- 怎么将yolov5中的PANet层改为BiFPN
- Gbase 8C system table PG_ amop
- SQL server queries the table structure of the specified table
猜你喜欢
![[hcia]no.15 communication between VLANs](/img/59/a467c5920cbccb72040f39f719d701.jpg)
[hcia]no.15 communication between VLANs

Tongda OA homepage portal workbench

Summary of interview project technology stack

HW-初始准备
![Error when installing MySQL in Linux: starting mysql The server quit without updating PID file ([FAILED]al/mysql/data/l.pid](/img/32/25771baad1ed06c5a592087df748f1.jpg)
Error when installing MySQL in Linux: starting mysql The server quit without updating PID file ([FAILED]al/mysql/data/l.pid
![[translation] modern application load balancing with centralized control plane](/img/b0/22e9bf098d580b2af67255ddcdc0d5.jpg)
[translation] modern application load balancing with centralized control plane

Didi programmers are despised by relatives: an annual salary of 800000 is not as good as two teachers

基于can总线的A2L文件解析(2)

Xiaodi notes

random shuffle注意
随机推荐
Gbase 8C create user / role example 2
GBase 8c系统表-pg_aggregate
Gbase 8C system table PG_ aggregate
Global and Chinese ammonium dimolybdate market in-depth analysis and prospect risk prediction report 2022 Edition
Practice of traffic recording and playback in vivo
Kubernetes cluster log and efk architecture log scheme
"Analysis of 43 cases of MATLAB neural network": Chapter 43 efficient programming skills of neural network -- Discussion Based on the characteristics of the new version of MATLAB r2012b
Use optimization | points that can be optimized in recyclerview
The difference between left value and right value in C language
What does "where 1=1" mean
Today, it's time to copy the bottom!
Gbase 8C system table PG_ amproc
Error when installing MySQL in Linux: starting mysql The server quit without updating PID file ([FAILED]al/mysql/data/l.pid
【教程】chrome关闭跨域策略cors、samesite,跨域带上cookie
Xiaodi notes
Gbase 8C system table PG_ cast
【教程】chrome關閉跨域策略cors、samesite,跨域帶上cookie
GBase 8c系统表-pg_collation
简单理解svg
Error invalid bound statement (not found): com ruoyi. stock. mapper. StockDetailMapper. XXXX solution