当前位置:网站首页>Process the dataset and use labelencoder to convert all IDs to start from 0
Process the dataset and use labelencoder to convert all IDs to start from 0
2022-07-03 02:43:00 【strawberry47】
Data sets in the field of recommended algorithms always start from 1 Start , Or a string of numbers , Every time you deal with it, you need one more user2id
The operation of , It's a real hassle
Simply handle it before using the dataset , And save it user2id
Dictionaries , Convenient for follow-up query
Pay attention to the :
- sep To change to the separator of the current dataset (’ ‘,’\t’)
- names Change to the column name of the current dataset
The code is as follows :
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
def load_mat():
data_path = '../dataset/ml-100k/u.data'
df_data = pd.read_csv(data_path, header = None, sep='\t', names =['user_id', 'item_id', 'rating','time'])
lbe_user = LabelEncoder()
lbe_user.fit(df_data['user_id'].unique())
converted_user = lbe_user.transform(df_data['user_id'])
lbe_item = LabelEncoder() # Make it discrete
lbe_item.fit(df_data['item_id'].unique())
converted_item = lbe_item.transform(df_data['item_id'])
converted_data = pd.DataFrame()
converted_data['user_id'] = converted_user
converted_data['item_id'] = converted_item
converted_data['rating'] = df_data['rating']
# Corresponding relation
user2id = {
}
for user in lbe_user.classes_:
user2id.update({
user: lbe_user.transform([user])[0]})
item2id = {
}
for item in lbe_item.classes_:
item2id.update({
item: lbe_item.transform([item])[0]})
return converted_data,user2id,item2id
def save(converted_data,user2id,item2id):
sort = converted_data.sort_values(by=['user_id'])
sort.to_csv('../dataset/ml-100k/data_converted', header=None, index=False)
np.save('../dataset/ml-100k/user2id.npy', user2id)
np.save('../dataset/ml-100k/item2id.npy', item2id)
print('successfully saved')
if __name__ == '__main__':
converted_data,user2id,item2id = load_mat()
save(converted_data,user2id,item2id)
边栏推荐
- Classes and objects - initialization and cleanup of objects - constructor call rules
- What is the way out for children from poor families?
- Gbase 8C system table PG_ aggregate
- [translation] modern application load balancing with centralized control plane
- leetcode540
- Gbase 8C trigger (I)
- Gbase 8C function / stored procedure definition
- cvpr2022去雨去雾
- Source code analysis | layout file loading process
- Gbase 8C trigger (II)
猜你喜欢
Producer consumer model based on thread pool (including blocking queue)
Choose it when you decide
定了,就选它
Oauth2.0 authentication, login and access "/oauth/token", how to get the value of request header authorization (basictoken)???
Practice of traffic recording and playback in vivo
错误Invalid bound statement (not found): com.ruoyi.stock.mapper.StockDetailMapper.xxxx解决
What is the way out for children from poor families?
HW-初始准备
[flutter] example of asynchronous programming code between future and futurebuilder (futurebuilder constructor setting | handling flutter Chinese garbled | complete code example)
Add automatic model generation function to hade
随机推荐
The solution of "the required function is not supported" in win10 remote desktop connection is to modify the Registry [easy to understand]
Gbase 8C function / stored procedure parameters (II)
GBase 8c触发器(二)
Error when installing MySQL in Linux: starting mysql The server quit without updating PID file ([FAILED]al/mysql/data/l.pid
Gbase 8C system table PG_ attribute
javeScript 0.1 + 0.2 == 0.3的问题
sql server数据库添加 mdf数据库文件,遇到的报错
GBase 8c系统表-pg_authid
Global and Chinese ammonium dimolybdate market in-depth analysis and prospect risk prediction report 2022 Edition
【教程】chrome关闭跨域策略cors、samesite,跨域带上cookie
sql server 查询指定表的表结构
GBase 8c 触发器(一)
Restcloud ETL cross database data aggregation operation
[hcia]no.15 communication between VLANs
Kubernetes cluster log and efk architecture log scheme
Gbase 8C system table PG_ constraint
超好用的日志库 logzero
5. File operation
JMeter performance test JDBC request (query database to obtain database data) use "suggestions collection"
Random Shuffle attention