当前位置:网站首页>处理数据集,使用LabelEncoder将所有id转换为从0开始
处理数据集,使用LabelEncoder将所有id转换为从0开始
2022-07-03 02:39:00 【strawberry47】
推荐算法领域的数据集总是从1开始,或是一串数字,每次处理的时候都要多一个user2id的操作,实在是麻烦
干脆在使用数据集前就处理好,并保存下user2id字典,方便后续查询
注意一下:
- sep要改成当前数据集的分隔符(’ ‘,’\t’)
- names改成当前数据集的列名
代码如下:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
def load_mat():
data_path = '../dataset/ml-100k/u.data'
df_data = pd.read_csv(data_path, header = None, sep='\t', names =['user_id', 'item_id', 'rating','time'])
lbe_user = LabelEncoder()
lbe_user.fit(df_data['user_id'].unique())
converted_user = lbe_user.transform(df_data['user_id'])
lbe_item = LabelEncoder() # 弄成离散的
lbe_item.fit(df_data['item_id'].unique())
converted_item = lbe_item.transform(df_data['item_id'])
converted_data = pd.DataFrame()
converted_data['user_id'] = converted_user
converted_data['item_id'] = converted_item
converted_data['rating'] = df_data['rating']
# 对应关系
user2id = {
}
for user in lbe_user.classes_:
user2id.update({
user: lbe_user.transform([user])[0]})
item2id = {
}
for item in lbe_item.classes_:
item2id.update({
item: lbe_item.transform([item])[0]})
return converted_data,user2id,item2id
def save(converted_data,user2id,item2id):
sort = converted_data.sort_values(by=['user_id'])
sort.to_csv('../dataset/ml-100k/data_converted', header=None, index=False)
np.save('../dataset/ml-100k/user2id.npy', user2id)
np.save('../dataset/ml-100k/item2id.npy', item2id)
print('successfully saved')
if __name__ == '__main__':
converted_data,user2id,item2id = load_mat()
save(converted_data,user2id,item2id)
边栏推荐
- Kubernetes family container housekeeper pod online Q & A?
- [principles of multithreading and high concurrency: 1_cpu multi-level cache model]
- GBase 8c系统表-pg_aggregate
- GBase 8c系统表-pg_auth_members
- 线程安全的单例模式
- Pytest (6) -fixture (Firmware)
- UDP receive queue and multiple initialization test
- GBase 8c系统表pg_cast
- Tongda OA homepage portal workbench
- [shutter] bottom navigation bar page frame (bottomnavigationbar bottom navigation bar | pageview sliding page | bottom navigation and sliding page associated operation)
猜你喜欢

Linear rectification function relu and its variants in deep learning activation function

What does "where 1=1" mean

【Flutter】shared_ Preferences local storage (introduction | install the shared_preferences plug-in | use the shared_preferences process)

错误Invalid bound statement (not found): com.ruoyi.stock.mapper.StockDetailMapper.xxxx解决

Oauth2.0 authentication, login and access "/oauth/token", how to get the value of request header authorization (basictoken)???

Awk from introduction to earth (0) overview of awk

Add MDF database file to SQL Server database, and the error is reported
![[fluent] JSON model conversion (JSON serialization tool | JSON manual serialization | writing dart model classes according to JSON | online automatic conversion of dart classes according to JSON)](/img/6a/ae44ddb090ce6373f04a550a15f973.jpg)
[fluent] JSON model conversion (JSON serialization tool | JSON manual serialization | writing dart model classes according to JSON | online automatic conversion of dart classes according to JSON)

定了,就选它

Pytest (6) -fixture (Firmware)
随机推荐
面试八股文整理版
Can netstat still play like this?
Principle and application of database
SQL Server Query spécifie la structure de la table
The sandbox explains its vision for the meta universe platform
C语言中左值和右值的区别
MATLAB小技巧(24)RBF,GRNN,PNN-神经网络
GBase 8c系统表-pg_collation
Your family must be very poor if you fight like this!
【Flutter】shared_ Preferences local storage (introduction | install the shared_preferences plug-in | use the shared_preferences process)
[shutter] banner carousel component (shutter_wiper plug-in | swiper component)
A2L file parsing based on CAN bus (2)
leetcode540
线程安全的单例模式
Error when installing MySQL in Linux: starting mysql The server quit without updating PID file ([FAILED]al/mysql/data/l.pid
awk从入门到入土(2)认识awk内置变量和变量的使用
QT qcombobox add qccheckbox (drop-down list box insert check box, including source code + comments)
《MATLAB 神经网络43个案例分析》:第43章 神经网络高效编程技巧——基于MATLAB R2012b新版本特性的探讨
Gbase 8C system table PG_ attribute
Machine learning process and method