当前位置:网站首页>处理数据集,使用LabelEncoder将所有id转换为从0开始
处理数据集,使用LabelEncoder将所有id转换为从0开始
2022-07-03 02:39:00 【strawberry47】
推荐算法领域的数据集总是从1开始,或是一串数字,每次处理的时候都要多一个user2id的操作,实在是麻烦
干脆在使用数据集前就处理好,并保存下user2id字典,方便后续查询
注意一下:
- sep要改成当前数据集的分隔符(’ ‘,’\t’)
- names改成当前数据集的列名
代码如下:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
def load_mat():
data_path = '../dataset/ml-100k/u.data'
df_data = pd.read_csv(data_path, header = None, sep='\t', names =['user_id', 'item_id', 'rating','time'])
lbe_user = LabelEncoder()
lbe_user.fit(df_data['user_id'].unique())
converted_user = lbe_user.transform(df_data['user_id'])
lbe_item = LabelEncoder() # 弄成离散的
lbe_item.fit(df_data['item_id'].unique())
converted_item = lbe_item.transform(df_data['item_id'])
converted_data = pd.DataFrame()
converted_data['user_id'] = converted_user
converted_data['item_id'] = converted_item
converted_data['rating'] = df_data['rating']
# 对应关系
user2id = {
}
for user in lbe_user.classes_:
user2id.update({
user: lbe_user.transform([user])[0]})
item2id = {
}
for item in lbe_item.classes_:
item2id.update({
item: lbe_item.transform([item])[0]})
return converted_data,user2id,item2id
def save(converted_data,user2id,item2id):
sort = converted_data.sort_values(by=['user_id'])
sort.to_csv('../dataset/ml-100k/data_converted', header=None, index=False)
np.save('../dataset/ml-100k/user2id.npy', user2id)
np.save('../dataset/ml-100k/item2id.npy', item2id)
print('successfully saved')
if __name__ == '__main__':
converted_data,user2id,item2id = load_mat()
save(converted_data,user2id,item2id)
边栏推荐
- Pytorch convolution network regularization dropblock
- Memory pool (understand the process of new developing space from the perspective of kernel)
- Kubernetes cluster log and efk architecture log scheme
- Principle and application of database
- [principles of multithreading and high concurrency: 1_cpu multi-level cache model]
- HTB-Devel
- GBase 8c系统表-pg_authid
- 左值右指解释的比较好的
- 怎么将yolov5中的PANet层改为BiFPN
- Gbase 8C system table PG_ conversion
猜你喜欢
随机推荐
What does "where 1=1" mean
搭建私有云盘 cloudreve
Create + register sub apps_ Define routes, global routes and sub routes
Your family must be very poor if you fight like this!
【Flutter】shared_ Preferences local storage (introduction | install the shared_preferences plug-in | use the shared_preferences process)
面试八股文整理版
The use of Flink CDC mongodb and the implementation of Flink SQL parsing complex nested JSON data in monggo
面试项目技术栈总结
sql server数据库添加 mdf数据库文件,遇到的报错
Javescript 0.1 + 0.2 = = 0.3 problem
Detailed analysis of micro service component sentinel (hystrix)
"Analysis of 43 cases of MATLAB neural network": Chapter 43 efficient programming skills of neural network -- Discussion Based on the characteristics of the new version of MATLAB r2012b
HW initial preparation
【翻译】后台项目加入了CNCF孵化器
基于can总线的A2L文件解析(2)
Gbase 8C trigger (III)
Simple understanding of SVG
SQL server queries the table structure of the specified table
Face recognition 6-face_ recognition_ Py based on OpenCV, face detection and real-time tracking using Haar cascade and Dlib Library
Gbase 8C system table PG_ amop







![[Hcia]No.15 Vlan间通信](/img/59/a467c5920cbccb72040f39f719d701.jpg)

