当前位置:网站首页>Process the dataset and use labelencoder to convert all IDs to start from 0
Process the dataset and use labelencoder to convert all IDs to start from 0
2022-07-03 02:43:00 【strawberry47】
Data sets in the field of recommended algorithms always start from 1 Start , Or a string of numbers , Every time you deal with it, you need one more user2id The operation of , It's a real hassle
Simply handle it before using the dataset , And save it user2id Dictionaries , Convenient for follow-up query
Pay attention to the :
- sep To change to the separator of the current dataset (’ ‘,’\t’)
- names Change to the column name of the current dataset
The code is as follows :
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
def load_mat():
data_path = '../dataset/ml-100k/u.data'
df_data = pd.read_csv(data_path, header = None, sep='\t', names =['user_id', 'item_id', 'rating','time'])
lbe_user = LabelEncoder()
lbe_user.fit(df_data['user_id'].unique())
converted_user = lbe_user.transform(df_data['user_id'])
lbe_item = LabelEncoder() # Make it discrete
lbe_item.fit(df_data['item_id'].unique())
converted_item = lbe_item.transform(df_data['item_id'])
converted_data = pd.DataFrame()
converted_data['user_id'] = converted_user
converted_data['item_id'] = converted_item
converted_data['rating'] = df_data['rating']
# Corresponding relation
user2id = {
}
for user in lbe_user.classes_:
user2id.update({
user: lbe_user.transform([user])[0]})
item2id = {
}
for item in lbe_item.classes_:
item2id.update({
item: lbe_item.transform([item])[0]})
return converted_data,user2id,item2id
def save(converted_data,user2id,item2id):
sort = converted_data.sort_values(by=['user_id'])
sort.to_csv('../dataset/ml-100k/data_converted', header=None, index=False)
np.save('../dataset/ml-100k/user2id.npy', user2id)
np.save('../dataset/ml-100k/item2id.npy', item2id)
print('successfully saved')
if __name__ == '__main__':
converted_data,user2id,item2id = load_mat()
save(converted_data,user2id,item2id)
边栏推荐
猜你喜欢
![Error when installing MySQL in Linux: starting mysql The server quit without updating PID file ([FAILED]al/mysql/data/l.pid](/img/32/25771baad1ed06c5a592087df748f1.jpg)
Error when installing MySQL in Linux: starting mysql The server quit without updating PID file ([FAILED]al/mysql/data/l.pid
![[fluent] listview list (map method description of list set | vertical list | horizontal list | code example)](/img/e5/c01f760b07b495f5b048ea367e0c21.gif)
[fluent] listview list (map method description of list set | vertical list | horizontal list | code example)

easyPOI

Producer consumer model based on thread pool (including blocking queue)

Summary of interview project technology stack

超好用的日志库 logzero

Tongda OA V12 process center
![[shutter] banner carousel component (shutter_wiper plug-in | swiper component)](/img/a6/5c97ef3f34702b83ebf0511501d757.gif)
[shutter] banner carousel component (shutter_wiper plug-in | swiper component)

基于can总线的A2L文件解析(2)

Linear rectification function relu and its variants in deep learning activation function
随机推荐
Interview stereotyped version
错误Invalid bound statement (not found): com.ruoyi.stock.mapper.StockDetailMapper.xxxx解决
【教程】chrome關閉跨域策略cors、samesite,跨域帶上cookie
Counter统计数量后,如何返回有序的key
【翻译】后台项目加入了CNCF孵化器
Gbase 8C function / stored procedure parameters (II)
[Hcia]No.15 Vlan间通信
Matlab tips (24) RBF, GRNN, PNN neural network
Monitoring and management of JVM
Packing and unpacking of JS
Strategy application of Dameng database
The data in servlet is transferred to JSP page, and the problem cannot be displayed using El expression ${}
Cvpr2022 remove rain and fog
[hcia]no.15 communication between VLANs
Simple understanding of SVG
[translation] modern application load balancing with centralized control plane
GBase 8c系统表-pg_auth_members
Apple releases MacOS 11.6.4 update: mainly security fixes
GBase 8c系统表-pg_collation
HTB-Devel