当前位置:网站首页>Process the dataset and use labelencoder to convert all IDs to start from 0
Process the dataset and use labelencoder to convert all IDs to start from 0
2022-07-03 02:43:00 【strawberry47】
Data sets in the field of recommended algorithms always start from 1 Start , Or a string of numbers , Every time you deal with it, you need one more user2id The operation of , It's a real hassle
Simply handle it before using the dataset , And save it user2id Dictionaries , Convenient for follow-up query
Pay attention to the :
- sep To change to the separator of the current dataset (’ ‘,’\t’)
- names Change to the column name of the current dataset
The code is as follows :
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
def load_mat():
data_path = '../dataset/ml-100k/u.data'
df_data = pd.read_csv(data_path, header = None, sep='\t', names =['user_id', 'item_id', 'rating','time'])
lbe_user = LabelEncoder()
lbe_user.fit(df_data['user_id'].unique())
converted_user = lbe_user.transform(df_data['user_id'])
lbe_item = LabelEncoder() # Make it discrete
lbe_item.fit(df_data['item_id'].unique())
converted_item = lbe_item.transform(df_data['item_id'])
converted_data = pd.DataFrame()
converted_data['user_id'] = converted_user
converted_data['item_id'] = converted_item
converted_data['rating'] = df_data['rating']
# Corresponding relation
user2id = {
}
for user in lbe_user.classes_:
user2id.update({
user: lbe_user.transform([user])[0]})
item2id = {
}
for item in lbe_item.classes_:
item2id.update({
item: lbe_item.transform([item])[0]})
return converted_data,user2id,item2id
def save(converted_data,user2id,item2id):
sort = converted_data.sort_values(by=['user_id'])
sort.to_csv('../dataset/ml-100k/data_converted', header=None, index=False)
np.save('../dataset/ml-100k/user2id.npy', user2id)
np.save('../dataset/ml-100k/item2id.npy', item2id)
print('successfully saved')
if __name__ == '__main__':
converted_data,user2id,item2id = load_mat()
save(converted_data,user2id,item2id)
边栏推荐
- [tutorial] chrome turns off cross domain policies CORS and samesite, and brings cookies across domains
- [shutter] setup of shutter development environment (supplement the latest information | the latest installation tutorial on August 25, 2021)
- Word word word
- Gbase 8C function / stored procedure definition
- 为什么会选择框架?选择什么样的框架
- Your family must be very poor if you fight like this!
- JS的装箱和拆箱
- Gbase 8C system table PG_ amop
- Deep learning: multi-layer perceptron and XOR problem (pytoch Implementation)
- Gbase 8C system table PG_ constraint
猜你喜欢

Tongda OA homepage portal workbench

Today, it's time to copy the bottom!

Can netstat still play like this?

Kubernetes cluster log and efk architecture log scheme

Mathematical statistics -- Sampling and sampling distribution

Deep Reinforcement Learning for Intelligent Transportation Systems: A Survey 论文阅读笔记

Error invalid bound statement (not found): com ruoyi. stock. mapper. StockDetailMapper. XXXX solution
![[translation] the background project has joined the CNCF incubator](/img/0b/e3d2674b1a1cba3ea398cbcb1a018a.png)
[translation] the background project has joined the CNCF incubator

Matlab tips (24) RBF, GRNN, PNN neural network
![[Hcia]No.15 Vlan间通信](/img/59/a467c5920cbccb72040f39f719d701.jpg)
[Hcia]No.15 Vlan间通信
随机推荐
疫情当头,作为Leader如何进行代码版本和需求开发管控?| 社区征文
Kubernetes cluster log and efk architecture log scheme
简单理解svg
Gbase 8C system table PG_ cast
where 1=1 是什么意思
Baidu map - surrounding search
ASP. Net core 6 framework unveiling example demonstration [02]: application development based on routing, MVC and grpc
random shuffle注意
Why choose a frame? What frame to choose
【教程】chrome關閉跨域策略cors、samesite,跨域帶上cookie
Oauth2.0 authentication, login and access "/oauth/token", how to get the value of request header authorization (basictoken)???
GBase 8c触发器(三)
Gbase 8C function / stored procedure definition
左值右指解释的比较好的
SQL server queries the table structure of the specified table
Deep learning: multi-layer perceptron and XOR problem (pytoch Implementation)
Gbase 8C function / stored procedure parameters (II)
[fluent] listview list (map method description of list set | vertical list | horizontal list | code example)
oauth2.0鉴权,登录访问 “/oauth/token”,请求头Authorization(basicToken)如何取值???
[hcia]no.15 communication between VLANs