当前位置:网站首页>Example of embedding code for continuous features
Example of embedding code for continuous features
2022-08-03 07:04:00 【WGS.】
Why do continuous features tooemb?
● On the one hand continuous featuresembIt can then be more fully intersected with other features;
● On the other hand, it can make learning more fully,Avoid drastic changes in forecast results due to small changes in values.

实现思路:
- Normalize continuous values;
- Then add a new column to do itlabel encoder;
- codedtensor做emb;
- Take out consecutive valuestensor,然后相乘;
ddd = pd.DataFrame({
'x1': [0.001, 0.002, 0.003], 'x2': [0.1, 0.1, 0.2]})
'''Spell back the encoded value'''
dense_cols = ['x1', 'x2']
dense_cols_enc = [c + '_enc' for c in dense_cols]
for i in range(len(dense_cols)):
enc = LabelEncoder()
ddd[dense_cols_enc[i]] = enc.fit_transform(ddd[dense_cols[i]].values).copy()
print(ddd)
'''计算fields'''
dense_fields = ddd[dense_cols_enc].max().values + 1
dense_fields = dense_fields.astype(np.int32)
offsets = np.array((0, *np.cumsum(dense_fields)[:-1]), dtype=np.longlong)
print(dense_fields, offsets)
'''Do it with the encoded oneemb'''
tensor = torch.tensor(ddd.values)
emb_tensor = tensor[:, -2:] + tensor.new_tensor(offsets).unsqueeze(0)
emb_tensor = emb_tensor.long()
embedding = nn.Embedding(sum(dense_fields) + 1, embedding_dim=4)
torch.nn.init.xavier_uniform_(embedding.weight.data)
dense_emb = embedding(emb_tensor)
print('---', dense_emb.shape)
print(dense_emb.data)
# print(embedding.weight.shape)
# print(embedding.weight.data)
# print(embedding.weight.data[1])
'''Take the original numerical features and increase the dimension for multiplication'''
dense_tensor = torch.unsqueeze(tensor[:, :2], dim=-1)
print('---', dense_tensor.shape)
print(dense_tensor)
dense_emb = dense_emb * dense_tensor
print(dense_emb)
x1 x2 x1_enc x2_enc
0 0.001 0.1 0 0
1 0.002 0.1 1 0
2 0.003 0.2 2 1
[3 2] [0 3]
--- torch.Size([3, 2, 4])
tensor([[[-0.1498, -0.5054, 0.0211, -0.2746],
[ 0.0133, 0.3257, -0.2117, -0.0956]],
[[-0.1296, -0.4524, 0.5334, 0.0894],
[ 0.0133, 0.3257, -0.2117, -0.0956]],
[[ 0.5597, 0.3630, -0.7686, -0.1408],
[ 0.6840, -0.5328, 0.0422, -0.6365]]])
--- torch.Size([3, 2, 1])
tensor([[[0.0010],
[0.1000]],
[[0.0020],
[0.1000]],
[[0.0030],
[0.2000]]], dtype=torch.float64)
tensor([[[-1.4985e-04, -5.0542e-04, 2.1051e-05, -2.7457e-04],
[ 1.3284e-03, 3.2572e-02, -2.1174e-02, -9.5578e-03]],
[[-2.5924e-04, -9.0472e-04, 1.0668e-03, 1.7884e-04],
[ 1.3284e-03, 3.2572e-02, -2.1174e-02, -9.5578e-03]],
[[ 1.6790e-03, 1.0891e-03, -2.3059e-03, -4.2229e-04],
[ 1.3679e-01, -1.0656e-01, 8.4448e-03, -1.2731e-01]]],
dtype=torch.float64, grad_fn=<MulBackward0>)
reference:
https://www.zhihu.com/question/352399723/answer/869939360
有关offsets可以看:
https://blog.csdn.net/qq_42363032/article/details/125928623?spm=1001.2014.3001.5501
边栏推荐
- Scala 基础 (三):运算符和流程控制
- 【EA Price strategy OC1】以实时价格为依据的EA,首月翻仓!】
- ES6中 async 函数、await表达式 的基本用法
- SQLSERVER将子查询数据合并拼接成一个字段
- Content type ‘applicationx-www-form-urlencoded;charset=UTF-8‘ not supported“【已解决】
- PCB制造常用的13种测试方法,你了解几种?
- pyspark --- 空串替换为None
- el-tree设置利用setCheckedNodessetCheckedKeys默认勾选节点,以及通过setChecked新增勾选指定节点
- empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType),
- el-table获取读取数据表中某一行的数据属性
猜你喜欢

MySQL 日期时间类型精确到毫秒

MySQL性能优化(硬件,系统配置,表结构,SQL语句)

【IoU loss】IoU损失函数理解

Nacos下载与安装

MySQL 操作语句大全(详细)

ClickHouse删除数据之delete问题详解

Getting started with el-tabs (tab bar)

信息学奥赛一本通T1450:Knight Moves

【dllogger bug】AttributeError: module ‘dllogger‘ has no attribute ‘StdOutBackend‘

torch.nn.modules.activation.ReLU is not a Module subclass
随机推荐
IFM网络详解及torch复现
MySQL 数据库基础知识(系统化一篇入门)
process.env环境变量配置方式(配置环境变量区分开发环境和生产环境)
El - tree set using setCheckedNodessetCheckedKeys default check nodes, and a new check through setChecked specified node
关于NOI 2022的报到通知
empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType),
pyspark @udf 循环使用变量问题
el-tree设置利用setCheckedNodessetCheckedKeys默认勾选节点,以及通过setChecked新增勾选指定节点
docker-compose部署mysql
el-table gets the data attribute of a row in the read data table
MySQL的安装(详细教程)
El - table column filter functions, control columns show and hide (effect and easy to implement full marks)
【云原生 · Kubernetes】搭建Harbor仓库
链表之打基础--基本操作(必会)
连续型特征做embedding代码示例
Flink对比Spark
Content type ‘applicationx-www-form-urlencoded;charset=UTF-8‘ not supported“【已解决】
信息学奥赛一本通T1446:素数方阵
502 bad gateway原因、解决方法
SQLSERVER将子查询数据合并拼接成一个字段