当前位置:网站首页>Bert-whitening 向量降维及使用
Bert-whitening 向量降维及使用
2022-06-24 13:04:00 【loong_XL】
参考:https://kexue.fm/archives/8069
https://kexue.fm/archives/9079
https://zhuanlan.zhihu.com/p/531476789
输入:vv是多个向量组成的三维矩阵
结果:v_data1 256维度
def compute_kernel_bias(vecs, n_components=256):
"""计算kernel和bias
vecs.shape = [num_samples, embedding_size],
最后的变换:y = (x + bias).dot(kernel)
"""
mu = vecs.mean(axis=0, keepdims=True)
cov = np.cov(vecs.T)
# print(cov)
u, s, vh = np.linalg.svd(cov)
print(np.diag(1 / np.sqrt(s) ))
W = np.dot(u, np.diag(1 / np.sqrt(s)))
return W[:, :n_components], -mu
def transform_and_normalize(vecs, kernel=None, bias=None):
""" 最终向量标准化
"""
if not (kernel is None or bias is None):
vecs = (vecs + bias).dot(kernel)
return vecs / (vecs**2).sum(axis=1, keepdims=True)**0.5
v_data = np.array(vv[0]) ## vv[0]多个向量组成的二维矩阵,如果输入一个向量的二维矩阵计算会报错
kernel,bias=compute_kernel_bias(v_data)
# print(kernel,bias)
v_data1=transform_and_normalize(v_data, kernel=kernel, bias=bias)
***线上单个向量就把上面整体计算出的kernel,bias用上,直接transform_and_normalize(v_data, kernel=kernel, bias=bias)就行
import numpy as np
data = np.random.rand(5,768)
print('data.shape = ')
print(data.shape,data)
def compute_kernel_bias(vecs):
"""计算kernel和bias
vecs.shape = [num_samples, embedding_size],
最后的变换:y = (x + bias).dot(kernel)
"""
mu = vecs.mean(axis=0, keepdims=True)
cov = np.cov(vecs.T)
u, s, vh = np.linalg.svd(cov)
W = np.dot(u, np.diag(1 / np.sqrt(s)))
return W, -mu
def transform_and_normalize(vecs, kernel=None, bias=None):
"""应用变换,然后标准化
"""
if not (kernel is None or bias is None):
vecs = (vecs + bias).dot(kernel)
return vecs / (vecs**2).sum(axis=1, keepdims=True)**0.5
kernel,bias = compute_kernel_bias(data)
kernel = kernel[:,:64]
print('kernel.shape = ')
print(kernel.shape)
print('bias.shape = ')
print(bias.shape)
data = transform_and_normalize(data, kernel, bias)
print('data.shape = ')
print(data.shape,data)

线上单个向量降维
data1 = np.random.rand(1,768)
data1_1 = transform_and_normalize(data1, kernel, bias)

边栏推荐
- MySQL日志管理、备份与恢复
- 一文搞定 UDP 和 TCP 高频面试题!
- 【比特熊故事汇】6月MVP英雄故事|技术实践碰撞境界思维
- 数据库注意事项
- 21set classic case
- Development of digital Tibetan product system NFT digital Tibetan product system exception handling source code sharing
- Py之toad:toad的简介、安装、使用方法之详细攻略
- How to avoid placing duplicate orders
- conda和pip命令
- Record various sets of and or of mongotemplate once
猜你喜欢
![Generate binary tree according to preorder & inorder traversal [partition / generation / splicing of left subtree | root | right subtree]](/img/f7/8d026c0e4435fc8fd7a63616b4554d.png)
Generate binary tree according to preorder & inorder traversal [partition / generation / splicing of left subtree | root | right subtree]

postgresql之List

Digital business cloud: strengthen supplier management and promote efficient collaboration between air transport enterprises and suppliers

【深度学习】NCHW、NHWC和CHWN格式数据的存储形式

如何解决 Iterative 半监督训练 在 ASR 训练中难以落地的问题丨RTC Dev Meetup

markdown/LaTeX中在字母下方输入圆点的方法

食品饮料行业渠道商管理系统解决方案:实现渠道数字化营销布局
![[learn ZABBIX from scratch] I. Introduction and deployment of ZABBIX](/img/d1/4b21c8049f0377b54a18a9b267432e.png)
[learn ZABBIX from scratch] I. Introduction and deployment of ZABBIX

Unity 热力图建立方法

数商云:加强供应商管理,助推航空运输企业与供应商高效协同
随机推荐
怎样评价国产报表工具和BI软件
ESP32系列--ESP32各个系列对比
Télétravail: Camping à la maison gadgets de bureau | rédaction communautaire
百度地图API绘制点及提示信息
In the eyes of the universe, how to correctly care about counting East and West?
六月集训(第24天) —— 线段树
c语言---18 函数(自定义函数)
win10系统问题
【从零开始学zabbix】一丶Zabbix的介绍与部署Zabbix
15 differences between MES in process and discrete manufacturing enterprises (Part 2)
Overview of SAP marketing cloud functions (III)
IDEA连接mysql自定义生成实体类代码
A review of text contrastive learning
Telecommuting: camping at home office gadgets | community essay solicitation
Halcon 绘制区域 到图片中
leetcode 139. Word Break 单词拆分(中等)
GO语言并发模型-MPG模型
简谈企业Power BI CI /CD 实施框架
R语言plotly可视化:使用plotly可视化数据划分后的训练集和测试集、使用不同的形状标签表征、训练集、测试集、以及数据集的分类标签(Display training and test split
Unit contour creation method