当前位置:网站首页>Bert-whitening 向量降维及使用
Bert-whitening 向量降维及使用
2022-06-24 13:04:00 【loong_XL】
参考:https://kexue.fm/archives/8069
https://kexue.fm/archives/9079
https://zhuanlan.zhihu.com/p/531476789
输入:vv是多个向量组成的三维矩阵
结果:v_data1 256维度
def compute_kernel_bias(vecs, n_components=256):
"""计算kernel和bias
vecs.shape = [num_samples, embedding_size],
最后的变换:y = (x + bias).dot(kernel)
"""
mu = vecs.mean(axis=0, keepdims=True)
cov = np.cov(vecs.T)
# print(cov)
u, s, vh = np.linalg.svd(cov)
print(np.diag(1 / np.sqrt(s) ))
W = np.dot(u, np.diag(1 / np.sqrt(s)))
return W[:, :n_components], -mu
def transform_and_normalize(vecs, kernel=None, bias=None):
""" 最终向量标准化
"""
if not (kernel is None or bias is None):
vecs = (vecs + bias).dot(kernel)
return vecs / (vecs**2).sum(axis=1, keepdims=True)**0.5
v_data = np.array(vv[0]) ## vv[0]多个向量组成的二维矩阵,如果输入一个向量的二维矩阵计算会报错
kernel,bias=compute_kernel_bias(v_data)
# print(kernel,bias)
v_data1=transform_and_normalize(v_data, kernel=kernel, bias=bias)
***线上单个向量就把上面整体计算出的kernel,bias用上,直接transform_and_normalize(v_data, kernel=kernel, bias=bias)就行
import numpy as np
data = np.random.rand(5,768)
print('data.shape = ')
print(data.shape,data)
def compute_kernel_bias(vecs):
"""计算kernel和bias
vecs.shape = [num_samples, embedding_size],
最后的变换:y = (x + bias).dot(kernel)
"""
mu = vecs.mean(axis=0, keepdims=True)
cov = np.cov(vecs.T)
u, s, vh = np.linalg.svd(cov)
W = np.dot(u, np.diag(1 / np.sqrt(s)))
return W, -mu
def transform_and_normalize(vecs, kernel=None, bias=None):
"""应用变换,然后标准化
"""
if not (kernel is None or bias is None):
vecs = (vecs + bias).dot(kernel)
return vecs / (vecs**2).sum(axis=1, keepdims=True)**0.5
kernel,bias = compute_kernel_bias(data)
kernel = kernel[:,:64]
print('kernel.shape = ')
print(kernel.shape)
print('bias.shape = ')
print(bias.shape)
data = transform_and_normalize(data, kernel, bias)
print('data.shape = ')
print(data.shape,data)

线上单个向量降维
data1 = np.random.rand(1,768)
data1_1 = transform_and_normalize(data1, kernel, bias)

边栏推荐
- 智慧园区SaaS管理系统解决方案:赋能园区实现信息化、数字化管理
- win10系统问题
- SAP Marketing Cloud 功能概述(三)
- Win10 system problems
- AQS初探
- Puzzle (016.2) finger painting Galaxy
- The function and principle of key in V-for
- Generate binary tree according to preorder & inorder traversal [partition / generation / splicing of left subtree | root | right subtree]
- v-for 中 key的作用和原理
- 港股上市公司公告 API 数据接口
猜你喜欢

【深度学习】NCHW、NHWC和CHWN格式数据的存储形式

Go语言三个高效编程的技巧

Second, the examinee must see | consolidate the preferred question bank to help the examinee make the final dash

v-if 和 v-show 的区别

Overview of SAP marketing cloud functions (IV)

【LeetCode】10、正则表达式匹配

puzzle(016.2)指画星河
![根据前序&中序遍历生成二叉树[左子树|根|右子树的划分/生成/拼接问题]](/img/f7/8d026c0e4435fc8fd7a63616b4554d.png)
根据前序&中序遍历生成二叉树[左子树|根|右子树的划分/生成/拼接问题]

leetcode:1504. Count the number of all 1 sub rectangles

STM32F1与STM32CubeIDE编程实例-WS2812B全彩LED驱动(基于SPI+DMA)
随机推荐
conda和pip命令
MySQL复合索引探究
v-if 和 v-show 的区别
【深度学习】NCHW、NHWC和CHWN格式数据的存储形式
IDEA连接mysql自定义生成实体类代码
文本对比学习综述
在宇宙的眼眸下,如何正确地关心东数西算?
P2pdb white paper
Puzzle (016.2) finger painting Galaxy
win10系统问题
Second, the examinee must see | consolidate the preferred question bank to help the examinee make the final dash
The difference between V-IF and v-show
Kotlin shared mutable state and concurrency
R语言构建回归模型诊断(正态性无效)、进行变量变换、使用car包中的powerTransform函数对目标变量进行Box-Cox变换(Box–Cox transform to normality)
数字臧品系统开发 NFT数字臧品系统异常处理源码分享
【无标题】
【从零开始学zabbix】一丶Zabbix的介绍与部署Zabbix
一文搞定 UDP 和 TCP 高频面试题!
The "little giant" specialized in special new products is restarted, and the "enterprise cloud" digital empowerment
Jupiter notebook operation