当前位置:网站首页>Bert whitening vector dimension reduction and its application
Bert whitening vector dimension reduction and its application
2022-06-24 14:25:00 【loong_ XL】
Reference resources :https://kexue.fm/archives/8069
https://kexue.fm/archives/9079
https://zhuanlan.zhihu.com/p/531476789
Input :vv Is a three-dimensional matrix composed of multiple vectors 
result :v_data1 256 dimension 
def compute_kernel_bias(vecs, n_components=256):
""" Calculation kernel and bias
vecs.shape = [num_samples, embedding_size],
The last transformation :y = (x + bias).dot(kernel)
"""
mu = vecs.mean(axis=0, keepdims=True)
cov = np.cov(vecs.T)
# print(cov)
u, s, vh = np.linalg.svd(cov)
print(np.diag(1 / np.sqrt(s) ))
W = np.dot(u, np.diag(1 / np.sqrt(s)))
return W[:, :n_components], -mu
def transform_and_normalize(vecs, kernel=None, bias=None):
""" The final vector is normalized
"""
if not (kernel is None or bias is None):
vecs = (vecs + bias).dot(kernel)
return vecs / (vecs**2).sum(axis=1, keepdims=True)**0.5
v_data = np.array(vv[0]) ## vv[0] A two-dimensional matrix composed of multiple vectors , If you input a vector, the two-dimensional matrix calculation will report an error
kernel,bias=compute_kernel_bias(v_data)
# print(kernel,bias)
v_data1=transform_and_normalize(v_data, kernel=kernel, bias=bias)
*** A single vector on the line will be calculated as a whole kernel,bias use , direct transform_and_normalize(v_data, kernel=kernel, bias=bias) Just go
import numpy as np
data = np.random.rand(5,768)
print('data.shape = ')
print(data.shape,data)
def compute_kernel_bias(vecs):
""" Calculation kernel and bias
vecs.shape = [num_samples, embedding_size],
The last transformation :y = (x + bias).dot(kernel)
"""
mu = vecs.mean(axis=0, keepdims=True)
cov = np.cov(vecs.T)
u, s, vh = np.linalg.svd(cov)
W = np.dot(u, np.diag(1 / np.sqrt(s)))
return W, -mu
def transform_and_normalize(vecs, kernel=None, bias=None):
""" Apply transformation , And then standardize
"""
if not (kernel is None or bias is None):
vecs = (vecs + bias).dot(kernel)
return vecs / (vecs**2).sum(axis=1, keepdims=True)**0.5
kernel,bias = compute_kernel_bias(data)
kernel = kernel[:,:64]
print('kernel.shape = ')
print(kernel.shape)
print('bias.shape = ')
print(bias.shape)
data = transform_and_normalize(data, kernel, bias)
print('data.shape = ')
print(data.shape,data)

Dimension reduction of single vector on line
data1 = np.random.rand(1,768)
data1_1 = transform_and_normalize(data1, kernel, bias)

边栏推荐
- 文本对比学习综述
- 一文搞定 UDP 和 TCP 高频面试题!
- 六月集训(第24天) —— 线段树
- Generate binary tree according to preorder & inorder traversal [partition / generation / splicing of left subtree | root | right subtree]
- How to implement redis cache of highly paid programmers & interview questions series 115? How do I find a hot key? What are the possible problems with caching?
- Data sharing between laravel lower views
- leetcode 139. Word Break 单词拆分(中等)
- IList of PostgreSQL
- GO语言并发模型-MPG模型
- Digital business cloud: strengthen supplier management and promote efficient collaboration between air transport enterprises and suppliers
猜你喜欢

OpenHarmony 1

卷积核、特征图可视化

Digital business cloud: strengthen supplier management and promote efficient collaboration between air transport enterprises and suppliers

P2PDB 白皮书

leetcode:1504. Count the number of all 1 sub rectangles

Solution of channel management system for food and beverage industry: realize channel digital marketing layout

数商云:加强供应商管理,助推航空运输企业与供应商高效协同

Unity 热力图建立方法

SaaS management system solution of smart Park: enabling the park to realize information and digital management

Go语言三个高效编程的技巧
随机推荐
09_ An efficient memory method
Convolution kernel and characteristic graph visualization
在CVS中恢复到早期版本
Qunhui synchronizes with alicloud OSS
R language constructs regression model diagnosis (normality is invalid), performs variable transformation, and uses powertransform function in car package to perform box Cox transform to normality on
R language plot visualization: use plot to visualize the training set and test set after data division, use different shape label representation, training set, test set, and display training and test
业务与技术双向结合构建银行数据安全管理体系
conda和pip命令
Method of establishing unity thermodynamic diagram
leetcode 139. Word Break 单词拆分(中等)
`Thymeleaf ` template engine comprehensive analysis
鲲鹏arm服务器编译安装PaddlePaddle
A review of text contrastive learning
How to implement redis cache of highly paid programmers & interview questions series 115? How do I find a hot key? What are the possible problems with caching?
GO语言并发模型-MPG模型
How to solve the problem that iterative semi supervised training is difficult to implement in ASR training? RTC dev Meetup
Keras深度学习实战(11)——可视化神经网络中间层输出
成功解决:selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This versi
Explore cloud native databases and take a broad view of future technological development
简谈企业Power BI CI /CD 实施框架