当前位置：网站首页>Vector similarity evaluation method

Vector similarity evaluation method

2022-07-29 02:17:00 【Algorithm with temperature】

Link to the original text ： Vector similarity evaluation method

The use of similarity in work can be said to be quite frequent , Today I'll introduce it to you pytorch Four commonly used vector similarity evaluation ideas ：

CosineSimilarity
DotProductSimilarity
BiLinearSimilarity
MultiHeadedSimilarity

1 Cosine similarity

Cosine similarity is familiar to everyone . The cosine of the angle between two vectors is used to measure the difference between two individuals . The closer the cosine is to 1, It means that the closer the angle is 0 degree , That is, the more similar the two vectors are .

import torch
import torch.nn as nn
import math

class CosineSimilarity(nn.Module):
 
    def forward(self, tensor_1, tensor_2):
        normalized_tensor_1 = tensor_1 / tensor_1.norm(dim=-1, keepdim=True)
        normalized_tensor_2 = tensor_2 / tensor_2.norm(dim=-1, keepdim=True)
        return (normalized_tensor_1 * normalized_tensor_2).sum(dim=-1)

2 `DotProductSimilarity`

This similarity function calculates the dot product between each pair of vectors , And use optional scaling to reduce the variance of the output , To adjust the output of the results .

class DotProductSimilarity(nn.Module):
 
    def __init__(self, scale_output=False):
        super(DotProductSimilarity, self).__init__()
        self.scale_output = scale_output
 
    def forward(self, tensor_1, tensor_2):
        result = (tensor_1 * tensor_2).sum(dim=-1)
        if self.scale_output:
            # TODO why allennlp do multiplication at here ?
            result /= math.sqrt(tensor_1.size(-1))
        return result

Cosine method and dot product method are the most commonly used mathematical methods , In complex scenes, we can add the idea of neural network to the method of calculating similarity .

3 `BiLinearSimilarity`

This similarity function performs a bilinear transformation of two input vectors , Is to add the neural network linear layer . This function has a weight matrix “W” And a deviation “b”, And the similarity between the two vectors , The formula is ：
$x^TWy+b$

After calculation, you can use the activation function , The default is inactive .

class BiLinearSimilarity(nn.Module):
 
    def __init__(self, tensor_1_dim, tensor_2_dim, activation=None):
        super(BiLinearSimilarity, self).__init__()
        self.weight_matrix = nn.Parameter(torch.Tensor(tensor_1_dim, tensor_2_dim))
        self.bias = nn.Parameter(torch.Tensor(1))
        self.activation = activation
        self.reset_parameters()
 
    def reset_parameters(self):
        nn.init.xavier_uniform_(self.weight_matrix)
        self.bias.data.fill_(0)
 
    def forward(self, tensor_1, tensor_2):
        intermediate = torch.matmul(tensor_1, self.weight_matrix)
        result = (intermediate * tensor_2).sum(dim=-1) + self.bias
        if self.activation is not None:
            result = self.activation(result)
        return result

According to this idea , We can evolve trilinear transformation , The formula is ：
$W^T[x,y,x*y]+b$

Only on the original basis, all features and the relationship between features are changed into input , Interested friends can do it by themselves .

4 `MultiHeadedSimilarity`

This similarity function borrows transformer many “ head ” To calculate the similarity . We project the input tensor into several new tensors , And calculate the similarity of each projection tensor .

class MultiHeadedSimilarity(nn.Module):
 
    def __init__(self,
                 num_heads,
                 tensor_1_dim,
                 tensor_1_projected_dim=None,
                 tensor_2_dim=None,
                 tensor_2_projected_dim=None,
                 internal_similarity=DotProductSimilarity()):
        super(MultiHeadedSimilarity, self).__init__()
        self.num_heads = num_heads
        self.internal_similarity = internal_similarity
        tensor_1_projected_dim = tensor_1_projected_dim or tensor_1_dim
        tensor_2_dim = tensor_2_dim or tensor_1_dim
        tensor_2_projected_dim = tensor_2_projected_dim or tensor_2_dim
        if tensor_1_projected_dim % num_heads != 0:
            raise ValueError("Projected dimension not divisible by number of heads: %d, %d"
                             % (tensor_1_projected_dim, num_heads))
        if tensor_2_projected_dim % num_heads != 0:
            raise ValueError("Projected dimension not divisible by number of heads: %d, %d"
                             % (tensor_2_projected_dim, num_heads))
        self.tensor_1_projection = nn.Parameter(torch.Tensor(tensor_1_dim, tensor_1_projected_dim))
        self.tensor_2_projection = nn.Parameter(torch.Tensor(tensor_2_dim, tensor_2_projected_dim))
        self.reset_parameters()
 
    def reset_parameters(self):
        torch.nn.init.xavier_uniform_(self.tensor_1_projection)
        torch.nn.init.xavier_uniform_(self.tensor_2_projection)
 
    def forward(self, tensor_1, tensor_2):
        projected_tensor_1 = torch.matmul(tensor_1, self.tensor_1_projection)
        projected_tensor_2 = torch.matmul(tensor_2, self.tensor_2_projection)
 
        last_dim_size = projected_tensor_1.size(-1) // self.num_heads
        new_shape = list(projected_tensor_1.size())[:-1] + [self.num_heads, last_dim_size]
        split_tensor_1 = projected_tensor_1.view(*new_shape)
        last_dim_size = projected_tensor_2.size(-1) // self.num_heads
        new_shape = list(projected_tensor_2.size())[:-1] + [self.num_heads, last_dim_size]
        split_tensor_2 = projected_tensor_2.view(*new_shape)
 
        return self.internal_similarity(split_tensor_1, split_tensor_2)

summary

The complex approach is to carry out more linear changes and combinations of linear changes on the basis of vectors . In fact, we can create our own calculation methods according to business scenarios , Because the advantage of neural network is that we can build it ourselves at will .

Link to the original text ： Vector similarity evaluation method

原网站

版权声明
本文为[Algorithm with temperature]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/210/202207290130075105.html