当前位置:网站首页>[machine learning Q & A] cosine similarity, cosine distance, Euclidean distance and the meaning of distance in machine learning
[machine learning Q & A] cosine similarity, cosine distance, Euclidean distance and the meaning of distance in machine learning
2022-06-30 01:34:00 【Sickle leek】
Cosine similarity 、 Cosine distance 、 European distance and the meaning of distance in machine learning
In machine learning problems , Features are usually expressed in the form of vectors , So when analyzing the similarity between two eigenvectors , Regular use
Cosine similarity To express . The range of cosine similarity is [ -1, 1 ], The similarity between the same two vectors is 1, take 1 Subtract cosine similarity
Cosine distance . therefore , The value range of cosine distance is [0, 2], The cosine distance of the same two vectors is 0. problem 1: Why use cosine similarity instead of Euclidean distance in some scenes ?
For two vectors A and B, The rest of string similarity is defined as c o s ( A , B ) = A ⋅ B ∣ ∣ A ∣ ∣ 2 ∣ ∣ B ∣ ∣ 2 cos(A, B)=\frac{A\cdot B}{||A||_2||B||_2} cos(A,B)=∣∣A∣∣2∣∣B∣∣2A⋅B, Cosine of the angle between two vectors , It's about the angular relationship between vectors , Don't care about their absolute size , Its value range is [-1, 1].
When the length difference of a pair of text similarity is large , But when the content is similar , If word frequency or word vector is used as a feature , Their Euclidean distance in the feature space is usually very large ; If cosine similarity is used , The angle between them may be very small , So the similarity is high .
Besides , In the text 、 Images 、 Video and other fields , The research object's characteristic dimension is often very high , Cosine similarity remains in high dimension “ Same as 1, When it's orthogonal, it's 0, On the contrary -1” The nature of , and The value of Euclidean distance is affected by the dimension , The range is not fixed , And the meaning is vague .
In some scenarios , Such as Word2Vec in , The module length of its vector is normalized , At this point, there is a monotonic relationship between Euclidean distance and cosine distance , namely :
∣ ∣ A − B ∣ ∣ 2 = 2 ( 1 − c o s ( A , B ) ) ||A-B||_2=\sqrt{2(1-cos(A,B))} ∣∣A−B∣∣2=2(1−cos(A,B))
among , ∣ ∣ A − B ∣ ∣ 2 ||A-B||_2 ∣∣A−B∣∣2 It means Euclidean distance , c o s ( A , B ) cos(A,B) cos(A,B) Represents cosine similarity , ( 1 − c o s ( A , B ) ) (1-cos(A,B)) (1−cos(A,B)) Represents the cosine distance . In this scenario , Select minimum distance ( The similarity is the biggest ) Nearest neighbor , Then the result of using cosine similarity and Euclidean distance is the same .
On the whole , The Euclidean distance reflects the absolute difference in value , And cosine distance represents the relative difference in direction .
problem 2: Whether cosine distance is a strictly defined distance ?
Be careful : Cosine distance is not a strictly defined distance !
The definition of distance : In a collection , If each pair of elements can uniquely determine a real number , Make three axioms of distance ( Positive definiteness 、 symmetry 、 Trigonometric inequality ) establish , Then the real number can be called the distance between the two elements .
(1) Positive definiteness
According to the definition of cosine distance , Yes
d i s t ( A , B ) = 1 − c o s θ = ∣ ∣ A ∣ ∣ 2 ∣ ∣ B ∣ ∣ 2 − A B ∣ ∣ A ∣ ∣ 2 ∣ ∣ B ∣ ∣ 2 dist(A, B)=1-cos \theta = \frac{||A||_2||B||_2-AB}{||A||_2||B||_2} dist(A,B)=1−cosθ=∣∣A∣∣2∣∣B∣∣2∣∣A∣∣2∣∣B∣∣2−AB
in consideration of ∣ ∣ A ∣ ∣ 2 ∣ ∣ B ∣ ∣ 2 − A B ≥ 0 ||A||_2||B||_2-AB\ge 0 ∣∣A∣∣2∣∣B∣∣2−AB≥0, So there is d i s t ( A , B ) ≥ 0 dist(A, B)\ge 0 dist(A,B)≥0 Hang up .
(2) symmetry
According to the definition of cosine distance , Yes
d i s t ( A , B ) = ∣ ∣ A ∣ ∣ 2 ∣ ∣ B ∣ ∣ 2 − A B ∣ ∣ A ∣ ∣ 2 ∣ ∣ B ∣ ∣ 2 = ∣ ∣ B ∣ ∣ 2 ∣ ∣ A ∣ ∣ 2 − A B ∣ ∣ B ∣ ∣ 2 ∣ ∣ A ∣ ∣ 2 dist(A, B)=\frac{||A||_2||B||_2-AB}{||A||_2||B||_2}=\frac{||B||_2||A||_2-AB}{||B||_2||A||_2} dist(A,B)=∣∣A∣∣2∣∣B∣∣2∣∣A∣∣2∣∣B∣∣2−AB=∣∣B∣∣2∣∣A∣∣2∣∣B∣∣2∣∣A∣∣2−AB
Satisfy symmetry
(3) Trigonometric inequality
This property does not hold , Counter example . Given A=(1,0),B=(1,1),C=(0,1), Then there are d i s t ( A , B ) = 1 − 2 2 dist(A,B)=1-\frac{\sqrt{2}}{2} dist(A,B)=1−22, d i s t ( B , C ) = 1 − 2 2 dist(B,C)=1-\frac{\sqrt{2}}{2} dist(B,C)=1−22, d i s t ( A , C ) = 1 dist(A,C)=1 dist(A,C)=1
So there is d i s t ( A , B ) + d i s t ( B , C ) = 2 − 2 < 1 = d i s t ( A , C ) dist(A,B)+dist(B,C)=2-\sqrt{2}<1=dist(A,C) dist(A,B)+dist(B,C)=2−2<1=dist(A,C)
The cosine distance satisfies positive definiteness and symmetry , Not satisfied with trigonometric inequality .
in addition , We know that Euclidean distance and cosine distance on unit circle satisfy
∣ ∣ A − B ∣ ∣ = 2 ( 1 − c o s ( A , B ) ) = 2 d i s t ( A , B ) ||A-B||=\sqrt{2(1-cos(A,B))}=\sqrt{2dist(A,B)} ∣∣A−B∣∣=2(1−cos(A,B))=2dist(A,B)
It has the following relationship :
d i s t ( A , B ) = 1 2 ∣ ∣ A − B ∣ ∣ 2 dist(A, B)=\frac{1}{2}||A-B||^2 dist(A,B)=21∣∣A−B∣∣2
Obviously on the unit circle , The range of cosine distance and Euclidean distance is [0,2], The known Euclidean distance is a legal distance , The cosine distance has a quadratic relationship with the Euclidean distance , Nature does not satisfy the trigonometric inequality .
In machine learning , It is commonly known as distance , But it is not only cosine distance that does not satisfy the three distance axioms , also KL distance (Kullback-Leibler Divergence), It's also called Relative entropy , It is often used in Calculate the difference between the two distributions , But it does not satisfy symmetry and trigonometric inequality .
In the field of machine learning ,A/B Testing is the main means to verify the final effect of the model .
problem 1: After a full off-line evaluation of the model , Why go online A/B test ?
(1) Off line evaluation cannot eliminate the influence of model over fitting ; therefore , It is concluded that the The effect of offline evaluation cannot completely replace the result of online evaluation ;
(2) Offline evaluation cannot completely restore the online engineering environment . commonly , Offline assessments often do not take into account delays in online environments 、 Data loss 、 Missing tag data, etc .
(3) Online system Some business metrics cannot be calculated in an offline assessment . such as , A new recommendation algorithm has been launched , Offline people often care about ROC curve 、P-R Curves, etc ; Online evaluation can fully understand the user click through rate brought by the recommendation algorithm 、 Retention time 、PV Changes in traffic, etc .
problem 2: How to do online A/B test ?
(1) Conduct A/B The main means of testing is to Users divide buckets , Divide users into experimental group and control group , Apply the new model to the users of the experimental group , Apply the old model to the control group ;
(2) In the process of separating buckets , it is to be noted that The independence of the sample and Unbiased sampling , Make sure The same user can only be allocated to the same bucket each time ;
problem 3: How to divide the experimental group and the control group ?
The algorithm engineer of a certain company is very helpful to American users , Developed a new video recommendation model A; The recommendation model being used for all users is B.
The right way : Will all U.S. users according to user_id Single digit parity was divided into experimental group and control group , Apply models separately A and B, To validate the model A The effect of ;
Reference material
[1] 《 Baimian machine learning 》 Chapter two : Model to evaluate
[2] entropy Entropy – Shannon entropy 、 Relative entropy 、 Cross entropy 、 Conditional entropy
边栏推荐
- Varnish 基础概览4
- Varnish foundation overview 4
- Cookie encryption 15 login encryption
- Questions about database: database attachment
- 首届技术播客月开播在即
- C language number prime
- MySQL monitoring 5
- 3-6sql injection website instance step 5: break through the background to obtain web administrator permissions
- Varnish foundation overview 10
- 对深度网络模型量化工作的总结
猜你喜欢

Cookie encryption 12

JS returned content is encoded by Unicode

How to seamlessly transition from traditional microservice framework to service grid ASM

Cookie encryption 13

The first technology podcast month will begin soon

Sentinel source code analysis Part 7 - sentinel adapter module - Summary

C语言 成绩排名

C语言 数素数

Circular right shift of array elements in C language

C language number prime
随机推荐
Cookie加密12
What to remember about the penalty for deduction of points in Item 1
【535. TinyURL 的加密与解密】
Pytorch模型训练到哪里找预训练模型?
Machinery -- nx2007 (UG) finite element analysis tutorial 1 -- simple object
Embedded exit (review and release)
JS returned content is encoded by Unicode
Cookie加密15 登录加密
Varnish foundation overview 10
【机器学习Q&A】准确率、精确率、召回率、ROC和AUC
Stringredistemplate disconnects and the value disappears
Varnish foundation overview 7
Varnish 基础概览2
MySQL monitoring 1
[binary tree] maximum binary tree II
【论文写作】英文论文写作指南
Varnish foundation overview 3
工具与生活服务
【图神经网络】图分类学习研究综述[2]:基于图神经网络的图分类
Pytorch 修改hook源码 获取Per-Layer输出参数(带layer name)