当前位置:网站首页>[machine learning Q & A] cosine similarity, cosine distance, Euclidean distance and the meaning of distance in machine learning
[machine learning Q & A] cosine similarity, cosine distance, Euclidean distance and the meaning of distance in machine learning
2022-06-30 01:34:00 【Sickle leek】
Cosine similarity 、 Cosine distance 、 European distance and the meaning of distance in machine learning
In machine learning problems , Features are usually expressed in the form of vectors , So when analyzing the similarity between two eigenvectors , Regular use
Cosine similarity To express . The range of cosine similarity is [ -1, 1 ], The similarity between the same two vectors is 1, take 1 Subtract cosine similarity
Cosine distance . therefore , The value range of cosine distance is [0, 2], The cosine distance of the same two vectors is 0. problem 1: Why use cosine similarity instead of Euclidean distance in some scenes ?
For two vectors A and B, The rest of string similarity is defined as c o s ( A , B ) = A ⋅ B ∣ ∣ A ∣ ∣ 2 ∣ ∣ B ∣ ∣ 2 cos(A, B)=\frac{A\cdot B}{||A||_2||B||_2} cos(A,B)=∣∣A∣∣2∣∣B∣∣2A⋅B, Cosine of the angle between two vectors , It's about the angular relationship between vectors , Don't care about their absolute size , Its value range is [-1, 1].
When the length difference of a pair of text similarity is large , But when the content is similar , If word frequency or word vector is used as a feature , Their Euclidean distance in the feature space is usually very large ; If cosine similarity is used , The angle between them may be very small , So the similarity is high .
Besides , In the text 、 Images 、 Video and other fields , The research object's characteristic dimension is often very high , Cosine similarity remains in high dimension “ Same as 1, When it's orthogonal, it's 0, On the contrary -1” The nature of , and The value of Euclidean distance is affected by the dimension , The range is not fixed , And the meaning is vague .
In some scenarios , Such as Word2Vec in , The module length of its vector is normalized , At this point, there is a monotonic relationship between Euclidean distance and cosine distance , namely :
∣ ∣ A − B ∣ ∣ 2 = 2 ( 1 − c o s ( A , B ) ) ||A-B||_2=\sqrt{2(1-cos(A,B))} ∣∣A−B∣∣2=2(1−cos(A,B))
among , ∣ ∣ A − B ∣ ∣ 2 ||A-B||_2 ∣∣A−B∣∣2 It means Euclidean distance , c o s ( A , B ) cos(A,B) cos(A,B) Represents cosine similarity , ( 1 − c o s ( A , B ) ) (1-cos(A,B)) (1−cos(A,B)) Represents the cosine distance . In this scenario , Select minimum distance ( The similarity is the biggest ) Nearest neighbor , Then the result of using cosine similarity and Euclidean distance is the same .
On the whole , The Euclidean distance reflects the absolute difference in value , And cosine distance represents the relative difference in direction .
problem 2: Whether cosine distance is a strictly defined distance ?
Be careful : Cosine distance is not a strictly defined distance !
The definition of distance : In a collection , If each pair of elements can uniquely determine a real number , Make three axioms of distance ( Positive definiteness 、 symmetry 、 Trigonometric inequality ) establish , Then the real number can be called the distance between the two elements .
(1) Positive definiteness
According to the definition of cosine distance , Yes
d i s t ( A , B ) = 1 − c o s θ = ∣ ∣ A ∣ ∣ 2 ∣ ∣ B ∣ ∣ 2 − A B ∣ ∣ A ∣ ∣ 2 ∣ ∣ B ∣ ∣ 2 dist(A, B)=1-cos \theta = \frac{||A||_2||B||_2-AB}{||A||_2||B||_2} dist(A,B)=1−cosθ=∣∣A∣∣2∣∣B∣∣2∣∣A∣∣2∣∣B∣∣2−AB
in consideration of ∣ ∣ A ∣ ∣ 2 ∣ ∣ B ∣ ∣ 2 − A B ≥ 0 ||A||_2||B||_2-AB\ge 0 ∣∣A∣∣2∣∣B∣∣2−AB≥0, So there is d i s t ( A , B ) ≥ 0 dist(A, B)\ge 0 dist(A,B)≥0 Hang up .
(2) symmetry
According to the definition of cosine distance , Yes
d i s t ( A , B ) = ∣ ∣ A ∣ ∣ 2 ∣ ∣ B ∣ ∣ 2 − A B ∣ ∣ A ∣ ∣ 2 ∣ ∣ B ∣ ∣ 2 = ∣ ∣ B ∣ ∣ 2 ∣ ∣ A ∣ ∣ 2 − A B ∣ ∣ B ∣ ∣ 2 ∣ ∣ A ∣ ∣ 2 dist(A, B)=\frac{||A||_2||B||_2-AB}{||A||_2||B||_2}=\frac{||B||_2||A||_2-AB}{||B||_2||A||_2} dist(A,B)=∣∣A∣∣2∣∣B∣∣2∣∣A∣∣2∣∣B∣∣2−AB=∣∣B∣∣2∣∣A∣∣2∣∣B∣∣2∣∣A∣∣2−AB
Satisfy symmetry
(3) Trigonometric inequality
This property does not hold , Counter example . Given A=(1,0),B=(1,1),C=(0,1), Then there are d i s t ( A , B ) = 1 − 2 2 dist(A,B)=1-\frac{\sqrt{2}}{2} dist(A,B)=1−22, d i s t ( B , C ) = 1 − 2 2 dist(B,C)=1-\frac{\sqrt{2}}{2} dist(B,C)=1−22, d i s t ( A , C ) = 1 dist(A,C)=1 dist(A,C)=1
So there is d i s t ( A , B ) + d i s t ( B , C ) = 2 − 2 < 1 = d i s t ( A , C ) dist(A,B)+dist(B,C)=2-\sqrt{2}<1=dist(A,C) dist(A,B)+dist(B,C)=2−2<1=dist(A,C)
The cosine distance satisfies positive definiteness and symmetry , Not satisfied with trigonometric inequality .
in addition , We know that Euclidean distance and cosine distance on unit circle satisfy
∣ ∣ A − B ∣ ∣ = 2 ( 1 − c o s ( A , B ) ) = 2 d i s t ( A , B ) ||A-B||=\sqrt{2(1-cos(A,B))}=\sqrt{2dist(A,B)} ∣∣A−B∣∣=2(1−cos(A,B))=2dist(A,B)
It has the following relationship :
d i s t ( A , B ) = 1 2 ∣ ∣ A − B ∣ ∣ 2 dist(A, B)=\frac{1}{2}||A-B||^2 dist(A,B)=21∣∣A−B∣∣2
Obviously on the unit circle , The range of cosine distance and Euclidean distance is [0,2], The known Euclidean distance is a legal distance , The cosine distance has a quadratic relationship with the Euclidean distance , Nature does not satisfy the trigonometric inequality .
In machine learning , It is commonly known as distance , But it is not only cosine distance that does not satisfy the three distance axioms , also KL distance (Kullback-Leibler Divergence), It's also called Relative entropy , It is often used in Calculate the difference between the two distributions , But it does not satisfy symmetry and trigonometric inequality .
In the field of machine learning ,A/B Testing is the main means to verify the final effect of the model .
problem 1: After a full off-line evaluation of the model , Why go online A/B test ?
(1) Off line evaluation cannot eliminate the influence of model over fitting ; therefore , It is concluded that the The effect of offline evaluation cannot completely replace the result of online evaluation ;
(2) Offline evaluation cannot completely restore the online engineering environment . commonly , Offline assessments often do not take into account delays in online environments 、 Data loss 、 Missing tag data, etc .
(3) Online system Some business metrics cannot be calculated in an offline assessment . such as , A new recommendation algorithm has been launched , Offline people often care about ROC curve 、P-R Curves, etc ; Online evaluation can fully understand the user click through rate brought by the recommendation algorithm 、 Retention time 、PV Changes in traffic, etc .
problem 2: How to do online A/B test ?
(1) Conduct A/B The main means of testing is to Users divide buckets , Divide users into experimental group and control group , Apply the new model to the users of the experimental group , Apply the old model to the control group ;
(2) In the process of separating buckets , it is to be noted that The independence of the sample and Unbiased sampling , Make sure The same user can only be allocated to the same bucket each time ;
problem 3: How to divide the experimental group and the control group ?
The algorithm engineer of a certain company is very helpful to American users , Developed a new video recommendation model A; The recommendation model being used for all users is B.
The right way : Will all U.S. users according to user_id Single digit parity was divided into experimental group and control group , Apply models separately A and B, To validate the model A The effect of ;
Reference material
[1] 《 Baimian machine learning 》 Chapter two : Model to evaluate
[2] entropy Entropy – Shannon entropy 、 Relative entropy 、 Cross entropy 、 Conditional entropy
边栏推荐
- Cub school learning: manual query and ADC in-depth use
- Cookie encryption 10
- Cookie encryption 8
- Varnish foundation overview 3
- Statsmodels notes STL
- 3-6sql injection website instance step 5: break through the background to obtain web administrator permissions
- Interview summary
- 挖财的课程靠谱吗,让开户安全吗?
- Gesture digital enlightenment learning machine
- GeoTools:WKT、GeoJson、Feature、FeatureCollection相互转换常用工具
猜你喜欢
随机推荐
Mechanical --nx2007 (UG) -- gap analysis (interference inspection)
Sorting out the usage of transforms in pytoch
【图神经网络】图分类学习研究综述[2]:基于图神经网络的图分类
工具与生活服务
Machine learning notes: time series decomposition STL
R language linear regression model fitting diagnosis outliers analysis of domestic gas consumption and calorie examples with self-test questions
C language score ranking
How does webapi relate to the database of MS SQL?
3-6sql injection website instance step 5: break through the background to obtain web administrator permissions
Module import reload method
Mysql 监控5
Three text to speech artifacts, each of which is very practical
Mysql 监控6
Pytroch Learning Notes 6: NN network layer convolution layer
Conjecture of prime pairs in C language
Varnish 基础概览8
App test related tools
Chiffrement des cookies 8
首届技术播客月开播在即
Sentinel source code analysis Part 6 - sentinel adapter module Chapter 4 zuul2 gateway









