当前位置:网站首页>Johnson–Lindenstrauss Lemma
Johnson–Lindenstrauss Lemma
2022-07-04 02:14:00 【FakeOccupational】
Johnson–Lindenstrauss lemma

lemma : Given ϵ > 0 \epsilon>0 ϵ>0, Random vector module length varies with n Converge exponentially to 1.
along with machine towards The amount x ∈ R n in Of Every time individual sit mark Mining sample since N ( 0 , 1 n ) P ( ∣ ∥ x ∥ 2 − 1 ∣ ≥ ε ) ≤ 2 exp ( − ε 2 n 8 ) Random vector x\in R^n Each coordinate in is sampled from N(0,\frac{1}{n})\\ P(|\Vert x\Vert^2 - 1| \geq \varepsilon) \leq 2\exp\left(-\frac{\varepsilon^2 n}{8}\right) along with machine towards The amount x∈Rn in Of Every time individual sit mark Mining sample since N(0,n1)P(∣∥x∥2−1∣≥ε)≤2exp(−8ε2n)
lemma : Also sample two random vectors , Approximately orthogonal .
P ( ∣ * x 1 , x 2 * ∣ ≥ ε ) ≤ 4 exp ( − ε 2 n 8 ) P(|\langle x_1, x_2\rangle| \geq \varepsilon) \leq 4\exp\left(-\frac{\varepsilon^2 n}{8}\right) P(∣*x1,x2*∣≥ε)≤4exp(−8ε2n)
Johnson–Lindenstrauss Lemma
Given ϵ > 0 \epsilon>0 ϵ>0, x i ∈ R m ( i = 1 , … , N ) , Such as On Mining sample Out One individual along with machine Moment front A ∈ R n × m , n > 24 log N ε 2 x_i \in R^m(i=1,…,N), A random matrix is sampled as above A\in \R^{n×m},n > \frac{24\log N}{\varepsilon^2} xi∈Rm(i=1,…,N), Such as On Mining sample Out One individual along with machine Moment front A∈Rn×m,n>ε224logN
( 1 − ε ) ∥ v i − v j ∥ 2 ≤ ∥ A v i − A v j ∥ 2 ≤ ( 1 + ε ) ∥ v i − v j ∥ 2 (1-\varepsilon)\Vert v_i - v_j\Vert^2 \leq \Vert Av_i - A v_j\Vert^2 \leq (1+\varepsilon)\Vert v_i - v_j\Vert^2 (1−ε)∥vi−vj∥2≤∥Avi−Avj∥2≤(1+ε)∥vi−vj∥2
application
Cosine theorem
Calculate the similarity of two sentences , You can use it first TF-IDF Algorithm to generate word frequency vector , Then calculate the cosine angle , The smaller, the more similar .
hash function
hash function (MD5 etc. ) Turn the article into a fixed length string , such as 32 position . In the front-end encryption, I have implemented the right “123456” The encryption .
simhash
The traditional hash function cannot compare the similarity between the two articles .simhash technology , It is Google Algorithm invented to solve large-scale web page de duplication . Use 0,1 Represents the final calculation result , XOR operation for comparison .
Johnson–Lindenstrauss lemma + discretization : In European Space N A little bit , After the same random projection mapping , They will still maintain their original relative positions . Then discretize the result of random projection ( Less than 90° by 1, Greater than 90° by 01 Similar as 1, Otherwise 0), Convenient for calculation and storage .
On the basis of the above ,simhash Word segmentation of the article , For every word hash, Yes hash Result weighting , Merge word vectors ( Add in sequence ), The final result is obtained by dimension reduction and other processing . There are pairs. simhash Explanation of algorithm , It seems that the operation of calculation is different .
2: or − 1 , send use − 1 Just can No take the Yes towards The amount Set in stay One individual like limit . \tiny or -1, Use -1 It is not necessary to concentrate all vectors in one quadrant . or −1, send use −1 Just can No take the Yes towards The amount Set in stay One individual like limit .
Reference resources
Reference resources
Reference resources
High dimensional random
To what extent has the theory of machine learning progressed ?
Du, S. S., Kakade, S. M., Wang, R., & Yang, L. F. (2019). Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?
Database-friendly random projections:
Johnson-Lindenstrauss with binary coins
attention Application in
DGBR Algorithm
IJCAI’21 Secure Deep Graph Generation with Link Differential Privacy
Horse Er can Husband No etc. type : P ( x ≥ a ) ≤ E [ x ] a cut Than snow Husband No etc. type : P ( ( x − E [ x ] ) 2 ≥ a 2 ) ≤ E [ ( x − E [ x ] ) 2 ] a 2 = V a r [ x ] a 2 Markov inequality :P(x\geq a)\leq \frac{\mathbb{E}[x]}{a}\\ Chebyshev inequality :P((x - \mathbb{E}[x])^2\geq a^2) \leq \frac{\mathbb{E}[(x - \mathbb{E}[x])^2]}{a^2}=\frac{\mathbb{V}ar[x]}{a^2} Horse Er can Husband No etc. type :P(x≥a)≤aE[x] cut Than snow Husband No etc. type :P((x−E[x])2≥a2)≤a2E[(x−E[x])2]=a2Var[x]
Bernstein inequality
边栏推荐
- Magical usage of edge browser (highly recommended by program ape and student party)
- Huawei cloud micro certification Huawei cloud computing service practice has been stable
- 中電資訊-信貸業務數字化轉型如何從星空到指尖?
- Small program graduation design is based on wechat order takeout small program graduation design opening report function reference
- Override and virtual of classes in C #
- Conditional statements of shell programming
- The automatic control system of pump station has powerful functions and diverse application scenarios
- Introduction to graphics: graphic painting (I)
- [Yugong series] February 2022 attack and defense world advanced question misc-83 (QR easy)
- Summarize the past to motivate yourself to move on
猜你喜欢

Iclr2022 | ontoprotein: protein pre training integrated with gene ontology knowledge

Life cycle of instance variables, static variables and local variables

LV1 tire pressure monitoring

Push technology practice | master these two tuning skills to speed up tidb performance a thousand times!

Huawei cloud micro certification Huawei cloud computing service practice has been stable

Take you to master the formatter of visual studio code

C # learning notes: structure of CS documents

Network communication basic kit -- IPv4 socket structure

The boss said: whoever wants to use double to define the amount of goods, just pack up and go

Override and virtual of classes in C #
随机推荐
STM32 key content
Learn these super practical Google browser skills, girls casually flirt
Jerry's modification setting status [chapter]
Database concept and installation
Small program graduation project based on wechat e-book small program graduation project opening report function reference
Mysql-15 aggregate function
Méthode de calcul de la connexion MSSQL de la carte esp32c3
Jerry's watch information type table [chapter]
Key knowledge of embedded driver
IPv6 experiment
[Yugong series] February 2022 attack and defense world advanced question misc-84 (MySQL)
After listening to the system clear message notification, Jerry informed the device side to delete the message [article]
LeetCode 168. Detailed explanation of Excel list name
Douban scoring applet Part-3
Small program graduation project based on wechat reservation small program graduation project opening report reference
C # learning notes: structure of CS documents
Yyds dry goods inventory it's not easy to say I love you | use the minimum web API to upload files
Yyds dry goods inventory override and virtual of classes in C
The difference between lambda expressions and anonymous inner classes
Chain ide -- the infrastructure of the metauniverse