当前位置:网站首页>Johnson–Lindenstrauss Lemma
Johnson–Lindenstrauss Lemma
2022-07-04 02:14:00 【FakeOccupational】
Johnson–Lindenstrauss lemma
lemma : Given ϵ > 0 \epsilon>0 ϵ>0, Random vector module length varies with n Converge exponentially to 1.
along with machine towards The amount x ∈ R n in Of Every time individual sit mark Mining sample since N ( 0 , 1 n ) P ( ∣ ∥ x ∥ 2 − 1 ∣ ≥ ε ) ≤ 2 exp ( − ε 2 n 8 ) Random vector x\in R^n Each coordinate in is sampled from N(0,\frac{1}{n})\\ P(|\Vert x\Vert^2 - 1| \geq \varepsilon) \leq 2\exp\left(-\frac{\varepsilon^2 n}{8}\right) along with machine towards The amount x∈Rn in Of Every time individual sit mark Mining sample since N(0,n1)P(∣∥x∥2−1∣≥ε)≤2exp(−8ε2n)
lemma : Also sample two random vectors , Approximately orthogonal .
P ( ∣ * x 1 , x 2 * ∣ ≥ ε ) ≤ 4 exp ( − ε 2 n 8 ) P(|\langle x_1, x_2\rangle| \geq \varepsilon) \leq 4\exp\left(-\frac{\varepsilon^2 n}{8}\right) P(∣*x1,x2*∣≥ε)≤4exp(−8ε2n)
Johnson–Lindenstrauss Lemma
Given ϵ > 0 \epsilon>0 ϵ>0, x i ∈ R m ( i = 1 , … , N ) , Such as On Mining sample Out One individual along with machine Moment front A ∈ R n × m , n > 24 log N ε 2 x_i \in R^m(i=1,…,N), A random matrix is sampled as above A\in \R^{n×m},n > \frac{24\log N}{\varepsilon^2} xi∈Rm(i=1,…,N), Such as On Mining sample Out One individual along with machine Moment front A∈Rn×m,n>ε224logN
( 1 − ε ) ∥ v i − v j ∥ 2 ≤ ∥ A v i − A v j ∥ 2 ≤ ( 1 + ε ) ∥ v i − v j ∥ 2 (1-\varepsilon)\Vert v_i - v_j\Vert^2 \leq \Vert Av_i - A v_j\Vert^2 \leq (1+\varepsilon)\Vert v_i - v_j\Vert^2 (1−ε)∥vi−vj∥2≤∥Avi−Avj∥2≤(1+ε)∥vi−vj∥2
application
Cosine theorem
Calculate the similarity of two sentences , You can use it first TF-IDF Algorithm to generate word frequency vector , Then calculate the cosine angle , The smaller, the more similar .
hash function
hash function (MD5 etc. ) Turn the article into a fixed length string , such as 32 position . In the front-end encryption, I have implemented the right “123456” The encryption .
simhash
The traditional hash function cannot compare the similarity between the two articles .simhash technology , It is Google Algorithm invented to solve large-scale web page de duplication . Use 0,1 Represents the final calculation result , XOR operation for comparison .
Johnson–Lindenstrauss lemma + discretization : In European Space N A little bit , After the same random projection mapping , They will still maintain their original relative positions . Then discretize the result of random projection ( Less than 90° by 1, Greater than 90° by 01 Similar as 1, Otherwise 0), Convenient for calculation and storage .
On the basis of the above ,simhash Word segmentation of the article , For every word hash, Yes hash Result weighting , Merge word vectors ( Add in sequence ), The final result is obtained by dimension reduction and other processing . There are pairs. simhash Explanation of algorithm , It seems that the operation of calculation is different .
2: or − 1 , send use − 1 Just can No take the Yes towards The amount Set in stay One individual like limit . \tiny or -1, Use -1 It is not necessary to concentrate all vectors in one quadrant . or −1, send use −1 Just can No take the Yes towards The amount Set in stay One individual like limit .
Reference resources
Reference resources
Reference resources
High dimensional random
To what extent has the theory of machine learning progressed ?
Du, S. S., Kakade, S. M., Wang, R., & Yang, L. F. (2019). Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?
Database-friendly random projections:
Johnson-Lindenstrauss with binary coins
attention Application in
DGBR Algorithm
IJCAI’21 Secure Deep Graph Generation with Link Differential Privacy
Horse Er can Husband No etc. type : P ( x ≥ a ) ≤ E [ x ] a cut Than snow Husband No etc. type : P ( ( x − E [ x ] ) 2 ≥ a 2 ) ≤ E [ ( x − E [ x ] ) 2 ] a 2 = V a r [ x ] a 2 Markov inequality :P(x\geq a)\leq \frac{\mathbb{E}[x]}{a}\\ Chebyshev inequality :P((x - \mathbb{E}[x])^2\geq a^2) \leq \frac{\mathbb{E}[(x - \mathbb{E}[x])^2]}{a^2}=\frac{\mathbb{V}ar[x]}{a^2} Horse Er can Husband No etc. type :P(x≥a)≤aE[x] cut Than snow Husband No etc. type :P((x−E[x])2≥a2)≤a2E[(x−E[x])2]=a2Var[x]
Bernstein inequality
边栏推荐
- IPv6 experiment
- Global and Chinese markets of advanced X-ray inspection system (Axi) in PCB 2022-2028: Research Report on technology, participants, trends, market size and share
- Intel's new GPU patent shows that its graphics card products will use MCM Packaging Technology
- How to view the computing power of GPU?
- Valentine's Day - 9 jigsaw puzzles with deep love in wechat circle of friends
- Question C: Huffman tree
- Méthode de calcul de la connexion MSSQL de la carte esp32c3
- 2022 new examination questions for safety management personnel of hazardous chemical business units and certificate examination for safety management personnel of hazardous chemical business units
- In yolov5, denselayer is used to replace focus, and the FPN structure is changed to bi FPN
- Jerry's watch information type table [chapter]
猜你喜欢
Remember another interview trip to Ali, which ends on three sides
Take you to master the formatter of visual studio code
LeetCode 168. Detailed explanation of Excel list name
Pytoch residual network RESNET
Dans la recherche de l'intelligence humaine ai, Meta a misé sur l'apprentissage auto - supervisé
MySQL advanced (Advanced) SQL statement (I)
MySQL workbench use
Conditional statements of shell programming
Write the first CUDA program
MySQL advanced SQL statement (1)
随机推荐
Pesticide synergist - current market situation and future development trend
Key knowledge of embedded driver
How to view the computing power of GPU?
Ceramic metal crowns - current market situation and future development trend
ZABBIX API batch delete a template of the host
LV1 tire pressure monitoring
Hunan University | robust Multi-Agent Reinforcement Learning in noisy environment
在尋求人類智能AI的過程中,Meta將賭注押向了自監督學習
On Valentine's day, I code a programmer's exclusive Bing Dwen Dwen (including the source code for free)
Chain ide -- the infrastructure of the metauniverse
All metal crowns - current market situation and future development trend
Day05 branch and loop (II)
SQL statement
Buuctf QR code
Containerization technology stack
The latest analysis of hoisting machinery command in 2022 and free examination questions of hoisting machinery command
Save Private Ryan - map building + voltage dp+deque+ shortest circuit
Why can't it run (unresolved)
How to subcontract uniapp and applet, detailed steps (illustration) # yyds dry goods inventory #
Description of setting items of Jerry [chapter]