当前位置:网站首页>Johnson–Lindenstrauss Lemma
Johnson–Lindenstrauss Lemma
2022-07-04 02:14:00 【FakeOccupational】
Johnson–Lindenstrauss lemma
lemma : Given ϵ > 0 \epsilon>0 ϵ>0, Random vector module length varies with n Converge exponentially to 1.
along with machine towards The amount x ∈ R n in Of Every time individual sit mark Mining sample since N ( 0 , 1 n ) P ( ∣ ∥ x ∥ 2 − 1 ∣ ≥ ε ) ≤ 2 exp ( − ε 2 n 8 ) Random vector x\in R^n Each coordinate in is sampled from N(0,\frac{1}{n})\\ P(|\Vert x\Vert^2 - 1| \geq \varepsilon) \leq 2\exp\left(-\frac{\varepsilon^2 n}{8}\right) along with machine towards The amount x∈Rn in Of Every time individual sit mark Mining sample since N(0,n1)P(∣∥x∥2−1∣≥ε)≤2exp(−8ε2n)
lemma : Also sample two random vectors , Approximately orthogonal .
P ( ∣ * x 1 , x 2 * ∣ ≥ ε ) ≤ 4 exp ( − ε 2 n 8 ) P(|\langle x_1, x_2\rangle| \geq \varepsilon) \leq 4\exp\left(-\frac{\varepsilon^2 n}{8}\right) P(∣*x1,x2*∣≥ε)≤4exp(−8ε2n)
Johnson–Lindenstrauss Lemma
Given ϵ > 0 \epsilon>0 ϵ>0, x i ∈ R m ( i = 1 , … , N ) , Such as On Mining sample Out One individual along with machine Moment front A ∈ R n × m , n > 24 log N ε 2 x_i \in R^m(i=1,…,N), A random matrix is sampled as above A\in \R^{n×m},n > \frac{24\log N}{\varepsilon^2} xi∈Rm(i=1,…,N), Such as On Mining sample Out One individual along with machine Moment front A∈Rn×m,n>ε224logN
( 1 − ε ) ∥ v i − v j ∥ 2 ≤ ∥ A v i − A v j ∥ 2 ≤ ( 1 + ε ) ∥ v i − v j ∥ 2 (1-\varepsilon)\Vert v_i - v_j\Vert^2 \leq \Vert Av_i - A v_j\Vert^2 \leq (1+\varepsilon)\Vert v_i - v_j\Vert^2 (1−ε)∥vi−vj∥2≤∥Avi−Avj∥2≤(1+ε)∥vi−vj∥2
application
Cosine theorem
Calculate the similarity of two sentences , You can use it first TF-IDF Algorithm to generate word frequency vector , Then calculate the cosine angle , The smaller, the more similar .
hash function
hash function (MD5 etc. ) Turn the article into a fixed length string , such as 32 position . In the front-end encryption, I have implemented the right “123456” The encryption .
simhash
The traditional hash function cannot compare the similarity between the two articles .simhash technology , It is Google Algorithm invented to solve large-scale web page de duplication . Use 0,1 Represents the final calculation result , XOR operation for comparison .
Johnson–Lindenstrauss lemma + discretization : In European Space N A little bit , After the same random projection mapping , They will still maintain their original relative positions . Then discretize the result of random projection ( Less than 90° by 1, Greater than 90° by 01 Similar as 1, Otherwise 0), Convenient for calculation and storage .
On the basis of the above ,simhash Word segmentation of the article , For every word hash, Yes hash Result weighting , Merge word vectors ( Add in sequence ), The final result is obtained by dimension reduction and other processing . There are pairs. simhash Explanation of algorithm , It seems that the operation of calculation is different .
2: or − 1 , send use − 1 Just can No take the Yes towards The amount Set in stay One individual like limit . \tiny or -1, Use -1 It is not necessary to concentrate all vectors in one quadrant . or −1, send use −1 Just can No take the Yes towards The amount Set in stay One individual like limit .
Reference resources
Reference resources
Reference resources
High dimensional random
To what extent has the theory of machine learning progressed ?
Du, S. S., Kakade, S. M., Wang, R., & Yang, L. F. (2019). Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?
Database-friendly random projections:
Johnson-Lindenstrauss with binary coins
attention Application in
DGBR Algorithm
IJCAI’21 Secure Deep Graph Generation with Link Differential Privacy
Horse Er can Husband No etc. type : P ( x ≥ a ) ≤ E [ x ] a cut Than snow Husband No etc. type : P ( ( x − E [ x ] ) 2 ≥ a 2 ) ≤ E [ ( x − E [ x ] ) 2 ] a 2 = V a r [ x ] a 2 Markov inequality :P(x\geq a)\leq \frac{\mathbb{E}[x]}{a}\\ Chebyshev inequality :P((x - \mathbb{E}[x])^2\geq a^2) \leq \frac{\mathbb{E}[(x - \mathbb{E}[x])^2]}{a^2}=\frac{\mathbb{V}ar[x]}{a^2} Horse Er can Husband No etc. type :P(x≥a)≤aE[x] cut Than snow Husband No etc. type :P((x−E[x])2≥a2)≤a2E[(x−E[x])2]=a2Var[x]
Bernstein inequality
边栏推荐
- LeetCode 168. Detailed explanation of Excel list name
- [untitled] the relationship between the metauniverse and digital collections
- Pesticide synergist - current market situation and future development trend
- Basic editing specifications and variables of shell script
- Iclr2022 | ontoprotein: protein pre training integrated with gene ontology knowledge
- Huawei cloud micro certification Huawei cloud computing service practice has been stable
- Override and virtual of classes in C #
- Hamburg University of Technology (tuhh) | intelligent problem solving as integrated hierarchical reinforcement learning
- It's corrected. There's one missing < /script >, why doesn't the following template come out?
- Small program graduation project based on wechat examination small program graduation project opening report function reference
猜你喜欢
Applet graduation project based on wechat selection voting applet graduation project opening report function reference
Node solves cross domain problems
Life cycle of instance variables, static variables and local variables
The reasons why QT fails to connect to the database and common solutions
High level application of SQL statements in MySQL database (I)
ZABBIX API pulls the values of all hosts of a monitoring item and saves them in Excel
Jerry's synchronous weather information to equipment [chapter]
Libcblas appears when installing opencv import CV2 so. 3:cannot open shared object file:NO such file or directory
[leetcode daily question] a single element in an ordered array
Network communication basic kit -- IPv4 socket structure
随机推荐
16. System and process information
Advanced learning of MySQL -- Application -- index
ZABBIX API pulls the values of all hosts of a monitoring item and saves them in Excel
Keep an IT training diary 055- moral bitch
Sword finger offer 14- I. cut rope
Create template profile
G3 boiler water treatment registration examination and G3 boiler water treatment theory examination in 2022
Global and Chinese market of box seals 2022-2028: Research Report on technology, participants, trends, market size and share
Key knowledge of C language
The requests module uses
Global and Chinese market of small batteries 2022-2028: Research Report on technology, participants, trends, market size and share
Douban scoring applet Part-3
Huawei cloud micro certification Huawei cloud computing service practice has been stable
When tidb meets Flink: tidb efficiently enters the lake "new play" | tilaker team interview
Advanced learning of MySQL -- Application -- storage engine
Measurement fitting based on Halcon learning [4] measure_ arc. Hdev routine
STM32 key content
Small program graduation project based on wechat examination small program graduation project opening report function reference
The difference between int (1) and int (10)
What is the intelligent monitoring system of sewage lifting pump station and does it play a big role