当前位置:网站首页>Sparse knowledge points

Sparse knowledge points

2022-06-10 08:50:00 Itchy heart

sparsity (sparse)

Definition :Sparse Expressed as parameters in the model , Only a few non-zero elements or only a few elements far greater than zero .

WHY: Why should we include sparsity in the model ?

Example : Take an examination of grind learn bully to have 10000 Vocabulary of , The vocabulary used in the exam , yes 10000 A small part of a vocabulary accumulation library .

Example:
Test Number:123.456
The first set of digital bases :
[100,10,1] ⇒ \Rightarrow 123.456 ≈ \approx 100 × \times × 1 + 10 × \times × 2 + 1 × \times × 3 (error=0.456)

The second set of digital bases :
[100,50,10,1,0.5,0.1,0.03,0.01,0.001]
123.456=100 × \times × 1 + 50 × \times × 0 + 10 × \times × 2 + 1 × \times × 3 + 0.5 × \times × 0 + 0.1 × \times × 4 + 0.03 × \times × 0 + 0.01 × \times × 5 + 0.001 × \times × 6(error=0)

among Sparse Feature( Be prepared against want ): Yes 50,0.5,0.03 These three numbers .

compared with PCA(Principal Component Analysis)
PCA(a complete set of basis vectors: Complete dictionary )
Through the vector base in this set of complete dictionaries , Restore the original data .

Sparse Represnetation(an over-complete set of basis vectors: Super complete dictionary , Contrary to sparsity .)
The number of base vectors is much larger than the dimension of the input vector

How to ensure sparsity ?

Machine learning model ⇒ \Rightarrow Optimize parameters based on training set ( For example, reduce Loss) ⇒ \Rightarrow Loss Add regular terms to , The penalty model parameter values make it close to 0

Common operations :
Loss = Training Loss + λ \lambda λ ∣ ∣ W ∣ ∣ 0 {||W||_0} W0 ( L 0 {L_0} L0 normal form )

Loss = Training Loss + λ \lambda λ ∣ ∣ W ∣ ∣ 1 {||W||_1} W1 ( L 1 {L_1} L1 normal form )

Sparce Coding( Sparse coding LOSS)
Loss = ∑ j = 1 m ∣ ∣ x ( j ) − ∑ i = 1 k a i ( j ) ϕ i ∣ ∣ 2 + λ ∑ i = 1 k ∣ ∣ a i ∣ ∣ 1 \sum_{j=1}^m||x^{(j)}-\sum_{i=1}^k a_i^{(j)}\phi_i||^2 + \lambda\sum_{i=1}^k||a_i||_1 j=1mx(j)i=1kai(j)ϕi2+λi=1kai1

among , ∑ i = 1 k a i ( j ) \sum_{i=1}^k a_i^{(j)} i=1kai(j) It's reconstruction error , λ ∑ i = 1 k ∣ ∣ a i ∣ ∣ 1 \lambda\sum_{i=1}^k||a_i||_1 λi=1kai1 For sparse penalty ( L 1 L_1 L1 Norm)

Also in the era of convolutional networks , We will add... To the convolution layer L 1 L_1 L1 norm , To ensure its sparsity .
Increase the depth and width of the model , To ensure that there are more super complete dictionaries .

Is mindless sparsity good or bad ?

Super complete dictionary ⇒ \Rightarrow A lot of high-quality data .
Too many inactive parameters ⇒ \Rightarrow The training process is very long

L 1 L_1 L1 The paradigm is Loss Some positions in are not differentiable ⇒ \Rightarrow The derivative is at zero , Derivative is not unique , Therefore, the model is difficult to converge

All in all , In the model of large-scale deep learning , Usually tend to use L 2 L_2 L2 Normal form to prevent over fitting .

原网站

版权声明
本文为[Itchy heart]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/161/202206100843360287.html