当前位置：网站首页>Complementary knowledge of auto encoder

Complementary knowledge of auto encoder

2022-07-04 12:24:00 【hello_ JeremyWang】

1. Yes Auto-Encoder Ask for more

stay Pytorch actual combat _ Image dimensionality reduction and clustering in , I have briefly introduced Auto-Encoder Principle . For the simplest Auto-Encoder, Our requirement is to minimize reconstruction loss, That is, the restored image or article should be as close as possible to the original image or article .

But beyond that , Can we talk about Auto-Encoder Put forward more requirements ？ The answer is yes , Let's take a look ：

Not just reduce reconstruction loss
Get more acceptable embedding

1.1 Demand one

Request a request that we not only reduce reconstruction loss, And ask us to get embedding Can represent our original pictures or words （ It's like writing wheel eyes represents the yuzhibo family ）. How can we make the machine do this ？

From the bottom PPT It can be seen that , We need to build another classifier Discriminator To measure embedding How well it fits the original picture . The specific process is , We set the parameter to $\theta$ Of Encoder Compress the picture , And compress the obtained embedding and Put pictures together Discriminator To classify , from Discriminator To determine whether the two fit . For each $\theta$ Come on , We all adjust Discriminator Parameters of $\phi$ To make Discriminator The training error is as small as possible , We define this error as $L_D^{*}$ . Finally, we should adjust the parameters $\theta$ bring $L_D^{*}$ As small as possible .
Insert picture description here

1.2 Requirement 2

Claim 2 requires us to get embedding More explanatory . Usually we get embedding It looks like a mess , Just like below PPT The picture in the upper right corner is the same . We want to know embedding What information does each part represent . As shown in the figure below , In speech training , What we got embedding It may contain the information of the speaker （ Such as ： Pronunciation habits and so on ） And the information in the discourse itself , We want to separate them .
Insert picture description here
How to do it specifically ？ A simple and natural idea is , We train two Encoder, One of them is specially used to extract the information of the speech itself , The other is used to extract the information of the speaker . What's the use ？ For example, we can combine the information of another speaker with the information of the discourse itself , Realize the effect of changing sound .
Insert picture description here
How to train? There are Encoder Well ？ One way is reverse training . Similarly, we create a binary Discriminator, This Discriminator The function of is to eat the part that represents the information of the discourse itself embedding, And decide who said it . If our Encoder Be able to cheat Discriminator, He couldn't tell who said it , That explains this part embedding The information of the speaker is no longer contained in .
Insert picture description here