当前位置：网站首页>Using super ball embedding to enhance confrontation training

Using super ball embedding to enhance confrontation training

2022-07-02 07:58:00 【MezereonXP】

Use super ball embedding to enhance confrontation training

This is an introduction NeurIPS2020 The job of ,“Boosting Adversarial Training with Hypersphere Embedding”, The first work is from Tsinghua University Tianyu Pang.

This work mainly introduces a technology , be called Hypersphere Embedding, In this paper, it is called hypersphere embedding .

This method is orthogonal to some existing variants of confrontation training , That is, they can integrate with each other to improve the effect .

The variants of confrontation training here are ALP, TRADE etc.

Confrontation training framework

First , As shown in the figure below , Let's list AT And its variants , Mark the differences of their training goals with pink

Confrontation training framework

among , $x^*$ It's a countermeasure sample , The countermeasure target on the right can be understood as the error function used to generate countermeasure samples .

We can simply see the design of these variants ：

ALP Is the cross entropy error of normal samples , A regularization term is introduced , $z$ In fact, that is $f (x)$
TRADES After introducing the cross entropy error of normal samples , The error of the original countermeasure sample is modified , namely , From the original label $y$ Change to the output of normal samples $f (x)$

HE There are two main parts to modify ：

In the model $f$ above
In the cross entropy error $\mathcal{L}_{CE}$ above

Methods to introduce

Mark description

Here are some basic marks first , Convenient for later description

We consider classification tasks , Note that the number of labels is $L$ , Record the model as :
$\mathbb{S}(\mathbf{W}^\top z+b)$
among , $z(x;\omega)$ Represents based on model parameters $\omega$ Extracted features , matrix $\mathbf{W} = (W_1,...,W_L)$ And bias $b$ It can be understood as the last linear layer , function $\mathbb{S}(\cdot)$ yes softmax function .

We remember that the cross entropy error is ：
$\mathcal{L}_{CE}(f(x),y)=-1^\top_y \log f(x)$
among , $1_y$ It's the label $y$ Of one-hot code , That is to say $y$ The position is 1, The rest are 0.

We use $\angle(u,v)$ It's a vector $u$ and $v$ The Angle between

The fusion HE Confrontation training framework

First , Most of the confrontation training can be written in the following two-stage framework ：
$\min_{\omega,\mathbf{W}}\mathbb{E}[\mathcal{L}_T(\omega,\mathbf{W}|x,x^*,y)], \text{where } x^*=\arg\max_{x'\in\mathbf{B}(x)} \mathcal{L}_A(x'|x,y,\omega,\mathbf{W})$
In fact, that is , Sir, become a confrontation sample , Then optimize the training objectives .

After many iterations , $\mathbf{W}$ as well as $\omega$ Will gradually converge , In order to improve the performance of this confrontation training , Some work will metric learning Introduce into confrontation learning , However, the calculation cost of these works is relatively high , It will lead to some category bias , It is still fragile under the stronger counter attack .

Related materials ：
NeurIPS 2019: Metric learning for adversarial robustness.
IWSBPR 2015: Deep metric learning using triplet network.
Stronger resistance to attack ：https://github.com/Line290/FeatureAttack

In fact, the motivation Not enough , The reason given is still not strong enough

Next , Give directly HE In the form of , In fact, it's about characteristics $z$ And weight $\mathbf{W}$ Standardize
$\mathbf{W}^\top z=(W_1^\top z, W_2^\top z,...,W_L^\top z)$
among $W_i^\top z=\Vert W_i\Vert\Vert z\Vert \cos\theta_i$ , $\theta_i = \angle(W_i,z)$

We make
$\widetilde{W_i}=\frac{W_i}{\Vert W_i\Vert}, \widetilde{z}=\frac{z}{\Vert z\Vert}$
Thus there are
$\widetilde{f}(x) = \mathbb{S}(\widetilde{\mathbf{W}}^\top \widetilde{z}) = \cos\theta\\ \theta = (\cos\theta_1,\cos\theta_2,...,\cos\theta_L)$
When calculating the cross entropy function , Introduce a variable $m$ , remember ：
$\mathcal{L}_{CE}^{m}(\widetilde{f}(x),y)=-1^\top_y\log(\mathbb{S}(s\cdot(\cos\theta-m\cdot 1_y)))$
among $s > 0$ It's a coefficient , Used to improve the stability of values during training

This $m$ The introduction of is a reference CVPR2018 An article from ,Cosface: Large margin cosine loss for deep face recognition

The theoretical analysis

First, we define a vector function $\mathbb{U}_p$
$\mathbb{U}_p(u)=\arg\max_{\Vert v\Vert_p\leq 1}u^\top v,\text{where } u^\top\mathbb{U}_p(u)=\Vert u\Vert_q$
among $\frac{1}{p}+\frac{1}{q}=1$

lemma 1： Given a counter target error function $\mathcal{L}_A$ , Make $\mathbf{B}(x)=\{x'|\Vert x-x'\Vert_p\leq\varepsilon\}$ , Using the first-order Taylor expansion , Available $\max_{x'\in \mathbf{B}(x)}\mathcal{L}_A(x')$ The solution of is $x^*=x+\varepsilon \mathbb{U}_p(\nabla_x\mathcal{L}_A)$ . further , $\mathcal{L}_A(x^*) = \mathcal{L}_A(x) + \varepsilon \Vert \nabla_x\mathcal{L}_A(x) \Vert_q$

prove :

You may as well make $x'=x+\varepsilon v$ , among $\Vert v\Vert_p\leq1$

thus , $\mathcal{L}_A(x')=\mathcal{L}_A(x + \varepsilon v)$

stay $\varepsilon v$ We're going to do a Taylor expansion at , obtain $\mathcal{L}_A(x+\varepsilon v) \approx \mathcal{L}_A(x) + \varepsilon v^\top (\nabla_x\mathcal{L}_A)$

so $\max_{x'\in \mathbf{B}(x)}\mathcal{L}_A(x') = \mathcal{L}_A(x) + \varepsilon \max_{\Vert v\Vert_p\leq 1} v^\top\nabla_x \mathcal{L}_A(x)$

I need to use ICML2019 First-order Adversarial Vulnerability of Neural Networks and Input Dimension A knot of On , namely $\max_{\delta:\Vert \delta\Vert_p\leq\epsilon} |\partial_x\mathcal{L}\cdot \delta| = \epsilon\Vert\partial_x\mathcal{L} \Vert_q,\frac{1}{p}+\frac{1}{q}=1$

Through lemma 1, We got the confrontation sample $x^{'}$ For the loss function $\mathcal{L}_A$ Influence , At the same time $x$ Yes $x^{'}$ The direction of .

lemma 2： Make $W_{ij} = W_i-W_j$ Is the difference between the two weights , $z(x';\omega)$ by $x^{'}$ Eigenvector of , Then there is
$\nabla_{x'}\mathcal{L}_{CE}(f(x'),f(x)) = -\sum_{i\neq j}f(x)_if(x')_j\nabla_{x'}(W_{ij}^\top z')$
prove :

$\begin{aligned} -\nabla_{x'}\mathcal{L}_{CE}(f(x'),f(x))&=\nabla_{x'}(f(x)^\top\log f(x'))\\ &=\sum_{i\in [L]} f(x)_i\nabla_{x'}(\log f(x')_i)\\ &=\sum_{i\in [L]}f(x)_i\nabla_{x'}(\log [\frac{\exp(W_i^\top z')}{\sum_{j\in [L]}\exp(W_j^\top z')}])\\ &=\sum_{i\in [L]}f(x)_i\nabla_{x'}(W_i^\top z' - \log(\sum_{j\in [L]}\exp(W_j^\top z')))\\ &=\sum_{i\in [L]}f(x)_i(\nabla_{x'}W_i^\top z' - \nabla_{x'}\log(\sum_{j\in [L]}\exp(W_j^\top z')))\\ &=\sum_{i\in [L]}f(x)_i(\nabla_{x'}W_i^\top z' - \frac{1}{\sum_{j\in [L]}\exp(W_j^\top z')}\nabla_{x'}(\sum_{j\in [L]}\exp(W_j^\top z')))\\ &=\sum_{i\in [L]}f(x)_i(\nabla_{x'}W_i^\top z' - \frac{1}{\sum_{j\in [L]}\exp(W_j^\top z')}(\sum_{j\in [L]}\exp(W_j^\top z')\nabla_{x'}(W_j^\top z')))\\ &=\sum_{i\in [L]}f(x)_i(\nabla_{x'}W_i^\top z' - \sum_{j\in [L]}f(x')_j\nabla_{x'}(W_j^\top z'))\\ &=\sum_{i\in [L]}f(x)_i((1-f(x')_i)\nabla_{x'}W_i^\top z' - \sum_{j\neq i}f(x')_j\nabla_{x'}(W_j^\top z'))\\ &=\sum_{i\in [L]}f(x)_i((\frac{\sum_{i\neq j}\exp(W_j^\top z')}{\sum_{t\in [L]}\exp(W_t^\top z')})\nabla_{x'}W_i^\top z' - \sum_{j\neq i}f(x')_j\nabla_{x'}(W_j^\top z'))\\ &=\sum_{i\in [L]}f(x)_i((\frac{\sum_{i\neq j}\exp(W_j^\top z')}{\sum_{t\in [L]}\exp(W_t^\top z')})\nabla_{x'}W_i^\top z' - \sum_{j\neq i}\frac{\exp(W_j^\top z')}{\sum_{t\in [L]}\exp(W_t^\top z')}\nabla_{x'}(W_j^\top z'))\\ &=\sum_{i\in [L]}f(x)_i(\sum_{i\neq j}f(x')_j\nabla_{x'}(W_{ij}^\top z'))\\ &=\sum_{i\neq j}f(x)_if(x')_j\nabla_{x'}(W_{ij}^\top z') \end{aligned}$

In lemma 2 above , remember $y^*$ It's a countermeasure sample $x^*$ Prediction output of , among $y\neq y^*$

Based on some prior observations , Usually, the probability value of the output tag is predicted （Top1 Probability ） It is much larger than the probability value of other tags

So there is
$\nabla_{x'}\mathcal{L}_{CE}(f(x'),f(x))\approx -f(x)_yf(x')_{y^*}\nabla_{x'}(W_{yy^*}^\top z')$
among $W_{yy^*}=W_y-W_{y^*}$

Make $\theta_{yy^*}'=\angle(W_{y y^*},z')$ , $W_{y y^*}^\top z'=\Vert W_{y y^*}\Vert\Vert z'\Vert \cos(\theta_{y y^*}')$ also $W_{y y^*}$ Don't depend on $x^{'}$

thus , Iteration of each attack , $x$ The increment of is
$\mathbb{U}_p[\nabla_{x'}\mathcal{L}_{CE}(f(x'),f(x))]\approx-\mathbb{U}_p[\nabla_{x'}(\Vert z'\Vert \cos(\theta_{y y^*}'))]$
And the method previously introduced , Will make $\Vert z'\Vert = 1$ , Thus, the attack samples are closer to the classification boundary

insight

As shown in the figure above , $\Vert z'\Vert$ Will affect the direction of descent , The effect of the generated countermeasure samples is relatively poor , And then inhibit the efficiency of confrontation training