当前位置:网站首页>Detailed explanation of Fisher information quantity detection countermeasure sample code

Detailed explanation of Fisher information quantity detection countermeasure sample code

2022-07-04 13:54:00 PaperWeekly

10a7a49adcdd63c2e44bec203f0b5f4d.gif

PaperWeekly original ·  author |  guiguzi

cd219f6dc922a48900b5a43c8b0ea3c7.png

introduction

In the last article 《Fisher The application of information quantity in countermeasure samples 》 It elaborates on Fisher The amount of information is fighting attacks , defense , And the application in detection , And analyzed three representative papers .Fisher The amount of information is a very good mathematical tool that can be used to explore the deep causes of the antagonism behavior of the deep learning model .

This article is mainly based on Fisher A paper on information quantity to detect countermeasure samples 《Inspecting adversarial examples using the Fisher information》 Code for deep parsing , In this paper, three indexes are proposed to detect the countermeasure samples Fisher Information matrix trace ,Fisher Information quadratic form and Fisher Information sensitivity . This paper will supplement the intermediate proof process of the results directly given in the paper , And some important key details in the code will also be explained in the corresponding chapters .

bdec356b8870d2e59b3a83944694b808.png

Fisher Information matrix trace

Given input samples , The output of the neural network is Dimensional probability vector , On the parameters of neural network Of Fisher The continuous and discrete forms of the information matrix are as follows :

346dd93105489668804928935c543a62.png

It can be seen that ,,. It should be noted that , Calculate a very small-scale neural network ,Fisher The amount of computation of the information matrix It's also tricky , What's more, those neural networks that often have hundreds of millions of parameters , The amount of calculation is much larger . Because the purpose of the original paper is to only focus on the detection of countermeasure samples , No detailed calculation is required Fisher Each exact value in the information matrix , Given the sample Fisher A value range of information quantity can be used as an indicator of detection , So... Is used in the paper Fisher The trace of the information matrix is used as the detection index , The specific calculation formula is as follows :

265f33ab178a58d0b750beb5afd46282.png

You should know that there are always some differences between theoretical analysis and practical programming , In the derivation of the above formula , It considers all the weight parameters in the neural network as a one-dimensional parameter vector , But in actual programming , The parameters of neural network are sorted by layers , But when it comes to solving Fisher The amount of information , The two cases are consistent . Suppose there is a neural network with four hidden layers , The parameters are , Then the corresponding parameters and gradients are as follows :

56a3cf21b739f40570decdc6f4e41826.png

It is further known that in two cases Fisher The traces of the information matrix are equal :

9e49e16ef385caf97a0a49592ad9fcb1.png

At this point, it can be found that back-propagation calculation Fisher The calculation amount of the trace of the information matrix is , Far less than the calculation Fisher The amount of computation of the information matrix .

bc334e07c41c4fe0a5dfc51f27ac08af.png

Fisher Information quadratic form

matrix The trace of can be written as , among Is the unit vector , That is to say Elements are , The rest of the elements are , This can be understood as The average value of divergence for each parameter change . Inspired by this , Authors can choose a specific direction and measure , Instead of finding the average value on the basis of complete orthogonality , There is the following quadratic form :

3ab391e8d7704248a0ef2abf8f402f4d.png

Where the given vector With the parameters And data points of :

d581ad28f8e807a5978627c6098c0438.png

When the When normalizing , Then there is the following quadratic form :

5271f7fd7c54fff55b7e86983adeaccb.png

It should be noted that the selected direction is not unique , If you want to maximize the value of the quadratic form , It is Fisher The maximum eigenvalue of the matrix , The selected direction is the eigenvector corresponding to the maximum eigenvalue . It should be pointed out that ,Fisher The trace of the matrix must be greater than Fisher The maximum eigenvalue of the matrix , The specific proof is as follows :

ce02bba39cfe8f68f17a4ac2f6db6edc.png

among For matrix Characteristic diagonal matrix of , Is a unit orthogonal matrix . In actual programming , To simplify the calculation , Will use the finite difference calculation to estimate the result of the back propagation gradient , According to Taylor's formula :

ae763f5c107764df96ef9cbd6fa91efa.png

Then there are :

6e07e52e557a0238e8338ce71e970302.png

41d589b68b3169f23e2421b99fc2cf47.png

Fisher Information sensitivity

In order to further obtain available Fisher The amount of information , The author randomly introduces a single random variable into the input sample , That is to say :

69d508ef92e3cd32ae76223d9249d3e1.png

among  , also And They have the same dimensions . For this perturbed input , For its Fisher The information matrix is :

e818d433b6feeb89ba09373bfcd4643c.png

among , The... Of the matrix That's ok , The first The elements of a column can be expressed as :

0b0175abd982c956f217dc96c246df65.png

Again because Of the That's ok , The first The elements of the column are :

64b70ecd4cf38ac913c3d44962e1252d.png

Then there is the Taylor expansion :

a6cac5a6b9184b5cc3270adc4c326f42.png

The second term of the above formula The matrix can be expressed as :

27b791d2bd34abc92231c80fc5d19ef1.png

Again because Is an average of , The variance of Random variable of , Then there are :

dd41596a51788cb6fab1f231ea23bfb3.png

Based on the above derivation results , Then there are :

5585797b846b191fa16d5f57ea641a94.png

Finally, we can get the same result as in the paper :

2394b8cb15806efb32562977cc5a4a54.png

As in the previous section Fisher Matrix quadratic form , The authors also study the perturbed samples Of Fisher Finding quadratic form of matrix , Then there are :

2fc78d9041ca681606692e4ab5437854.png

among :

8c62f6b092afa92a2b7b76179aa4484f.png

Let's say that a given perturbation vector It's the unit vector , namely . In practical programming, the finite difference method is used to estimate the result of anti back propagation gradient , Then there are :

f33a9fca6ddd31718928cdee8b4c0358.png

The above formula is called Fisher Information sensitivity (FIS), It is mainly used to evaluate the Importance of input nodes .

7d09bb6c545b2773f9783fac66dc07b2.png

Code example

Fisher Trace of information matrix ,Fisher Information quadratic form and Fisher The code examples and experimental results of information sensitivity are as follows , Corresponding to the principle introduction above , You can better understand the implementation details of the relevant principles in the code examples .

import torch
import torch.nn.functional as F
from copy import deepcopy


class FISHER_OPERATION(object):
        def __init__(self, input_data, network, vector, epsilon = 1e-3):
                self.input = input_data
                self.network = network
                self.vector = vector
                self.epsilon = epsilon

        # Computes the fisher matrix quadratic form along the specific vector
        def fisher_quadratic_form(self):
                fisher_sum = 0
                ## Computes the gradient of parameters of each layer
                for i, parameter in enumerate(self.network.parameters()):
                        ## Store the original parameters
                        store_data = deepcopy(parameter.data)
                        parameter.data += self.epsilon * self.vector[i]
                        log_softmax_output1 = self.network(self.input)
                        softmax_output1 = F.softmax(log_softmax_output1, dim=1)
                        parameter.data -= 2 * self.epsilon * self.vector[i]
                        log_softmax_output2 = self.network(self.input)
                        solfmax_output2 = F.softmax(log_softmax_output2, dim=1)
                        parameter.data = store_data
                        # The summation of finite difference approximate
                        fisher_sum += (((log_softmax_output1 - log_softmax_output2)/(2 * self.epsilon))*((softmax_output1 - solfmax_output2)/(2 * self.epsilon))).sum()
                return fisher_sum


        # Computes the fisher matrix trace
        def fisher_trace(self):
                fisher_trace = 0
                output = self.network(self.input)
                output_dim = output.shape[1]
                parameters = self.network.parameters()
                ## Computes the gradient of parameters of each layer
                for parameter in parameters:
                        for j in range(output_dim):
                                self.network.zero_grad()
                                log_softmax_output = self.network(self.input)
                                log_softmax_output[0,j].backward()
                                log_softmax_grad = parameter.grad
                                self.network.zero_grad()
                                softmax_output = F.softmax(self.network(self.input), dim=1)
                                softmax_output[0,j].backward()
                                softmax_grad = parameter.grad
                                fisher_trace += (log_softmax_grad * softmax_grad).sum()
                return fisher_trace


        # Computes fisher information sensitivity for x and v.
        def fisher_sensitivity(self):
                output = self.network(self.input)
                output_dim = output.shape[1]
                parameters = self.network.parameters()
                x = deepcopy(self.input.data)
                x.requires_grad = True
                fisher_sum = 0
                for i, parameter in enumerate(parameters):
                        for j in range(output_dim):
                                store_data = deepcopy(parameter.data)
                                # plus eps
                                parameter.data += self.epsilon * self.vector[i]
                                log_softmax_output1 = self.network(x)
                                log_softmax_output1[0,j].backward()
                                new_plus_log_softmax_grad = deepcopy(x.grad.data)
                                x.grad.zero_()
                                self.network.zero_grad()
                                softmax_output1 = F.softmax(self.network(x), dim=1)
                                softmax_output1[0,j].backward()
                                new_plus_softmax_grad = deepcopy(x.grad.data)
                                x.grad.zero_()
                                self.network.zero_grad()
                                # minus eps
                                parameter.data -= 2 * self.epsilon * self.vector[i]
                                log_softmax_output2 = self.network(x)
                                log_softmax_output2[0,j].backward()
                                new_minus_log_softmax_grad = deepcopy(x.grad.data)
                                x.grad.zero_()
                                self.network.zero_grad()
                                softmax_output2 = F.softmax(self.network(x), dim=1)
                                softmax_output2[0,j].backward()
                                new_minus_softmax_grad = deepcopy(x.grad.data)
                                x.grad.zero_()
                                self.network.zero_grad()
                                # reset and evaluate
                                parameter.data = store_data
                                fisher_sum += 1/(2 * self.epsilon)**2 * ((new_plus_log_softmax_grad - new_minus_log_softmax_grad)*(new_plus_softmax_grad - new_minus_softmax_grad))
                return fisher_sum

import torch
import torch.nn as nn
import fisher

network = nn.Sequential(
                nn.Linear(15,4),
                nn.Tanh(),
                nn.Linear(4,3),
                nn.LogSoftmax(dim=1)
    )
epsilon = 1e-3
input_data = torch.randn((1,15))
network.zero_grad()
output = network(input_data).max()
output.backward()
vector = []
for parameter in network.parameters():
    vector.append(parameter.grad.clone())

FISHER = fisher.FISHER_OPERATION(input_data, network, vector, epsilon)
print("The fisher matrix quadratic form:", FISHER.fisher_quadratic_form())
print("The fisher matrix trace:", FISHER.fisher_trace())
print("The fisher information sensitivity:", FISHER.fisher_sensitivity())

0d70bc3a8294046144d000c8ad019694.png

Read more

e9d84be92097882eba0210a590fff9ff.png

98c87aa4a0b598b1ff343be7170b4772.png

6677f4be986e18cdf5c8143ae3d0ca1a.png

7a9c2d8365a369fdadb9a6217f556683.gif

# cast draft   through Avenue #

  Let your words be seen by more people  

How to make more high-quality content reach the reader group in a shorter path , How about reducing the cost of finding quality content for readers ? The answer is : People you don't know .

There are always people you don't know , Know what you want to know .PaperWeekly Maybe it could be a bridge , Push different backgrounds 、 Scholars and academic inspiration in different directions collide with each other , There are more possibilities . 

PaperWeekly Encourage university laboratories or individuals to , Share all kinds of quality content on our platform , It can be Interpretation of the latest paper , It can also be Analysis of academic hot spots Scientific research experience or Competition experience explanation etc. . We have only one purpose , Let knowledge really flow .

  The basic requirements of the manuscript :

• The article is really personal Original works , Not published in public channels , For example, articles published or to be published on other platforms , Please clearly mark  

• It is suggested that  markdown  Format writing , The pictures are sent as attachments , The picture should be clear , No copyright issues

• PaperWeekly Respect the right of authorship , And will be adopted for each original first manuscript , Provide Competitive remuneration in the industry , Specifically, according to the amount of reading and the quality of the article, the ladder system is used for settlement

  Contribution channel :

• Send email :[email protected] 

• Please note your immediate contact information ( WeChat ), So that we can contact the author as soon as we choose the manuscript

• You can also directly add Xiaobian wechat (pwbot02) Quick contribution , remarks : full name - contribute

106fb9d1b8c2638c2de95f3c0ef75e1c.png

△ Long press add PaperWeekly Small make up

Now? , stay 「 You know 」 We can also be found

Go to Zhihu home page and search 「PaperWeekly」

Click on 「 Focus on 」 Subscribe to our column

·

92c0a060dc050f0264447349aade2d80.jpeg

原网站

版权声明
本文为[PaperWeekly]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/185/202207041051098496.html