当前位置：网站首页>Paper reading report

Paper reading report

2022-07-05 06:23:00 【mentalps】

0 2022/6/23-2022/6/25

1. FLAME: Taming Backdoors in Federated Learning

1.1 The contribution of this article

We proposed FLAME, It's a way of targeting FL Defense framework of middle and back door attack , It can eliminate backdoors without affecting the benign performance of the aggregation model . Contrary to the early backdoor defense ,FLAME Applicable to general opponent model , That is, it does not rely on the powerful assumption of the opponent's attack strategy , Nor does it depend on the underlying data distribution of benign and hostile data sets .
We show that , The required Gaussian noise can be fundamentally reduced by the following methods ：a） Apply our clustering method to delete potential malicious model updates ,b） Cut the weight of the local model to an appropriate level , To limit single （ Especially malice ） The impact of models on aggregation models .
We inject noise （ suffer DP inspire ） The required amount of Gaussian noise provides a proof of the noise boundary , To eliminate the contribution of the back door .
We have extensively evaluated the defense framework of real-world data sets from three very different application areas . We show that ,FLAME Reduce the amount of noise required , Therefore, the benign performance of the aggregation model will not be significantly reduced , And direct injection is based on DP Compared with the most advanced defense against noise , It has important advantages .

1.2 Problem setting and goals

Rear door feature description ：
Insert picture description here
Benign models: Benign model ;
Backdoored models: Backdoor attack model ;
Deviations of Backdoored models： The deviation of the back door model ;
$G_{t-1}$ : The deviation between the local model and the global model in the last round ;
$W_{1}^{,},W_{2}^{,},W_{3}^{,},$ : They represent three different backdoor attacks ;
Defensive target ：
stay FL In the environment , General defense that can effectively mitigate backdoor attacks needs to achieve the following goals ：（i） effectiveness ： In order to prevent the opponent from achieving his attack target , The influence of backdoor model update must be eliminated , So that the aggregated global model does not show backdoor behavior .（ii） performance ： The benign performance of the global model must be maintained , To maintain its effectiveness .（iii） Independent of data distribution and attack strategy ： The defense method must be applicable to the general opponent model , That is, it is not required to know the backdoor attack method in advance , Or make assumptions about the specific data distribution of local clients , for example , The data is iid Or not iid.

1.2 FLAME Overview and design

motivation ：
Early work used the difference privacy heuristic noise elimination backdoor of aggregation model . They identify a sufficient amount of noise to be used empirically . However , stay FL Setting up , This is a challenge , Because it is usually impossible to assume that aggregators can access training data , Especially toxic data sets . therefore , A general method is needed to determine how much noise is sufficient to effectively remove the back door . On the other hand , The more noise is injected into the model , The greater the impact on its benign performance .
FLAME summary ：
FLAME Estimated at FL The noise level required for the removal of the rear door in the environment , There is no need for extensive empirical evaluation , There is no need to obtain training data . Besides , In order to effectively limit the amount of noise required ,FLAME A new method based on clustering is used to identify and delete the update of opponent model with great influence , And the dynamic weight clipping method is applied to limit the influence of the model expanded by the opponent to improve performance . Such as §3 Described , We cannot guarantee that all backdoor models can be detected , Because the opponent can completely control the angle and amplitude deviation , Make the model arbitrary and difficult to detect . therefore , Our clustering method aims to remove high attack impact （ The angle deviation is large ） Model of , Not all malicious models . chart 3 It explains the FLAME High level concept of ： Filter 、 Clipping and noise . However , We emphasize , Each of these components needs to be applied very carefully , Because the naive combination of noise and clustering and clipping will lead to bad results , Because it's easy not to ease the back door and / Or worsen the benign performance of the model .
FLAME Design ：
Insert picture description here

FLAME The paired cosine distance is used to measure the angular difference between all model updates , And Application HDBSCAN clustering algorithm . The advantage here is , Even if the opponent enlarges the model update to enhance its influence , Cosine distance will not be affected , Because this does not change the angle between update weight vectors . because HDBSCAN The algorithm clusters the model according to the density of cosine distance distribution , And dynamically determine the required number of clusters .
step ：
1. Server acquisition n User model .
2. Calculation $n$ Cosine similarity between two models .
3. Use dynamic clustering algorithm HDBSCAN Cluster the cosine similarity between two pairs , exceed 50 % The class of is benign update . Other classes are considered outliers , Remove it , Get the rest $L$ A benign model .
4. Yes $n$ Each of the models calculates the Euclidean distance from the current global model $e_{1},e_{2},...,e_{n})$ , And let the value be $S_{t}$ .
5. For each round $L$ A model screened by clustering algorithm , The dynamic adaptive clipping threshold is $\gamma=S_{t}/e_{l}$ .
6. Calculate the trimmed local model $W_{l}=G_{t-1}+(W_{l}-G_{t-1})*MIN(1,\gamma)$ .
7. Give the same weight to the trimmed local model and aggregate to get the global model $G_{t}$ .
8. Based on the differences between local models （ distance ） Get the dynamic adaptive noise $\sigma=\lambda*S_{t}$ , Where the super parameter $\lambda$ Is the noise level factor set according to experience .
9. Get the global model after noise $G_{t}=G_{t}+N(0,\sigma^{2})$ .

0 2022/6/27

1 The Limitations of Federated Learning in Sybil Settings

The main idea：

1. For each client $i$ , The server saves its historical update vector as $H_{i}=\sum_{t}\Delta_{i,t}$ .
2. Look for indicative features .
3. Calculate the cosine similarity between two indicative features in the client history update $cs_{i,j}$ , And order $v_{i}$ For the client $i$ The maximum value of cosine similarity with all other clients .
4. After the cosine similarity between two is calculated , For each client $i$ , If there is another client $j$ Satisfy $v_{j}>v_{i}$ , Then the client $i$ Of $cs_{i,j}$ Amend to read $cs_{i,j}*v_{i}/v_{j}$ . By re measuring cosine similarity , It can further reduce the similarity between honest clients and malicious clients , The similarity between malicious clients and malicious clients is basically unchanged , Thus reducing false positives .
5. Calculate each client $i$ Learning rate of $a_{i}=1-max_{j}(cs_{i})$ Is and divided by $a_{i}$ Maximum of , Standardize and reduce it to 0 To 1 Between .
6. Learning rate for each client $a_{i}$ , Use to 0.5 Centered logit function , Make close to 0 and 1 The learning rate becomes more divergent .
7. Multiply the final learning rate by the corresponding model update , Get the aggregated model update , And use it to update the global model

0 2022/7/1

1 Defending Against Backdoors in Federated Learning with Robust Learning Rate

The main idea：

Server acquisition $n$ Partial update of users .
For each dimension $j$ , Server pair $n$ The number of models $j$ Find the symbolic function with three parameters , Then sum the operation results of the symbolic function , The o $\sum_{k}=sgn(\Delta_{j}^{k})$ .
Set a super parameter of learning threshold $\theta$ , if $|\sum_{k}sgn(\Delta_{j}^{k})|$ , be $\eta_{j}=-\eta$ .
Select an aggregation function to aggregate local updates , The only difference is , The learning rate of each parameter is not fixed $\eta$ , It's what we got before $\eta_{j}$ .