当前位置:网站首页>CCVR eases heterogeneous federated learning based on classifier calibration
CCVR eases heterogeneous federated learning based on classifier calibration
2022-08-05 09:12:00 【I love computer vision】
关注公众号,发现CV技术之美
本篇分享 NeurIPS 2021 论文『No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data』,用非IIDClassifier calibration for federated learning on data.
论文链接:https://arxiv.org/abs/2106.05001
01
背景与概述
Author asks questions:为了解决Non-IID问题,因为Non-IIDData distribution can cause significant damage to global model performance;Existing schemes mainly involve performing regularization in local optimization or improving model aggregation schemes on the server,虽然有效,But they lack a deep understanding of how data heterogeneity affects each layer of a deep classification model.The author first conducts an experimental analysis of the representations learned by different layers,发现:1. The classifier layer of the model tends to be more biased,2. Classification performance can be significantly improved by calibrating the classifier after joint training.
基于上述发现,作者提出CCVR方法,Tune the classifier by a dummy representation sampled from an approximate Gaussian mixture model(Retrain the classifier)Thereby greatly improving the global model performance.
The federal study usually useNon-IID数据进行训练,due to different user behaviors,There may be large heterogeneity in the local data of different clients,This will result in unstable model training and slow convergence;
Existing methods are roughly divided into4类:
Client Drift Mitigation:Modify the client's local target,Make the local model consistent with the global model to some extent(Most methods add regularization);
聚合方案:Improve the model fusion mechanism on the server;
数据共享:Bring in public datasets or synthetic data,Helps build a more balanced data distribution on the client or server;
个性化联邦学习:Designed to train personalized models for individual customers,Instead of a Shared global model.
However, the existing algorithms still cannot achieve good performance,为了确定原因,The authors conduct a thorough experimental study of each layer of the deep neural network:具体来说,The authors measure the center-core alignment between representations of the same layer from different client-local models(CKA)相似性;观察结果发现:Comparing different layers of different client models,The classifier was found to have the lowest feature similarity between different local models.
基于上述发现,The author further studied the change of classifier weights,confirms that the classifier tends to be biased towards specific classes,This is the main reason for performance degradation in classification problems.The author then verified through experiments that,Retrain the classifier after training/Calibration strategies are especially useful(through a smallIID数据,Significantly improved classification accuracy),But the method involves privacy,Therefore it cannot be directly deployed in practice.
因此,The author proposes a new privacy protection method,i.e. virtual representation classifier calibration(CCVR),This method corrects the decision boundary of deep networks after federated training(分类器);CCVRBased on approximate Gaussian mixture models in feature space(GMM)and the learned feature extractor to generate a virtual representation,and then recalibrate the classifier.
总得来说,本文贡献如下:
The first systematic study of the use ofFedAvg对非IIDDifferent layers of neural networks trained on data(NN)Research on Hidden Feature Representation of;
expose in AfricaIIDThe main reason for the performance degradation of neural networks trained on data is the classifier;
提出了CCVR,A Simple and General Classifier Calibration Algorithm for Joint Learning,No need to transmit a representation of the original data,Therefore no additional privacy concerns arise.
02
CCVR方法
Retrain the classifier with dummy features
2.1 Understanding Classifier Bias
为了理解Non-IIDHow data affects classification models in federated learning,We conduct experimental studies on heterogeneous local models:基于CIFAR-10,10clients and a7层的卷积神经网络模型,Dividing data based on Dirichlet distribution(α=0.1);具体地说,for each layer in the model,Utilize the recently proposed core-nucleus alignment(CKA)to measure the similarity of output features between two local models given the same input test samples,基于FedAvg对模型进行100round communication training,Each client is optimized in each round of communication10个Epochs.
如下图1所示,shows the three different layers in the local modelCKA特征相似性,We find that features output by deeper layers show lowerCKA相似性.这表明,for the non-IIDfederated model trained on data,Deeper layers have greater heterogeneity among different clients.
图1:CKA相似性
通过对CKASimilarity is averaged,We can obtain a value to approximate the similarity of the feature outputs of each layer for different clients.如下图2所示,illustrates the similarity of approximate hierarchical features,结果表明,与使用IIDdata-trained model compared to,使用非IIDData-trained models consistently have low feature similarity across all layers.更进一步的,对于非IID数据训练,The classifier shows the lowest feature similarity among all layers.分类器层的CKAlow similarity,Explain that the local classifier varies greatly,to fit the local data distribution.
图2:Hierarchy characteristic similarity
接下来,The author also analyzes the local classifier weight vectorL2范数,如下图3所示,下图3The first figure represents the distribution of labels among different clients.在初始训练阶段,The weight specification of the classifier will be biased towards classes with more training samples,在训练结束时,在非IIDData on training than in the modelIIDModels trained on data suffer from heavier bias differences.
图3:of the local classifier weight vectorL2范数
Based on the above observations on the classifier,作者假设:Since the classifier is the layer closest to the local label distribution,It can easily be biased towards heterogeneous local data,Low feature similarity and weights reflected between different local classifiersL2norm difference.从而,作者认为Debiasing classifiers can directly improve classification performance.
2.2 Regularization and Calibration Methods for Classifiers
In order to effectively eliminate the bias of the classifier,The authors considered regularization and calibration methods,
Classifier Weight L2-normalization:The weight vector of the classifier is calculated during the training phase and the inference phase.L2归一化处理(下面公式1);
Classifier Quadratic Regularization:By adding penalties,to limit the weights of the classifier to be close to the global classifier weight vector received from the server(下面公式2);
Classifier Post-calibration with IID Samples:We also consider a post-processing technique to tune the learned classifier,after joint training,We fixed feature extractor,并基于IID样本通过SGDPerform cross-entropy loss optimization to calibrate the classifier,But this calibration strategy requires collection from heterogeneous clientsIID的原始特性,Cannot be applied to real federated learning systems.
Based on the above three methods,结果如下表1所示.我们观察到,regularized classifier weightsl2范数(clsnorm)Effective for low heterogeneity data,But with increasing heterogeneity,help will decrease,Even cause damage model performance;Regularize the classifier parameters(clsprox)始终有效,but only a very small improvement;使Calibrate with all training samplesFedAvgmodel's classifier(calibration),Significant performance improvement for all degrees of data heterogeneity.
表1:The degree of improvement of different methods
For further understanding of classifier calibration techniques,In addition to different number of data samples and byFedAvg和FedProxTraining of different existing federal model was calibrated,结果如下图4所示.The authors observe that even using only1/50Data samples for calibration,Data-based classifier calibration also performed consistently well.The significant performance improvement after tuning the classifier strongly validates the above hypothesis,That is, the damage to the model performance mainly lies in the classifier layer.
图4:The influence of different amount of data
2.3 Classifier calibration with virtual representation
基于上述观察结果,We propose the use of virtual representations(CCVR)Classifier calibration for,In the joint training after the global model,在服务器上运行.CCVRfrom an estimated Gaussian mixture model(GMM)Extract virtual features from,without accessing any real images.假设f和gare the feature extractor and classifier of the global model, respectively,我们将使用fto extract features and estimate the corresponding feature distribution,Then use the generated virtual representation to retraing.
Step1 Eigen distribution estimation:For semantically related tasks such as classification,Features learned by deep neural networks can be approximated by a mixture of Gaussian distributions,在CCVR中,我们假设DThe features of each class follow a Gaussian distribution.The server collects the mean of the client-local data for each class byu和协方差εto estimate this distribution,without access to real data samples or their characteristics.The server first takes the feature extractor of the trained global modelf发送给客户端,针对客户端k,对于类别c,It has a sample size ofNc_k,对于样本j,其特征为Z_ckj=f(X_ckj)计算均值u和协方差ε:
The client then upload the meanu和协方差ε,Then calculate the global mean and global covariance:
Step2 Generate virtual representation:After get the global mean and covariance,server from a Gaussian distributionN(u,ε)Generate a set of ground truth labelscvirtual features ofG_c.CCVRThe last step of the method is to retrain the classifier using the dummy representation,Take the classifier out of the global modelg进行重训练:
CCVRprotect privacy to a certain extent,Because each client only upload their local gaussian statistics,instead of the original representation,And it can be easily combined with some privacy-preserving technologies,to further protect privacy.
03
实验验证
Federated Simulation:基于CIFAR-10、CIFAR-100和CINIC-10数据集的图像分类任务,其中CINIC-10是由ImageNet和CIFAR-10构建的,their samples are very similar,but not from the same distribution;Based on Dirichlet distribution,默认将α设置为0.5.我们使用一个简单的4层CNN+2层MLP作为CIFAR-10模型,对于CIFAR-100和CINIC-10,我们采用MobileNetV2.
Baselines and Implementation:FedAvg、FedProx、FedAvgM以及MOON作为基准;About the number of feature samples to generate the virtual representationMc,在CIFAR-10、CIFAR-100以及CINIC-10On the set, respectively100、500和1000.
性能提升:如下表2所示,given in the applicationCCVRTest accuracy on all datasets before and after,其中OracleIndicates that the entire data is used for classifier calibration,It shows that the upper bound of the classifier calibration.首先可以观察到,Apply Classifier CalibrationCCVRimproved the accuracy of all baseline methods after;By comparing different methods in applicationCCVRand accuracy gain after full data calibration,我们发现FedAvg和MOONThe accuracy of the maximum increase;在CINIC-10上,FedAvgforecasts even outperform all other baselines,这意味着FedAvgMore focus on learning high-quality features,But ignored the study a fair classifier,Further confirms the necessity of classifier calibration.
表2:CVRR性能表现
CCVRUnder what circumstances does it work best:虽然应用CCVR对CIFAR-100有了改进,But the improvement is very small compared to the other two datasets,这是因为,The final accuracy obtained through classifier calibration depends not only on the classifier,also depends on the feature representation.在CIFAR-100中,每个类只有500张训练图像,Therefore the classification task itself is very difficult,The feature representation learned by the model is not strong.但是结果表明,CCVR在CIFAR-100The obtained accuracy is very close to the upper bound,Explain that even with poor feature extractors,CCVR效果也很好.
作者还注意到,CCVR在CINIC-10Huge performance improvement on,为了进一步分析,在下图5中展示了FedAvg在CINIC-10features learned on the datasett-SNE可视化:on the second subplot,We can observe some class dominated the classification results(分类错误,compared to the first subplot),当经过CCVRAfter correcting some misclassified samples.And we find that the classifier's bias is in the green and purple categories,and grey and red.
经过CCVR处理后,When identifying confusing features close to the decision boundary,Does not misclassify as majority class feature,This indicates that the weights of the classifier have been adjusted to be fairer for each class.综上所述,当CCVRWhen applied to models that are well represented but heavily biased classifiers,可能更有效.
图5:t-SNE可视化
how many virtual feature samples to generate:在CCVR中,An important hyperparameter is each classcvirtual features to generateMc的数量.从下图6中可以看出,一般来说,even if only a few features are sampled,It can also significantly improve the classification accuracy;Furthermore, it can be observed,In two more non-uniform distribution(α更小),More samples yields higher precision;At the same time, it can be seen in the third figure that,Accuracy drops when using a moderate number of dummy samples,这表明McMore sensitive when faced with more balanced datasets.
这可以用CCVR的本质来解释:Use a dummy feature distribution to simulate the original feature distribution,If the number of dummy samples is limited,The simulated distribution may deviate from the true characteristic distribution.NIID-0.5的结果表明,当CCVRWhen dealing with a more balanced original distribution,The virtual distribution may deviate from the true distribution.
图6:Exploring the Number of Virtual Feature Samples
04
总结归纳
The authors provide a new perspective to understand why deep learning-based classification models use non-IID数据进行训练时,性能会下降.The author first dissects the neural network,并通过CKASimilarity techniques study the similarity of different layers of different customer models;to observe,The classifiers of different local models are more different than any other layer,and there is a significant bias between the classifiers.
进一步,The authors propose virtual representation classifier calibration(CCVR)方法,Mainly through an approximate Gaussian mixture model(GMM)to sample virtual features for classifier calibration,to avoid uploading raw features to the server.The final experiment also showedCCVR的有效性,It can effectively alleviate the performance damage of the global model under the heterogeneous federated learning system.
END
欢迎加入「图像分类」交流群备注:分类
边栏推荐
- ts/js 函数传参带函数写法
- 最 Cool 的 Kubernetes 网络方案 Cilium 入门教程
- How to replace colors in ps, self-study ps software photoshop2022, replace one color of a picture in ps with another color
- leetcode 剑指 Offer 10- I. 斐波那契数列
- Pagoda measurement - building small and medium-sized homestay hotel management source code
- Creo 9.0 基准特征:基准坐标系
- The toss of MM before going to the street (interesting)
- Creo 9.0 基准特征:基准点
- DPU — 功能特性 — 管理系统的硬件卸载
- DPU — 功能特性 — 存储系统的硬件卸载
猜你喜欢
随机推荐
宝塔实测-搭建中小型民宿酒店管理源码
Excuse me if you want to write data in mysql, with flink - connector - JDBC directly is ok, but I'm in the f
DPU — 功能特性 — 存储系统的硬件卸载
express hot-reload
阿里云存储的数据库是怎么自动加快加载速度的呢www.cxsdkt.cn怎么设置案例?
Luogu: P2574 XOR的艺术 [线段树]
【ASM】字节码操作 方法的初始化 Frame
Thinking and summary of the efficiency of IT R&D/development process specification
Going to book tickets tomorrow, ready to go home~~
让程序员崩溃的N个瞬间(非程序员误入)
mySQL数据库初始化失败,有谁可以指导一下吗
The toss of MM before going to the street (interesting)
Creo 9.0 基准特征:基准平面
放大器OPA855的噪声计算实例
ps怎么替换颜色,自学ps软件photoshop2022,ps一张图片的一种颜色全部替换成另外一种颜色
随时牵手 不要随意分手[转帖]
ECCV 2022 Oral 视频实例分割新SOTA:SeqFormer&IDOL及CVPR 2022 视频实例分割竞赛冠军方案...
最 Cool 的 Kubernetes 网络方案 Cilium 入门教程
The Coolest Kubernetes Network Solution Cilium Getting Started Tutorial
thinkPHP5 realizes clicks (data increment/decrement)