当前位置：网站首页>Federated meta learning with fast convergence and effective communication

Federated meta learning with fast convergence and effective communication

2022-06-12 07:16:00 【Programmer long】

One . Introduce

Data in federated learning is not independent and identically distributed , be based on FedAvg After successful algorithm , The author found a meta learning algorithm MAML There is less data on the client , The scenario of uneven data distribution presents FedMeta frame , As a bridge between meta learning and federated learning . In meta learning , The parameterized algorithm learns slowly from a large number of tasks through the meta training process , In the process of meta training , The algorithm quickly trains a specific model in each task . A task consists of a support set and a query set that are not related to each other . Training specific models on support sets , Then test on the query set , The test results are used to update the algorithm . about FedMeta Come on , The algorithm is maintained on the server and distributed to the client for training . After training , The test results on the query set are uploaded to the server for algorithm update .

Two . Algorithm is introduced

First, let's define
$D_S^T:support\ set$
$D_Q^T:query\ set$
$A : element learn xi count Law$
$\phi: Meta learning parameters$
$\theta_T： Model parameters$
According to the idea of meta learning , We first pass $D_S^T$ Training A The model on f, After updating the output model parameters $\theta_T$ , This step is called inner update（ Internal update ）. After training $\theta_T$ Through our query set $D_Q^T$ To assess the , Calculate the loss of the test $L_{D_Q^T}(\theta_T)$ , Through the loss we can reflect our algorithm $A_\phi$ Training ability on , Finally, we minimize and update our parameters according to this test loss $\phi$ , This step is called outer update（ External update ）. These processes are expressed in data ： Our algorithm $A_\phi$ By optimizing the following objectives ：

$\min_\phi E_{T}[L_{D_Q^T}(\theta_T)]=\min_\phi E_{T}[L_{D_Q^T}(A_\phi (D_S^T))]$

If the maml To see if , At the beginning we had an accident $\phi=\theta$ , And then through $D_S^T$ Training update ( A few steps of gradient descent ) $L_{D_S^T}(\theta)=\frac{1}{|D_S^T|}\sum_{(x,y)}l(f_\theta(x),y)$ bring $\theta = \theta_T$ , after , take $f_{\theta_T}$ stay $D_Q^T$ To test , Get the test loss function $L_{D_S^T}(\theta)=\frac{1}{|D_Q^T|}\sum_{(x',y')}l(f_{\theta_T}(x'),y')$ . The minimization goal above the defined value week can be changed to ：

$\min_\phi E_{T}[L_{D_Q^T}(\theta\ -\ \alpha\nabla L_{D_S^T}(\theta))]$ .

Come here ,meta End of part of , Then there is the federal learning section . How to combine them ？ The author thinks of every client in query set After the test , Gain the loss of the test , At the same time, the corresponding gradient is calculated according to this loss , Send this gradient to the server , After the average gradient of the server , Update the parameters of the server according to this gradient , Finally, pass the parameters back to the client , That is, the client performs inner update and outer update（ Only gradient calculation ）, On the server outer update（ Merge gradient updates ）.
The algorithm process is shown in the figure
Insert picture description here
Here to maml as well as meta learning I'm not sure , as well as query set and support set If you have any questions, please see my previous blog Click here .

Four . Code explanation

Of this algorithm github Address here , A large part of the code is to realize the interaction between the client and the server , I won't go into details here , Focus on the client training process and the server update process .
First, let's look at the training of the client （ Corresponding inner update）

for batch_idx, (x, y) in enumerate(support_data_loader):
    x, y = x.to(self.device), y.to(self.device)
    num_sample = y.size(0)
    pred = self.model(x)
    loss = self.criterion(pred, y)
    #  assessment 
    correct = self.count_correct(pred, y)
    #  Write related records ,  This  loss  It's average 
    support_loss.append(loss.item())
    support_correct.append(correct)
    support_num_sample.append(num_sample)
    #  Calculation  loss  About the derivative of the current parameter ,  And update the parameters of the current network ( Back to  model)
    loss_sum += loss * num_sample
grads = torch.autograd.grad(loss_sum / sum(support_num_sample), list(self.model.parameters()), create_graph=True, retain_graph=True)
for p, g in zip(self.model.parameters(), grads):
    p.data.add_(g.data, alpha=-self.inner_lr)

This is based on support set updated , first for A loop is a calculation of the gradient , the second for A loop is an update parameter
The updated parameters will be used for query set Calculate the loss on （outer update Gradient calculation part of ）

query_loss, query_correct, query_num_sample = [], [], []
loss_sum = 0.0
for batch_idx, (x, y) in enumerate(query_data_loader):
    x, y = x.to(self.device), y.to(self.device)
    num_sample = y.size(0)
    pred = self.model(x)
    loss = self.criterion(pred, y)
    # batch_sum_loss
    #  assessment 
    correct = self.count_correct(pred, y)
    #  Write related records ,  This  loss  It's average 
    query_loss.append(loss.item())
    query_correct.append(correct)
    query_num_sample.append(num_sample)
    #
    loss_sum += loss * num_sample
spt_sz = np.sum(support_num_sample)
qry_sz = np.sum(query_num_sample)
#  The only role of this optimizer is to clear the network of redundant gradient information 
# self.optimizer.zero_grad()
#  Get the gradient of this ,  This gradient is a  tensor
grads = torch.autograd.grad(loss_sum / qry_sz, list(self.model.parameters()))

Then the server will merge and update , Merge gradients and updates

def aggregate_grads_weighted(self, solns, num_samples, weights_before):
    #  Use  adam
    m = len(solns)
    g = []
    for i in range(len(solns[0])):
        # i  Of the current gradient  index
        #  Always  client 1  The shape of the gradient 
        grad_sum = torch.zeros_like(solns[0][i])
        total_sz = 0
        for ic, sz in enumerate(num_samples):
            grad_sum += solns[ic][i] * sz
            total_sz += sz
            #  After accumulation ,  Make a gradient descent 
        g.append(grad_sum / total_sz)
    #  Normal gradient descent  [u - (v * self.outer_lr / m) for u, v in zip(weights_before, g)]
    self.outer_opt.increase_n()
    for i in range(len(weights_before)):
        #  This is a  in-place  Function of 
        self.outer_opt(weights_before[i], g[i], i=i)

In fact, it is calculated according to the gradient of the client and the weighted average of the training amount ,outer_opt Parameter update , The update here uses Adam

原网站

版权声明
本文为[Programmer long]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203010557274696.html

当前位置：网站首页>Federated meta learning with fast convergence and effective communication

Federated meta learning with fast convergence and effective communication

One . Introduce

Two . Algorithm is introduced

$\min_\phi E_{T}[L_{D_Q^T}(\theta_T)]=\min_\phi E_{T}[L_{D_Q^T}(A_\phi (D_S^T))]$

$\min_\phi E_{T}[L_{D_Q^T}(\theta\ -\ \alpha\nabla L_{D_S^T}(\theta))]$ .

Four . Code explanation

边栏推荐

猜你喜欢

随机推荐

当前位置：网站首页>Federated meta learning with fast convergence and effective communication

Federated meta learning with fast convergence and effective communication

One . Introduce

Two . Algorithm is introduced

min ⁡ ϕ E T [ L D Q T ( θ T ) ] = min ⁡ ϕ E T [ L D Q T ( A ϕ ( D S T ) ) ] \min_\phi E_{T}[L_{D_Q^T}(\theta_T)]=\min_\phi E_{T}[L_{D_Q^T}(A_\phi (D_S^T))] ϕmin​ET​[LDQT​​(θT​)]=ϕmin​ET​[LDQT​​(Aϕ​(DST​))]

min ⁡ ϕ E T [ L D Q T ( θ − α ∇ L D S T ( θ ) ) ] \min_\phi E_{T}[L_{D_Q^T}(\theta\ -\ \alpha\nabla L_{D_S^T}(\theta))] ϕmin​ET​[LDQT​​(θ − α∇LDST​​(θ))].

Four . Code explanation

边栏推荐

猜你喜欢

随机推荐

$\min_\phi E_{T}[L_{D_Q^T}(\theta_T)]=\min_\phi E_{T}[L_{D_Q^T}(A_\phi (D_S^T))]$

$\min_\phi E_{T}[L_{D_Q^T}(\theta\ -\ \alpha\nabla L_{D_S^T}(\theta))]$ .