当前位置：网站首页>pytorch with Automatic Mixed Precision(AMP)

pytorch with Automatic Mixed Precision(AMP)

2022-06-09 05:32:00 【kaims】

PyTorch Source code interpretation of torch.cuda.amp: Detailed explanation of automatic mixing accuracy - You know

Automatic Mixed Precision examples — PyTorch 1.9.1 documentation

torch.cuda.amp A more convenient hybrid accuracy training mechanism is provided ：

user There is no need to manually adjust the model parameters dtype transformation ,amp It will automatically select the appropriate numerical precision for the operator
For back propagation ,FP16 Gradient numerical overflow problem ,amp Provides a gradient scaling operation , And before the optimizer updates the parameters , Will automatically adjust the gradient unscaling, therefore , It has no effect on the superparameters used for model optimization

The above two points , By using amp.autocast and amp.GradScaler To achieve .

basic

# Creates model and optimizer in default precision
model = Net().cuda()
optimizer = optim.SGD(model.parameters(), ...)

# Creates a GradScaler once at the beginning of training.
scaler = GradScaler()

for epoch in epochs:
    for input, target in data:
        optimizer.zero_grad()

        # Runs the forward pass with autocasting.
        with autocast():
            output = model(input)
            loss = loss_fn(output, target)

        # Scales loss.  Calls backward() on scaled loss to create scaled gradients.
        # Backward passes under autocast are not recommended.
        # Backward ops run in the same dtype autocast chose for corresponding forward ops.
        scaler.scale(loss).backward()

        # scaler.step() first unscales the gradients of the optimizer's assigned params.
        # If these gradients do not contain infs or NaNs, optimizer.step() is then called,
        # otherwise, optimizer.step() is skipped.
        scaler.step(optimizer)

        # Updates the scale for next iteration.
        scaler.update()

gradient clipping

scaler = GradScaler()

for epoch in epochs:
    for input, target in data:
        optimizer.zero_grad()
        with autocast():
            output = model(input)
            loss = loss_fn(output, target)
        scaler.scale(loss).backward()

        # Unscales the gradients of optimizer's assigned params in-place
        scaler.unscale_(optimizer)

        # Since the gradients of optimizer's assigned params are unscaled, clips as usual:
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)

        # optimizer's gradients are already unscaled, so scaler.step does not unscale them,
        # although it still skips optimizer.step() if the gradients contain infs or NaNs.
        scaler.step(optimizer)

        # Updates the scale for next iteration.
        scaler.update()

gradient accumulation

scaler = GradScaler()

for epoch in epochs:
    for i, (input, target) in enumerate(data):
        with autocast():
            output = model(input)
            loss = loss_fn(output, target)
            loss = loss / accumulate_steps

        # Accumulates scaled gradients.
        scaler.scale(loss).backward()

        if i % accumulate_steps == 0:
            # may unscale_ here if desired (e.g., to allow clipping unscaled gradients)
            # unscale  gradient , May not affect clip Of threshold
            scaler.unscale_(optimizer)
            # clip gradient 
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
            
            scaler.step(optimizer)
            scaler.update()
            optimizer.zero_grad()

AMP in DDP

autocast Designed as “thread local” Of , So only in main thread Set up autocast The area is not work Of , therefore , Also need to model Of forward To embellish ：

MyModel(nn.Module):
    ...
    @autocast()
    def forward(self, input):
       ...

Or in forward Set in autocast Area ：

MyModel(nn.Module):
    ...
    def forward(self, input):
        with autocast():
            ...

The first one is in use DDP There was an error （ Show forward Some parameters of are not normally obtained , Unresolved ……）

原网站

版权声明
本文为[kaims]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/160/202206090517441744.html

当前位置：网站首页>pytorch with Automatic Mixed Precision(AMP)

pytorch with Automatic Mixed Precision(AMP)

basic

gradient clipping

gradient accumulation

AMP in DDP

边栏推荐

猜你喜欢

随机推荐