当前位置：网站首页>Be careful with your dictionaries and boilerplate code

Be careful with your dictionaries and boilerplate code

2022-07-30 21:33:00 【InfoQ】

原始文档：

https://www.yuque.com/lart/blog/gbd39h

This article mainly discusses the coding process,Use a dictionary and boilerplate code,The low-level mistakes made by.

The dictionary keys does not match

Today when modify the code to found a very low-level mistakes before,Function returns a dictionary parameters,The keys of the dictionary to correspond to the wrong.This causes behind me procedures used in the data dictionary indeed and I don't agree with each other.

The key is behind the data dictionary when using,This error will not lead to the program on the corresponding relationship appear abnormal or directly throw an error,Also led to finally I found that it took so long to.

After the emergency repair,Restart the program started the verification to run,Look at how much deviation method before.

The error checking up really difficult.

错误案例：

data={
 &quot;image1.5&quot;: image_0_5,
 &quot;image1.0&quot;: image_1_0,
 &quot;image0.5&quot;: image_1_5,
}

正确形式：

data={
 &quot;image1.5&quot;: image_1_5,
 &quot;image1.0&quot;: image_1_0,
 &quot;image0.5&quot;: image_0_5,
}

All this to the origin to the forIDECompletion function of relying too much on.

In fact because of these values corresponding to the similarity between the variable name,输入“image”之后,Completion prompt list appear in the option may not be the most appropriate content.

In order to faster code validation program,这种情况下,The influence of the subconscious mind more obvious.I quickly pressed the application of the current options“Tab”键.Such unconscious behavior,让“明显”Error into the paper.也如“房间里的大象”一样,Although there is,But it is ignored.Only at the next to implement new functionality and need to modify the original code line by line directly,Just may be Dui eyes,Can't avoid any more.

房间里的大象（Elephant in the room）Is a use English,Metaphorical something even though it's obvious by collective blind、不做讨论（英语：conflict management）Things or risk,Or is it a kind of can't argue against（英语：conflict resolution）Some obvious problems of collective myth.Although this is an English use,In recent years in Chinese also have used or mention.
This phrase refers to the legal norms are clear、Visible matter or thing like an elephant,I don't know why but seem to have been ignored.Or is it refers to a particular social background、Social psychological effects on a more macro environment,Makes people deliberately choose to turn a blind eye to the problem.
维基百科

Then the error can be avoid？

直观来看,This error is similar so we test the so-called“失误”一样,只能尽可能减少,It is difficult to have to cure.

The problem now feel hard to from the tool level is optimized.

Need enough to maintain the strict coding habits.That there are multiple parallel corresponding relation have to be careful when creating.Finish should carefully check the,

This not only include the check artificial reading form,Also contains a defensive programming in the form of.

防御性编程（Defensive programming）是防御式设计的一种具体体现,它是为了保证,对程序的不可预见的使用,不会造成程序功能上的损坏.它可以被看作是为了减少或消除墨菲定律效力的想法.防御式编程主要用于可能被滥用,恶作剧或无意地造成灾难性影响的程序上.
百度百科

The keys of the dictionary is different for,When used in practice,The sequence between them is very important,For example in order to pass parameters,The each value needs to be given to the different parameters on.

Although special circumstances,例如

a+b+c

This difference between key/value pair is disorderly combination in the form of,Even the corresponding relationship between error will not cause any problems,但是这毕竟是少数,And this kind of situation can also be ordered form normal said.

If the corresponding relation between the key value in the back of the code has obvious dependence,It is necessary for us at the code level to examine this relationship.Can take advantage of the assert statement（

assert

）Or with an exception unexpected situation（

if

raise

）来进行约束.

Such as listed above data,At the time of return or use the data,Considering the three corresponding key values of the main difference between the image size,So the three dimensions can be constraints,Simple example to

assert data['image1.5'].shape[-1] > data['image1.0'].shape[-1] > data['image0.5'].shape[-1]

.This way, we need to the logic has a comprehensive understanding of the code,Only in this way can out of a more general form of the relationship between the constraints.

样板代码(boilerplate code)的遗漏

Before the project made another mistake is to let me in distress situation.pytorchIn some boilerplate code,Typical is the following a：

loss = loss_fn(model(X), Y)
optimizer.zero_grad()
loss.backward()
optimizer.step()

If considering the mixed precision application,It is such a form：

optimizer.zero_grad()
with autocast(enabled=args.use_fp16):
 loss = loss_fn(model(X), Y)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

In my mistake is put for gradient0的操作

optimizer.zero_grad()

给漏了.This leads to the gradient constant accumulation,Very easy to appear gradient explosion.Match again when using mixed precision,Due to take a step gradient zooming operations

scaler.scale(loss).backward()

,So won't appear abnormal log,But in model can't normal training.

This error is also very difficult to detect,Especially when training the main script code very much,In the process of code browsing,Very easy to ignore.

So how do you avoid this problem？

I came up with two strategies,自定义code snippetsThe automatic completion or pulled out as a fixed structure,For example, a class or function to call.Above all is to find ways to avoid knock yourself to manual line.

自定义snippet

自定义code snippetsIs the mainstream of the editor is very common and powerful features.Some editor provides even insnippetsCall an external application in the mechanism of.But because we only consider the boilerplate code to write,So in fact does not use too many complicated mechanism,You just need to consider the definition of the cursor position, jump.snippets设定好后,Direct percussion shortcuts,According to the prompt completion can be.

代码模块化

The other is pulled out as fixed code structure,For example to pack these boilerplate code into separate classes or independent function to call,Every time not to write directly,But direct import need to the content of the.

One of the above content packaging shown in the code snippet below,This function is more complicated,Contains additional gradient cut、混合精度,As well as to the export and load the parameters set.Because of the optimizer is not the core of this class,So import and export for gradient onlyscaler.

def clip_grad(params, mode, clip_cfg: dict):
 if mode == &quot;norm&quot;:
 if &quot;max_norm&quot; not in clip_cfg:
 raise ValueError(f&quot;`clip_cfg` must contain `max_norm`.&quot;)
 torch.nn.utils.clip_grad_norm_(
 params, max_norm=clip_cfg.get(&quot;max_norm&quot;), norm_type=clip_cfg.get(&quot;norm_type&quot;, 2.0)
 )
 elif mode == &quot;value&quot;:
 if &quot;clip_value&quot; not in clip_cfg:
 raise ValueError(f&quot;`clip_cfg` must contain `clip_value`.&quot;)
 torch.nn.utils.clip_grad_value_(params, clip_value=clip_cfg.get(&quot;clip_value&quot;))
 else:
 raise NotImplementedError


class Scaler:
 def __init__(
 self, optimizer, use_fp16=False, *, set_to_none=False, clip_grad=False, clip_mode=None, clip_cfg=None
 ) -> None:
 self.optimizer = optimizer
 self.set_to_none = set_to_none
 self.autocast = autocast(enabled=use_fp16)
 self.scaler = GradScaler(enabled=use_fp16)

 if clip_grad:
 self.grad_clip_ops = partial(ops.clip_grad, mode=clip_mode, clip_cfg=clip_cfg)
 else:
 self.grad_clip_ops = None

 def calculate_grad(self, loss):
 self.scaler.scale(loss).backward()
 if self.grad_clip_ops is not None:
 self.scaler.unscale_(self.optimizer)
 self.grad_clip_ops(chain(*[group[&quot;params&quot;] for group in self.optimizer.param_groups]))

 def update_grad(self):
 self.scaler.step(self.optimizer)
 self.scaler.update()
 self.optimizer.zero_grad(set_to_none=self.set_to_none)

 def state_dict(self):
 r&quot;&quot;&quot;
 Returns the state of the scaler as a :class:`dict`. It contains five entries:

 * ``&quot;scale&quot;`` - a Python float containing the current scale
 * ``&quot;growth_factor&quot;`` - a Python float containing the current growth factor
 * ``&quot;backoff_factor&quot;`` - a Python float containing the current backoff factor
 * ``&quot;growth_interval&quot;`` - a Python int containing the current growth interval
 * ``&quot;_growth_tracker&quot;`` - a Python int containing the number of recent consecutive unskipped steps.

 If this instance is not enabled, returns an empty dict.

 .. note::
 If you wish to checkpoint the scaler's state after a particular iteration, :meth:`state_dict`
 should be called after :meth:`update`.
 &quot;&quot;&quot;
 return self.scaler.state_dict()

 def load_state_dict(self, state_dict):
 r&quot;&quot;&quot;
 Loads the scaler state. If this instance is disabled, :meth:`load_state_dict` is a no-op.

 Args:
 state_dict(dict): scaler state. Should be an object returned from a call to :meth:`state_dict`.
 &quot;&quot;&quot;
 self.scaler.load_state_dict(state_dict)

在实际使用中,Can make according to the following form to use.In doing down as the key steps in the process of operation,So, it reduces the probability of loss statement.

scaler = pipeline.Scaler(
 optimizer=optimizer,
 use_fp16=cfg.train.use_amp,
 set_to_none=cfg.train.optimizer.set_to_none,
 clip_grad=cfg.train.grad_clip.enable,
 clip_mode=cfg.train.grad_clip.mode,
 clip_cfg=cfg.train.grad_clip.cfg,
)

with torch.cuda.amp.autocast(enabled=cfg.train.use_amp):
 probs, loss, loss_str = model(
 data=batch_data, iter_percentage=counter.curr_iter / counter.num_total_iters
 )
 loss = loss / cfg.train.grad_acc_step
scaler.calculate_grad(loss=loss)
if counter.every_n_iters(cfg.train.grad_acc_step): # Accumulates scaled gradients.
 scaler.update_grad()