Manifold Mixup
Unofficial implementation of ManifoldMixup (Proceedings of ICML 19) for fast.ai (V2) based on Shivam Saboo's pytorch implementation of manifold mixup, fastai's input mixup implementation plus some improvements/variants that I developped with lessw2020.
This package provides four additional callbacks to the fastai learner :
ManifoldMixup
which implements ManifoldMixupOutputMixup
which implements a variant that does the mixup only on the output of the last layer (this was shown to be more performant on a benchmark and an independant blogpost)DynamicManifoldMixup
which lets you use manifold mixup with a schedule to increase difficulty progressivelyDynamicOutputMixup
which lets you use manifold mixup with a schedule to increase difficulty progressively
Usage
For a minimal demonstration of the various callbacks and their parameters, see the Demo notebook.
Mixup
To use manifold mixup, you need to import manifold_mixup
and pass the corresponding callback to the cbs
argument of your learner :
learner = Learner(data, model, cbs=ManifoldMixup())
learner.fit(8)
The ManifoldMixup
callback takes three parameters :
alpha=0.4
parameter of the beta law used to sample the interpolation weightuse_input_mixup=True
do you want to apply mixup to the inputsmodule_list=None
can be used to pass an explicit list of target modules
The OutputMixup
variant takes only the alpha
parameters.
Dynamic mixup
Dynamic callbackss, which are available via dynamic_mixup
, take three parameters instead of the single alpha
parameter :
alpha_min=0.0
the initial, minimum, value for the parameter of the beta law used to sample the interpolation weight (we recommend keeping it to 0)alpha_max=0.6
the final, maximum, value for the parameter of the beta law used to sample the interpolation weightscheduler=SchedCos
the scheduling function to describe the evolution ofalpha
fromalpha_min
toalpha_max
The default schedulers are SchedLin
, SchedCos
, SchedNo
, SchedExp
and SchedPoly
. See the Annealing section of fastai2's documentation for more informations on available schedulers, ways to combine them and provide your own.
Notes
Which modules will be intrumented by ManifoldMixup ?
ManifoldMixup
tries to establish a sensible list of modules on which to apply mixup:
- it uses a user provided
module_list
if possible - otherwise it uses only the modules wrapped with
ManifoldMixupModule
- if none are found, it defaults to modules with
Block
orBottleneck
in their name (targetting mostly resblocks) - finaly, if needed, it defaults to all modules that are not included in the
non_mixable_module_types
list
The non_mixable_module_types
list contains mostly recurrent layers but you can add elements to it in order to define module classes that should not be used for mixup (do not hesitate to create an issue or start a PR to add common modules to the default list).
When can I use OutputMixup ?
OutputMixup
applies the mixup directly to the output of the last layer. This only works if the loss function contains something like a softmax (and not when it is directly used as it is for regression).
Thus, OutputMixup
cannot be used for regression.
A note on skip-connections / residual-blocks
ManifoldMixup
(this does not apply to OutputMixup
) is greatly degraded when applied inside a residual block. This is due to the mixed-up values becoming incoherent with the output of the skip connection (which have not been mixed).
While this implementation is equiped to work around the problem for U-Net and ResNet like architectures, you might run into problems (negligeable improvements over the baseline) with other network structures. In which case, the best way to apply manifold mixup would be to manually select the modules to be instrumented.
For more unofficial fastai extensions, see the Fastai Extensions Repository.