R-Drop: Regularized Dropout for Neural Networks
R-drop is a simple yet very effective regularization method built upon dropout, by minimizing the bidirectional KL-divergence of the output distributions of any pair of sub models sampled from dropout in model training.
R-Drop is capable to handle many tasks for both NLP and CV: