当前位置：网站首页>【MobileNet V3】《Searching for MobileNetV3》

【MobileNet V3】《Searching for MobileNetV3》

2022-07-02 06:26:00 【bryant_meng】

在这里插入图片描述

ICCV-2019

deliver the next generation of high accuracy efficient neural network models to power on-device computer vision

手动设计网络
reducing the number of parameters -> reducing the number of operations (MAdds) -> reducing the actual measured latency
NAS
cell level -> block level
Quantization
knowledge distillation

NAS + 手动设计组装成 mobilenet v3 backbone，提出了 hard swish 激活函数（swish 改进版），提出了 Lite R-ASPP 分割头（R-ASPP 改进版），在分类、目标检测、分割数据集上速度和精度均有提升

1）Network Search

Platform-Aware NAS for Blockwise Search（来自 MnastNet，稍微修改了一下 reward design 的权重）
NetAdapt for Layerwise Search
search per layer for the number of filters
maximizes $\frac{\bigtriangleup Acc}{\bigtriangleup latency}$

2）Network Improvements

Redesigning Expensive Layers
search 后的网络头尾比较重，进行了优化
头部
channels 32 + ReLU or swish 减小到了 channels 16 + hard swish
尾部
Nonlinearities
$\cdot \sigma(x)$
swish activation function 虽然提升了网络精度，但对硬件部署不够友好，增加了计算时间，作者采取了如下的改进（piece-wise linear）
$\frac{ReLU6(x+3)}{6}$
比 relu 慢的
only use h-swish at the second half of the model（we find that most of the benefits swish are realized by using them only in the deeper layers）
Large squeeze-and-excite
v3 相比于 v2，采用了 SE 模块，SE 里面的 sigmoid 也是采用的 hard 形式，也即 $\frac{ReLU6(x+3)}{6}$
作者把 SE 模块中的 squeeze fc 固定成 block 中 expand 通道数的 1/4（图 4 红√ 处）
no discernible latency cost
MobileNetV3 Definitions

use single-threaded large core in all our measurements

f1 左上角最好在这里插入图片描述
左上角最好

在这里插入图片描述

1）Impact of non-linearities

这里的 112 看的不是特别懂，N 越大按道理用的 h-swish 越多，速度要慢一些，怎么还快了

2）Impact of other components
在这里插入图片描述

mAP 离谱，哈哈

R-ASPP 基础上改进
在这里插入图片描述

t7 t8
在这里插入图片描述

Pareto-optimal，帕累托最优（来自百度百科）
帕累托最优（Pareto Optimality），也称为帕累托效率（Pareto efficiency），是指资源分配的一种理想状态，假定固有的一群人和可分配的资源，从一种分配状态到另一种状态的变化中，在没有使任何人境况变坏的前提下，使得至少一个人变得更好，这就是帕累托改进或帕累托最优化。
帕累托最优状态就是不可能再有更多的帕累托改进的余地；换句话说，帕累托改进是达到帕累托最优的路径和方法。帕累托最优是公平与效率的“理想王国”。是由帕累托提出的。
MobileNet V3 = MobileNet v2 + SE + hard-swish activation + half initial layers channel & last block do global average pooling first（来自盖肉特别慌）

版权声明
本文为[bryant_meng]所创，转载请带上原文链接，感谢
https://blog.csdn.net/bryant_meng/article/details/122304735