当前位置:网站首页>【MobileNet V3】《Searching for MobileNetV3》
【MobileNet V3】《Searching for MobileNetV3》
2022-07-02 06:26:00 【bryant_meng】


ICCV-2019
文章目录
1 Background and Motivation
【MobileNet】《MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications》(CVPR-2017)
【MobileNet V2】《MobileNetV2:Inverted Residuals and Linear Bottlenecks》(CVPR-2018)
deliver the next generation of high accuracy efficient neural network models to power on-device computer vision
2 Related Work
手动设计网络
reducing the number of parameters -> reducing the number of operations (MAdds) -> reducing the actual measured latencyNAS
cell level -> block levelQuantization
knowledge distillation
3 Advantages / Contributions
NAS + 手动设计组装成 mobilenet v3 backbone,提出了 hard swish 激活函数(swish 改进版),提出了 Lite R-ASPP 分割头(R-ASPP 改进版),在分类、目标检测、分割数据集上速度和精度均有提升
4 Method
1)Network Search
Platform-Aware NAS for Blockwise Search(来自 MnastNet,稍微修改了一下 reward design 的权重)
NetAdapt for Layerwise Search
search per layer for the number of filters
maximizes △ A c c △ l a t e n c y \frac{\bigtriangleup Acc}{\bigtriangleup latency} △latency△Acc
2)Network Improvements
Redesigning Expensive Layers
search 后的网络头尾比较重,进行了优化
头部
channels 32 + ReLU or swish 减小到了 channels 16 + hard swish
尾部

Nonlinearities

s w i s h ( x ) = x ⋅ σ ( x ) swish(x) = x \cdot \sigma(x) swish(x)=x⋅σ(x)
swish activation function 虽然提升了网络精度,但对硬件部署不够友好,增加了计算时间,作者采取了如下的改进(piece-wise linear)
h − s w i s h ( x ) = x R e L U 6 ( x + 3 ) 6 h-swish(x) = x \frac{ReLU6(x+3)}{6} h−swish(x)=x6ReLU6(x+3)
比 relu 慢的
only use h-swish at the second half of the model(we find that most of the benefits swish are realized by using them only in the deeper layers)
Large squeeze-and-excite

v3 相比于 v2,采用了 SE 模块,SE 里面的 sigmoid 也是采用的 hard 形式,也即 R e L U 6 ( x + 3 ) 6 \frac{ReLU6(x+3)}{6} 6ReLU6(x+3)

作者把 SE 模块中的 squeeze fc 固定成 block 中 expand 通道数的 1/4(图 4 红√ 处)
no discernible latency cost
MobileNetV3 Definitions
5 Experiments
use single-threaded large core in all our measurements
5.1 Datasets
- ImageNet
- COCO
- Cityscapes
5.2 Classification


左上角最好




1)Impact of non-linearities
这里的 112 看的不是特别懂,N 越大按道理用的 h-swish 越多,速度要慢一些,怎么还快了
2)Impact of other components
5.3 Detection

mAP 离谱,哈哈
5.4 Segmentation
R-ASPP 基础上改进


6 Conclusion(own) / Future work
Pareto-optimal,帕累托最优(来自百度百科)
帕累托最优(Pareto Optimality),也称为帕累托效率(Pareto efficiency),是指资源分配的一种理想状态,假定固有的一群人和可分配的资源,从一种分配状态到另一种状态的变化中,在没有使任何人境况变坏的前提下,使得至少一个人变得更好,这就是帕累托改进或帕累托最优化。
帕累托最优状态就是不可能再有更多的帕累托改进的余地;换句话说,帕累托改进是达到帕累托最优的路径和方法。 帕累托最优是公平与效率的“理想王国”。是由帕累托提出的。MobileNet V3 = MobileNet v2 + SE + hard-swish activation + half initial layers channel & last block do global average pooling first(来自 盖肉特别慌)
边栏推荐
- 聊天中文语料库对比(附上各资源链接)
- Label propagation
- [in depth learning series (8)]: principles of transform and actual combat
- Machine learning theory learning: perceptron
- 【BiSeNet】《BiSeNet:Bilateral Segmentation Network for Real-time Semantic Segmentation》
- Proof and understanding of pointnet principle
- Point cloud data understanding (step 3 of pointnet Implementation)
- 半监督之mixmatch
- Memory model of program
- SSM supermarket order management system
猜你喜欢
![How do vision transformer work? [interpretation of the paper]](/img/93/5f967b876fbd63c07b8cfe8dd17263.png)
How do vision transformer work? [interpretation of the paper]
![[introduction to information retrieval] Chapter II vocabulary dictionary and inverted record table](/img/3f/09f040baf11ccab82f0fc7cf1e1d20.png)
[introduction to information retrieval] Chapter II vocabulary dictionary and inverted record table

常见CNN网络创新点
![[mixup] mixup: Beyond Imperial Risk Minimization](/img/14/8d6a76b79a2317fa619e6b7bf87f88.png)
[mixup] mixup: Beyond Imperial Risk Minimization

【Mixup】《Mixup:Beyond Empirical Risk Minimization》

Faster-ILOD、maskrcnn_benchmark训练coco数据集及问题汇总

【TCDCN】《Facial landmark detection by deep multi-task learning》

Sparksql data skew
![[tricks] whiteningbert: an easy unsupervised sentence embedding approach](/img/8e/3460fed55f2a21f8178e7b6bf77d56.png)
[tricks] whiteningbert: an easy unsupervised sentence embedding approach

自然辩证辨析题整理
随机推荐
Use matlab to realize: chord cut method, dichotomy, CG method, find zero point and solve equation
ABM thesis translation
程序的内存模型
SSM garbage classification management system
Typeerror in allenlp: object of type tensor is not JSON serializable error
Common CNN network innovations
【AutoAugment】《AutoAugment:Learning Augmentation Policies from Data》
[Sparse to Dense] Sparse to Dense: Depth Prediction from Sparse Depth samples and a Single Image
Faster-ILOD、maskrcnn_benchmark训练自己的voc数据集及问题汇总
Deep learning classification Optimization Practice
Optimization method: meaning of common mathematical symbols
【Sparse-to-Dense】《Sparse-to-Dense:Depth Prediction from Sparse Depth Samples and a Single Image》
Mmdetection model fine tuning
【Wing Loss】《Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks》
Thesis writing tip2
Point cloud data understanding (step 3 of pointnet Implementation)
【Hide-and-Seek】《Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization xxx》
Interpretation of ernie1.0 and ernie2.0 papers
Latex formula normal and italic
Installation and use of image data crawling tool Image Downloader


