当前位置:网站首页>[target detection] tph-yolov5: UAV target detection based on Transformer's improved yolov5
[target detection] tph-yolov5: UAV target detection based on Transformer's improved yolov5
2022-07-25 16:47:00 【zstar-_】
brief introduction
Recently in use VisDrone Data set as target detection task , See this TPH-YOLOv5 This model is in VisDrone2021 testset-challenge The detection effect on the data set ranks fifth ,mAP achieve 39.18%.
So I began to read its paper , And run its code .
Address of thesis :https://arxiv.org/pdf/2108.11539.pdf
Project address :https://github.com/cv516Buaa/tph-yolov5
VisDrone Dataset Download :https://pan.baidu.com/s/1JzRTeSi_LgdUVhwtbWhA_w?pwd=8888
solve the problem
TPH-YOLOv5 It aims to solve two problems in UAV image :
- Because drones fly at different altitudes , The scale of the object changes violently
- High speed and low altitude flight bring motion blur to densely arranged objects .
Main improvements
TPH-YOLOv5 Is in YOLOv5 The following improvements have been made on the basis of :
- 1、 A new detection head is added to detect smaller scale objects
- 2、 use transformer prediction heads(TPH) Replace the original prediction header
- 3、 take CBAM Integrated into the YOLOv5 in , Help the network find areas of interest in the images covered by large areas .
- 4、 Other series of small tricks
New detection head

The new detection head is not difficult to understand , In my previous blog post 【 object detection 】YOLOv5 An improved model for small target detection / Add frame rate detection Also mentioned this improvement idea .
The overall structure of the improved network is as follows :
TPH
The author uses a Transformer Encoder Instead of some convolution sum CSP structure , take Transformer Application in vision , It is also the current mainstream trend ,Transformer It has a unique attention mechanism , The effect is better than before .

CBAM

CBAM(Convolutional Block Attention Module) It is a new design structure proposed by the author . As shown in the figure , A feature map is input to the next processing unit , Will first calculate its channel attention and spatial attention in parallel , Then it is fused and reshaped , This will make later processing units pay more attention to (focus on) Valuable target areas .
summary , This paper is written by Chinese , The structure and ideas of the paper are very consistent with the cognitive habits of Chinese people , It reads very smoothly .
actual combat
Next I'll use TPH-YOLOv5 Yes Visdron Data sets are trained . Because the code is based on YOLOv5 To modify , So I'm familiar with YOLOv5 Our readers can easily get through .
It is worth noting that , The author provides two model structures , The first is yolov5l-xs-tph.yaml This model structure , Not used CBAM, It's just YOLOv5 6.0 A new detection head is added to the version , I guess it's used in Ablation Experiment . If you need to run, the best effect , You should use yolov5l-xs-tr-cbam-spp-bifpn.yaml This model structure .
meanwhile , The author provides two pre training models , Then I will put it at the end of the article for readers to download .
I use Visdron Dataset training 100epoch after , Take an online video to detect , and YOLOv5 5.0,6.1 Compare the results of version , The effect is shown in the following video .
YOLOv5/TPH-YOLOv5 Test effect comparison test
B standing Link:https://www.bilibili.com/video/BV17a411u7JD( Go to B It's better to stand with one button for three times )
We can see that the actual effect is quite obvious ,TPH-YOLOv5 The recognition effect of dense crowds has been significantly improved .
I also share the test video :https://pan.baidu.com/s/1jgTonbDYmONkqvLjhLPpRQ?pwd=8888
The test effect of using other models can @ Let me be healthy for a while .
Test data is attached :
| Algorithm | [email protected] | [email protected]:.95s |
|---|---|---|
| yolov5-5.0 | 34.9% | 20.6% |
| yolov5-6.1 | 33.1% | 18.7% |
| tph-yolov5 | 37.4% | 21.7% |
notes : It's just 100 individual epoch What you get best.pt Test results , Not achieving optimal performance .
Code backup
Attached separately TPH-YOLOv5 Code local backup ( Include two pre training weights provided by the author ):https://pan.baidu.com/s/15mVle5Exghu3jJMFyl9Lyg?pwd=8888
边栏推荐
- Rebudget: balance efficiency and fairness in market-based multi-core resource allocation by reallocating the budget at run time
- 复旦大学EMBA同学同行专题:始终将消费者的价值放在最重要的位置
- ILSSI认证|六西格玛DMAIC的历程
- Slf4j and log4j2 process logs
- 城市燃气安全再拉警钟,如何防患于未“燃”?
- 异常处理机制专题1
- 如何安装govendor并打开项目
- Rosen's QT journey 100 QML four standard dialog boxes (color, font, file, promotion)
- [MySQL] takes you to the database
- Understanding service governance in distributed development
猜你喜欢
随机推荐
中国芯片自给率大幅提升,导致外国芯片库存高企而损失惨重,美国芯片可谓捧起石头砸自己的脚...
聊聊如何用 Redis 实现分布式锁?
Chain game development ready-made version chain game system development detailed principle chain game source code delivery
Talk about how to use redis to realize distributed locks?
Rebudget汇报PPT
How to deploy applications on IPFs using 4everland cli
Ilssi certification | the course of Six Sigma DMAIC
Exception handling mechanism topic 1
【obs】发送前丢帧及帧优先级
80篇国产数据库实操文档汇总(含TiDB、达梦、openGauss等)
复旦大学EMBA2022毕业季丨毕业不忘初心 荣耀再上征程
[cloud co creation] explore how gaussdb helps ICBC create core financial data
Breakthrough in core technology of the large humanoid Service Robot Walker x
C Music
mindoc制作思维导图
MySQL view
Sum arrays with recursion
China's chip self-sufficiency rate has increased significantly, resulting in high foreign chip inventories and heavy losses. American chips can be said to have thrown themselves in the foot
ILSSI认证|六西格玛DMAIC的历程
数据分析与隐私安全成 Web3.0 成败关键因素,企业如何布局?









