当前位置:网站首页>Remote Sensing Image Super-resolution and Object Detection: Benchmark and State of the Art

Remote Sensing Image Super-resolution and Object Detection: Benchmark and State of the Art

2022-07-06 03:25:00 leon. shadow

title

Remote sensing image super classification and target detection benchmark and SOTA.

Abstract

In the past 20 years , People have been studying remote sensing (RS) Object detection method in image . in the majority of cases , The data set used for small target detection in remote sensing images is insufficient . Many researchers use scene classification data sets for target detection , This has its limitations ; for example , In the target category , The number of large targets exceeds that of small targets . therefore , They lack diversity ; This will further affect RS Detection performance of small target detector in image . This paper reviews the current data sets and target detection methods for remote sensing images ( Based on deep learning ). We also propose a large-scale 、 Publicly available remote sensing super-resolution target detection (RSSOD) Data sets .RSSOD Data set from 1,759 Composed of manually marked images , Among them is 22,091 The spatial resolution is about 0.05 Extremely high resolution of meters (VHR) Image instance . There are five categories , The label frequency of each category is different ; Image with YOLO and COCO Format annotation . Images are extracted from satellite images , Including real image distortion , Such as tangential proportional distortion and tilt distortion . Proposed RSSOD Data sets will help researchers benchmark various types of the most advanced target detection methods , Especially for small targets using image super-resolution . We also propose a new multi class ( level ) Cyclic super-resolution generation countermeasure network and residual feature aggregation (MCGR) And auxiliary YOLOv5 detector , To measure target detection based on image super-resolution , And with the existing image-based super-resolution (SR) The most advanced methods of . The proposed MCGR In the image SR Has achieved the most advanced performance , Compared with the most advanced NLSN Methods compared ,PSNR Improved 1.2dB.MCGR For five categories 、 Four types of 、 The best target detection of two classes and one class mAPs Respectively 0.758、0.881、0.841 and 0.983, They exceed the most advanced target detectors YOLOv5、EfficientDet、Faster RCNN、SSD and RetinaNet.

introduction

Object detection and recognition has always been a core problem of computer vision , Its purpose is to locate the target in the image . In the field of remote sensing in the past decade , Target detection is an important task , It is closely related to various applications , for example , Surveying and mapping of geographical resources 、 Crop harvest analysis 、 Disaster management 、 Transportation planning and navigation . Because of its wide coverage , Remote sensing images have various detectable targets , From small to large . therefore , because RS Multiscale of image , Its target detection is a challenging task . but , The latest progress of deep learning provides help for the breakthrough of target detection and location .

The method based on deep learning requires high data , Its detection efficiency depends on the quality and quantity of input data . A comprehensive 、 Challenging data sets will contribute to the progress of target detection methods , for example ,ImageNet and MSCOCO Data set has been the standard of natural scene classification and target detection since its introduction , Most of the most advanced methods use these data sets for evaluation . Again ,UC Merced and NWPU-RESISC45 Data sets promote the progress of scene classification , and ISPRS Vaihingen and 38-Cloud Data set points out the way for the development of remote sensing image semantic segmentation model based on deep learning . Recent data sets such as DOTA and DIOR It deals with conventional targets in remote sensing . These datasets ( In addition to these, there are many ), It has promoted researchers to develop new data-driven target detection and recognition methods in the past decade .

In Optics RS Object detection in image , Interested targets ( in the majority of cases ) It takes up very few pixels , for example , Sampling distance on the ground (GSD) by 0.5 Extremely high resolution of meters (VHR) Image , A car ( The floor area is 4×1.5 Square meters ) Account for only a 8×3( common 24 Pixel ) Pixel lattice of . For small goals , Such as vehicle , A few pixels represent the whole target ; therefore , Recognition and detection become very challenging . One GSD by 0.25 Extremely high resolution of meters (VHR) satellite image , For a size of 4×1.5 Square meters of vehicles , There will be a 96 Pixels (16×6) Area grid . therefore , The latest development of remote sensing target detection is in low resolution (LR) Perform the target detection task in the image , Is to use the concept of image super-resolution , increase LR Spatial resolution of image , just as Courtrai Et al Bashir What others did . It is necessary to benchmark this method of target detection using low resolution images on challenging data sets . therefore , Remote sensing image super-resolution target detection (RSSOD) Data sets will help these researchers in high resolution (HR) Benchmark the method and detection task performed on the image .

The main contributions of this paper are as follows :

  1. A large public data set ,RSSOD, For small target detection in urban environment ( The sample image is shown in Figure 1). This dataset has 22,091 Examples of manual annotation and 5 Categories , Because of its geographic information and the direction of annotation , This data set will provide researchers with a challenging test .
     Insert picture description here

    Figure 1
  2. In our proposal RSSOD The data set tests the most advanced model performance benchmark .

  3. We also propose a new multi class ( level ) Cyclic super-resolution generates countermeasure networks and combines them with residual feature aggregation (MCGR), Used when the scale factor is 2 and 4 Target detection in low resolution images .

Review remote sensing target detection data sets and methods

In the past decade , Researchers have begun to explore target detection data sets and methods for remote sensing . This is mainly due to the progress of satellite sensor design , High quality VHR Images can now be used for deep learning . In this section , We briefly review the current data sets and methods of target detection in remote sensing .

Target detection data set of remote sensing image

Because the image in remote sensing is a top view , And due to the change of sensor design in the satellite payload , The size of the target is diverse , Therefore, the target instance often has great deviation in size and direction . Previous data sets include NWPU-RESISC45、NWPU VHR-10、DIOR、DOTA. In this section , We briefly introduce the current data sets of remote sensing target detection and recognition .

  1. TAS Data sets .TAS Data sets are used to detect vehicles in aerial images , Because it contains 30 Images , share 1319 A car in any direction . The disadvantage is low resolution , Nearby buildings and trees will cast shadows to affect detection .
  2. UC Merced Land-Use Data sets . This is a 21 Class scene classification dataset , Each kind has 100 Zhang 256×256 Image . The image is extracted from the urban area image of the national map of the United States Geological Survey , The spatial resolution is 0.3048m. UC Merced It is one of the most used data sets in remote sensing scene classification , It has overlapping categories , Such as sparse residential areas 、 High density residential areas and medium density residential areas , Make this data set richer , More challenging . Some remote sensing image classification studies include LGFBOVW classifier 、LASC-CNN classifier 、 Hybrid satellite image classification system 、 Structured quantitative learning 、mcODM classifier 、 Classifier based on Feature Engineering .
  3. NWPU Data sets .NWPU The remote sensing image classification data set originally proposed by researchers of is called NWPU-RESISC45, It has 45 Categories , Each category has 700 Zhang image . The spatial resolution of this data set is about 0.2 to 30 rice , The image size is fixed to 256×256. Because of its scale ( share 31,500 Zhang image ), This dataset is widely used in scene classification ; However , The spatial resolution of most images is low , So the other one has VHR The data set of images is called NWPU VHR-10, These include 715 The spatial resolution is 2m Image and 85 The spatial resolution is 8cm Image . In the proposed data set 10 Categories , It also includes negative examples . However , The disadvantage is that it includes some small-scale targets , Because their size is only a few pixels , So it's not marked ( for example , Vehicles ). And NWPU-RESISC45 The difference is , This data set is a target detection data set , And the horizontal bounding box is used in this data set (HBB) notes . Researchers often use these data sets for remote sensing scene classification and target recognition tasks .
  4. VEDAI.Razakarivony This data set of et al. Is used for multi class vehicle detection , Because it contains 3640 Vehicle instances , These include 9 Categories , Namely ship 、 automobile 、 Camping car 、 The plane 、 Pickup 、 Tractor 、 truck 、 Trucks and other categories . This data set includes a total of 1210 Images , The spatial resolution is 12.5 centimeter , Size is 1024×1024. Researchers often use this data set to measure target detection in remote sensing . Another vehicle data set is DLR MVDA, It USES DLR 3K The camera system is 1000 Shoot at a height of meters VHR Images , The spatial resolution is 13 centimeter .DLR MVDA Use the directional bounding box (OBB) Annotate , And use the multi direction box to indicate the direction of the vehicle .
  5. RSC11 Data sets . This data set is also extracted from Google Earth , It includes 11 Scene categories , Including some very similar category scenarios , This makes the classification of this data set difficult . The spatial resolution of this dataset is 0.2 rice , share 1232 Images , One of the images has a resolution of 0.1, The image size is 512×512.
  6. ISPRS Potsdam. This dataset is a semantic segmentation dataset of the international society for Photogrammetry and remote sensing . The dataset contains 38 The size of the opening is 6000×6000、 The spatial resolution is 5 Centimeter VHR Images . This dataset has been widely used in semantic segmentation and target detection tasks , Especially in the urban environment , Because the data set includes six categories , Impervious surface 、 building 、 A low one Vegetation 、 tree 、 Cars and clutter .
  7. RSOD Data sets .Xiao They collected 976 The spatial resolution is 0.5-2 The image of meters . This data set has four categories , That is, oil tank 、 The plane 、 Overpass and playground .
  8. DOTA Data sets .Xia The data set of et al. Is a large target detection data set , share 15 Target categories , Including large targets ( Like a bridge 、 port 、 Basketball Court ) And small targets ( Such as small vehicles ). The data set includes 2,806 Images with different resolutions , There are more than 188k Target instances . Due to significant changes in image size and spatial resolution , The scale of the goal 、 The direction and shape are also different , Make the detection task more challenging .
  9. DIOR Data sets .Li Etc. DIOR Data set containing 23000 A picture , There are more than 192000 Instances and 20 Target categories ( Each category is about 1200 A picture ). Based on HBB Notes , And later released DIOR-R Version uses an updated OBB notes , be used for OBB Target detection task .
     Insert picture description here
    Table 1 describes the proposed RSSOD Comparison of datasets with existing datasets . Compared with other target detection data sets ,RSSOD The dataset has the highest spatial resolution . Data sets have high-resolution images , About 1000×1000 Pixels , Emphasize small goals .
Remote sensing image target detection method

There are two main methods of target detection , That is, a general target detector that takes detection as an end-to-end learning task , Or based on SR Methods , A priori network is used to improve image quality before target detection . This section reviews the current general object detectors , Including some of the most advanced small object detectors and image-based SR Object detector for .

  1. Universal target detector

    In recent years , The method based on deep learning has successfully realized classification 、 Detection and identification tasks . In these tasks , Target detection is one of the most prominent fields , It will be expanded every year , Many target detectors are being developed and released . The detection task can be single-stage , It can also be a two-stage ; In the latter , Will produce region proposals, And then there was bbox Classification task .

    One of the earliest 2 One of the phase detectors is based on a region based convolutional neural network (RCNN), It uses ss Algorithm (Uijlings wait forsomeone ,2013) Make regional proposals , These proposals were passed on to CNN, Used to generate eigenvectors for each proposed region . stay RCNN in , The final classification task is performed by support vector machine (SVM) complete .RCNN The transition to Fast RCNN, It was in the initial RCNN Two major changes have been introduced ; The first change is the introduction of shared feature maps in regions of interest . The second change is to use full connection (FC) Neural networks replace SVM, Used for target classification and bbox Return to ; This allows end-to-end training , Achieve real-time target detection .RCNN Another modification of ,Faster RCNN, The author introduces a cost-free regional proposal network (RPN), This is in Pascal VOC The dataset produces 5fps The most advanced results .

    First use one stage The detector is designed to achieve high frame rate at the cost of detection accuracy ; for example ,You-Only-Look-Once(YOLO) Model in Pascal VOC 2007 and 2012 It has been realized. 45 frame , Average accuracy (mAP) by 63.4%, and Faster RCNN With 0.5 Frame realizes mAP by 70.0. therefore , The era of real-time target detector has begun , This opens up the field of real-time target detectors on video streams . the other one one-stage The detector is Single-Shot Detector(SSD) It eliminates the regional proposal stage , Complete the positioning and classification of goals in a single process . A bounding box with a fixed size is used for target detection , The final detection is performed using non maximum suppression . By means of SSD Add focus loss ,RetinaNet Can surpass all the most advanced two-stage Accuracy of the detector , Keep at the same time one-stage The speed of .

    Redmon And so forth YOLO The first three versions of ;YOLO-v2 and YOLO-v3. These new detectors incorporate deeper back ends CNN、 Residual block and skip connection , Produced the most advanced performance , Velocity ratio RetinaNet high 3.8 times .2020 year ,Bochkovskiy And so forth YOLOv4, It leverages a new back-end network , be called CSPDarknet53, Have spatial attention and Mish-activation, And YOLOv3 comparison ,mAP and fps( stay MS COCO Dataset On ) Improved respectively 10% and 12%. stay YOLOv4 Within one month ,Jocher Released YOLOv5 edition , This version has achieved the most advanced performance mAP and fps.YOLOv5x6 Model in MS COCO Val 2017 Highest reported on dataset mAP by 55.4%, The reasoning time of each image is 19.4ms, Currently, it ranks first among the most advanced target detectors ; Comparison and comparison of various models EfficientDet See Figure 2.
     Insert picture description here

  2. RS Detector for small and medium-sized objects in the image

    Here we discuss some of the most advanced images SR Method , be used for LR To HR Graphic SR, These methods have recently been further used by different researchers RS Small target detection of image .

    1. The most advanced image SR Method

      Here we discuss some of the most advanced images SR Method .2017 year ,Lim Et al. Introduced enhanced depth SR The Internet (EDSR). lately (Rabbi wait forsomeone ,2020;Shermeyer & Van Etten,2019;Wei & Liu,2021),EDSR It has become an integral part of small target detection in remote sensing images , But the reported detection of this kind of detector mAP Very low .

      Wang And others on the existing SR Generative antagonistic network (GAN) Propose improvements to the architecture , And it is proposed to be used in real images SR Of ESRGAN.Wang Et al. Introduced the residual dense block (RRDB) And confrontational and perceptual losses .Wang Et al. Reported further improvements , They train with pure synthetic data ESRGAN, Use high-order degradation modeling close to the real world .

      Liang And so forth SwinIR, By means of SwinIR Three parts are added to the converter : Shallow feature extraction 、 Deep feature extraction and use Residual Swin Transformer Blocks(RSTB) High quality image reconstruction , Use RSTB To solve the image SR.SwinIR stay DIV2K and Flickr2K State of the art performance on datasets .

      In the image SR in , The overall performance of the model depends on what is used to generate LR Image degradation model ; Most models consider additional factors , As fuzzy ; But the degradation of the real world is still diverse ; To solve this problem ,Zhang Et al. BSRGAN A more practical degradation model is proposed in .BSRGAN Generated by degenerate model LR The image has random blue shuffle、 Down sampling and noise degradation , Make degradation more real .

      stay DRN in , A double regression network is proposed to learn LR To HR And the corresponding degenerate mapping function , namely HR To LR, It learned the down sampling kernel . And EDSR、RCAN、RRDB And so on , The method in PSNR And the number of parameters has achieved the most advanced performance .

      Mei Et al. Proposed a method for image SR Nonlocal sparse attention (NLSA) Method ; Nonlocal attention is to divide the input into hash buckets with relevant features through spherical hashing , Thus, we don't pay attention to the places with less noise and information in the learning process . The network is named NLSN, That is, non local sparse network , It's in Set14、B100、Urban100、Manga109 State of the art performance on datasets .

    2. Image based SR Small target detector

      The size of the target is a key parameter in the task of target detection , Especially the whole reasoning image . At present, the most advanced detectors can work effectively on large and medium-sized targets , But when it comes to small-scale targets ( That is, the size is in a few pixels or occupies the entire image size 5% following ) when , The performance of general-purpose target detector will be doubled . Because of the small size , The characteristics of small targets cannot be distinguished from those of other categories , This leads to the inaccuracy of the test . One way to improve detection performance is by simply copying small targets , Oversampling small targets of interest , So as to enhance the data . Enhancement technology increases the possibility of overlap with predictions and ground reality , Thus, the prediction accuracy is improved . However , Due to the overlap in the enhancement process , This technology will reduce the detection accuracy of other target categories ; Even if the overlap with other goals is 0, Because the target is pasted on the background , Negative samples will also decrease , This may increase the false positive rate in the test task . Another method of small target detection is to train small targets and large targets on multiple resolutions, as Park What others did .

      For small target detection , Auxiliary image SR The network can improve the spatial resolution of the data set before the actual detection task , just as Courtrai Et al Bashir What others did . In recent years , Images SR The network 2× and 4× Great achievements have been made in the proportion coefficient of . Some methods based on deep learning are from LR Generate HR Images , Including single image SR(SISR), It performs by taking a single image as input SR Mission . A detailed review of all image super-resolution methods , Including traditional and deep learning based methods , We encourage readers to read Yang Et al Bashir Articles of others .

      Ferdous And so forth two-stage The detector uses GAN Image SR And use SSD Target detection for , and Zhang Others use weak supervised learning , Use the pseudo tag generation method to learn RS Object detection in image . in addition ,Rabbi And so on ESRGAN and Edge-Enhanced GAN(EEGAN) An end-to-end small target detection network is developed , Use Faster RCNN and SSD Target detection .

RRSOD Data sets

In this section , We introduced RSSOD Data sets and about data collection 、 Category selection 、 Annotation method 、 Data segmentation 、 Image size 、 Spatial resolution and object direction information .

Image collection

We reviewed remote sensing scene classification and target detection data sets ( See table 1), Aware of the lack of high-resolution urban object detection data sets . We learned from other tasks that have been made public ( Such as scene classification ) Collect data in a centralized way . The city image is from ISPRS Potsdam Semantic segmentation data set extraction , Among them, we start from 38 Zhang VHR Extract... From the image 1000×1000 Pixel sized blocks , The overlap is 100 Pixels . For reproducibility , We use the same name as the original dataset , And use an additional two digits to represent the block number ; A total of 36 Block , That is to say, there is 1368 Zhang image .RSSOD The overall image selection of the dataset is shown in table 2. The average image resolution is 856×853.
 Insert picture description here
These images are taken by multiple sensors , The direction of the object is random , The position of the object is also random , Pictured 3a Shown . Most objects are relatively small , Pictured 3b Shown .
 Insert picture description here

Category selection and data segmentation

In remote sensing , Interested goals are the most important ; In most object detection data sets , Common categories include vehicles ( Including ground vehicles 、 Ships and aircraft )、 tree 、 building 、 Impervious surfaces and low vegetation . Considering the size of the object , We omit architecture , Because most of the images come from the urban environment , The building occupies most of the image space ; therefore , The final category considered includes vehicles 、 tree 、 The plane 、 Ships and low vegetation . Detailed examples of categories are shown in table 3 in .
 Insert picture description here
Data sets follow the traditional 70:20:10 The proportion of is divided into training 、 Verification and test subset . chart 4 Describes the training 、 Verify and test the overall instance distribution of the image . Training 、 The verification and test examples are 1232、351 and 176.
 Insert picture description here

Annotation method

For marking , We chose to be based on HBB The annotation , And use the Python Open source tag tool OpenLabeling, Mark the image . All objects use YOLO Format manual annotation , Use a point and width , Bounding box (BB) Height , namely :< Class identifier ,x,y,w,h>; among (x, y) Express BB Center of , and (w, h) Indicates width and height respectively .

The marking is improved through three rounds of manual review , Any human error in the label was resolved during the review . Besides , Notes are divided into four groups , There are five 、 four 、 Two and one category . This is to facilitate the detection of different types of objects , And check the performance of the detector on the class subset . chart 5 Shows some from RSSOD Annotation image of .
 Insert picture description here

Use multiple classes ( level ) loop GAN And RFA Small target detection

In this section , We propose a new multi class ( level ) loop GAN, It aggregates with residual characteristics (RFA), be used for RSSOD Object detection of data sets . The proposed network is based on two tasks . Image SR And object detection . We first introduce the proposed image SR The Internet .

Image based SR Target detection MCGR

Let's start with SRResNet and EDSR Image proposed in SR The traditional residual block in is replaced by a block based on RFA The block ( See the picture 6), This block uses Bashir Et al 1×1 The convolution layer aggregates the characteristics of all residual blocks . This feature series enhances the performance of the network and therefore improves SR The quality of the image .
 Insert picture description here
Besides , We use SRGAN The form of modification , This includes based on RFA The residual block sum of is based on Wasserstein GANs Circular network , Use L1 and L2 Loss function to train the network ; Proposed cycle GAN The network is shown in the figure 7.
 Insert picture description here
Color represents meaning and diagram 6 be similar , Light blue represents the reduction layer , Brown represents pixel rearrangement layer . loop GAN( Pictured 7 Shown ) Use a secondary GAN,LRGEN.
from HRGEN Generated HR Generated in the image LR Images . The overall loss function of this cyclic model is shown in the formula 1 Shown :
 Insert picture description here
among ,IHR and ILR yes HR and LR Images ;HRGEN and LRGEN It's corresponding HR and LR generator , Pictured 7 Shown .

By comparison LR-LRGEN, We ensure that the generated LR Image and actual LR be similar , So that the network generated HR The image will also be compared with the actual HR Image similarity . This circular approach ensures two GAN By evaluating the output produced by each other , Minimize the overall loss . Besides , At last, a detection network is used , One YOLOv5 detector , Pictured 8 Shown .
 Insert picture description here
chart 8 The three networks shown in have different loss functions , The total loss function is the generator 、 Weighted sum of loss functions of discriminator and detector . The loss function of the generator network is in the formula (2) Give in :
 Insert picture description here
among ,L(HRGEN) Is the loss of generating networks ,N It's the total number of samples ,HRGEN(IiLR) It's No i Generative HR Images , and IiHR It's No i A reality HR Images .

Use improved WGAN And YOLOv5, One is based on 48 block RFA The generator , Block size is 64×64, Nuclear is 3×3; Use a high gradient penalty coefficient in the discriminator network , Discriminator loss L(Dis) In the formula (3) Give in :
 Insert picture description here
among IHR、ISR and Iran, yes HR、SR And from the {IHR、ISR} Uniformly sampled random images in .HR、SR And the probability density distributions of random images are PHR、PSR and Pran. Gradient penalty coefficient λ To be endowed with 10 High value of . The final YOLOv5 The network has a bbox Loss function , As formula (4) Shown :
 Insert picture description here
among Det and Anc Represents the number of detection grids and anchors , and HR and SR Predicted BB The coordinates of are (x,y,w,h) and (x′,y′,w′,h′) Express .(x, y) and (x′, y′) representative BB The center of , and (w, h) and (w′, h′) Represent reality and prediction respectively BB Width and height . chart 8 The network in is using the total loss function L(Tot) Training , As formula (5) Shown :
 Insert picture description here
Weight factor μ1、μ2 and μ3, Respectively 0.90、10 and 0.10. This choice is used to regulate the errors produced by the Three Networks , So as to bring stability to the training process . therefore , Generated using generator SR Images , Target category recognition and target location are completed by the detection network .

Evaluation methods

The proposed method utilizes images SR The concept of , The image quality is improved before the detection task . therefore , We use image quality indicators , Such as PSNR、SSIM And Task Evaluation ( object detection ) To compare and contrast the proposed SR Compare with the most advanced methods available .

For target detection , We measured [email protected] IoU Cross class object detection , And with the existing use YOLOv5 As an auxiliary detector, the target detection methods are compared . lately , This indicator is used to evaluate the target detector . Because most objects are small , Therefore, in the reasoning task, we choose the low value IoU and Confidence threshold.

Implementation details

The proposed network usage PyTorch The framework and Ubuntu 20.04 Computer , And use Nvidia Of Titan XP Graphics processor . In the training phase , Coefficient of total loss function μ1、μ2 and μ3 Respectively 0.90、10 and 0.10, And the network 100 individual epochs Training for .

RSSOD Benchmark results for

In this section , We shared the use of state-of-the-art images SR And target detection methods in RSSOD Benchmark results on the dataset . The training is carried out on the training set , And verify the verification set , The final evaluation is based on the test set . We also discussed benchmark results , To provide information about MCGR The network is used in remote sensing images RSSOD Data sets for target detection .

Image quality assessment

We use three main quantitative image quality assessments (IQA) Indicators to evaluate SR result , namely MSE、PSNR and SSIM. Besides , We also compared the proposed MCGR And the most advanced NLSN Method in restoring complex texture SR image quality .LR The scale factor of the version is 4; We compared MCGR And HR Images 、Real-ESRGAN、SwinIR-L、BSRGAN、DRN、EDSR、NLSN Result , Pictured 9 and 10 Shown .
 Insert picture description here
Pictured 9 Shown ,Real-ESRGAN、SwinIR-L and BSRGAN Excessive smoothing ; therefore , The texture information of the ground under the car has not been restored . in addition ,DRN Mixed information of adjacent pixels , Resulting in poor image quality .EDSR and NLSN Restore the high-level information in the image , But texture information , Including high-frequency details , Is proposed by MCGR Network to recover . chart 10 Describes the plane image SR result , This confirms the proposed MCGR The advantages of the network .
 Insert picture description here
We are still trying to 11 And the second best way to perform NLSN Compared image SR result . ad locum , The enlarged block describes the proposed MCGR Using a RFA The cycle of GAN from LR Restore the texture of the real world in the image . Color indicates the position of the object , The four boxes on the left side are MCGR Result , On the right is NLSN Result . chart 11 The enlarged image in shows the detail quality restored by the two methods .
 Insert picture description here
Test set IQA The summary of indicators is shown in table 4. The average of bicubic interpolation PSNR by 30.23 dB, Proposed MCGR stay MSE、PSNR and SSIM Has achieved the most advanced performance .MCGR The best PSNR by 34.68 dB. and NLSN、EDSR and DRN Of PSNR Respectively 33.48 dB、33.13 dB and 32.69 dB.MCGR The best MSE by 27.98, and NLSN、EDSR and DRN Of PSNR Respectively 33.48 dB、33.13 dB and 32.69 dB.MCGR Of PSNR by 34.68 dB, and NLSN、EDSR and DRN Of PSNR Respectively 31.17、37.64 and 44.89.MCGR made SSIM by 0.93 The best result of , and NLSN、EDSR and DRN Respectively 0.88、0.89 and 085.
 Insert picture description here
Pictured 9-11 And table 4 Shown , The proposed MCGR In from LR The best effect has been achieved in restoring high-frequency information in the image . ad locum , We also shared the accuracy of use / Recall curve the performance of training four independent networks , The scale factor is 4, Used in five categories 、 Four types of 、 Two kinds and one kind of object detection . The training was 100 individual epochs, Tested mAP And precision / The recall curve is shown in Figure 12.
 Insert picture description here
From the picture 12 The training performance shown can be seen , The proposed MCGR For one and two classes, the convergence speed is very fast ( That is to say 25 individual epoch following ), And for the four classes , The network 40 individual epoch After that, it reached a stable mAP, The lowest tree class mAP by 0.717, And the vehicle category is the highest mAP by 0983. For five classes , Learning becomes a little difficult , Because low vegetation is quite similar to trees , therefore mAP0.5 IoU When we arrived at 0.265. In five 、 four 、 Training of four models trained on two and one class mAP Respectively 0.758、0.881、0.841 and 0.983.

Benchmark evaluation of target detector

In this section , We talked about RSSOD The object detection results on the dataset are similar to the most advanced detectors such as RetinaNet、SSD、Faster RCNN、EfficientDet-D5 And proposed MCGR Comparison . In order to avoid the deviation in the process of test reasoning , We use RSSOD- Training set HR Image to train the target detector with verification set . stay RSSOD- Test set , Enter the dimension as 640×640 The detection results of five types of objects of pixels are shown in table 5. about HR Test image ,YOLOv5 The best test reported mAP by 0.76, And the second best mAP by 0.746, from MCGR Realization . It is also obvious that , With the scale factor (SF) An increase in , The detection performance of general detectors will deteriorate with the reduction of object size . However ,MCGR stay SF by 2 and 4 They reached 0.731 and 0.711 Of mAP.
 Insert picture description here
The proposed network is in HR and LR The image has achieved the most advanced performance , The reasoning time of each image is relatively reasonable ( namely 16.61ms). chart 13 It shows the detection performance of some images on the test set .
 Insert picture description here
Using images SR Of MCGR Detect small objects , Obviously improve the effect of object detection , However , Reasoning time ratio YOLOv5 Four times higher , But this is obviously better than other methods , It shows that the MCGR Superior performance .

On independent datasets MCGR performance

We also have independent data sets DroneDeploy the MCGR Network benchmarking , And in the picture 14 I share 1000×1000 Object detection result of pixel block . To verify the proposed MCGR Performance of .
 Insert picture description here
We are also right Gąsienica-Józkowy Another aerial data set of et al – Floating object aerial data set (AFO) the MCGR test , The data set contains six categories of tiny objects .Gąsienica-Józkowy Proposed Ensembled Method in IoU by 0.50 In this case 0.8216 Object detection for mAP, And ours MCGR Achieved state-of-the-art performance ,mAP by 0.845, See the picture 15. The key challenge is how to learn to occupy less than the total area of the image 1% The characteristics of small objects .
 Insert picture description here

Conclusion and future development direction

RS Object location and detection in images is a continuous research topic ; therefore , Developing state-of-the-art object detectors for environmental remote sensing is the most important . In this paper , We propose a new benchmark for remote sensing object detector RSSOD Data sets , The dataset has highly overlapping categories and complex settings , Emphasize small objects . We also propose a method based on RFA Of MCGR The Internet , Achieved the most advanced image SR Quality and object detection tasks . At present, for vehicles 、 The detection accuracy of aircraft and ships is satisfactory , For the complex characteristics of trees and low vegetation , Further exploration and learning are needed .

Extensive experiments show that , Use images before object detection tasks SR The network helps to improve the accuracy of object detection mAP, The proposed MCGR stay mAP Aspect is more advanced than the most YOLOv5 The difference is higher than 5% and 13%, The scale factor is 2 and 4. Besides , We also shared the results of object detection on an independent dataset , This shows the proposed MCGR The flexibility of the network to detect objects on other data sets .

Reference article

Wang Y, Bashir S M A, Khan M, et al. Remote sensing image super-resolution and object detection: Benchmark and state of the art[J]. Expert Systems with Applications, 2022: 116793.

原网站

版权声明
本文为[leon. shadow]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207060318353895.html