Winners of DrivenData's Overhead Geopose Challenge

Overview



Banner Image

Images shown are from the public Urban Semantic 3D Dataset, provided courtesy of DigitalGlobe

Goal of the Competition

Overhead satellite imagery provides critical, time-sensitive information for use in arenas such as disaster response, navigation, and security. Most current methods for using aerial imagery assume images are taken from directly overhead, known as near-nadir. However, the first images available are often taken from an angle — they are oblique. Effects from these camera orientations complicate useful tasks such as change detection, vision-aided navigation, and map alignment.

In this challenge, participants made satellite imagery taken from a significant angle more useful for time-sensitive applications such as disaster and emergency response

What's in This Repository

This repository contains code from winning competitors in the Overhead Geopose Challenge.

Winning code for other DrivenData competitions is available in the competition-winners repository.

Winning Submissions

Prediction Contest

All of the models below build on the solution provided in the benchmark blog post: Overhead Geopose Challenge - Benchmark. Additional solution details can be found in the reports folder inside the directory for each submission.

The weights for each winning model can be downloaded from the National Geospatial-Intelligence Agency's (NGA's) DataPort page.

Place Team or User Public Score Private Score Summary of Model
1 selim_sef 0.902184 0.902459 An EfficientNet V2 L encoder is used instead of the Resnet34 encoder because it has a huge capacity and is less prone to overfitting. The decoder is a UNet with more filters and additional convolution blocks for better handling of fine-grained details. MSE loss would produce imbalance for different cities, depending on building heights. The model is trained with an R2 loss for AGL/MAG outputs, which reflects the final competition metric and is more robust to noisy training data.
2 bloodaxe 0.889955 0.891393 I’ve trained a bunch of UNet-like models and averaged their predictions. Sounds simple, yet I used quite heavy encoders (B6 & B7) and custom-made decoders to produce very accurate height map predictions at original resolution. Another crucial part of the solution was extensive custom data augmentation for height, orientation, scale, GSD, and image RGB values.
3 o__@ 0.882882 0.882801 I ensembled the VFlow-UNet model using a large input resolution and a large backbone without downsampling. Better results were obtained when the model was trained on all images from the training set. The test set contains images of the same location as the images in the training set. This overlap was identified by image matching to improve the prediction results.
4 kbrodt 0.872775 0.873057 The model uses a UNet architecture with various encoders (efficientnet-b{6,7} and senet154) and has only one above-ground level (AGL) head and two heads in the bottleneck for scale and angle. The features are a random 512x512 crop of an aerial image, the city's one hot encoding, and ground sample distance (GSD). The model is trained with mean squared error (MSE) loss function for all targets (AGL, scale, angle) using AdamW optimizer with 1e-4 learning rate.

Model Write-up Bonus

Prediction rank Team or User Public Score Private Score Summary of Model
2 bloodaxe 0.889955 0.891393 See the "Prediction Contest" section above
5 chuchu 0.856847 0.855636 We conducted an empirical upper bound analysis, which suggested that the main errors are from height prediction and the rest are from angle prediction. To overcome the bottlenecks we proposed HR-VFLOW, which takes HRNet as backbone and adopts simple multi-scale fusion as multi-task decoders to predict height, magnitude, angle, and scale simultaneously. To handle the height variance, we first pretrained the model on all four cities and then transferred the pretrained model to each specific city for better city-wise performance.
7 vecxoz 0.852948 0.851828 First, I implemented training with automatic mixed precision in order to speed up training and facilitate experiments with the large architectures. Second, I implemented 7 popular decoder architectures and conducted extensive preliminary research of different combinations of encoders and decoders. For the most promising combinations I ran long training for at least 200 epochs to study best possible scores and training dynamics. Third, I implemented an ensemble using weighted average for height and scale target and circular average for angle target.

Approved for public release, 21-943

Owner
DrivenData
Data science competitions for social good.
DrivenData
Code for the paper 'A High Performance CRF Model for Clothes Parsing'.

Clothes Parsing Overview This code provides an implementation of the research paper: A High Performance CRF Model for Clothes Parsing Edgar Simo-S

Edgar Simo-Serra 119 Nov 21, 2022
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Annoy Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given quer

Spotify 10.6k Jan 04, 2023
Clockwork Variational Autoencoder

Clockwork Variational Autoencoders (CW-VAE) Vaibhav Saxena, Jimmy Ba, Danijar Hafner If you find this code useful, please reference in your paper: @ar

Vaibhav Saxena 35 Nov 06, 2022
code for paper -- "Seamless Satellite-image Synthesis"

Seamless Satellite-image Synthesis by Jialin Zhu and Tom Kelly. Project site. The code of our models borrows heavily from the BicycleGAN repository an

Light 14 Apr 05, 2022
Grammar Induction using a Template Tree Approach

Gitta Gitta ("Grammar Induction using a Template Tree Approach") is a method for inducing context-free grammars. It performs particularly well on data

Thomas Winters 36 Nov 15, 2022
Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

7 Jun 22, 2022
Official implementation for the paper: Generating Smooth Pose Sequences for Diverse Human Motion Prediction

Generating Smooth Pose Sequences for Diverse Human Motion Prediction This is official implementation for the paper Generating Smooth Pose Sequences fo

Wei Mao 28 Dec 10, 2022
A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR'2021 - DPML and MLSys'21 - GNNSys workshops.

FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks A Research-oriented Federated Learning Library and Benchmark Platform

FedML-AI 175 Dec 01, 2022
Pytorch reimplementation of PSM-Net: "Pyramid Stereo Matching Network"

This is a Pytorch Lightning version PSMNet which is based on JiaRenChang/PSMNet. use python main.py to start training. PSM-Net Pytorch reimplementatio

XIAOTIAN LIU 1 Nov 25, 2021
This repository contains a Ruby API for utilizing TensorFlow.

tensorflow.rb Description This repository contains a Ruby API for utilizing TensorFlow. Linux CPU Linux GPU PIP Mac OS CPU Not Configured Not Configur

somatic labs 825 Dec 26, 2022
Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

This repository is the official PyTorch implementation of Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

hippopmonkey 4 Dec 11, 2022
Easy to use Python camera interface for NVIDIA Jetson

JetCam JetCam is an easy to use Python camera interface for NVIDIA Jetson. Works with various USB and CSI cameras using Jetson's Accelerated GStreamer

NVIDIA AI IOT 358 Jan 02, 2023
MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモ

Tokyo2020-Pictogram-using-MediaPipe MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモです。 Tokyo2020Pictgram02.mp4 Requirement mediapipe 0.8.6 or later O

KazuhitoTakahashi 295 Dec 26, 2022
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral) This repo is the official imp

如今我已剑指天涯 46 Dec 21, 2022
TensorFlow-based implementation of "Pyramid Scene Parsing Network".

PSPNet_tensorflow Important Code is fine for inference. However, the training code is just for reference and might be only used for fine-tuning. If yo

HsuanKung Yang 323 Dec 20, 2022
FastCover: A Self-Supervised Learning Framework for Multi-Hop Influence Maximization in Social Networks by Anonymous.

FastCover: A Self-Supervised Learning Framework for Multi-Hop Influence Maximization in Social Networks by Anonymous.

0 Apr 02, 2021
Official repository of the paper "GPR1200: A Benchmark for General-PurposeContent-Based Image Retrieval"

GPR1200 Dataset GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval (ArXiv) Konstantin Schall, Kai Uwe Barthel, Nico Hezel, Klaus J

Visual Computing Group 16 Nov 21, 2022
Graph InfoClust: Leveraging cluster-level node information for unsupervised graph representation learning

Graph-InfoClust-GIC [PAKDD 2021] PAKDD'21 version Graph InfoClust: Maximizing Coarse-Grain Mutual Information in Graphs Preprint version Graph InfoClu

Costas Mavromatis 21 Dec 03, 2022
An educational tool to introduce AI planning concepts using mobile manipulator robots.

JEDAI Explains Decision-Making AI Virtual Machine Image The recommended way of using JEDAI is to use pre-configured Virtual Machine image that is avai

Autonomous Agents and Intelligent Robots 13 Nov 15, 2022
A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

ManhattanSLAM Authors: Raza Yunus, Yanyan Li and Federico Tombari ManhattanSLAM is a real-time SLAM library for RGB-D cameras that computes the camera

117 Dec 28, 2022