Spatiotemporal resampling methods for mlr3

Overview

mlr3spatiotempcv

Package website: release | dev

Spatiotemporal resampling methods for mlr3.

tic CRAN Status Coverage status Lifecycle: maturing CodeFactor

This package extends the mlr3 package framework with spatiotemporal resampling and visualization methods.

If you prefer the tidymodels ecosystem, have a look at the {spatialsample} package for spatial sampling methods.

Installation

CRAN version

install.packages("mlr3spatiotempcv")

Development version

remotes::install_github("mlr-org/mlr3spatiotempcv")

# R Universe Repo
install.packages('mlr3spatiotempcv', mlrorg = 'https://mlr-org.r-universe.dev')

Get Started

See the "Get Started" vignette for a quick introduction.

For more detailed information including an usage example see the "Spatiotemporal Analysis" chapter in the mlr3book.

Article "Spatiotemporal Visualization" shows how 3D subplots grids can be created.

Citation

To cite the package in publications, use the output of citation("mlr3spatiotempcv").

Other spatiotemporal resampling packages

This list does not claim to be comprehensive.

Name Language Resources
blockCV R Paper, CRAN
CAST R Paper, CRAN
ENMeval R Paper, CRAN
spatialsample R CRAN
sperrorest R Paper, CRAN
Pyspatialml Python GitHub
spacv Python GitHub

FAQ

Which resampling method should I use?
There is no single-best resampling method. It depends on your dataset characteristics and what your model should is about to predict on. The resampling scheme should reflect the final purpose of the model - this concept is called "target-oriented" resampling. For example, if the model was trained on multiple forest plots and its purpose is to predict something on unknown forest stands, the resampling structure should reflect this.
Are there more resampling methods than the one {mlr3spatiotempcv} offers?
{mlr3spatiotempcv} aims to offer all resampling methods that exist in R. Though this does not mean that it covers all resampling methods. If there are some that you are missing, feel free to open an issue.
How can I use the "blocking" concept of the old {mlr}?
This concept is now supported via the "column roles" concept available in {mlr3} [Task](https://mlr3.mlr-org.com/reference/Task.html) objects. See [this documentation](https://mlr3.mlr-org.com/reference/Resampling.html#grouping-blocking) for more information.
For the methods that offer buffering, how can an appropriate value be chosen?
There is no easy answer to this question. Buffering train and test sets reduces the similarity between both. The degree of this reduction depends on the dataset itself and there is no general approach how to choosen an appropriate buffer size. Some studies used the distance at which the autocorrelation levels off. This buffer distance often removes quite a lot of observations and needs to be calculated first.
Comments
  • Support resampling method based on predefined spatiotemporal groups

    Support resampling method based on predefined spatiotemporal groups

    Just as CAST::CreateSpacetimeFolds() does.

    I am not sure if this approach can work with all currently implemented spatial sampling methods. Even if not, we should support exactly this way of creating resamplings since some people already asked me exactly for this. @HannaMeyer Is there a dedicated name for your method? If not, do you want to make a proposal? :) You can have a look at the current names of the other methods in the README.

    It seems that @jannes-m has added temporal extension support for spcv-coords already. Let's have a look how this works in detail.

    opened by pat-s 15
  • Support presence-background option in

    Support presence-background option in "Spatial Buffer CV"

    If the target has a binary outcome, a presence-background approach (see blockCV::buffering) would be possible. Target needs to be transformed to 0/1 before sampling.

    opened by be-marc 11
  • CRS-related warning in autoplot seems incoherent

    CRS-related warning in autoplot seems incoherent

    autoplot throws the following warning when some of the resampling methods are applied to the ecuador task (see application of methods "spcv_disc" and "spcv_block" in the manuscript):

    CRS not set, transforming to WGS84 (EPSG: 4326).

    This doesn't seem to make sense, as a transformation to WGS84 is not possible when the source CRS is unknown. As far as I can see the ecuador dataset and task contains only UTM coordinates, not even lat/lon, therefore it is also not possibly to just assume that WGS84 is present or to guess which UTM zone is applicable. Just a minor issue, but potentially confusing...

    Priority: Low 
    opened by alexanderbrenning 9
  • Instantiate spcv_coords for AutoTuner

    Instantiate spcv_coords for AutoTuner

    Dear mlr3 team,

    first of all, thanks for your efforts in developing this extension package, it is very much appreciated.

    I am trying to apply spatial CV using "spcv_coords" to an AutoTuner in order to retrieve nested resampling following the process described in the mlr3 book

            RT.at_sp <- AutoTuner$new(
              learner = reg.tree,
              resampling = spatial_CV, 
              measure = opt.mse,
              search_space = param_set_RT,
              terminator = trm.evals,
              tuner = tnr.GridSearch)
    

    However, I end up with the error message:

            "Error: Resampling 'spcv_coords' may not be instantiated". 
    

    The same error message remains, even if I try to instantiate the task manually beforehand using the command

            spatial_CV$instantiate(sp_task)
    

    as described in 2.5.2.

    As I am not an expert, do I make something wrong, or is spatial CV not yet implemented for use with AutoTuner?

    Thank you very much! BR, Jürgen

    Type: Question 
    opened by jue-d 9
  • Planar versus great circle distance

    Planar versus great circle distance

    When using spatial object with unprojected CRS (i.e. lat/lon), does mlr3spatiotempcv use great circle distance (on the ellipsoid) or Euclidean distances based on lat/lon values? Is this handled consistently across resampling tools, e.g. buffering and clustering? This should be clarified in the paper, but perhaps it should be turned into a feature request...

    Priority: Low Type: Question 
    opened by alexanderbrenning 8
  • Visualization

    Visualization

    #6

    plot.ResamplingSpCVBlock, plot.ResamplingSpCVEnv, plot.ResamplingSpCVKmeans are the same. We could create a super class to just have one plotting function. plot.ResamplingSpCVBuffer is different because it is a leave-one-out cross-validation, which cannot be visualized in the same way.

    • [x] Single fold plot

    • [x] Single train-test plot

    • [x] Multi train-test plot

    • [x] Unify redundant code

    • [x] Update documentation

    • [x] add tests

    Examples are in inst/mlr3spatiotemporal_test.R at the end of the file.

    opened by be-marc 8
  • Checkerboard pattern with spcv_block?

    Checkerboard pattern with spcv_block?

    Dear mlr3spatiotempcv team,

    First, many thanks for your hard work on this excellent resource.

    I am having an issues producing a checkerboard sampling pattern using spcv_block. Instead of getting a checkerboard spatial partitioning, I always get something that looks more like a random sampling pattern. I have been successful creating a checkerboard pattern using the blockCV functions directly.

    Here is a reproducible example that fails to produce a checkerboard sampling pattern:

    library(blockCV)
    library(mlr3)
    library(mlr3spatiotempcv)
    
    x <- runif(5000, -80.5, -75)
    y <- runif(5000, 39.7, 42)
    
    data <- data.frame(spp="test", 
                       label=factor(round(runif(length(x), 0, 1))),
                       x=x,
                       y=y)
    
    testTask <- TaskClassifST$new(id = "test", 
                                  backend = data, 
                                  target = "label",
                                  positive="1",
                                  extra_args = list(coordinate_names=c("x", "y"),
                                                    crs="EPSG: 4326"))
    
    blockSamp <- rsmp("spcv_block",
                      folds=2,
                      range=50000,
                      selection="checkerboard")
    blockSamp$instantiate(testTask)
    autoplot(blockSamp, testTask)
    

    Rplot01

    Priority: High Status: In Progress Type: Bug 
    opened by fitzLab-AL 7
  • Code Review

    Code Review

    • Be reasonable with dependencies. E.g., we do not need stringr for str_detect, just use grepl() instead.

    • Some examples are in DONTRUN. Can we put them in if (requireNamespace(...)) blocks instead?

    • ResamplingSpCVBuffer looks similar to LOO. If this is the case, the instance should be stored more efficiently.

    • autoplot tests are not working for me or are waaay to slow for unit tests (I'm stuck in there with 100% CPU)

    • blockCV::spatialBlock() seems to call print(). It is convention (maybe even a CRAN policy?) to use message() which you can be suppressed. This should be reported upstream.

    • blockCV is terrible slow (and has a stupid long dependency chain). Is there an alternative?

    • Suggested packages should be explicitly attached with require_namespaces.

    opened by mllg 7
  • `.$folds()` of all Repeated* classes returns wrong fold number

    `.$folds()` of all Repeated* classes returns wrong fold number

    library(mlr3spatiotempcv)
    library(mlr3)
    rsp <- rsmp("repeated_spcv_coords", folds = 3, repeats = 5)
    rsp$instantiate(tsk("ecuador"))
    
    # should return 3
    rsp$folds(6)
    #> [1] 1
    

    Created on 2021-04-17 by the reprex package (v2.0.0)

    This is because in https://github.com/mlr-org/mlr3spatiotempcv/blob/b9ded4ac098655dc00c300b48426bd6d4cd0a97a/R/ResamplingRepeatedSpCVCoords.R#L54 %% is used whereas it should be %/%.

    But there is more to it - I think the method should look like

        folds = function(iters) {
          iters = assert_integerish(iters, any.missing = FALSE, coerce = TRUE)
          n_folds = ((self$iters - 1L) %/% as.integer(self$param_set$values$repeats)) + 1L
    
          if (all(iters <= n_folds)) {
            return(iters)
          } else {
            # modify all entries which are > n_folds
            iters[which(iters > n_folds)] = iters[which(iters > n_folds)] - n_folds
            return(iters)
          }
        }
    
    Priority: High Status: Accepted Type: Bug 
    opened by pat-s 6
  • Expand Table listing all resampling methods

    Expand Table listing all resampling methods

    • [x] add some use cases for each method
    • [x] list more implementations for each method (eventually also in other languages?)

    Also consider to move methods operating in the feature space into a distinct group, e.g.:

    • spatial
    • spatiotemporal
    • feature space
    opened by pat-s 6
  • New `Task*ST` API, consolidate `autoplot()`

    New `Task*ST` API, consolidate `autoplot()`

    • arguments crs, coordinate_names and coords_as_features are now passed directly in the constructor instead of list extra_args
    • added argument label
    • improved as_task_* converters
    • Column role coordinates was renamed to coordinate to cope with the singular naming of column roles
    • Task printer only returns the first 10 coordinate rows
    • Consolidated autoplot() code internally
    • Improved CLUTO test setup

    fixes #116

    opened by pat-s 5
  • Please remove dependencies on **rgdal**, **rgeos**, and/or **maptools**

    Please remove dependencies on **rgdal**, **rgeos**, and/or **maptools**

    This package depends on (depends, imports or suggests) raster and one or more of the retiring packages rgdal, rgeos or maptools (https://r-spatial.org/r/2022/04/12/evolution.html, https://r-spatial.org/r/2022/12/14/evolution2.html). Since raster 3.6.3, all use of external FOSS library functionality has been transferred to terra, making the retiring packages very likely redundant. It would help greatly if you could remove dependencies on the retiring packages as soon as possible.

    opened by rsbivand 0
  • `as_task_*_st` and friends could allow setting column roles directly

    `as_task_*_st` and friends could allow setting column roles directly

    We could support this via the ellipsis. Otherwise setting the respective column roles could be somewhat easily forgotten. On the other hand behaviour would differ than compared to as_task_*() from mlr3 as such custom conversions would not be supported there.

    @be-marc what do you think?

    Example:

    data("cookfarm_sample", package = "mlr3spatiotempcv")
    
    # data.frame
    as_task_regr_st(cookfarm_sample, target = "PHIHOX",
      coords_as_features = FALSE,
      crs = 26911,
      coordinate_names = c("x", "y"),
      column_role_space = "foo",
      column_role_time = "time"
    )
    
    ```
    Priority: Low Status: Review Needed Type: Optimization 
    opened by pat-s 2
  • Longterm play of Task*ST and DataBackends

    Longterm play of Task*ST and DataBackends

    With the addition of spatial DataBackends (DataBackendVector and DataBackendRaster) from {mlr3spatial} multiple combinations of Tasks and Backends are possible:

    • Task + spatial backends
    • Task*ST + non-spatial backend
    • Task*ST + spatial backend

    Moving forward and to simplify both usage and development, we should pick one combination as the "recommended" one and potentially issue warnings for others.

    cc @be-marc

    Priority: Medium Status: In Progress Type: Optimization 
    opened by pat-s 1
  • New SpCV method Zalazar et al.

    New SpCV method Zalazar et al.

    Unfortunately the GH repo leads to a 404. Contacted the author, he wants to fix it.

    https://www.sciencedirect.com/science/article/abs/pii/S0920410521015023

    opened by pat-s 0
  • Temporal CV

    Temporal CV

    I currently have a task with a column that is a date. As the task is to basically predict values in the future, a cross-validation strategy that can take this into account would be required. Similar to see RollingWindowCV. As this is a very common use-case, we should perhaps think about implementing this.

    • This is implemented in mlr3forecasting, but for forecasting tasks instead of regular Classif|Regr Tasks.
    • Where should such a method live? mlr3spatiotempcv ?
    • How would we go about implementing this.
    Priority: Medium Status: Accepted Type: Enhancement 
    opened by pfistfl 13
Releases(v2.0.3)
  • v2.0.3(Nov 19, 2022)

  • v2.0.2(Aug 9, 2022)

    • Add error message when trying to create a TaskClassifST or TaskRegrST from an sf object
    • Synchronize TaskClassifST or TaskRegrST with {mlr3spatial}
    • Add support for mlr_reflections changes in {mlr3} > 0.13.4
    • Adjust "Getting Started" vignette to recent API changes
    • autoplot.ResamplingSptCVCstf(): Add missing support for argument axis_label_fontsize for x and y axes
    Source code(tar.gz)
    Source code(zip)
  • v2.0.1(Jun 23, 2022)

    Bugfixes

    • autoplot.ResamplingSptCVCstf: when multiple folds are requested, the subplots are now returned again (before, the return was empty)
    • autoplot.ResamplingSptCVCstf: the legend item for the "omitted" observations now displays the correct color and label again
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Jun 15, 2022)

    Breaking

    • Rename task cookfarm to cookfarm_mlr3. This was done to distinguish the cookfarm task implementation in {mlr3} better from the original cookfarm dataset. cookfarm_mlr3 also now comes with all rows of the upstream cookfarm task and not with a random subset as before.
    • Rewrite mlr_resampling_spctcv_cstf implementation. The method will produce different fold results compared to {mlr3spatiotempcv} <= 1.0.1. This is because of a change/fix in the sampling behavior: before, an (unwanted) stratified sampling was done on time and space variables. While this matched the upstream implementation in {CAST}, this did not match with the actual theoretical underpinning described in the literature.

    Features

    • Add support for DataBackendRaster (@be-marc, #191).
    • mlr_resampling_spctcv_cstf: a log message returns the column roles from the Task which are used for partitioning
    • The help pages for all methods now describe the methods manually rather than importing the upstream documentation of the respective method.
    • Task*ST classes now print column roles space and time (if set) (#198)
    • autoplot() gains plot_time_var argument for 3D visualizations of mlr_resamplings_sptcv_cstf resamplings with only 'space' used for partitioning (#197)
    • Vignette updates

    Bugfixes

    • All {mlr3spatiotempcv} methods now comply with the {mlr3} man file declaration logic.

    Misc

    • Escape all examples and tests for non-installed packages.
    • The cookfarm_mlr3 task now sets column roles "space" and "time" for variables SOURCEID and Date, respectively.
    • Harden CLUTO tests (#182)
    • Large update for the "spatiotemporal" section in the mlr3book
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Mar 3, 2022)

    • Fixed a issue which caused coordinates to appear in the feature set when a data.frame was supplied (#166, @be-marc)
    • Add autoplot() support for "groups" column role in rsmp("cv")
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Aug 19, 2021)

    Breaking

    • autoplot(): removed argument crs. The CRS is now inferred from the supplied Task. Setting a different CRS than the task might lead to spurious issues and the initial idea of changing the CRS for plotting to have proper axes labeling does not apply (anymore) (#144)

    Features

    • Added autoplot() support for ResamplingCustomCV (#140)

    Bug fixes

    • "spcv_block": Assert error if folds > 2 when selection = "checkerboard" (#150)
    • Fixed row duplication when creating TaskRegrST tasks from sf objects (#152)

    Miscellaneous

    • Upgrade tests to {vdiffr} 1.0.0
    • Add {rgdal} to suggests and required it in "spcv_block" since it is required in {blockCV} >= 2.1.4 and {sf} >= 1.0
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Jun 24, 2021)

    • Upgrade tests to {vdiffr} 1.0.0
    • Add {rgdal} to suggests and required it in "spcv_block" since it is required in {blockCV} >= 2.1.4 and {sf} >= 1.0
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jun 3, 2021)

    Features

    • Support clustering coords only for "sptcv_cluto"
    • Add as_task_* S3 generics: as_task_classif_st.data.frame(), as_task_classif_st.DataBackend(), as_task_classif_st.sf(), as_task_regr_st.data.frame(), as_task_regr_st.DataBackend(), as_task_regr_st.sf(), as_task_classif.TaskClassifST(), as_task_regr.TaskRegrST() (#99)
    • Add "spcv_tiles" and "repeated_spcv_tiles" (#121)
    • Add "spcv_disc" (#115)

    Bug Fixes

    • Fixed train set issues for sptcv_cstf() with space and time var (#135)
    • Fixed $folds() active binding returning wrong fold number (#120)
    • Add missing man IDs (#122)

    Misc

    • Add example 2D spatial plots to spatiotemp-viz vignette
    • Add {caret} to Suggests
    • "Cstf" methods: remove arguments in favor of param set to align with other methods (#122)
    • Inherit documentation from upstream functions (#117)
    • Vignette: Update and categorize table listing all implemented methods
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Apr 13, 2021)

    New Features

    • autoplot.ResamplingSptCVCstf(): add 2D plotting method (#106)
    • autoplot.ResamplingSptCVCstf(): add arguments show_omitted and static_image (#100)
    • autoplot() (all methods): allow adjusting point size via ... (#98)

    Maintenance

    • Remove {GSIF} package due to CRAN archival and host the cookfarm dataset standalone
    • Use Cstf method for spatiotemporal viz vignette
    • Fix help page content of ResamplingRepeatedSptCVCstf (beforehand the Cluto method was referenced accidentally)
    • Fix segfault in autoplot.ResamplingSpcvBlock example when rendering pkgdown site (unclear why this happens when show_labels = TRUE)
    • Update autoplot() examples and related documentation
    • Remove duplicate resources in Tasks "see also" fields
    • Skip a test on Solaris and macOS 3.6
    • Optimize "Spatiotemporal Visualization" vignette
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Mar 20, 2021)

    • Add support for rasterLayer argument in blockCV::spatialBlock() (#94)
    • Ensure that blockCV::spatialBlock() functions actually returns the same result when invoked via {mlr3spatiotempcv} (#93). Among other issues, blockCV::spatialBlock(selection = "checkerboard") was ignored.
    • Get coordinates names from {sf} objects dynamically. Before some functions would have errored if the coordinate names were not named "x" and "y".
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Mar 8, 2021)

  • v0.1.1(Jan 6, 2021)

  • v0.1.0(Dec 26, 2020)

  • v0.0.0.9006(Oct 27, 2020)

Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on [ Paper ] [ Project Page ] This repository contains the code fo

Andrew Jong 97 Dec 13, 2022
particle tracking model, works with the ROMS output file(qck.nc, his.nc)

particle-tracking-model-for-ROMS particle tracking model, works with the ROMS output file(qck.nc, his.nc) description this is a 2-dimensional particle

xusheng 1 Jan 11, 2022
Automatic deep learning for image classification.

AutoDL AutoDL automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few line

wenqi 2 Oct 12, 2022
Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation

CorrNet This project provides the code and results for 'Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation'

Gongyang Li 13 Nov 03, 2022
HGCAE Pytorch implementation. CVPR2021 accepted.

Hyperbolic Graph Convolutional Auto-Encoders Accepted to CVPR2021 🎉 Official PyTorch code of Unsupervised Hyperbolic Representation Learning via Mess

Junho Cho 37 Nov 13, 2022
[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

Wenhao Wu 114 Nov 27, 2022
Implementation of CVPR'21: RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction

RfD-Net [Project Page] [Paper] [Video] RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction Yinyu Nie, Ji Hou, Xiaoguang Han, Matthi

Yinyu Nie 162 Jan 06, 2023
PyTorch implementation of the cross-modality generative model that synthesizes dance from music.

Dancing to Music PyTorch implementation of the cross-modality generative model that synthesizes dance from music. Paper Hsin-Ying Lee, Xiaodong Yang,

NVIDIA Research Projects 485 Dec 26, 2022
Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Introduction 1. Usage (For MSS) 1.1 Prepare running environment 1.2 Use pretrained model 1.3 Train new MSS models from scratch 1.3.1 How to train 1.3.

Leo 100 Dec 25, 2022
A model that attempts to learn and benefit from data collected on card counting.

A model that attempts to learn and benefit from data collected on card counting. A decision tree like model is built to win more often than loose and increase the bet of the player appropriately to c

1 Dec 17, 2021
A implemetation of the LRCN in mxnet

A implemetation of the LRCN in mxnet ##Abstract LRCN is a combination of CNN and RNN ##Installation Download UCF101 dataset ./avi2jpg.sh to split the

44 Aug 25, 2022
Fast Soft Color Segmentation

Fast Soft Color Segmentation

3 Oct 29, 2022
Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

Faster R-CNN pretrained on VisualGenome This repository modifies maskrcnn-benchmark for object detection and attribute prediction on VisualGenome data

Shizhe Chen 7 Apr 20, 2021
Wav2Vec for speech recognition, classification, and audio classification

Soxan در زبان پارسی به نام سخن This repository consists of models, scripts, and notebooks that help you to use all the benefits of Wav2Vec 2.0 in your

Mehrdad Farahani 140 Dec 15, 2022
A modular domain adaptation library written in PyTorch.

A modular domain adaptation library written in PyTorch.

Kevin Musgrave 225 Dec 29, 2022
Multi-view 3D reconstruction using neural rendering. Unofficial implementation of UNISURF, VolSDF, NeuS and more.

Volume rendering + 3D implicit surface Showcase What? previous: surface rendering; now: volume rendering previous: NeRF's volume density; now: implici

Jianfei Guo 682 Jan 04, 2023
OpenCV, MediaPipe Pose Estimation, Affine Transform for Icon Overlay

Yoga Pose Identification and Icon Matching Project Goal Detect yoga poses performed by a user and overlay a corresponding icon image. Running the main

Anna Garverick 1 Dec 03, 2021
(ICONIP 2020) MobileHand: Real-time 3D Hand Shape and Pose Estimation from Color Image

MobileHand: Real-time 3D Hand Shape and Pose Estimation from Color Image This repo contains the source code for MobileHand, real-time estimation of 3D

90 Dec 12, 2022
Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5)

YOLOv5-GUI 🎉 YOLOv5算法(ver.6及ver.5)的Qt-GUI实现 🎉 Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5). 基于YOLOv5的v5版本和v6版本及Javacr大佬的UI逻辑进行编写

EricFang 12 Dec 28, 2022