The easiest tool for extracting radiomics features and training ML models on them.

Overview


ClassyRadiomics

License CI Build codecov

Simple pipeline for experimenting with radiomics features

Installation

git clone https://github.com/piotrekwoznicki/ClassyRadiomics.git
cd classrad
pip install -e .

Example - Hydronephrosis detection from CT images:

Extract radiomics features and save them to CSV table

df = pd.read_csv(table_dir / "paths.csv")
extractor = FeatureExtractor(
    df=df,
    out_path=(table_dir / "features.csv"),
    image_col="img_path",
    mask_col="seg_path",
    verbose=True,
)
extractor.extract_features()

Create a dataset from the features table

feature_df = pd.read_csv(table_dir / "features.csv")
data = Dataset(
    dataframe=feature_df,
    features=feature_cols,
    target=label_col="Hydronephrosis",
    task_name="Hydronephrosis detection"
)
data.cross_validation_split_test_from_column(
    column_name="cohort", test_value="control"
)

Select classifiers to compare

classifier_names = [
    "Gaussian Process Classifier",
    "Logistic Regression",
    "SVM",
    "Random Forest",
    "XGBoost",
]
classifiers = [MLClassifier(name) for name in classifier_names]

Create an evaluator to train and evaluate selected classifiers

evaluator = Evaluator(dataset=data, models=classifiers)
evaluator.evaluate_cross_validation()
evaluator.boxplot_by_class()
evaluator.plot_all_cross_validation()
evaluator.plot_test()
Comments
  • Preprocessing features fails during machine learning

    Preprocessing features fails during machine learning

    Describe the bug

    Trying to use Machine Learning in the self-hosted webapp, as well as in example_WORC.ipynb fails.

    Steps/Code to Reproduce

    import pandas as pd
    from pathlib import Path
    from autorad.external.download_WORC import download_WORCDatabase
    
    # Set where we will save our data and results
    base_dir = Path.cwd() / "autorad_tutorial"
    data_dir = base_dir / "data"
    result_dir = base_dir / "results"
    data_dir.mkdir(exist_ok=True, parents=True)
    result_dir.mkdir(exist_ok=True, parents=True)
    
    %load_ext autoreload
    %autoreload 2
    
    download data (it may take a few minutes)
    download_WORCDatabase(
    dataset="Desmoid",
    data_folder=data_dir,
    n_subjects=100,
    )
    
    from autorad.utils.preprocessing import get_paths_with_separate_folder_per_case
    
    # create a table with all the paths
    paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
    paths_df.sample(5)
    
    
    from autorad.data.dataset import ImageDataset
    from autorad.feature_extraction.extractor import FeatureExtractor
    import logging
    
    logging.getLogger().setLevel(logging.CRITICAL)
    
    image_dataset = ImageDataset(
        paths_df,
        ID_colname="ID",
        root_dir=data_dir,
    )
    
    # Let's take a look at the data, plotting random 10 cases
    image_dataset.plot_examples(n=10, window=None)
    
    extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")
    feature_df = extractor.run()
    
    feature_df.head()
    
    label_df = pd.read_csv(data_dir / "labels.csv")
    label_df.sample(5)
    
    from autorad.data.dataset import FeatureDataset
    
    merged_feature_df = feature_df.merge(label_df, left_on="ID",
        right_on="patient_ID", how="left")
    feature_dataset = FeatureDataset(
        merged_feature_df,
        target="diagnosis",
        ID_colname="ID"
    )
    
    splits_path = result_dir / "splits.json"
    feature_dataset.split(method="train_val_test", save_path=splits_path)
    
    from autorad.models.classifier import MLClassifier
    from autorad.training.trainer import Trainer
    
    models = MLClassifier.initialize_default_sklearn_models()
    print(models)
    
    trainer = Trainer(
        dataset=feature_dataset,
        models=models,
        result_dir=result_dir,
        experiment_name="Fibromatosis_vs_sarcoma_classification",
    )
    trainer.run_auto_preprocessing(
            selection_methods=["boruta"],
            oversampling=False,
            )
    
    

    Expected Results

    Initialising the trainer and running preprocessing on the features

    Actual Results

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Input In [15], in <cell line: 7>()
          1 trainer = Trainer(
          2     dataset=feature_dataset,
          3     models=models,
          4     result_dir=result_dir,
          5     experiment_name="Fibromatosis_vs_sarcoma_classification",
          6 )
    ----> 7 trainer.run_auto_preprocessing(
          8         selection_methods=["boruta"],
          9         oversampling=False,
         10         )
    
    File ~/AutoRadiomics/autorad/training/trainer.py:78, in Trainer.run_auto_preprocessing(self, oversampling, selection_methods)
         70 preprocessor = Preprocessor(
         71     normalize=True,
         72     feature_selection_method=selection_method,
         73     oversampling_method=oversampling_method,
         74 )
         75 try:
         76     preprocessed[selection_method][
         77         oversampling_method
    ---> 78     ] = preprocessor.fit_transform(self.dataset.data)
         79 except AssertionError:
         80     log.error(
         81         f"Preprocessing with {selection_method} and {oversampling_method} failed."
         82     )
    
    File ~/AutoRadiomics/autorad/preprocessing/preprocessor.py:66, in Preprocessor.fit_transform(self, data)
         64 result_y = {}
         65 all_features = X.train.columns.tolist()
    ---> 66 X_train_trans, y_train_trans = self.pipeline.fit_transform(
         67     X.train, y.train
         68 )
         69 self.selected_features = self.pipeline["select"].selected_features(
         70     column_names=all_features
         71 )
         72 result_X["train"] = pd.DataFrame(
         73     X_train_trans, columns=self.selected_features
         74 )
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/pipeline.py:434, in Pipeline.fit_transform(self, X, y, **fit_params)
        432 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
        433 if hasattr(last_step, "fit_transform"):
    --> 434     return last_step.fit_transform(Xt, y, **fit_params_last_step)
        435 else:
        436     return last_step.fit(Xt, y, **fit_params_last_step).transform(Xt)
    
    File ~/AutoRadiomics/autorad/feature_selection/selector.py:47, in CoreSelector.fit_transform(self, X, y)
         44 def fit_transform(
         45     self, X: np.ndarray, y: np.ndarray
         46 ) -> tuple[np.ndarray, np.ndarray]:
    ---> 47     self.fit(X, y)
         48     return X[:, self.selected_columns], y
    
    File ~/AutoRadiomics/autorad/feature_selection/selector.py:124, in BorutaSelector.fit(self, X, y, verbose)
        122 with warnings.catch_warnings():
        123     warnings.simplefilter("ignore")
    --> 124     model.fit(X, y)
        125 self.selected_columns = np.where(model.support_)[0].tolist()
        126 if not self.selected_columns:
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:201, in BorutaPy.fit(self, X, y)
        188 def fit(self, X, y):
        189     """
        190     Fits the Boruta feature selection with the provided estimator.
        191 
       (...)
        198         The target values.
        199     """
    --> 201     return self._fit(X, y)
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:251, in BorutaPy._fit(self, X, y)
        249 def _fit(self, X, y):
        250     # check input params
    --> 251     self._check_params(X, y)
        252     self.random_state = check_random_state(self.random_state)
        253     # setup variables for Boruta
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:517, in BorutaPy._check_params(self, X, y)
        513 """
        514 Check hyperparameters as well as X and y before proceeding with fit.
        515 """
        516 # check X and y are consistent len, X is Array and y is column
    --> 517 X, y = check_X_y(X, y)
        518 if self.perc <= 0 or self.perc > 100:
        519     raise ValueError('The percentile should be between 0 and 100.')
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:964, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
        961 if y is None:
        962     raise ValueError("y cannot be None")
    --> 964 X = check_array(
        965     X,
        966     accept_sparse=accept_sparse,
        967     accept_large_sparse=accept_large_sparse,
        968     dtype=dtype,
        969     order=order,
        970     copy=copy,
        971     force_all_finite=force_all_finite,
        972     ensure_2d=ensure_2d,
        973     allow_nd=allow_nd,
        974     ensure_min_samples=ensure_min_samples,
        975     ensure_min_features=ensure_min_features,
        976     estimator=estimator,
        977 )
        979 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric)
        981 check_consistent_length(X, y)
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:746, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
        744         array = array.astype(dtype, casting="unsafe", copy=False)
        745     else:
    --> 746         array = np.asarray(array, order=order, dtype=dtype)
        747 except ComplexWarning as complex_warning:
        748     raise ValueError(
        749         "Complex data not supported\n{}\n".format(array)
        750     ) from complex_warning
    
    ValueError: could not broadcast input array from shape (60,1015) into shape (60,)
    
    opened by wagon-master 3
  • BUG: Time and memory inefficient concating in pandas on every case.

    BUG: Time and memory inefficient concating in pandas on every case.

    In the feature extraction, we concat a pd.DataFrame for every case. AFAIK this construction of a pd.DataFrame leads to a new memory allocation (and copying) every time, which is highly memory inefficient. Especially, when parallelized on many CPUs, combined with the already memory intensive forking in joblib this can lead to OOM-Events (and is slow of course). Wouldn't it be more convenient to return only the feature set, that is currently processed. https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L109-L115 These are subsequently collected in results anyways: https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L135-L144

    opened by laqua-stack 2
  • Feature/add inference mlflow

    Feature/add inference mlflow

    Major changes:

    • fixed training with autologging of training parameters, preprocessor and classifier in MLFlow
    • webapp: added Predict subpage for inference on a single case, giving out class probability and Shap explanation
    • webapp: moved all steps into subpages
    • webapp: added Getting started in the landing page

    Fixes:

    • webapp: fixed extraction params discarding Feature Names selected from Feature Classes
    opened by pwoznicki 1
  • example_WORC.ipynb not being up to date with the repository

    example_WORC.ipynb not being up to date with the repository

    Describe the bug

    In example_WORC.ipynb there are function calls that do not work due to code in the repository being changed while the example_WORC.ipynb code wasn't updated to reflect those changes

    Steps/Code to Reproduce

    import pandas as pd
    from pathlib import Path
    from autorad.external.download_WORC import download_WORCDatabase
    
    # Set where we will save our data and results
    base_dir = Path.cwd() / "autorad_tutorial"
    data_dir = base_dir / "data"
    result_dir = base_dir / "results"
    data_dir.mkdir(exist_ok=True, parents=True)
    result_dir.mkdir(exist_ok=True, parents=True)
    
    %load_ext autoreload
    %autoreload 2
    
    download data (it may take a few minutes)
    download_WORCDatabase(
    dataset="Desmoid",
    data_folder=data_dir,
    n_subjects=100,
    )
    
    
    
    from autorad.data.utils import get_paths_with_separate_folder_per_case  # 1
    
    # create a table with all the paths
    paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
    paths_df.sample(5)
    
    
    from autorad.data.dataset import ImageDataset
    from autorad.feature_extraction.extractor import FeatureExtractor
    import logging
    
    logging.getLogger().setLevel(logging.CRITICAL)
    
    image_dataset = ImageDataset(
        paths_df,
        ID_colname="ID",
        root_dir=data_dir,
    )
    
    # Let's take a look at the data, plotting random 10 cases
    image_dataset.plot_examples(n=10, window=None)
    
    extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") # 2
    feature_df = extractor.run()
    
    

    Expected Results

    1: Importing the function get_paths_with_separate_folder_per_case

    2: Using default_MR.yaml as value for extraction_params

    Actual Results

    1:

    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    Input In [7], in <cell line: 1>()
    ----> 1 from autorad.data.utils import get_paths_with_separate_folder_per_case
          3 # create a table with all the paths
          4 paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
    
    ModuleNotFoundError: No module named 'autorad.data.utils'
    

    2:

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Input In [18], in <cell line: 1>()
    ----> 1 extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml")
          2 feature_df = extractor.run()
    
    File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:41, in FeatureExtractor.__init__(self, dataset, feature_set, extraction_params, n_jobs)
         39 self.dataset = dataset
         40 self.feature_set = feature_set
    ---> 41 self.extraction_params = self._get_extraction_param_path(
         42     extraction_params
         43 )
         44 log.info(f"Using extraction params from {self.extraction_params}")
         45 self.n_jobs = set_n_jobs(n_jobs)
    
    File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:55, in FeatureExtractor._get_extraction_param_path(self, extraction_params)
         53     result = default_extraction_param_dir / extraction_params
         54 else:
    ---> 55     raise ValueError(
         56         f"Extraction parameter file {extraction_params} not found."
         57     )
         58 return result
    
    ValueError: Extraction parameter file default_MR.yaml not found.
    

    Fix

    1: change from autorad.data.utils to from autorad.utils.preprocessing 2: change extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") to extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")

    opened by wagon-master 1
  • Bugfix/refactor

    Bugfix/refactor

    New features:

    • log feature dataset and splits in MLFlow
    • update docs & add getting-started

    Fixes:

    • fix evaluation in the web app
    • fix docs build in readthedocss
    opened by pwoznicki 0
  • Support various readers (Nibabel, ITK)

    Support various readers (Nibabel, ITK)

    Currently we use Nibabel for loading images. It works only for Nifti images, but a user may want to load a DICOM image, without converting it to Nifti.

    Consider using MONAI LoadImage() function that provides a common interface for loading both Nifti and DICOM images.

    enhancement 
    opened by pwoznicki 0
Releases(v0.2.2)
  • v0.2.2(Jul 30, 2022)

Owner
Piotr Woźnicki
Recently graduated medical doctor, working on medical image analysis.
Piotr Woźnicki
Learning with Noisy Labels via Sparse Regularization, ICCV2021

Learning with Noisy Labels via Sparse Regularization This repository is the official implementation of [Learning with Noisy Labels via Sparse Regulari

Xiong Zhou 38 Oct 20, 2022
Code for testing various M1 Chip benchmarks with TensorFlow.

M1, M1 Pro, M1 Max Machine Learning Speed Test Comparison This repo contains some sample code to benchmark the new M1 MacBooks (M1 Pro and M1 Max) aga

Daniel Bourke 348 Jan 04, 2023
Pywonderland - A tour in the wonderland of math with python.

A Tour in the Wonderland of Math with Python A collection of python scripts for drawing beautiful figures and animating interesting algorithms in math

Zhao Liang 4.1k Jan 03, 2023
Pytorch Implementation of LNSNet for Superpixel Segmentation

LNSNet Overview Official implementation of Learning the Superpixel in a Non-iterative and Lifelong Manner (CVPR'21) Learning Strategy The proposed LNS

42 Oct 11, 2022
CTC segmentation python package

CTC segmentation CTC segmentation can be used to find utterances alignments within large audio files. This repository contains the ctc-segmentation py

Ludwig Kürzinger 217 Jan 04, 2023
Neural Network to colorize grayscale images

#colornet Neural Network to colorize grayscale images Results Grayscale Prediction Ground Truth Eiji K used colornet for anime colorization Sources Au

Pavel Hanchar 3.6k Dec 24, 2022
An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"

Optex An implementation of Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport for TU Delft CS4240. You c

Hans Brouwer 33 Jan 05, 2023
A basic implementation of Layer-wise Relevance Propagation (LRP) in PyTorch.

Layer-wise Relevance Propagation (LRP) in PyTorch Basic unsupervised implementation of Layer-wise Relevance Propagation (Bach et al., Montavon et al.)

Kai Fabi 28 Dec 26, 2022
Re-implement CycleGAN in Tensorlayer

CycleGAN_Tensorlayer Re-implement CycleGAN in TensorLayer Original CycleGAN Improved CycleGAN with resize-convolution Prerequisites: TensorLayer Tenso

89 Aug 15, 2022
Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR, 2019)

Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR 2019) To make better use of given limited labels, we propo

126 Sep 13, 2022
PyTorch implementation of Progressive Growing of GANs for Improved Quality, Stability, and Variation.

PyTorch implementation of Progressive Growing of GANs for Improved Quality, Stability, and Variation. Warning: the master branch might collapse. To ob

559 Dec 14, 2022
MAterial del programa Misión TIC 2022

Mision TIC 2022 Esta iniciativa, aparece como respuesta frente a los retos de la Cuarta Revolución Industrial, y tiene como objetivo la formación de 1

6 May 25, 2022
Language Models Can See: Plugging Visual Controls in Text Generation

Language Models Can See: Plugging Visual Controls in Text Generation Authors: Yixuan Su, Tian Lan, Yahui Liu, Fangyu Liu, Dani Yogatama, Yan Wang, Lin

Yixuan Su 195 Dec 22, 2022
Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation.

PersonLab This is a Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation. The model predicts heatmaps and vari

OCTI 160 Dec 21, 2022
DGL-TreeSearch and the Gurobi-MWIS interface

Independent Set Benchmarking Suite This repository contains the code for our maximum independent set benchmarking suite as well as our implementations

Maximilian Böther 19 Nov 22, 2022
Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

FFD Source Code Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face M

88 Nov 22, 2022
Official implementation for NIPS'17 paper: PredRNN: Recurrent Neural Networks for Predictive Learning Using Spatiotemporal LSTMs.

PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning The predictive learning of spatiotemporal sequences aims to generate future

THUML: Machine Learning Group @ THSS 243 Dec 26, 2022
Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

pair-emnlp2020 Official repository for the paper: Xinyu Hua and Lu Wang: PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long

Xinyu Hua 31 Oct 13, 2022
Robust & Reliable Route Recommendation on Road Networks

NeuroMLR: Robust & Reliable Route Recommendation on Road Networks This repository is the official implementation of NeuroMLR: Robust & Reliable Route

4 Dec 20, 2022
High accurate tool for automatic faces detection with landmarks

faces_detanator High accurate tool for automatic faces detection with landmarks. The library is based on public detectors with high accuracy (TinaFace

Ihar 7 May 10, 2022