HistoKT: Cross Knowledge Transfer in Computational Pathology

Related tags

Deep LearningHistoKT
Overview

HistoKT: Cross Knowledge Transfer in Computational Pathology

Exciting News! HistoKT has been accepted to ICASSP 2022.

HistoKT: Cross Knowledge Transfer in Computational Pathology,
Ryan Zhang, Jiadai Zhu, Stephen Yang, Mahdi S. Hosseini, Angelo Genovese, Lina Chen, Corwyn Rowsell, Savvas Damaskinos, Sonal Varma, Konstantinos N. Plataniotis
Accepted in 2022 IEEE International Conference on Acourstics, Speech, and Signal Processing (ICASSP2022)

Overview

In computational pathology, the lack of well-annotated datasets obstructs the application of deep learning techniques. Since pathologist time is expensive, dataset curation is intrinsically difficult. Thus, many CPath workflows involve transferring learned knowledge between various image domains through transfer learning. Currently, most transfer learning research follows a model-centric approach, tuning network parameters to improve transfer results over few datasets. In this paper, we take a data-centric approach to the transfer learning problem and examine the existence of generalizable knowledge between histopathological datasets. First, we create a standardization workflow for aggregating existing histopathological data. We then measure inter-domain knowledge by training ResNet18 models across multiple histopathological datasets, and cross-transferring between them to determine the quantity and quality of innate shared knowledge. Additionally, we use weight distillation to share knowledge between models without additional training. We find that hard to learn, multi-class datasets benefit most from pretraining, and a two stage learning framework incorporating a large source domain such as ImageNet allows for better utilization of smaller datasets. Furthermore, we find that weight distillation enables models trained on purely histopathological features to outperform models using external natural image data.

Results

We report our transfer learning using ResNet18 results accross various datasets, with two initialization methods (random and ImageNet initialization). Each item in the matrix represents the Top-1 test accuracy of a ResNet18 model trained on the source dataset and deep-tuned on the target dataset. Items are highlighted in a colour gradient from deep red to deep green, where green represents significant accuracy improvement after tuning, and red represents accuracy decline after tuning.

No Pretraining

ImageNet Initialization

Table of Contents

Getting Started

Dependencies

  • Requirements are specified in requirements.txt
argon2-cffi==20.1.0
async-generator==1.10
attrs==21.2.0
backcall==0.2.0
bleach==3.3.0
cffi==1.14.5
colorama==0.4.4
cycler==0.10.0
decorator==4.4.2
defusedxml==0.7.1
entrypoints==0.3
et-xmlfile==1.1.0
h5py==3.2.1
imageio==2.9.0
ipykernel==5.5.4
ipython==7.23.1
ipython-genutils==0.2.0
ipywidgets==7.6.3
jedi==0.18.0
Jinja2==3.0.0
joblib==1.0.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.12
jupyter-console==6.4.0
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
kiwisolver==1.3.1
MarkupSafe==2.0.0
matplotlib==3.4.2
matplotlib-inline==0.1.2
mistune==0.8.4
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.3
nest-asyncio==1.5.1
networkx==2.5.1
notebook==6.3.0
numpy==1.20.3
openpyxl==3.0.7
packaging==20.9
pandas==1.2.4
pandocfilters==1.4.3
parso==0.8.2
pickleshare==0.7.5
Pillow==8.2.0
prometheus-client==0.10.1
prompt-toolkit==3.0.18
pyaml==20.4.0
pycparser==2.20
Pygments==2.9.0
pyparsing==2.4.7
pyrsistent==0.17.3
python-dateutil==2.8.1
pytz==2021.1
PyWavelets==1.1.1
pywin32==300
pywinpty==0.5.7
PyYAML==5.4.1
pyzmq==22.0.3
qtconsole==5.1.0
QtPy==1.9.0
scikit-image==0.18.1
scikit-learn==0.24.2
scipy==1.6.3
Send2Trash==1.5.0
six==1.16.0
sklearn==0.0
terminado==0.9.5
testpath==0.4.4
threadpoolctl==2.1.0
tifffile==2021.4.8
torch==1.8.1+cu102
torchaudio==0.8.1
torchvision==0.9.1+cu102
tornado==6.1
traitlets==5.0.5
typing-extensions==3.10.0.0
wcwidth==0.2.5
webencodings==0.5.1
widgetsnbextension==3.5.1

Running the Code

This codebase was created in collaboration with the RMSGD repository. As such, much of the training pipeline is shared.

Downloading datasets

All available datasets can be found on their respective websites. Some datasets, such as ADP, are available by request.

A list of all datasets used in this paper can be found below:

Preprocessing and Training

To prepare datasets for training, please use the functions found in dataset_processing\standardize_datasets.py after downloading all the datasets and placing them all in one folder.

cd HistoKT/dataset_processing
python standardize_datasets.py

A standardized version of each dataset will be created in the dataset folder.

To run the code for training, use the src/adas/train.py file:

cd HistoKT
python src/adas/train.py --config CONFIG --data DATA_FOLDER

Options for Training

--config CONFIG       Set configuration file path: Default = 'configAdas.yaml'
--data DATA           Set data directory path: Default = '.adas-data'
--output OUTPUT       Set output directory path: Default = '.adas-output'
--checkpoint CHECKPOINT
                    Set checkpoint directory path: Default = '.adas-checkpoint'
--resume RESUME       Set checkpoint resume path: Default = None
--pretrained_model PRETRAINED_MODEL
                    Set checkpoint pretrained model path: Default = None
--freeze_encoder FREEZE_ENCODER
                    Set if to freeze encoder for post training: Default = True
--root ROOT           Set root path of project that parents all others: Default = '.'
--save-freq SAVE_FREQ
                    Checkpoint epoch save frequency: Default = 25
--cpu                 Flag: CPU bound training: Default = False
--gpu GPU             GPU id to use: Default = 0
--multiprocessing-distributed
                    Use multi-processing distributed training to launch N processes per node, which has N GPUs. This is the fastest way to use PyTorch for either   
                    single node or multi node data parallel training: Default = False
--dist-url DIST_URL   url used to set up distributed training:Default = 'tcp://127.0.0.1:23456'
--dist-backend DIST_BACKEND
                    distributed backend: Default = 'nccl'
--world-size WORLD_SIZE
                    Number of nodes for distributed training: Default = -1
--rank RANK           Node rank for distributed training: Default = -1
--color_aug COLOR_AUG
                    override config color augmentation, can also choose "no_aug"
--norm_vals NORM_VALS
                    override normalization values, use dataset string. e.g. "BACH_transformed"

Training Output

All training output will be saved to the OUTPUT_PATH location. After a full experiment, results will be recorded in the following format:

  • OUTPUT
    • Timestamped xlsx sheet with the record of train and validation (notated as test) acc, loss, and rank metrics for each layer in the network (refer to AdaS)
  • CHECKPOINT
    • checkpoint dictionaries with a snapshot of the model's parameters at a given epoch.

Code Organization

Configs

We provide sample configuration files for ResNet18 over all used datasets in configs\NewPretrainingConfigs

These configs were used for training the model on each dataset from random initialization.

All available options can be found in the config files.

Visualization

We provide sample code to plot training curves in Plots

We provide sample code on using the statistical method t-SNE to visualize the high-dimensional features in T-sne.

We provide sample code on using the visual explanation algorithm Grad-CAM heat-maps in gradCAM.

Version History

  • 0.1
    • Initial Release
Owner
Mahdi S. Hosseini
Assistant Professor in ECE Department at University of New Brunswick. My research interests cover broad topics in Machine Learning and Computer Vision problems
Mahdi S. Hosseini
An image processing project uses Viola-jones technique to detect faces and then use SIFT algorithm for recognition.

Attendance_System An image processing project uses Viola-jones technique to detect faces and then use LPB algorithm for recognition. Face Detection Us

8 Jan 11, 2022
A library for optimization on Riemannian manifolds

TensorFlow RiemOpt A library for manifold-constrained optimization in TensorFlow. Installation To install the latest development version from GitHub:

Oleg Smirnov 83 Dec 27, 2022
python debugger and anti-vm that checks if you're in a virtual machine or if someones trying to debug your file

Anti-Debug was made by Love ❌ code ✅ 🎉 ・What it checks for ・ Kills tools that can be used to debug your file ・ Exits if ran in vm (supports different

Rdimo 31 Aug 09, 2022
Pyeventbus: a publish/subscribe event bus

pyeventbus pyeventbus is a publish/subscribe event bus for Python 2.7. simplifies the communication between python classes decouples event senders and

15 Apr 21, 2022
Voice of Pajlada with model and weights.

Pajlada TTS Stripped down version of ForwardTacotron (https://github.com/as-ideas/ForwardTacotron) with pretrained weights for Pajlada's (https://gith

6 Sep 03, 2021
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention

AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention. AdaNet buil

3.4k Jan 07, 2023
Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

STARS Laboratory 8 Sep 14, 2022
Predict the latency time of the deep learning models

Deep Neural Network Prediction Step 1. Genernate random parameters and Run them sequentially : $ python3 collect_data.py -gp -ep -pp -pl pooling -num

QAQ 1 Nov 12, 2021
Just Go with the Flow: Self-Supervised Scene Flow Estimation

Just Go with the Flow: Self-Supervised Scene Flow Estimation Code release for the paper Just Go with the Flow: Self-Supervised Scene Flow Estimation,

Himangi Mittal 50 Nov 22, 2022
Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Datset)

Graphlevel-SSL Overview Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Dataset). It is unified framework to co

JunSeok 8 Oct 15, 2021
Implementation of Research Paper "Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation"

Zero-DCE and Zero-DCE++(Lite architechture for Mobile and edge Devices) Papers Abstract The paper presents a novel method, Zero-Reference Deep Curve E

Tauhid Khan 15 Dec 10, 2022
High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Anakin2.0 Welcome to the Anakin GitHub. Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineer

514 Dec 28, 2022
Everything you want about DP-Based Federated Learning, including Papers and Code. (Mechanism: Laplace or Gaussian, Dataset: femnist, shakespeare, mnist, cifar-10 and fashion-mnist. )

Differential Privacy (DP) Based Federated Learning (FL) Everything about DP-based FL you need is here. (所有你需要的DP-based FL的信息都在这里) Code Tip: the code o

wenzhu 83 Dec 24, 2022
Code for Towards Streaming Perception (ECCV 2020) :car:

sAP — Code for Towards Streaming Perception ECCV Best Paper Honorable Mention Award Feb 2021: Announcing the Streaming Perception Challenge (CVPR 2021

Martin Li 85 Dec 22, 2022
List of all dependencies affected by node-ipc malicious commit

node-ipc-dependencies-list List of all dependencies affected by node-ipc malicious commit as of 17/3/2022 - 19/3/2022 (timestamp) Please improve upon

99 Oct 15, 2022
A deep learning CNN model to identify and classify and check if a person is wearing a mask or not.

Face Mask Detection The Model is designed to check if any human is wearing a mask or not. Dataset Description The Dataset contains a total of 11,792 i

1 Mar 01, 2022
Brain tumor detection using CNN (InceptionResNetV2 Model)

Brain-Tumor-Detection Building a detection model using a convolutional neural network in Tensorflow & Keras. Used brain MRI images. InceptionResNetV2

1 Feb 13, 2022
This is the offical website for paper ''Category-consistent deep network learning for accurate vehicle logo recognition''

The Pytorch Implementation of Category-consistent deep network learning for accurate vehicle logo recognition This is the offical website for paper ''

Wanglong Lu 28 Oct 29, 2022
Code for the paper BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Residual Convolutional Neural Networks

Biomedical Entity Linking This repo provides the code for the paper BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Res

Tuan Manh Lai 24 Oct 24, 2022