Tools for computational pathology

Overview

tests Documentation Status Code style: black PyPI version Downloads codecov

A toolkit for computational pathology and machine learning.

View documentation

Please cite our paper

Installation

There are several ways to install PathML:

  1. pip install (recommended for users)
  2. clone repo to local machine and install from source (recommended for developers/contributors)

Options (1) and (2) require that you first install all external dependencies:

  • openslide
  • JDK 8

We recommend using conda for environment management. Download Miniconda here

Note: these instructions are for Linux. Commands may be different for other platforms.

Installation option 1: pip install

Create conda environment

conda create --name pathml python=3.8
conda activate pathml

Install external dependencies (Linux) with Apt

sudo apt-get install openslide-tools g++ gcc libblas-dev liblapack-dev

Install external dependencies (MacOS) with Brew

brew install openslide

Install OpenJDK 8

conda install openjdk==8.0.152

Optionally install CUDA (instructions here)

Install PathML

pip install pathml

Installation option 2: clone repo and install from source

Clone repo

git clone https://github.com/Dana-Farber-AIOS/pathml.git
cd pathml

Create conda environment

conda env create -f environment.yml
conda activate pathml

Optionally install CUDA (instructions here)

Install PathML:

pip install -e .

CUDA

To use GPU acceleration for model training or other tasks, you must install CUDA. This guide should work, but for the most up-to-date instructions, refer to the official PyTorch installation instructions.

Check the version of CUDA:

nvidia-smi

Install correct version of cudatoolkit:

# update this command with your CUDA version number
conda install cudatoolkit=11.0

After installing PyTorch, optionally verify successful PyTorch installation with CUDA support:

python -c "import torch; print(torch.cuda.is_available())"

Using with Jupyter

Jupyter notebooks are a convenient way to work interactively. To use PathML in Jupyter notebooks:

Set JAVA_HOME environment variable

PathML relies on Java to enable support for reading a wide range of file formats. Before using PathML in Jupyter, you may need to manually set the JAVA_HOME environment variable specifying the path to Java. To do so:

  1. Get the path to Java by running echo $JAVA_HOME in the terminal in your pathml conda environment (outside of Jupyter)
  2. Set that path as the JAVA_HOME environment variable in Jupyter:
    import os
    os.environ["JAVA_HOME"] = "/opt/conda/envs/pathml" # change path as needed
    

Register PathML as an IPython kernel

conda activate pathml
conda install ipykernel
python -m ipykernel install --user --name=pathml

This makes PathML available as a kernel in jupyter lab or notebook.

Contributing

PathML is an open source project. Consider contributing to benefit the entire community!

There are many ways to contribute to PathML, including:

  • Submitting bug reports
  • Submitting feature requests
  • Writing documentation and examples
  • Fixing bugs
  • Writing code for new features
  • Sharing workflows
  • Sharing trained model parameters
  • Sharing PathML with colleagues, students, etc.

See contributing for more details.

License

The GNU GPL v2 version of PathML is made available via Open Source licensing. The user is free to use, modify, and distribute under the terms of the GNU General Public License version 2.

Commercial license options are available also.

Contact

Questions? Comments? Suggestions? Get in touch!

[email protected]

Comments
  • Improve performance

    Improve performance

    Currently, writing to h5 is the primary performance bottleneck when running a pipeline (see profile here).

    Perhaps by refactoring our h5 integration, we can boost performance. For example, maybe we should store tiles in separate groups instead of in one big array. This would potentially let us write in parallel and also make it trivial to support overlapping tiles (#223).

    Some work on this was being tracked in #200 but I am creating this issue so that we can discuss here instead of on the pull request

    enhancement 
    opened by jacob-rosenthal 16
  • Warnings associated with circulating a keras model among dask workers

    Warnings associated with circulating a keras model among dask workers

    We are getting a set of warnings (which I think is contributing to a subsequent error https://github.com/Dana-Farber-AIOS/pathml/issues/164#issuecomment-953384867 and the warnings https://github.com/Dana-Farber-AIOS/pathml/issues/211#issue-1038691185) is around the loading of a saved keras checkpoint file.

    Here is the warning we get, which we get when we run the SegmentMIF function:

    WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.

    We believe that the keras saved model is being recycled dirtily to dask workers (existing locks not released etc.), causing the warnings in https://github.com/Dana-Farber-AIOS/pathml/issues/211#issue-1038691185 and eventually, the error in https://github.com/Dana-Farber-AIOS/pathml/issues/164#issuecomment-953384867.

    To Reproduce Here is our pipeline. I cannot share the data for regulatory reasons.

    pipeline = Pipeline([
        CollapseRunsVectra(),    
        SegmentMIF(model='mesmer', nuclear_channel=0, cytoplasm_channel=2, image_resolution=0.5, 
                   gpu=False, postprocess_kwargs_whole_cell=None, 
                   postprocess_kwrags_nuclear=None),
        QuantifyMIF('nuclear_segmentation')   
    ])
    
    bug 
    opened by surya-narayanan 13
  • Docker ci

    Docker ci

    Add a Dockerfile which builds a working environment for pathml and starts up a jupyterlab instance in the container, which users can connect to and get up and running quickly. Also add a github actions workflow to build the image and publish it to dockerhub whenever we create a new release

    This will close #145

    opened by jacob-rosenthal 11
  • Unable to open tile object (object 'array' doesn't exist)

    Unable to open tile object (object 'array' doesn't exist)

    Describe the bug Unable to access tile array from TileDataset.__getitem__() KeyError: "Unable to open object (object 'array' doesn't exist)"

    To Reproduce Traceback:

    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    /tmp/ipykernel_4463/806735975.py in <module>
    ----> 1 tile_dataset.__getitem__(0)
    
    ~/pathml/pathml/ml/dataset.py in __getitem__(self, ix)
         54         ### this part copied from h5manager.get_tile()
         55         tile_image = self.h5["tiles"][str(k)]["array"][:]
    ---> 56 
         57         # get corresponding masks if there are masks
         58         if "masks" in self.h5["tiles"][str(k)].keys():
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    /opt/conda/envs/wtf/lib/python3.8/site-packages/h5py/_hl/group.py in __getitem__(self, name)
        286                 raise ValueError("Invalid HDF5 object reference")
        287         else:
    --> 288             oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
        289 
        290         otype = h5i.get_type(oid)
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    h5py/h5o.pyx in h5py.h5o.open()
    
    KeyError: "Unable to open object (object 'array' doesn't exist)"
    

    Expected behavior Should be able to access the tile array object. I created the h5 file with the following code:

    slidename = < Path to Slide >
    slide = SlideData(slide_name, backend = "bioformats", slide_type = types.Vectra)
    slide.write(f'/parent_directory/{slide.name}.h5')
    
    bug 
    opened by surya-narayanan 11
  • Adding to h5 file

    Adding to h5 file

    Is it possible to re-run a slide with a different pipeline and add to the h5 file, without re-doing tiling? Happy to provide an example, if that would be helpful.

    opened by surya-narayanan 9
  • Resolving dependencies between PathML and Deepcell

    Resolving dependencies between PathML and Deepcell

    Describe the bug When we run pipelines for multiparametric images we often want to include models from deepcell https://github.com/vanvalenlab/deepcell-tf (especially for the SegmentMIF transform). It is difficult for users to solve the environment since installing deepcell downgrades packages like numpy to incompatible versions. This has caused installation problems for @MohamedOmar2020 and other internal users

    To Reproduce

    pipe = Pipeline(
        [
            CollapseRunsVectra(),
            SegmentMIF(
                model="mesmer",
                nuclear_channel=0,
                cytoplasm_channel=7,
                image_resolution=0.5,
            ),
            QuantifyMIF(segmentation_mask="cell_segmentation"),
        ]
    )
    dataset.run(pipe)
    

    Expected behavior We would expect this to run but following the default installation instructions (option 1 from pip) followed by pip install deepcell results in a series of numpy errors when we attempt to run the pipeline

    Working Solution These dependency problems are resolved (at least to the extent that the above pipeline can run) by upgrading numpy after deepcell installation as follows

    conda create --name pathml python=3.8
    conda activate pathml
    sudo apt-get install openslide-tools g++ gcc libblas-dev liblapack-dev
    conda install openjdk==8.0.152
    pip install pathml
    pip install deepcell
    pip install --upgrade numpy
    

    The question is: should we include this in our installation instructions for users who want to use multiparametric pipelines? Should we create a docker container for multiparametric pipelines? Should we remove our dependency on deepcell and try to wrap the model more directly in PathML (or train our own)?

    enhancement 
    opened by ryanccarelli 8
  • Weird segmentation results

    Weird segmentation results

    Hello, I have a problem with the segmentation resulting from the mesmer model. It looks like the model is not identifying cells properly since many cells are too large with too many nuclei. This is the code used to process the image:

    pipe = Pipeline([ CollapseRunsVectra(), SegmentMIF(model='mesmer', nuclear_channel=0, cytoplasm_channel=7, image_resolution=0.5), QuantifyMIF(segmentation_mask='cell_segmentation') ])

    slidedata.run(pipe, distributed = False, tile_size= (12784, 13234), tile_pad=False, overwrite_existing_tiles=True)

    img = slidedata.tiles[3].image[10000:10500,12000:12500, :] nuc_mask = slidedata.tiles[3].masks['nuclear_segmentation'][10000:10500,12000:12500, :] cell_mask = slidedata.tiles[3].masks['cell_segmentation'][10000:10500,12000:12500, :]

    img_fiji = np.expand_dims(img, axis=0) nuc_cytoplasm = np.stack((img_fiji[:,:,:,0], img_fiji[:,:,:,7]), axis=-1) rgb_image = create_rgb_image(nuc_cytoplasm, channel_colors=['blue', 'green']) cell_segmentation_predictions = np.expand_dims(cell_mask, axis=0) overlay_cell = make_outline_overlay(rgb_data=rgb_image, predictions=cell_segmentation_predictions)

    That is how it looks like when I overlay the segmentation on the original image in fiji: OverlaySeg1

    I loaded a small part of the original image in fiji and adjusted the brightness/contrast then used the mesmer model for segmentation (using deepcell directly not pathml) and the segmentation seems good. this is how it looks like: Screenshot 2021-07-19 at 1 39 19 PM

    Is it right to assume that the bad segmentation shown in the first image has something to do with the brightness/contrast of the raw image? Any ideas how to fix this?

    Thanks in advance

    opened by MohamedOmar2020 8
  • indices should be either on cpu or on the same device as the indexed tensor (cpu)

    indices should be either on cpu or on the same device as the indexed tensor (cpu)

    Describe the bug RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (CPU)

    To Reproduce

    n_classes_pannuke = 6

    load the model

    hovernet = HoVerNet(n_classes=n_classes_pannuke)

    wrap model to use multi-GPU

    hovernet = torch.nn.DataParallel(hovernet)

    set up optimizer

    opt = torch.optim.Adam(hovernet.parameters(), lr = 1e-4)

    learning rate scheduler to reduce LR by factor of 10 each 25 epochs

    scheduler = StepLR(opt, step_size=25, gamma=0.1)

    send model to GPU

    hovernet.to(device);

    n_epochs = 50

    print performance metrics every n epochs

    print_every_n_epochs = None

    evaluating performance on a random subset of validation mini-batches

    this saves time instead of evaluating on the entire validation set

    n_minibatch_valid = 50

    epoch_train_losses = {} epoch_valid_losses = {} epoch_train_dice = {} epoch_valid_dice = {}

    best_epoch = 0

    main training loop

    for i in tqdm(range(n_epochs)): minibatch_train_losses = [] minibatch_train_dice = []

    ### put model in training mode
    hovernet.train()
    
    for data in train_dataloader:
        ### send the data to the GPU
        images = data[0].float().to(device)
        masks = data[1].to(device)
        hv = data[2].float().to(device)
        tissue_type = data[3]
    
        ### zero out gradient
        opt.zero_grad()
    
        ### forward pass
        outputs = hovernet(images)
    
        ### compute loss
        loss = loss_hovernet(outputs = outputs, ground_truth = [masks, hv], n_classes=6)
    
        ### track loss
        minibatch_train_losses.append(loss.item())
    
        ### also track dice score to measure performance
        preds_detection, preds_classification = post_process_batch_hovernet(outputs, n_classes=n_classes_pannuke)
        truth_binary = masks[:, -1, :, :] == 0
        dice = dice_score(preds_detection, truth_binary.cpu().numpy())
        minibatch_train_dice.append(dice)
    
        ### compute gradients
        loss.backward()
    
        ### step optimizer and scheduler
        opt.step()
    
    ### step LR scheduler
    scheduler.step()
    
    ### evaluate on random subset of validation data
    hovernet.eval()
    minibatch_valid_losses = []
    minibatch_valid_dice = []
    ### randomly choose minibatches for evaluating
    minibatch_ix = np.random.choice(range(len(valid_dataloader)), replace=False, size=n_minibatch_valid)
    with torch.no_grad():
        for j, data in enumerate(valid_dataloader):
            if j in minibatch_ix:
                # send the data to the GPU
                images = data[0].float().to(device)
                masks = data[1].to(device)
                hv = data[2].float().to(device)
                tissue_type = data[3]
    
                # forward pass
                outputs = hovernet(images)
    
                # compute loss
                loss = loss_hovernet(outputs = outputs, ground_truth = [masks, hv], n_classes=6)
    
                # track loss
                minibatch_valid_losses.append(loss.item())
    
                # also track dice score to measure performance
                preds_detection, preds_classification = post_process_batch_hovernet(outputs, n_classes=n_classes_pannuke)
                truth_binary = masks[:, -1, :, :] == 0
                dice = dice_score(preds_detection, truth_binary.cpu().numpy())
                minibatch_valid_dice.append(dice)
    
    ### average performance metrics over minibatches
    mean_train_loss = np.mean(minibatch_train_losses)
    mean_valid_loss = np.mean(minibatch_valid_losses)
    mean_train_dice = np.mean(minibatch_train_dice)
    mean_valid_dice = np.mean(minibatch_valid_dice)
    
    ### save the model with best performance
    if i != 0:
        if mean_valid_loss < min(epoch_valid_losses.values()):
            best_epoch = i
            torch.save(hovernet.state_dict(), f"hovernet_best_perf.pt")
    
    ### track performance over training epochs
    epoch_train_losses.update({i : mean_train_loss})
    epoch_valid_losses.update({i : mean_valid_loss})
    epoch_train_dice.update({i : mean_train_dice})
    epoch_valid_dice.update({i : mean_valid_dice})
    
    if print_every_n_epochs is not None:
        if i % print_every_n_epochs == print_every_n_epochs - 1:
            print(f"Epoch {i+1}/{n_epochs}:")
            print(f"\ttraining loss: {np.round(mean_train_loss, 4)}\tvalidation loss: {np.round(mean_valid_loss, 4)}")
            print(f"\ttraining dice: {np.round(mean_train_dice, 4)}\tvalidation dice: {np.round(mean_valid_dice, 4)}")
    

    save fully trained model

    torch.save(hovernet.state_dict(), f"hovernet_fully_trained.pt") print(f"\nEpoch with best validation performance: {best_epoch}")

    Expected behavior Should start model training

    Screenshots image

    Additional context Anyone else also have this problem. I run this on HPC with 4 GPUs, each having 16G memory.

    bug 
    opened by luzy05111036 7
  • Issue with distributed processing

    Issue with distributed processing

    Hello, Thank you for fixing the distributed issue with the mesmer model. I am running the pipeline with 'distributed = True' flag but I am getting many warnings and errors. Additionally, the pipeline was supposed to return 145 tiles but it is returning only 3 !. This is a part of the log message:

    def watershed(image, markers=None, connectivity=1, offset=None, mask=None, /Users/mohamedomar/.local/lib/python3.8/site-packages/skimage/morphology/_deprecated.py:5: skimage_deprecation: Function watershed is deprecated and will be removed in version 0.19. Use skimage.segmentation.watershed instead. def watershed(image, markers=None, connectivity=1, offset=None, mask=None, /Users/mohamedomar/opt/anaconda3/envs/pathml2/lib/python3.8/site-packages/anndata/_core/anndata.py:120: ImplicitModificationWarning: Transforming to str index. warnings.warn("Transforming to str index.", ImplicitModificationWarning) /Users/mohamedomar/.local/lib/python3.8/site-packages/skimage/morphology/_deprecated.py:5: skimage_deprecation: Function watershed is deprecated and will be removed in version 0.19. Use skimage.segmentation.watershed instead. def watershed(image, markers=None, connectivity=1, offset=None, mask=None, /Users/mohamedomar/opt/anaconda3/envs/pathml2/lib/python3.8/site-packages/anndata/io/h5ad.py:64: FutureWarning: The force_dense argument is deprecated. Use as_dense instead. warn( /Users/mohamedomar/opt/anaconda3/envs/pathml2/lib/python3.8/site-packages/anndata/core/anndata.py:120: ImplicitModificationWarning: Transforming to str index. warnings.warn("Transforming to str index.", ImplicitModificationWarning) storing 'coords' as categorical storing 'slice' as categorical storing 'tile' as categorical **> 2021-08-10 00:00:05.176962: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at save_restore_v2_ops.cc:205 : Out of range: Read less bytes than requested distributed.worker - WARNING - Compute Failed Function: apply args: (Tile(coords=(1598, 6616), name=None, image shape: (1598, 1654, 2), slide_type=SlideType(stain=Fluor, platform=Vectra, tma=None, rgb=None, volumetric=None, time_series=None), labels=None, masks=None, counts=None)) kwargs: {} Exception: OutOfRangeError()**

    That last error (bold text) is repeated many times.

    Thanks in advance

    bug 
    opened by MohamedOmar2020 7
  • Error installing owing to cached version of torch

    Error installing owing to cached version of torch

    If one tries to install pathml after a previously failed installation attempt, one runs into the following error, which I think is due to using cached files. One suggested solution (for just torch) is to do pip --no-cache-dir install torchvision, but i dont know if this is going to solve the issue and how to integrate this into intalling pathml as a whole, without installing each dependency one by one.

    (pathml) [email protected]:~$ pip install pathml
    Collecting pathml
      Using cached pathml-2.0.4-py3-none-any.whl (83 kB)
    Collecting scipy
      Using cached scipy-1.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (41.6 MB)
    Collecting python-bioformats>=4.0.0
      Using cached python_bioformats-4.0.5-py3-none-any.whl (41.4 MB)
    Collecting scikit-image
      Using cached scikit_image-0.19.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.8 MB)
    Collecting scikit-learn
      Using cached scikit_learn-1.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.7 MB)
    Requirement already satisfied: pip in /opt/conda/envs/pathml/lib/python3.8/site-packages (from pathml) (22.0.3)
    Collecting openslide-python
      Using cached openslide-python-1.1.2.tar.gz (316 kB)
      Preparing metadata (setup.py) ... done
    Collecting dask[distributed]
      Using cached dask-2022.1.1-py3-none-any.whl (1.1 MB)
    Collecting pandas
      Using cached pandas-1.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB)
    Collecting matplotlib
      Using cached matplotlib-3.5.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
    Collecting anndata>=0.7.6
      Using cached anndata-0.7.8-py3-none-any.whl (91 kB)
    Requirement already satisfied: numpy>=1.16.4 in /opt/conda/envs/pathml/lib/python3.8/site-packages (from pathml) (1.22.2)
    Collecting h5py
      Using cached h5py-3.6.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.5 MB)
    Collecting opencv-contrib-python
      Using cached opencv_contrib_python-4.5.5.62-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (66.6 MB)
    Collecting pydicom
      Using cached pydicom-2.2.2-py3-none-any.whl (2.0 MB)
    Collecting torch
    SystemError: deallocated bytearray object has exported buffers
    ERROR: Exception:
    Traceback (most recent call last):
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 167, in exc_logging_wrapper
        status = run_func(*args)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 205, in wrapper
        return func(self, options, args)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 339, in run
        requirement_set = resolver.resolve(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 94, in resolve
        result = self._result = resolver.resolve(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 481, in resolve
        state = resolution.resolve(requirements, max_rounds=max_rounds)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 373, in resolve
        failure_causes = self._attempt_to_pin_criterion(name)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 213, in _attempt_to_pin_criterion
        criteria = self._get_updated_criteria(candidate)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 204, in _get_updated_criteria
        self._add_to_criteria(criteria, requirement, parent=candidate)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 172, in _add_to_criteria
        if not criterion.candidates:
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__
        return bool(self._sequence)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
        return any(self)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
        return (c for c in iterator if id(c) not in self._incompatible_ids)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
        candidate = func()
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 215, in _make_candidate_from_link
        self._link_candidate_cache[link] = LinkCandidate(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 288, in __init__
        super().__init__(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 158, in __init__
        self.dist = self._prepare()
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 227, in _prepare
        dist = self._prepare_distribution()
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 299, in _prepare_distribution
        return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 487, in prepare_linked_requirement
        return self._prepare_linked_requirement(req, parallel_builds)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 532, in _prepare_linked_requirement
        local_file = unpack_url(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 214, in unpack_url
        file = get_http_url(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 94, in get_http_url
        from_path, content_type = download(link, temp_dir.path)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/network/download.py", line 133, in __call__
        resp = _http_get_download(self._session, link)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/network/download.py", line 116, in _http_get_download
        resp = session.get(target_url, headers=HEADERS, stream=True)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/requests/sessions.py", line 542, in get
        return self.request('GET', url, **kwargs)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/network/session.py", line 454, in request
        return super().request(method, url, *args, **kwargs)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/requests/sessions.py", line 529, in request
        resp = self.send(prep, **send_kwargs)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/requests/sessions.py", line 645, in send
        r = adapter.send(request, **kwargs)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/adapter.py", line 48, in send
        cached_response = self.controller.cached_request(request)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/controller.py", line 151, in cached_request
        resp = self.serializer.loads(request, cache_data)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/serialize.py", line 95, in loads
        return getattr(self, "_loads_v{}".format(ver))(request, data)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/serialize.py", line 182, in _loads_v4
        cached = msgpack.loads(data, raw=False)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 128, in unpackb
        ret = unpacker._unpack()
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 592, in _unpack
        ret[key] = self._unpack(EX_CONSTRUCT)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 592, in _unpack
        ret[key] = self._unpack(EX_CONSTRUCT)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 546, in _unpack
        typ, n, obj = self._read_header()
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 488, in _read_header
        obj = self._read(n)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 407, in _read
        ret = self._buffer[i : i + n]
    MemoryError
    
    
    bug 
    opened by surya-narayanan 6
  • Extracting tile returns multi dim output from H&E qptiff

    Extracting tile returns multi dim output from H&E qptiff

    I think this is specific to my scanner, when running the following line on my qptiff H&E svs file,

    region = wsi.slide.extract_region(location = (900, 800), size = (500, 500))

    I get a 5 dimensional object, which is incompatible for downstream analysis.

    Do you think it may be useful to run a np.squeeze right before returning the array in wsi.slide.extract_region?

    opened by surya-narayanan 6
  • How to implement HoVer-Net Model in TIAToolBox?

    How to implement HoVer-Net Model in TIAToolBox?

    opened by WilliWespe 0
  • How to train hovernet starting with semantic-level mask image?

    How to train hovernet starting with semantic-level mask image?

    Is your feature request related to a problem? Please describe. I have large WSI data and multi-class jpeg masks, but I am so tired to find a solution to make them work with any hovernet implementation.

    Describe the solution you'd like I'd like to be able to feed my large WSI data along with the jpeg masks, and tiling and training then took place.

    Describe alternatives you've considered If I still need instance masks, I can do that I do this in watershed. But still don't know what format PathML would want (e.g npy, mat, json, jpeg ..etc

    Additional context

    Any help is highly appreciated.

    Thanks!

    enhancement 
    opened by OmarAshkar 3
  • Deepcell segmentation without cytoplasm channel

    Deepcell segmentation without cytoplasm channel

    I note that Deepcell provides both a segmentation (nuclear channel only) and mesmer (nuclear and cytoplasm) model (https://www.deepcell.org/predict) Our datasets do not have a single general cytoplasm marker that will capture all cell type cytoplasm required by mesmer model, eg tumour cells vs immune vs stromal cells Can both nuclear_channel=DAPI, cytoplasm_channel=DAPI in mesmer model? or can SegmentMIF(model='segmentation' be supported? thanks!

    enhancement 
    opened by jamesMo84 1
  • Allow SlideData to use existing h5path files

    Allow SlideData to use existing h5path files

    As motivated by https://github.com/Dana-Farber-AIOS/pathml/issues/332 and https://github.com/Dana-Farber-AIOS/pathml/issues/300, this modifies SlideData to read and update Tiles from an existing h5path file instead of requiring each pipeline run to recreate all tiles from scratch.

    This includes #335 as many transforms (e.g. BoxBlur) require np.uint8 data instead of the default float16 saved to h5path files. I was also working off my load-data-in-workers branch because it had significant performance changes for my use cases. Sorry about the branching messiness, hopefully the changes will be clearer as other branches are merged into dev.

    This makes breaking changes to the SlideData API, namely replacing generate_tiles with get_tiles and moving the tile parameterization from run to the SlideData constructor.

    opened by tddough98 0
  • Load tiles in parallel on workers and add options to `TissueDetectionHE`

    Load tiles in parallel on workers and add options to `TissueDetectionHE`

    This contains two separate improvements

    • add drop_empty_tiles and keep_mask options to the TissueDetectionHE transform to bypass saving tiles with no detected H&E tissue and bypass saving masks
    • parallelize tile image loading by using dask.delayed to avoid loading images on the main thread

    The first part is both for convenience and performance. It's possible to generate all tiles and then filter out the empty tiles and remove masks before writing the h5path to disk, but that requires that all the tiles be added to the Tiles which takes IO time. If these tiles and masks are never saved even to in-memory objects, processing can finish faster.

    The second part is a core performance issue with distributed processing. I believe it's relevant to https://github.com/Dana-Farber-AIOS/pathml/issues/211 and https://github.com/Dana-Farber-AIOS/pathml/issues/299. When processing tiles, I've found that loading time >> processing time, and currently, tile image data is loaded on the main thread and scatters the loaded tile to workers. This prevents any parallelism as all but one worker are always waiting for the main thread to load data and send them a tile.

    Additionally, as all tiles have to be loaded on the main thread, the block that generates the futures

    for tile in self.generate_tiles(
        level=level,
        shape=tile_size,
        stride=tile_stride,
        pad=tile_pad,
        **kwargs,
    ):
        if not tile.slide_type:
            tile.slide_type = self.slide_type
        # explicitly scatter data, i.e. send the tile data out to the cluster before applying the pipeline
        # according to dask, this can reduce scheduler burden and keep data on workers
        big_future = client.scatter(tile)
        f = client.submit(pipeline.apply, big_future)
        processed_tile_futures.append(f)
    

    has to load all tiles and send them all to workers before ANY tile can be added to the Tiles and the memory can be freed in the next block

    # as tiles are processed, add them to h5
    for future, tile in dask.distributed.as_completed(
        processed_tile_futures, with_results=True
    ):
        self.tiles.add(tile)
    

    causing the dramatic memory leaks seen in https://github.com/Dana-Farber-AIOS/pathml/issues/211.

    I've used dask.delayed to prevent reading from the input file until the image is accessed on the worker. The code that accesses the file and loads the image can now be run by each worker in parallel. To preserve the parallelism, we have to take care not to access and load tile.image on the main thread before loading it on the worker, or to at least wrap accesses in dask.delayed as in SlideData.generate_tiles.

    I had some issues with the backends not being picklable. The Backend has to be sent to each worker so it has access to the code that interfaces with the filesystem. I changed Backend filelike attributes to be lazily evaluated with the @property decorator.

    opened by tddough98 4
  • Parameterize dtype for h5path with `SlideData` constructor

    Parameterize dtype for h5path with `SlideData` constructor

    Currently, PathML stores all images with float16, forcing all image inputs to be upcast or downcast to this data type, which increases storage size or loses information. There already is a dtype parameter in the SlideData constructor, but it's only used to assist the BioFormatsBackend in loading images correctly. This repurposes that parameter to control what dtype h5py uses when writing image data.

    I also changed masks to stored as ENUM and use the strongest compression setting as boolean masks are highly compressible and easily compressed. The compression made a huge difference in file size, and using (HDFView)[https://www.hdfgroup.org/downloads/hdfview/] showed a compression ratio of 100-200x for masks. The ENUM data type is stored as an 8-bit integer (https://docs.h5py.org/en/stable/special.html#enumerated-types) but at least this is less than using float16.

    opened by tddough98 4
Releases(v2.1.0)
  • v2.1.0(Apr 22, 2022)

    What's Changed

    • Clean SegmentMIF by @ryanccarelli in https://github.com/Dana-Farber-AIOS/pathml/pull/294
      • Removed GPU argument from SegmentMIF
      • Separated whole_cell and nuclear kwargs
    • Update README.md by @surya-narayanan in https://github.com/Dana-Farber-AIOS/pathml/pull/298
    • Update quantify mif by @jacob-rosenthal in https://github.com/Dana-Farber-AIOS/pathml/pull/301
      • update the functional implementation F() to not require a tile object.
      • Add "label" property to counts matrix.
    • Fix tiling bug by @jacob-rosenthal in https://github.com/Dana-Farber-AIOS/pathml/pull/306
      • Fixed bug for generate_tiles() within OpenSlideBackend and BioFormatsBackend. Tile shape evenly divides into slide shape
    • Added logging functionality by @BeeGass in https://github.com/Dana-Farber-AIOS/pathml/pull/304
      • Includes logger customization
    • Don't augment test or valid splits for PanNuke by @jacob-rosenthal in https://github.com/Dana-Farber-AIOS/pathml/pull/309

    New Contributors

    • @BeeGass made their first contribution in https://github.com/Dana-Farber-AIOS/pathml/pull/304

    Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.4...v2.1.0

    Source code(tar.gz)
    Source code(zip)
  • v2.0.4(Feb 7, 2022)

    What's Changed

    • Fix bug caused by mixing up (i, j) and (x, y) coordinate systems in BioFormatsBackend (#278)
    • Add option to not normalize image in BioFormatsBackend.extract_region() (#279)
    • Fix logic when inferring correct backend to use from file path which was failing on paths containing periods (#284)
    • Fix bug to correctly pass image_resolution argument to Mesmer model (#286)
    • Fix outdated url for PanNuke dataset (#287) by @Yu-AnChen
    • Fix GitHub Actions configuration which was causing testing suite to hang (#289)

    New Contributors

    • @dependabot made their first contribution in https://github.com/Dana-Farber-AIOS/pathml/pull/275
    • @Yu-AnChen made their first contribution in https://github.com/Dana-Farber-AIOS/pathml/pull/287

    Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.3...v2.0.4

    Source code(tar.gz)
    Source code(zip)
  • v2.0.3(Jan 7, 2022)

  • v2.0.2(Jan 6, 2022)

    What's Changed

    • Streamline environment setup by removing spams as a dependency (#142) and updating environment.yml to create an environment with both PathML and deepcell (#259 #210)
    • Add a Dockerfile for another installation option, and a GitHub Actions workflow to build and publish it to Dockerhub on new release (#145)
    • Add series_as_channels flag to BioFormatsBackend.extract_region() to fix support for images from the MISI lab (#261)

    Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.1...v2.0.2

    Source code(tar.gz)
    Source code(zip)
  • v2.0.dev4(Jan 4, 2022)

  • v2.0.dev3(Jan 4, 2022)

  • v2.0.dev2(Jan 4, 2022)

  • 2.0.dev1(Jan 4, 2022)

  • v2.0.1(Dec 25, 2021)

    What's Changed

    • Improve h5path read/write by @ryanccarelli in https://github.com/Dana-Farber-AIOS/pathml/pull/260

    Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.0...v2.0.1

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Dec 19, 2021)

    What's new in v2.0.0:

    • Changed h5path format and refactored h5manager to improve performance (#231)
    • support XYZCT images for TileDataset (#233)
    • Cleaned up versioning tracker (#236)
    • fix bug when reading region from openslide backend at higher levels (#242)
    • Add support for multi-series images with BioformatsBackend (#251)
    • Pin python-bioformats version to avoid any possibility of log4j hacks (#256)
    • Added optional flag in SlideDataset.run() to write slides to h5path as they finish processing (#226)
    • Added GitHub Actions workflow to automatically build package and publish to PyPI when a new release is created (#235)

    Because the file format is changed in this version, .h5path files saved in older versions will not be able to be loaded in this one, and vice versa (i.e. breaking backwards compatibility, hence the bumped major version).

    Source code(tar.gz)
    Source code(zip)
  • v1.0.4(Nov 29, 2021)

  • v1.0.dev4(Nov 29, 2021)

Owner
AI Operations and Data Science Services group
Python implementation of Wu et al (2018)'s registration fusion

reg-fusion Projection of a central sulcus probability map using the RF-ANTs approach (right hemisphere shown). This is a Python implementation of Wu e

Dan Gale 26 Nov 12, 2021
LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

LVI-SAM This repository contains code for a lidar-visual-inertial odometry and mapping system, which combines the advantages of LIO-SAM and Vins-Mono

Tixiao Shan 1.1k Dec 27, 2022
A stable algorithm for GAN training

DRAGAN (Deep Regret Analytic Generative Adversarial Networks) Link to our paper - https://arxiv.org/abs/1705.07215 Pytorch implementation (thanks!) -

195 Oct 10, 2022
PyTorch framework for Deep Learning research and development.

Accelerated DL & RL PyTorch framework for Deep Learning research and development. It was developed with a focus on reproducibility, fast experimentati

Catalyst-Team 29 Jul 13, 2022
A repository with exploration into using transformers to predict DNA ↔ transcription factor binding

Transcription Factor binding predictions with Attention and Transformers A repository with exploration into using transformers to predict DNA ↔ transc

Phil Wang 62 Dec 20, 2022
[NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning

SoCo [NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning By Fangyun Wei*, Yue Gao*, Zhirong Wu, Han Hu,

Yue Gao 139 Dec 14, 2022
Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

AimCLR This is an official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Reco

Gty 44 Dec 17, 2022
Model Agnostic Interpretability for Multiple Instance Learning

MIL Model Agnostic Interpretability This repo contains the code for "Model Agnostic Interpretability for Multiple Instance Learning". Overview Executa

Joe Early 10 Dec 17, 2022
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch = 1.2.0 P

16 Dec 14, 2022
Contrastive Learning Inverts the Data Generating Process

Official code to reproduce the results and data presented in the paper Contrastive Learning Inverts the Data Generating Process.

71 Nov 25, 2022
Motion planning environment for Sampling-based Planners

Sampling-Based Motion Planners' Testing Environment Sampling-based motion planners' testing environment (sbp-env) is a full feature framework to quick

Soraxas 23 Aug 23, 2022
JUSTICE: A Benchmark Dataset for Supreme Court’s Judgment Prediction

JUSTICE: A Benchmark Dataset for Supreme Court’s Judgment Prediction CSCI 544 Final Project done by: Mohammed Alsayed, Shaayan Syed, Mohammad Alali, S

Smit Patel 3 Dec 28, 2022
Migration of Edge-based Distributed Federated Learning

FedFly: Towards Migration in Edge-based Distributed Federated Learning About the research Due to mobility, a device participating in Federated Learnin

qub-blesson 11 Nov 13, 2022
Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 short.

Session-aware BERT4Rec Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 shor

Jamie J. Seol 22 Dec 13, 2022
3D ResNet Video Classification accelerated by TensorRT

Activity Recognition TensorRT Perform video classification using 3D ResNets trained on Kinetics-400 dataset and accelerated with TensorRT P.S Click on

Akash James 39 Nov 21, 2022
A collection of inference modules for fastai2

fastinference A collection of inference modules for fastai including inference speedup and interpretability Install pip install fastinference There ar

Zachary Mueller 83 Oct 10, 2022
Aalto-cs-msc-theses - Listing of M.Sc. Theses of the Department of Computer Science at Aalto University

Aalto-CS-MSc-Theses Listing of M.Sc. Theses of the Department of Computer Scienc

Jorma Laaksonen 3 Jan 27, 2022
Official implementation of particle-based models (GNS and DPI-Net) on the Physion dataset.

Physion: Evaluating Physical Prediction from Vision in Humans and Machines [paper] Daniel M. Bear, Elias Wang, Damian Mrowca, Felix J. Binder, Hsiao-Y

Hsiao-Yu Fish Tung 18 Dec 19, 2022
FluidNet re-written with ATen tensor lib

fluidnet_cxx: Accelerating Fluid Simulation with Convolutional Neural Networks. A PyTorch/ATen Implementation. This repository is based on the paper,

JoliBrain 50 Jun 07, 2022
Demonstrational Session git repo for H SAF User Workshop (28/1)

5th H SAF User Workshop The 5th H SAF User Workshop supported by EUMeTrain will be held in online in January 24-28 2022. This repository contains inst

H SAF 4 Aug 04, 2022