HISSL: HIstology Self-Supervised Learning

=== WORK IN PROGRESS ===

Submodule initialization does not work because DLUP is not yet publicly available
Current scripts with singularity calls might not work since DLUP is installed as editable and the entire hissl repo is mounted on the singularity, meaning there will be an empty DLUP directory
- It is recommended to remove the --bind $HISSL_ROOT:/hissl from the singularity options for the time being, meaning changes you make to hissl are not reflected inside the container. This will allow you to run all the DeepSMILE reproduction scripts.

=========================

HISSL performs self-supervised pre-training of a feature extractor on histopathology data. It is used for DeepSMILE: Contrastive self-supervised pre-training benefits MSI and HRD classification directly from H&E whole-slide images in colorectal and breast cancer, and is meant to

allow reproduction of the results of DeepSMILE
be used on other datasets and other tasks by other researchers

Essentially, the repository contains the following:

DLUP (third_party/dlup): Deep Learning Utilities for Pathology. This is a repository that eases Whole-Slide Image preprocessing and can create datasets to read tiles from WSIs directly without any preprocessing.
A fork of VISSL (third_party/vissl). VISSL is a repository that allows high-performant self-supervised learning using a large variety of state of the art self-supervised learning methods. The fork contains custom data sources (dataset classes) and saves the features in h5 format with additional metadata.
A docker image (docker/README.md) that has the above repositories properly installed, so that no complex set-up is required, and so that the pre-training can be run on any OS.
Reproduction scripts (tools/reproduce_deepsmile) that can be run using bash that perform all the downloading, preprocessing, pretraining, and feature extraction steps for the feature learning part of DeepSMILE: Contrastive self-supervised pre-training benefits MSI and HRD classification directly from H&E whole-slide images in colorectal and breast cancer
The pre-trained weights of the models trained for DeepSMILE .

Free software: MIT license

Installation

Please use docker. See docker/README.md

Reproduce DeepSMILE feature extractors

Run tools/reproduce_deepsmile.sh on

a linux server
with more than 5TB storage space
with 1-4 CUDA-enabled GPUs
and singularity installed

This script will 0. Check that you have the singularity container

Download all TCGA-BC WSIs from the TCGA repository
Precompute masks for TCGA-BC using DLUP
Download TCGA-CRCk tiles from Zenodo
Create train-val-test splits for TCGA-BC using a label file from Kather (2019)
Pretrain (5x) ShufflenetV2x1_0 on TCGA-BC. These models should be similar to those downloaded by model_zoo/tcga-bc/download_model.sh
Pretrain (1x) Resnet18 on TCGA-CRCk. This model should be similar to the one downloaded by model_zoo/tcga-crck/download_model.sh
Extract and save all features for TCGA-BC and TCGA-CRCk

Changes to VISSL for HISSL

Our fork of VISSL contains some changes to be used for histopathology data. In summary:

We add a ShuffleNet backbone (third_party/vissl/vissl/models/trunks/hissl_shufflenet.py)
We add a dataset for TCGA-CRCk (third_party/vissl/vissl/data/kather_msi_dataset.py)
We add a DLUP dataset to read tiles directly from WSIs and load/compute masks (third_party/vissl/vissl/data/dlup_dataset.py)
We add saving of features as H5 datasets for both TCGA-CRCk and TCGA-BC (third_party/vissl/vissl/trainer/trainer_main_hissl.py)

Reproduce classifiers

See dlup-lightning-mil

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
configs/config		configs/config
docker		docker
docs		docs
hissl		hissl
model_zoo		model_zoo
resources/data/tcga_bc		resources/data/tcga_bc
third_party		third_party
tools		tools
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.md		HISTORY.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HISSL: HIstology Self-Supervised Learning

Installation

Reproduce DeepSMILE feature extractors

Changes to VISSL for HISSL

Reproduce classifiers

About

Releases

Packages

Languages

License

kaiko-ai/hissl

Folders and files

Latest commit

History

Repository files navigation

HISSL: HIstology Self-Supervised Learning

Installation

Reproduce DeepSMILE feature extractors

Changes to VISSL for HISSL

Reproduce classifiers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages