Skip to content

[ISBI 2024] Accurate Subtyping of Lung Cancers by Modelling Class Dependencies

Notifications You must be signed in to change notification settings

GeorgeBatch/dependency-mil

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dependency-MIL

Accurate Subtyping of Lung Cancers by Modelling Class Dependencies

[Pre-print] [Code] [BibTeX]

Has been presented in 2024 at the 21th International Symposium on Biomedical Imaging (ISBI-2024).

Authors: George Batchkala, Bin Li, Mengran Fan, Mark McCole, Cecilia Brambilla, Fergus Gleeson, Jens Rittscher.

Creation of the Multi-label Dataset

Source files used to make the labels

Dummy label files

Columns include the label (LUAD vs LUSC) and paths to features:

  • features_csv_file_path
  • h5_file_path
  • pt_file_path
mapping = {
    "LUAD": 0,
    "LUSC": 1,
}

DHMC has only LUAD slides, so all entries in the label field are 0:

TCGA has both LUAD and LUSC so entries in the label field include 0 and 1:

Run the creation code

Run the labels creation code notebook. The code will create the files in labels/experiment-label-files/.

Note, the combined dataset for training/validation is not the same as in the paper since the in-house DART dataset is not publicly available. The test set, however, is the same as in the paper and is fully available in the 8-label task and 5-label task.

Tiling, Feature Extraction, and Training - Improvements In Progress (last updated: June 4th, 2024)

For publication, I used the tiling and feature extraction pipeline from https://github.com/binli123/dsmil-wsi repository. For faster computation, the csv features should be converted into hdf5 and pt files like in https://github.com/mahmoodlab/CLAM. I am currently working on standardising the tiling and feature extraction pipeline for the Dependency-MIL model using tiatoolbox.

For training I used the code from https://github.com/binli123/dsmil-wsi modified to accomodate for partial labels using custom_binary_cross_entropy_with_logits function from source.losses

I will release the code once I finish improving it. If you need the code urgently, please contact me.

PyTorch Dataset and Data Loaders

Code for creating

Dependency Modelling architecture

Dependency-MIL model can be created using get_model() function from source.models.combined_model

Acknowledgements

George Batchkala is supported by Fergus Gleeson and the EPSRC Center for Doctoral Training in Health Data Science (EP/S02428X/1). The work was done as part of DART Lung Health Program (UKRI grant 40255).

The computational aspects of this research were supported by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z and the NIHR Oxford BRC. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

Citation

If you find Dependency-MIL useful for your your research and applications, please cite using this BibTeX (will be updated once the paper is published by IEEE in ISBI 2024 proceedings):

@INPROCEEDINGS{batchkala2024dependency-mil,
  author={Batchkala, George and Li, Bin and Fan, Mengran and McCole, Mark and Brambilla, Cecilia and Gleeson, Fergus and Rittscher, Jens},
  booktitle={2024 IEEE 21th International Symposium on Biomedical Imaging (ISBI)}, 
  title={Accurate Subtyping of Lung Cancers by Modelling Class Dependencies}, 
  year={2024},
  volume={},
  number={},
  pages={...},
  keywords={lung cancer;computational pathology;multi-label classification;multiple-instance learning},
  doi={...}
}