Skip to content
/ STANDS Public

Detecting and dissecting anomalous anatomic regions in spatial transcriptomics with STANDS

License

Notifications You must be signed in to change notification settings

Catchxu/STANDS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Detecting and dissecting anomalous anatomic regions in spatial transcriptomics with STANDS

We introduce Spatial Transcriptomics ANomaly Detection and Subtyping (STANDS), an innovative computational method capable of integrating multimodal information, e.g., spatial gene expression, histology image and single cell gene expression, to not only delineate anomalous tissue regions but also reveal their compositional heterogeneities across multi-sample spatial transcriptomics (ST) data.


Outline of DDATD

The accurate detection of anomalous anatomic regions, followed by their dissection into biologically heterogeneous subdomains across multiple tissue slices, is of paramount importance in clinical diagnostics, targeted therapies and biomedical research. This procedure, which we refer to as Detection and Dissection of Anomalous Tissue Domains (DDATD), serves as the first and foremost step in a comprehensive analysis of tissues harvested from affected individuals for revealing population-level and individual-specific factors (e.g., pathogenic cell types) associated with disease developments.


Framework of STANDS

STANDS is an innovative framework built on a suite of specialized Generative Adversarial Networks (GANs) for seamlessly integrating the three tasks of DDATD. The framework consists of three components.

Component I (C1) trains a GAN model on the reference dataset, learning to reconstruct normal spots from their multimodal representations of both spatial transcriptomics data and associated histology image. Subsequently, the model is applied on the target datasets to identify anomalous spots as those with unexpectedly large reconstruction deviances, namely anomaly scores.

Component II (C2) aims at diminishing the non-biological variations (e.g. batch effects) among anomalies via aligning target datasets in a common space. It employs two cooperative GAN models to identify pairs of reference and target spots that share similar biological contents, based on which the target datasets are aligned to the reference data space via “style-transfer”.

Component III (C3) fuses the embeddings and reconstruction residuals of aligned anomalous spots to serve as inputs to an iterative clustering algorithm which groups anomalies into distinct subtypes.


Dependencies

  • anndata>=0.10.7
  • dgl>=2.1.0
  • networkx>=3.2.1
  • numpy>=1.22.4
  • pandas>=1.5.1
  • Pillow>=9.4.0
  • PuLP>=2.7.0
  • pyemd>=1.0.0
  • rpy2>=3.5.13
  • scanpy>=1.10.1
  • scikit_learn>=1.2.0
  • scipy>=1.11.4
  • torch>=2.0.0
  • torchvision>=0.15.1
  • tqdm>=4.64.1

Installation

STANDS is developed as a Python package. You will need to install Python, and the recommended version is Python 3.9.

You can download the package from GitHub and install it locally:

git clone https://github.com/Catchxu/STANDS.git
cd STANDS/
python3 setup.py install

Getting Started

STANDS offers a variety of functionalities, including but not limited to:

  • Identify cancerous domains in single ST dataset (tutorial)
  • Identify cancerous domains across multiple ST datasets concurrently (tutorial)
  • Align multiple ST datasets sharing identical domain types (tutorial)
  • Align multiple ST datasets with non-overlapping domain types (tutorial)
  • Discern biologically distinct anomalous tissue subdomains in single ST datasets (tutorial)
  • Discern biologically distinct anomalous tissue subdomains across multiple ST datasets (tutorial)

Before starting the tutorial, we need to make some preparations, including: installing STANDS and its required Python packages, downloading the datasets required for the tutorial, and so on. The preparations is available at STANDS Preparations. Additionally, when dealing with multimodal data structures involving both images and gene expression matrices, we strongly recommend using a GPU and pretraining STANDS on large-scale public spatial transcriptomics datasets. This ensures faster execution of STANDS and improved performance in modules related to image feature extraction and feature fusion.

Finally, more useful and helpful information can be found at the online documentation and tutorials for a quick run.

Tested environment

Environment 1

  • CPU: Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz
  • Memory: 256 GB
  • System: Ubuntu 20.04.5 LTS
  • Python: 3.9.15

Environment 2

  • CPU: Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz
  • Memory: 256 GB
  • System: Ubuntu 22.04.3 LTS
  • Python: 3.9.18

Getting help

Please see the tutorial for more complete documentation of all the functions of STANDS. For any questions or comments, please use the GitHub issues or directly contact Kaichen Xu at the email: [email protected].

Citation

Coming soon.