Skip to content

nf-imagecleaner is a Nextflow pipeline that prepares images for upload to HTAN by removing sensitive data

Notifications You must be signed in to change notification settings

ncihtan/nf-imagecleaner

Repository files navigation

nf-imagecleaner

The nf-imagecleaner is a Nextflow pipeline that prepares images for upload by removing sensitive data. This includes AcquisitionDate and StructuredAnnotations from OME-TIFF files, label images and Date from SVS files, and specified metadata tags from TIFFs. It handles Synapse URIs, local file paths, and mixtures of both in its input samplesheet.

Nextflow Launch on Nextflow Tower run with docker GitHub Actions CI Status

Requirements

  • Nextflow
  • Docker

Usage

To run the pipeline with default parameters and docker (recommended), use:

nextflow run ncihtan/nf-imagecleaner --input <path/to/samplesheet.csv> -profile docker

Inputs

The input to the pipeline is a CSV file (specified with --input) where the image column contains paths to images. If the path is a Synapse URL (starts with syn://), this file will be downloaded from Synapse.

For example:

image
syn://syn00123
/local/path/to/image.svs
s3://my-bucket/my-image.ome.tiff

Parameters

  • outdir: Directory for outputs (default: outputs)
  • outsuffix: Suffix for output files (default: _cleaned)
  • rm_svs_macro: Boolean indicating whether to remove the macro image in SVS files (default: false) Coming soon!
  • rm_svs_label: Boolean indicating whether to remove the label image in SVS files (default: true)
  • rm_ome_sa: Boolean indicating whether to remove structural annotations in OME-XML files (default: true)

Outputs

The cleaned images will be placed in the directory specified by --outdir.

Metadata Redaction

The specific tags removed are:

for TIFFs:

  • DateTime
  • NDPI_ScanTime
  • NDPI_WriteTime
  • Artist
  • HostComputer
  • WangAnnotation
  • WriterSerialNumber
  • MDLabName
  • MDPrepDate
  • MDSampleInfo
  • Software

for SVSs:

  • Date
  • Time Zone
  • ScanScope ID
  • User
  • Time
  • DSR ID

for OME-TIFFs:

  • Whole StructuredAnnotations block
  • Experimenter's e-mail, first name, and last name
  • AcquisitionDate

Tools Used

  • tifftools: for handling TIFF and OME-TIFF metadata
  • ome_types: for handling OME-XML
  • synapseclient: for downloading data from Synapse

This README.md was automatically generated by jaredcd/ai-tools and GPT-4.

About

nf-imagecleaner is a Nextflow pipeline that prepares images for upload to HTAN by removing sensitive data

Resources

Stars

Watchers

Forks

Packages