-
Notifications
You must be signed in to change notification settings - Fork 1
slideDupIdentify
slideDupIdentify
identifies and organizes multiplicate files based on specified criteria, such as study type and stain name. It prioritizes multiplicates according to certain rules and provides options to output information about the multiplicates and log statistics.
This script identifies and organizes multiplicate files based on specified criteria, such as --output
for the
output file name, study type (--study_type
), stain name (--stain
). It prioritizes multiplicates according to
certain rules. It provides options (--verbose
) to output information about the multiplicates and log statistics
using --log
.
Images are expected to be of the form study_typestudy_number.[additional_info.]stain.[random_info.]file_extension
,
e.g., AE1234.T01-12345.CD34.ndpi
, where AE
is the study_type
, 1234
is the study_number
,
T01-12345
is the additional_info
and optional, CD34
is the stain name, and ndpi
is the file_extension
.
The random_info
is optional and can be any random string of characters, e.g. 2017-12-22_23.54.03
. The
file_extension
is expected to be ndpi
or TIF
for the original image files.
The script will move all files with the same study_number
and stain
name to the duplicate folder. It will
prioritize the files based on the following criteria:
- There is a ndpi > keep ndpi,
keep_this_one
- Different creation date > keep latest file,
different_date_kept_latest
- Same date, different type > keep ndpi,
same_date_diff_type_kept_ndpi
- Same date, same type, different checksum > keep biggest,
same_date_same_type_diff_checksum_biggest
- Same date, same type, same checksum > keep first one,
same_date_same_type_same_checksum_keep_this_one
- When none of the above apply >
cannot_assign_priority
Example usage:
python slideDupIdentify.py --study_type AE --stain CD34 --output duplicate_files
Argument(s):
-
--image-folder
,-i
Specify the folder where images are located (default: current directory). Required. -
--studytype
,-t
Specify the study type prefix, e.g., AE. Required. -
--stain
,-s
Specify the stain name, e.g., CD34. Required. -
--out_file
,-o
Specify the output file name (without extension) to write duplicate information. Required.
Optional argument(s):
-
--force
,-f
Force overwrite if the output file already exists. Optional. -
--dry_run
,-d
Perform a dry run (report in the terminal, no actual file operations. Optional. -
--debug,
-D
Print debug information. Optional. -
--verbose
,-V
Print the number of duplicate samples identified. Optional. -
--help
,-h
Print this help message and exit. Optional. -
--version
,-v
Print the version number and exit. Optional.
Licence. The MIT License (MIT): http://opensource.org/licenses/MIT.
Copyright (c) 2014-2023, Bas G.L. Nelissen & Sander W. van der Laan, UMC Utrecht, Utrecht, the Netherlands.
Introduction
General instructions
slide2Tiles
slideAppend.sh
slideAppendGCT.sh
slideConvert
slideDirectory
slideDupIdentify.py
slideEMask
slideEntropySegmentation.py
slideExtract.py
slideExtractTiles.py
slideInfo
slideInfo.py
slideJobChecker
slideLookup
slideMacro
slideMacro.py
slideMask
slideNormalize
slideRename
slideRename.py
slideThumb
slideThumb.py
slideQuantify_v1
slideQuantify_v1_1_expresshist_mask.sh
slideQuantify_v1_2_expresshist_tile.sh
slideQuantify_v1_3_tile_normalizing.sh
slideQuantify_v1_4_cellprofiler.sh
slideQuantify_v1_5_wrapup.sh
slideQuantify_v2
slideQuantify_v2_1_entropy_segmentation.sh
slideQuantify_v2_2_extract_tiles.sh
slideQuantify_v2_3_tile_normalizing.sh
slideQuantify_v2_4_cellprofiler.sh
slideQuantify_v2_5_wrapup.sh
slideQuantifyOSX
slideQuantify_cellprofiler.sh
slideQuantify_mask.sh
slideQuantify_normalizing.sh
slideQuantify_tiling.sh
slideQuantify_wrapup.sh
Conda version (default/preferred)
Homebrew version
Rocky 8 Conda version (default/preferred)
Ubuntu 16.04 LTS
Ubuntu 12.04
CentOS7 Conda version with modules
Administrator version