Skip to content

Commit

Permalink
Merge pull request #1 from TRON-Bioinformatics/parametrise-memory-and…
Browse files Browse the repository at this point in the history
…-cpus

Parametrise memory and cpus
  • Loading branch information
priesgo committed Sep 29, 2021
2 parents cd1e65f + a6d6a8c commit eab507a
Show file tree
Hide file tree
Showing 6 changed files with 90 additions and 35 deletions.
22 changes: 22 additions & 0 deletions .github/workflows/automated_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: Automated tests

on: [push]

jobs:
test:
runs-on: ubuntu-20.04

steps:
- uses: actions/checkout@v2
- uses: actions/setup-java@v2
with:
distribution: 'zulu' # See 'Supported distributions' for available options
java-version: '11'
- uses: conda-incubator/setup-miniconda@v2
- name: Install dependencies
run: |
apt-get update && apt-get --assume-yes install wget make procps software-properties-common
wget -qO- https://get.nextflow.io | bash && cp nextflow /usr/local/bin/nextflow
- name: Run tests
run: |
make
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,12 @@ test:
nextflow main.nf -profile test,conda --disable_common_germline_filter --output output/test2 --input_files test_data/test_input.txt
echo "sample_name_with_replicates\t"`pwd`"/test_data/TESTX_S1_L001.bam,"`pwd`"/test_data/TESTX_S1_L001.bam\t"`pwd`"/test_data/TESTX_S1_L002.bam,"`pwd`"/test_data/TESTX_S1_L002.bam" > test_data/test_input_with_replicates.txt
nextflow main.nf -profile test,conda --input_files test_data/test_input_with_replicates.txt --output output/test3
nextflow main.nf -profile test,conda --output output/test4 --input_files test_data/test_input.txt --intervals false


check:
test -s output/test1/sample_name/sample_name.mutect2.vcf || { echo "Missing test 1 output file!"; exit 1; }
test -s output/test2/sample_name/sample_name.mutect2.vcf || { echo "Missing test 2 output file!"; exit 1; }
test -s output/test3/sample_name_with_replicates/sample_name_with_replicates.mutect2.vcf || { echo "Missing test 3 output file!"; exit 1; }
test -s output/test4/sample_name/sample_name.mutect2.vcf || { echo "Missing test 4 output file!"; exit 1; }

17 changes: 13 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# TronFlow Mutect2

![GitHub tag (latest SemVer)](https://img.shields.io/github/v/release/tron-bioinformatics/tronflow-mutect2?sort=semver)
[![Run tests](https://github.com/TRON-Bioinformatics/tronflow-mutect2/actions/workflows/automated_tests.yml/badge.svg?branch=master)](https://github.com/TRON-Bioinformatics/tronflow-mutect2/actions/workflows/automated_tests.yml)
[![DOI](https://zenodo.org/badge/355860788.svg)](https://zenodo.org/badge/latestdoi/355860788)
[![License](https://img.shields.io/badge/license-MIT-green)](https://opensource.org/licenses/MIT)
[![Powered by Nextflow](https://img.shields.io/badge/powered%20by-Nextflow-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)](https://www.nextflow.io/)
Expand All @@ -27,13 +28,21 @@ Input:
name1 tumor_bam1 normal_bam1
name2 tumor_bam2 normal_bam2
* reference: path to the FASTA genome reference (indexes expected *.fai, *.dict)
* intervals: path to a BED file containing the regions to analyse
* gnomad: path to the gnomad VCF
* gnomad: path to the gnomad VCF or other germline resource
Optional input:
* intervals: path to a BED file containing the regions to analyse
* output: the folder where to publish output
* memory: the ammount of memory used by each job (default: 16g)
* cpus: the number of CPUs used by each job (default: 2)
* memory_mutect2: the ammount of memory used by mutect2 (default: 16g)
* cpus_mutect2: the number of CPUs used by mutect2 (default: 2)
* memory_read_orientation: the ammount of memory used by learn read orientation (default: 16g)
* cpus_read_orientation: the number of CPUs used by learn read orientation (default: 2)
* memory_pileup: the ammount of memory used by pileup (default: 32g)
* cpus_pileup: the number of CPUs used by pileup (default: 2)
* memory_contamination: the ammount of memory used by contamination (default: 16g)
* cpus_contamination: the number of CPUs used by contamination (default: 2)
* memory_filter: the ammount of memory used by filter (default: 16g)
* cpus_filter: the number of CPUs used by filter (default: 2)
* disable_common_germline_filter: disable the use of GnomAD to filter out common variants in the population
from the somatic calls. The GnomAD resource is still required though as this common SNPs are used elsewhere to
calculate the contamination (default: false)
Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# You can use this file to create a conda environment for this pipeline:
# conda env create -f environment.yml
name: tronflow-mutect2-0.3.1
name: tronflow-mutect2-1.2.0
channels:
- conda-forge
- bioconda
Expand Down
49 changes: 27 additions & 22 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,16 @@ params.intervals = false
params.gnomad = false
params.output = 'output'
params.pon = false
params.memory = "16g"
params.cpus = 2
params.memory_mutect2 = "16g"
params.cpus_mutect2 = 2
params.memory_read_orientation = "16g"
params.cpus_read_orientation = 2
params.memory_pileup = "32g"
params.cpus_pileup = 2
params.memory_contamination = "16g"
params.cpus_contamination = 2
params.memory_filter = "16g"
params.cpus_filter = 2
params.disable_common_germline_filter = false

def helpMessage() {
Expand All @@ -24,10 +32,6 @@ if (!params.reference) {
log.error "--reference is required"
exit 1
}
if (!params.intervals) {
log.error "--intervals is required"
exit 1
}
if (!params.gnomad) {
log.error "--gnomad is required"
exit 1
Expand All @@ -51,8 +55,8 @@ if (params.input_files) {
}

process mutect2 {
cpus params.cpus
memory params.memory
cpus params.cpus_mutect2
memory params.memory_mutect2
tag "${name}"
publishDir "${params.output}/${name}", mode: "copy"

Expand All @@ -68,10 +72,11 @@ process mutect2 {
germline_filter = params.disable_common_germline_filter ? "" : "--germline-resource ${params.gnomad}"
normal_inputs = normal_bam.split(",").collect({v -> "--input $v"}).join(" ")
tumor_inputs = tumor_bam.split(",").collect({v -> "--input $v"}).join(" ")
intervals_option = params.intervals ? "--intervals ${params.intervals}" : ""
"""
gatk --java-options '-Xmx${params.memory}' Mutect2 \
gatk --java-options '-Xmx${params.memory_mutect2}' Mutect2 \
--reference ${params.reference} \
--intervals ${params.intervals} \
${intervals_option} \
${germline_filter} \
${normal_panel_option} \
${normal_inputs} --normal-sample normal \
Expand All @@ -82,8 +87,8 @@ process mutect2 {
}

process learnReadOrientationModel {
cpus params.cpus
memory params.memory
cpus params.cpus_read_orientation
memory params.memory_read_orientation
tag "${name}"
publishDir "${params.output}/${name}", mode: "copy"

Expand All @@ -94,15 +99,15 @@ process learnReadOrientationModel {
set name, file("${name}.read-orientation-model.tar.gz") into read_orientation_model

"""
gatk --java-options '-Xmx${params.memory}' LearnReadOrientationModel \
gatk --java-options '-Xmx${params.memory_read_orientation}' LearnReadOrientationModel \
--input ${f1r2_stats} \
--output ${name}.read-orientation-model.tar.gz
"""
}

process pileUpSummaries {
cpus params.cpus
memory params.memory
cpus params.cpus_pileup
memory params.memory_pileup
tag "${name}"
publishDir "${params.output}/${name}", mode: "copy"

Expand All @@ -115,7 +120,7 @@ process pileUpSummaries {
script:
tumor_inputs = tumor_bam.split(",").collect({v -> "--input $v"}).join(" ")
"""
gatk --java-options '-Xmx${params.memory}' GetPileupSummaries \
gatk --java-options '-Xmx${params.memory_pileup}' GetPileupSummaries \
--intervals ${params.gnomad} \
--variant ${params.gnomad} \
${tumor_inputs} \
Expand All @@ -124,8 +129,8 @@ process pileUpSummaries {
}

process calculateContamination {
cpus params.cpus
memory params.memory
cpus params.cpus_contamination
memory params.memory_contamination
tag "${name}"
publishDir "${params.output}/${name}", mode: "copy"

Expand All @@ -136,16 +141,16 @@ process calculateContamination {
set name, file("${name}.segments.table"), file("${name}.calculatecontamination.table") into contaminationTables

"""
gatk --java-options '-Xmx${params.memory}' CalculateContamination \
gatk --java-options '-Xmx${params.memory_contamination}' CalculateContamination \
--input ${table} \
-tumor-segmentation ${name}.segments.table \
--output ${name}.calculatecontamination.table
"""
}

process filterCalls {
cpus params.cpus
memory params.memory
cpus params.cpus_filter
memory params.memory_filter
tag "${name}"
publishDir "${params.output}/${name}", mode: "copy"

Expand All @@ -158,7 +163,7 @@ process filterCalls {
file "${name}.mutect2.vcf"

"""
gatk --java-options '-Xmx${params.memory}' FilterMutectCalls \
gatk --java-options '-Xmx${params.memory_filter}' FilterMutectCalls \
-V ${unfiltered_vcf} \
--reference ${params.reference} \
--tumor-segmentation ${segments_table} \
Expand Down
32 changes: 24 additions & 8 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,16 @@ profiles {
params.reference = "$baseDir/test_data/ucsc.hg19.minimal.fasta"
params.intervals = "$baseDir/test_data/intervals.minimal.bed"
params.gnomad = "$baseDir/test_data/gnomad.minimal.vcf.gz"
params.cpus = 1
params.memory = "2g"
params.memory_mutect2 = "2g"
params.cpus_mutect2 = 1
params.memory_read_orientation = "2g"
params.cpus_read_orientation = 1
params.memory_pileup = "2g"
params.cpus_pileup = 1
params.memory_contamination = "2g"
params.cpus_contamination = 1
params.memory_filter = "2g"
params.cpus_filter = 1
timeline.enabled = false
report.enabled = false
trace.enabled = false
Expand All @@ -29,7 +37,7 @@ env {
// Capture exit codes from upstream processes when piping
process.shell = ['/bin/bash', '-euo', 'pipefail']

VERSION = '1.1.0'
VERSION = '1.2.0'
DOI = 'https://zenodo.org/badge/latestdoi/355860788'

manifest {
Expand Down Expand Up @@ -58,14 +66,22 @@ Input:
name1 tumor_bam1 normal_bam1
name2 tumor_bam2 normal_bam2
* reference: path to the FASTA genome reference (indexes expected *.fai, *.dict)
* intervals: path to a BED file containing the regions to analyse
* gnomad: path to the gnomad VCF
* gnomad: path to the gnomad VCF or other germline resource
Optional input:
* intervals: path to a BED file containing the regions to analyse
* output: the folder where to publish output
* memory: the ammount of memory used by each job (default: 16g)
* cpus: the number of CPUs used by each job (default: 2)
* disable_common_germline_filter: disable the use of GnomAD to filter out common variants in the population
* memory_mutect2: the ammount of memory used by mutect2 (default: 16g)
* cpus_mutect2: the number of CPUs used by mutect2 (default: 2)
* memory_read_orientation: the ammount of memory used by learn read orientation (default: 16g)
* cpus_read_orientation: the number of CPUs used by learn read orientation (default: 2)
* memory_pileup: the ammount of memory used by pileup (default: 32g)
* cpus_pileup: the number of CPUs used by pileup (default: 2)
* memory_contamination: the ammount of memory used by contamination (default: 16g)
* cpus_contamination: the number of CPUs used by contamination (default: 2)
* memory_filter: the ammount of memory used by filter (default: 16g)
* cpus_filter: the number of CPUs used by filter (default: 2)
* disable_common_germline_filter: disable the use of GnomAD to filter out common variants in the population
from the somatic calls. The GnomAD resource is still required though as this common SNPs are used elsewhere to
calculate the contamination (default: false)
Expand Down

0 comments on commit eab507a

Please sign in to comment.