Skip to content
Trung Nghia Vu edited this page Jul 1, 2024 · 13 revisions
scasa logo

Welcome to to scasa wiki!

Scasa is a single cell transcript quantification software designed for single cell RNA-Sequencing data. The software comprises pseudo-alignment to quantification steps. Here we show detailed instructions and examples on how to use scasa as part of a single-cell RNA-seq workflow.

scasa format

Scasa works with raw FASTQ files directly for single cell RNA-Sequencing alignment and subsequent quantification processes or with alignment output files from Salmon Alevin or Kallisto Bustools for single cell RNA-Sequencing quantification process.

Usage

Type scasa --help in the terminal to see a list of available commands.

> scasa
Usage: scasa [options] [arguments]

List of options:

--help,h                    0. Help Page to Display All Options
--project,-p                1. Create a Project Name
--mapper,-m                 2. Choose an Alignment Tool
--align,-a                  3. Alignment Step
--quant,-q                  4. Quantification Step
--in,-i                     5. Provide an Input Directory
--fastq,f                   6. Input FASTQ Files
--samplesheet,-s            7. Provide a Samplesheet
--out,-o                    8. Provide an Output Directory
--ref,-r                    9. Reference Transcriptome Fasta File
--index,-e                  10. To Index Reference Fasta File
--index_dir,-d              11. Index File Directory if Index was Prebuilt
--whitelist,-w              12. Provide a Whitelist for Barcode Correction
--tech,-t                   13. Sequencing Technology Used
--cellthreshold,-c          14. Number of Cells to Retain
--nthreads,-n               15. Set the Number of Threads to Use
--postalign_dir,-g          16. Post-alignment Directory if Alignment was Done in Prior
--createxmatrix,-b          17. To Generate Xmatrix
--xmatrix,-x                18. X-Matrix directory

0. Help Page to Display All Options

Use --help,h to view all options to scasa.

> scasa
Example: scasa --help

(List of options)

1. Create a Project Name

Use --project to pass project name to scasa <STRING, optional, if option is not used, default is set to My_Project. No space is allowed, please use '-' or '_' or "." symbols to replace space. If you want to rerun existing folder, provide the timestamp suffix project folder name (not project directory), for example: My_Project_202104241111>.

Usage: scasa --project [arguments]
Example: scasa --project SCRNASEQ_PROJECT

2. Choose an Alignment Tool

Use --mapper to state an alignment tool to be used for alignment step. <STRING, optional, currently scasa only supports two options to --mapper: salmon_alevin, kallisto_bus. Default is set to salmon_alevin>.

Usage: scasa --mapper [arguments]
Example: scasa --mapper salmon_alevin

Arguments available (Default is salmon_alevin):
--mapper salmon_alevin
--mapper kallisto_bus

3. Alignment Step

Use --align to run pseudo-alignment step <STRING, optional, if set to YES, please state in the --mapper option which pseudo-alignment to use. Currently, scasa only supports two alignment tools: salmon alevin, kallisto bus. Default is set to YES>.

Usage: scasa --align [arguments]

Arguments available (Default is YES): 
--align YES
--align NO

4. Quantification Step

Use --quant to run scasa quantification step to produce transcript counts <STRING, optional, default is set to YES>

Usage: scasa --quant [arguments]

Arguments available (Default is YES): 
--quant YES
--quant NO

5. Provide an Input Directory

Use --in to provide an input directory containing input FASTQ files<STRING, optional, no space in directory path is allowed, default is set to current directory>.

Usage: scasa --in [arguments]
Example: scasa --in /mnt/PROJECT/PROJECT_OUT/

6. Input FASTQ Files

Use --fastq to provide fastq file names to the argument (not path to fastq files, just fastq file names, path should be stated in --in option), separate each file by commas, make sure that you have labeled R1 and R2 in each paired fastq names and the file prefix should be the same for each pair of fastq files <STRING, optional, this option is for users with few fastq files to run. User could provide argument to either -samplesheet or --fastq but not both. If both --samplesheet and --fastq options are not provided, scasa will look for fastq files in the input directory supplied by user via option --in>.

Usage: scasa --fastq [arguments]
Example 1: scasa --fastq Sample01_R1.fastq,Sample01_R2.fastq
Example 2: scasa --fastq Sample01_R1.fastq,Sample01_R2.fastq,Sample_02_S1_L001_R1_001.fastq,Sample_02_S1_L001_R2_001.fastq

7. Provide a Samplesheet

Use --samplesheet to provide a directory to a comma or tab-separated samplesheet file containing input FASTQ file names (Download an example of samplesheet file here). One row one pair of fastq files, separated by a comma. No header line. Make sure that you have labeled R1 and R2 in each paired fastq names and the file prefix should be the same for each pair of fastq files <STRING, optional, this option is for users with many fastq files to run, if this option is not used, please supply fastq names to option --fastq. If both --samplesheet and --fastq options are not provided, scasa will look for fastq files in the input directory supplied by user via option --in. No default for this option>.

Usage: scasa --samplesheet [arguments]
Example: scasa --samplesheet /mnt/PROJECT/My_Samplesheet.csv

8. Provide an Output Directory

Use --out to provide an output directory to your analysis <STRING, optional, no space in the directory is allowed. Output will be generated under the stated output directory with a new folder with user-provided project name with timestamp suffix, default is set to current directory>.

Usage: scasa --out [arguments]
Example: scasa --out /mnt/PROJECT/SCASA_OUT/

9. Reference Transcriptome Fasta File

Use --ref to provide reference fasta file path <STRING, required, provide a fasta reference file, currently scasa supports hg38 with the prebuilt annotation. Users could UCSC Hg38 reference fasta file here. No default for this option. However, users can consider to run Scasa for other annotations and species by following the instruction here >.

Usage: scasa --ref [arguments]
Example: scasa --ref /mnt/PROJECT/refMrna.fa

10. To Index Reference Fasta File

Use --index to run indexing for reference fasta file <STRING, optional, provide YES or NO to the option. Default is set to YES>.

Scasa utilizes Alevin/Kallisto-bustools for mapping reads to the reference sequences of transcripts and extracting the eqclasses. Currently, Scasa uses only transcript sequences for indexing, decoy sequences recommended in Salmon are not used.

Usage: scasa --index [arguments]

Arguments available (Default is YES): 
--index YES
--index NO

11. Index File Directory if Index was Prebuilt

Use --index_dir to provide a directory to reference fasta index file if --index is set to NO <STRING, optional, right now scasa only supports salmon or kallisto indexed-fasta files. No space in directory path is allowed, no default>.

Usage: scasa --index_dir [arguments]
Example: scasa --index_dir /mnt/PROJECT/refMrna.fa.idx

12. Provide a Whitelist for Barcode Correction

Use --whitelist to provide a white list file path for barcode correction. <STRING, optional, Note that this option will be required if --xmatrix is set to YES for Xmatrix generation. No space in the directory path is allowed, no default>. For more information on the white lists to be used for different versions, visit the following link to obtain the relevant whitelist from 10X Genomics: https://kb.10xgenomics.com/hc/en-us/articles/115004506263-What-is-a-barcode-whitelist

Usage: scasa --whitelist [arguments]
Example: scasa --whitelist /mnt/PROJECT/whitelist.txt

13. Sequencing Technology Used

Use --tech to provide the technlogy used for sequencing <STRING, optional. Currently, scasa only supports sequencing output from 10X 3' Chromium V1, V2 and V3 chemistries. Default is set to 10xv3>.

Usage: scasa --tech [arguments]

Arguments available (Default is 10xv3): 
--tech 10xv1
--tech 10xv2
--tech 10xv3

14. Number of Cells to Retain

Use --index to provide a threshold for number of expected cells to be produced <NUMERIC, optional, currently, this option is only valid for alevin alignment step, default is set to no expected cells>.

Usage: scasa --cellthreshold [arguments]
Example: scasa --cellthreshold 35000

15. Set the Number of Threads to Use

Use --nthreads to set the number of threads to be used for running scasa <NUMERIC, optional, set a higher number of threads for faster processing. Default is set to 4>.

Usage: scasa --nthreads [arguments]
Example: --nthreads 16

16. Post-alignment Directory if Alignment was Done in Prior

Use --postalign_dir to provide a directory to post-alignment files if alignment has been done in prior <STRING, optional, this option is only valid if --align is set to NO. No space in directory path is allowed, no default. Currently we only supports output from alevin or bustools>.

Usage: scasa --postalign_dir [arguments]
Example: scasa --postalign_dir /mnt/PROJECT/ALIGNMENT_OUTPUT/

17. To Generate Xmatrix

Use --createxmatrix to generate a X-matrix reference file <STRING, optional, provide YES or NO to the option. X-matrix is a matrix containing the starting values EM-algorithm. We provide Xmatrix in our scasa package in prior so users do not need to generate a reference matrix on his/her own. Currently scasa provides two Xmatrix reference files, which supports alignment output from salmon alevin or alignment output from kallisto bus, which can be set by option --xmatrix. Default is set to NO>.

Usage: scasa --createxmatrix [arguments]

Arguments available (Default is NO): 
--createxmatrix YES
--createxmatrix NO

18. X-Matrix directory

Use --xmatrix to provide a Xmatrix reference file if --createxmatrix is set to NO <STRING, optional, two preset options: alevin or bustools. Give argument alevin to use Xmatrix for alevin alignment output or give bustools as argument to use Xmatrix for bustools alignment output data. If a directory is given, no space in directory path is allowed. If option is not set, default is set to use scasa prebuilt Xmatrix for alevin alignment output data>.

Usage: scasa --xmatrix [arguments]

Arguments available (Default is YES): 
--xmatrix alevin
--xmatrix bustools

Example 1: scasa --xmatrix alevin
Example 2: scasa --xmatrix /mnt/PROJECT/Self-Generated-Xmatrix-Using-Scasa.RData

An Example:

    scasa --fastq Sample_01_S1_L001_R1_001.fastq,Sample_01_S1_L001_R2_001.fastq \
          --ref <hg38_ref_file_path>  \
          --whitelist <test_dataset_whitelist_path> \
          --nthreads 4
### Now, you are ready to go!