Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail in docker: Total 0 white-listed Barcodes #11

Open
LinAU opened this issue Mar 18, 2024 · 9 comments
Open

fail in docker: Total 0 white-listed Barcodes #11

LinAU opened this issue Mar 18, 2024 · 9 comments

Comments

@LinAU
Copy link

LinAU commented Mar 18, 2024

Thanks for your work.
Everything works well with the demo testing data. However, when I have tested scRNAseq data from 10xgenomics 3' v2. Although the first part runs well. all reads were filtered out due to no white listed BC as lalevin.log file below:

[2024-03-18 20:40:50.427] [alevinLog] [info] Found 70629 transcripts(+0 decoys, +0 short and +0 duplicate names in the index)
[2024-03-18 20:40:50.466] [alevinLog] [info] Filled with 70629 txp to gene entries
[2024-03-18 20:40:50.472] [alevinLog] [info] Found all transcripts to gene mappings
[2024-03-18 20:40:50.479] [alevinLog] [info] Processing barcodes files (if Present)

[2024-03-18 20:54:52.178] [alevinLog] [info] Done barcode density calculation.
[2024-03-18 20:54:52.178] [alevinLog] [info] # Barcodes Used: �[32m316403175�[0m / �[31m316403175�[0m.
[2024-03-18 20:54:52.178] [alevinLog] [info] Done importing white-list Barcodes
[2024-03-18 20:54:52.178] [alevinLog] [warning] Skipping 200 Barcodes as no read was mapped
[2024-03-18 20:54:52.178] [alevinLog] [info] Total 0 white-listed Barcodes
[2024-03-18 20:54:52.178] [alevinLog] [info] Sorting and dumping raw barcodes
[2024-03-18 20:54:58.507] [alevinLog] [warning] Total 100% reads will be thrown away because of noisy Cellular barcodes.
[2024-03-18 20:54:58.507] [alevinLog] [info] Done populating Z matrix
[2024-03-18 20:54:58.507] [alevinLog] [warning] 0 Whitelisted Barcodes with 0 frequency
[2024-03-18 20:54:58.507] [alevinLog] [info] Total 0 CB got sequence corrected
[2024-03-18 20:54:58.507] [alevinLog] [info] Done indexing Barcodes
[2024-03-18 20:54:58.507] [alevinLog] [info] Total Unique barcodes found: 892257
[2024-03-18 20:54:58.507] [alevinLog] [info] Used Barcodes except Whitelist: 0
I used the whitelist file from cellranger, and here it is my docker parameters:
#!/bin/bash

main parameters

INPUT="/mnt/d/data_analysis/Analysis_2023_NAFLD_sc/9.isoform.analysis_v2/5.NAFLD.docker/1.fq/GSM4041150"
OUTPUT="/mnt/d/data_analysis/Analysis_2023_NAFLD_sc/9.isoform.analysis_v2/4.test.docker/ScasaOut_v10"
ref="/mnt/d/data_analysis/Analysis_2023_NAFLD_sc/9.isoform.analysis_v2/4.test.docker/refMrna.fa.gz"
index="YES" #when index="YES", scasa will index the reference fasta file and write in index_dir. This index_dir cam be reused for other run
#index_dir="/path/to/PreBuilt_REF_INDEX" #when index="NO", scasa will use directly the reference indexing in index_dir
nthreads=12
tech="10xv2"
whitelist="/mnt/d/data_analysis/Analysis_2023_NAFLD_sc/9.isoform.analysis_v2/4.test.docker/Test_Dataset/737K-august-2016.txt"
cellthreshold="none"
project="My_Project"

other parameters

samplesheet="NULL"
mapper="salmon_alevin"
xmatrix="alevin"
postalign_dir=""
createxmatrix="NO"

May I ask for your help? Thanks a lot!
KR
Lin

@LinAU
Copy link
Author

LinAU commented Mar 18, 2024

BTW, I have checked the top10 BC in raw_cb_frequency.txt. I could find them all in my whitelist file manually.

KR,
Lin

@LinAU
Copy link
Author

LinAU commented Mar 19, 2024

Here are the top BC in raw_cb_frequency.txt in folder SCASA_My_Project_20240318203735\1ALIGN\Sample_S1_L001_alignout\alevin

CTGTGCTTCGCCAGCA 1252648
AGTGTCAAGCAGCCTC 1155715
ACTGAGTCAATGGTCT 1036019
ATTGGACAGTTAACGA 1009156
GGCTCGACAGCCTATA 908019
GGGACCTCAATGGAAT 887484
TTCCCAGTCTTCAACT 788127
TGCCCTATCAGCCTAA 757341
CTCAGAATCATCGCTC 696810
GTAGGCCGTCCATGAT 690158
CAGGTGCAGACAAGCC 681130
CAGAATCTCGTCCAGG 675500

@nghiavtr
Copy link
Collaborator

Hi @LinAU ,

Thank you for using Scasa.
From your log file and information you provide, it is hard to know the exact issue.
Could you send us a sample of your data which produced the issue? Then we can investigate the issue in details.

Best,
Nghia

@LinAU
Copy link
Author

LinAU commented Mar 21, 2024 via email

@nghiavtr
Copy link
Collaborator

nghiavtr commented Mar 22, 2024

How big the files are? If they are not too big, Onedrive is ok for me.
Please send the link via my email

Nghia

@LinAU
Copy link
Author

LinAU commented Mar 22, 2024 via email

@nghiavtr
Copy link
Collaborator

Tks, I have been able to access the files. I will get back to you as soon as I have some news

N

@nghiavtr
Copy link
Collaborator

Hi @LinAU
I have carefully investigated the issue with your input files. Later on I found that the issue was not from Scasa but raised by Alevin.

The problem is that Alevin does not work for the whitelist from 737K-august-2016.txt, as it describes in their webpage (https://salmon.readthedocs.io/en/latest/alevin.html#whitelist):

Not 10x 737k whitelist
This flag does not use the technologically defined whitelisted cellular barcodes provided by 10x, instead it’s a per experiment level list of subsampled cellular barcodes that need to quantified for consistency with other tools for example an input would be a file generated by cellranger with the name barcodes.tsv (uncompressed).

So, following the instruction, you might run cellranger first with your data to get barcodes.tsv, then use it as the input for the whitelist.

Best,
Nghia

@LinAU
Copy link
Author

LinAU commented Mar 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants