Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Percentage unmapped: 100 #167

Open
sopenaml opened this issue Apr 5, 2022 · 18 comments
Open

Percentage unmapped: 100 #167

sopenaml opened this issue Apr 5, 2022 · 18 comments

Comments

@sopenaml
Copy link

sopenaml commented Apr 5, 2022

Hi, sorry for asking again, but after going through the previous answers I have not found the answer that solves my problem. I have data that contains: HTOs, ADTs, 5' gex and VDJ. I'm trying to use CITE-seq-count to get matrix counts for the HTOs and ADTs, in order to demultiplex my data. To do so I've opted for doing to separate runs for HTOs and ADTs for which I provide the tags.csv

ACCCACCAGTAAGAC,hashtag1
CTTGCCGCATGTCAT,hashtag3
AAAGCATTCTTCACG,hashtag4

or cellsurface.barcodes.csv

GACCCGGTGTCATTT,CD80
CACATCGTTTGTGTA,CD95
CACTCCTTGTAGTCA,PD-L2

All my abs are TotalSeq-C, and upon grep the tags I can see that they start in position 11: so I've added --start-trim 10

tag

When I run the script with the tags.csv file I get 96% mapped but when I do with the cellsurface.barcodes.csv, I get 100% unmapped despite I can grep the tags in R2. Would you know why the ADT tags are not mapped? Since the libraries should contain both cell surface barcodes and HTOs I would expect the mapping to be split between both or am I wrong? Can anyone help please? Thank you very much.
Miriam

my running commands

CITE-seq-Count -R1 BEN4535A3_R1_001.fastq.gz -R2 BEN4535A3_R2_001.fastq.gz --tags cellsurface.barcodes.csv --cell_barcode_first_base 1 --cell_barcode_last_base 16 --umi_first_base 17 --umi_last_base 26 -cells 10000 --start-trim 10 --threads 24 -o citeseqcount/BEN4535A3.adt


CITE-seq-Count -R1 BEN4535A3_R1_001.fastq.gz -R2 BEN4535A3_R2_001.fastq.gz --tags tags.csv --cell_barcode_first_base 1 --cell_barcode_last_base 16 --umi_first_base 17 --umi_last_base 26 -cells 10000 --start-trim 10 --threads 24 -o citeseqcount/BEN4535A3.adt

@dianitasusilo
Copy link

Hi I have same problem here.
How did you extract that sequence from fastq and determine the start trim point?
do we use R2 fastq for it?
Thank you

@sopenaml
Copy link
Author

sopenaml commented Apr 12, 2022 via email

@dianitasusilo
Copy link

Thanks for your quick reply but in my case they are not in the same position...

Here I looked for my hashtag nucleotide in the raw R2 fastq files, and it turned out like this.
image

Or did I use wrong fastq file as input?

@sopenaml
Copy link
Author

sopenaml commented Apr 13, 2022 via email

@Hoohm
Copy link
Owner

Hoohm commented Apr 13, 2022

Happy to see fellow users help each other.

@dianitasusilo maybe a sliding window approach might help yes: --sliding-window is the option you are looking for.

@sopenaml Could you check out of your barcodes in R1 are overlapping? It might be a mapping between barcodes similar to totalSeqB.

@sopenaml
Copy link
Author

sopenaml commented Apr 13, 2022 via email

@Hoohm
Copy link
Owner

Hoohm commented Apr 13, 2022 via email

@sopenaml
Copy link
Author

Hi Patrick,

I've checked my cite-seq ab barcodes agains R1 and I don't see any matches. If I check my cell hashing barcodes, there's one that finds few (7 ) matches on R1, but the rest none. So it's not that my barcodes are overlapping with cell barcodes. Any other ideas of what the problem may be? Thanks

@drlaurenwasson
Copy link

Hi,
I have the same problem, where I can grep my HTO out of read 2 but still get 100% reads unmapped. I am running 1.4.5 using Python 3.9. Do we know what the solution to this issue is?

@sopenaml
Copy link
Author

sopenaml commented May 11, 2022 via email

@Hoohm
Copy link
Owner

Hoohm commented May 15, 2022

Hey @sopenaml,
I need to rephrase what I mentioned earlier.
Depending on what chemistry kit you used, it's possible that your R1 barcodes(cell barcodes) linked to one library (GEX, VDJ, ADTs) are linked to one cell barcode and your HTOs are linked to another cell barcode in the same cell.
This means that when you do your overlap, it's going to be very low because the barcodes need to be translated.

Here is the translation matrix.
https://github.com/10XGenomics/cellranger/blob/master/lib/python/cellranger/barcodes/translation/3M-february-2018.txt.gz

Is it a bit clearer?

@stepanovacz
Copy link

Hi everyone,

I am running into an issue, where I do have about ~35% unmapped reads. Is there a way to bring that number up? Grepping the R2 file, shows that start trim needs to be --start-trim 0
Grep_R2
I used 10xv3
Attached are the tags
tags.csv

Here is the is what I run to get such output:

CITE-seq-Count -T ${numThreads} \ -R1 ${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L001_R1_001.fastq.gz,${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L002_R1_001.fastq.gz,${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L003_R1_001.fastq.gz,${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L004_R1_001.fastq.gz \ -R2 ${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L001_R2_001.fastq.gz,${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L002_R2_001.fastq.gz,${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L003_R2_001.fastq.gz,${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L004_R2_001.fastq.gz \ -t tags.csv -cbf 1 -cbl 16 -umif 17 -umil 28 -cells 5000 --sliding-window --start-trim 0 \ -o /project/CiteSeq7_sham \

Here is the the output I get:
Date: 2022-07-13
Running time: 6.0 minutes, 37.25 seconds
CITE-seq-Count Version: 1.4.5
Reads processed: 3575668
Percentage mapped: 64
Percentage unmapped: 36
Uncorrected cells: 0
Correction:
Cell barcodes collapsing threshold: 1
Cell barcodes corrected: 16075
UMI collapsing threshold: 2
UMIs corrected: 12836
Run parameters:
Read1_paths: _S17_L004_R1_001.fastq.gz
Read2_paths: _S17_L004_R2_001.fastq.gz
Cell barcode:
First position: 1
Last position: 16
UMI barcode:
First position: 17
Last position: 28
Expected cells: 5000
Tags max errors: 2
Start trim: 0

Thank you in advance!

@stepanovacz
Copy link

I am able to bring the number of mapped reads above 90, by setting --max-error 6 or higher. However, I do not think that is it a good solution as I get plenty of doublets and negatives Doublet 1868 Negative 35 Singlet 394 . Any idea what else I can do?
Thank you!

@Hoohm
Copy link
Owner

Hoohm commented Aug 10, 2022

Would you be able to send me a sample of your data so that I can run it and have a look?

@stepanovacz
Copy link

stepanovacz commented Aug 10, 2022 via email

@Hoohm
Copy link
Owner

Hoohm commented Aug 13, 2022

I asked for access

@Hoohm
Copy link
Owner

Hoohm commented Aug 13, 2022

results/unmapped.csv 
tag,count
AAGCAGTGGTATCAA,38893
GGGGGGGGGGGGGGG,20759
CCGTACCTCAAAAAA,17644
GCAGTGGTATCAACG,10879
TTCCTGCCAAAAAAA,5855
GTGGTATCAACGCAG,5442
AGCAGTGGTATCAAC,4087
CCGTACCCCAAAAAA,3959
CAGTGGTATCAACGC,3894

It seems pretty reasonable from what I see in the first sample. The unmapped.csv gives you the top sequences that are not mapping. 22% of polyG, means no sequence there or could not be read

Why do you need to get higher?

I want to make sure about the translation issue. Do you have a high overlap between the cells from the RNA side and the HTO?

@leeanapeters
Copy link

Hi, is the translation matrix used only with v3 chemistry? I seem to have a similar problem where grep doesnt return barcodes in my fastq R2 for which I know exist in my data after cellranger. I used the 5' v2 chemistry with gex, vdj and feature barcode libs.

Thanks

Leeana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants