Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running with totalseqB #158

Open
naila53 opened this issue Jul 6, 2021 · 4 comments
Open

running with totalseqB #158

naila53 opened this issue Jul 6, 2021 · 4 comments

Comments

@naila53
Copy link

naila53 commented Jul 6, 2021

Hi,

thanks for devoloping this tool!
I'm trying to run citecount with 10xGenomics 5kPBMC public data as a test for the tool. in principle, it should work as i have specified a trim of 10. I also used the script you provided in another issue to convert the barcodes and get a compatible whitelist. I do recover all the cells in the whitelist, However, the tags counts are low when i compare the cellragner output and citecount ouptut for the protein data!

according to the refrence csv for the barcodes, there should be 10N bases, the protein barcode sequence and then another 9N arbitrary sequence.
can you please advise on how to run citecount properly with this dataset?

dataset fastqs here:
https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.2/5k_pbmc_protein_v3_nextgem

protein barcodes refrence:
https://cf.10xgenomics.com/samples/cell-exp/3.0.2/5k_pbmc_protein_v3_nextgem/5k_pbmc_protein_v3_nextgem_feature_ref.csv

my run info:
CITE-seq-Count Version: 1.4.3
Reads processed: 41007618
Percentage mapped: 95
Percentage unmapped: 5
Uncorrected cells: 1
Correction:
Cell barcodes collapsing threshold: 1
Cell barcodes corrected: 0
UMI collapsing threshold: 2
UMIs corrected: 665766
Run parameters:
Read1_paths: /data/raw/5kPBMC/fastq/big_R1.fastq.qz
Read2_paths:/data/raw/5kPBMC/fastq/big_R2.fastq.qz
Cell barcode:
First position: 1
Last position: 16
UMI barcode:
First position: 17
Last position: 28
Expected cells: 6794880
Tags max errors: 2
Start trim: 10

I combined R1 and R2 fastq files into one for each.
I specified high number of expected cells to get the empty droplets for normalization. Also, whitelist is derived from cellranger's raw_bc_feature matrix befroe filtering to make sure i get all the raw output.

for the same exact cells i compared maximum count per tag and as you can see, counts are very low in cite-count umi output!

Screen Shot 2021-07-06 at 2 26 31 PM

Screen Shot 2021-07-06 at 2 26 55 PM

@Hoohm
Copy link
Owner

Hoohm commented Jul 9, 2021

Hello @naila53
this comes from the fact that the cells have two barcodes. One for RNA and one for Protein data.

Version 1.5.0 will deal with this on the fly for users but as of today, I would recommend running the antibody data without a whitelist but with a -n_cells argument.

Then use the translation to map the barcodes properly.

@hisplan
Copy link

hisplan commented Aug 18, 2021

Correct me if I'm wrong, but as far as I know, one of the ouputs, barcodes.tsv.gz, from CITE-seq-Count also needs to be translated properly at the end, especially if you want to look at GEX and HTO together, which looks like what you're doing...

@Hoohm
Copy link
Owner

Hoohm commented Nov 21, 2021

Yes, it's been a thorn in my side for a while now.
I've worked on 1.5.0 today: https://github.com/Hoohm/CITE-seq-Count/tree/feature/cells_argument
I'm nearly done with automated translation on the fly by just selecting the chemistry!

@dmiyagi
Copy link

dmiyagi commented Apr 6, 2022

Hi @Hoohm if I am using TotalSeqB with just normal 10x V3, is the -n_cells still recommended? is 1.5.0 ready? Or is what you are saying only for multiomic? Do you happen to know if the nuclear pore antibodies TotalSeqB can be used with 10x multiomic (RNA/ATAC)? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants