running with totalseqB #158

naila53 · 2021-07-06T10:33:56Z

Hi,

thanks for devoloping this tool!
I'm trying to run citecount with 10xGenomics 5kPBMC public data as a test for the tool. in principle, it should work as i have specified a trim of 10. I also used the script you provided in another issue to convert the barcodes and get a compatible whitelist. I do recover all the cells in the whitelist, However, the tags counts are low when i compare the cellragner output and citecount ouptut for the protein data!

according to the refrence csv for the barcodes, there should be 10N bases, the protein barcode sequence and then another 9N arbitrary sequence.
can you please advise on how to run citecount properly with this dataset?

dataset fastqs here:
https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.2/5k_pbmc_protein_v3_nextgem

protein barcodes refrence:
https://cf.10xgenomics.com/samples/cell-exp/3.0.2/5k_pbmc_protein_v3_nextgem/5k_pbmc_protein_v3_nextgem_feature_ref.csv

my run info:
CITE-seq-Count Version: 1.4.3
Reads processed: 41007618
Percentage mapped: 95
Percentage unmapped: 5
Uncorrected cells: 1
Correction:
Cell barcodes collapsing threshold: 1
Cell barcodes corrected: 0
UMI collapsing threshold: 2
UMIs corrected: 665766
Run parameters:
Read1_paths: /data/raw/5kPBMC/fastq/big_R1.fastq.qz
Read2_paths:/data/raw/5kPBMC/fastq/big_R2.fastq.qz
Cell barcode:
First position: 1
Last position: 16
UMI barcode:
First position: 17
Last position: 28
Expected cells: 6794880
Tags max errors: 2
Start trim: 10

I combined R1 and R2 fastq files into one for each.
I specified high number of expected cells to get the empty droplets for normalization. Also, whitelist is derived from cellranger's raw_bc_feature matrix befroe filtering to make sure i get all the raw output.

for the same exact cells i compared maximum count per tag and as you can see, counts are very low in cite-count umi output!

Hoohm · 2021-07-09T16:44:38Z

Hello @naila53
this comes from the fact that the cells have two barcodes. One for RNA and one for Protein data.

Version 1.5.0 will deal with this on the fly for users but as of today, I would recommend running the antibody data without a whitelist but with a -n_cells argument.

Then use the translation to map the barcodes properly.

hisplan · 2021-08-18T17:50:14Z

Correct me if I'm wrong, but as far as I know, one of the ouputs, barcodes.tsv.gz, from CITE-seq-Count also needs to be translated properly at the end, especially if you want to look at GEX and HTO together, which looks like what you're doing...

Hoohm · 2021-11-21T18:53:24Z

Yes, it's been a thorn in my side for a while now.
I've worked on 1.5.0 today: https://github.com/Hoohm/CITE-seq-Count/tree/feature/cells_argument
I'm nearly done with automated translation on the fly by just selecting the chemistry!

dmiyagi · 2022-04-06T18:54:00Z

Hi @Hoohm if I am using TotalSeqB with just normal 10x V3, is the -n_cells still recommended? is 1.5.0 ready? Or is what you are saying only for multiomic? Do you happen to know if the nuclear pore antibodies TotalSeqB can be used with 10x multiomic (RNA/ATAC)? Thank you!

naila53 mentioned this issue Feb 25, 2023

method keeps looking for whitelist when not provided. #176

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running with totalseqB #158

running with totalseqB #158

naila53 commented Jul 6, 2021 •

edited

Loading

Hoohm commented Jul 9, 2021

hisplan commented Aug 18, 2021

Hoohm commented Nov 21, 2021

dmiyagi commented Apr 6, 2022

running with totalseqB #158

running with totalseqB #158

Comments

naila53 commented Jul 6, 2021 • edited Loading

Hoohm commented Jul 9, 2021

hisplan commented Aug 18, 2021

Hoohm commented Nov 21, 2021

dmiyagi commented Apr 6, 2022

naila53 commented Jul 6, 2021 •

edited

Loading