Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect gene.annotation processing for sciPlex3 #7

Open
FarzanT opened this issue May 1, 2023 · 0 comments
Open

Incorrect gene.annotation processing for sciPlex3 #7

FarzanT opened this issue May 1, 2023 · 0 comments
Assignees
Labels
bug Something isn't working database

Comments

@FarzanT
Copy link

FarzanT commented May 1, 2023

Using header=None reads the first line (which is a header) as a row in the gene annotation dataframe. This subsequently affects the dimensions of the whole dataset.

https://github.com/sanderlab/scPerturb/blob/fac49ee392f6873b50fad27550e82f6507158834/dataset_processing/SrivatsanTrapnell2020.py#L77C19-L78

Currently, sciPlex3's var looks like this:

srivatsan.var
Out[4]: 
                     ensembl_id   ncounts  ncells
gene_symbol                                      
nan          id gene_short_name   26582.0   23228
nan:1           ENSG00000000003      35.0      33
nan:2           ENSG00000000005  163109.0  116153

So all the genes are shifted somehow. This can drastically affect downstream tasks since it's no longer clear what genes are expressed.

@stefanpeidli stefanpeidli self-assigned this Dec 22, 2023
@stefanpeidli stefanpeidli added bug Something isn't working database labels Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working database
Projects
None yet
Development

No branches or pull requests

2 participants