Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP irlba method for sparse matrices #8

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

flying-sheep
Copy link

Needs docs and a decision if this is the way to proceed or if we need to make prep sparse-friendly.

Fixes #7

@hredestig
Copy link
Owner

Looks great so far! Old project this and sadly lacking unit-tests but will try it out over weekend.

It's very long time since I worked with sparse data but I guess those that do have good tools for doing so already and so wonder if making prep sparse-friendly really adds value to anyone(?) I like your current solution

@flying-sheep
Copy link
Author

These days there’s a lot of sparse single cell transcriptomics data, since current methods both produces huge amounts of data (e.g. 20k genes × 100k cells) but suffers from a lot of dropout (0 instead of small values).

Using PCA as a preprocessing step speeds up things and saves memory – if the PCA method can handle sparse data, that is.

@hredestig
Copy link
Owner

hredestig commented Aug 11, 2019

After looking at this more carefully I note this is more complicated than it might first seem. Calling prep like you suggested isn't good since the center and scale vectors are used later but are then not returned by prcomp_irlba. I fixed that (not entirely sure it's sparse aware but done the same way that irlba does it) in my irlba branch https://github.com/hredestig/pcaMethods/tree/irlba. But then realize that also fitted and predict must be made sparse aware :/

Wanna have look at that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support IRLBA method and use it as default for sparse matrices
2 participants