Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shift is too slow #11

Open
aaron-gu opened this issue Feb 27, 2021 · 2 comments
Open

Shift is too slow #11

aaron-gu opened this issue Feb 27, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@aaron-gu
Copy link
Contributor

The performance of shift is really slow. I think it can be improved if regions are not modified in place, but are added as new regions and old regions are removed.

@aaron-gu aaron-gu reopened this Feb 28, 2021
@aaron-gu
Copy link
Contributor Author

Well, my change to creating new regions and dropping old regions didn't help improve the shift performance by much. I think the slow part about shift is in the Pandas Dataframe accession, when the code needs to get the chromosome, start, and end position at a certain row. Now imagine when you have a 50,000 region BED file and a high shift rate of 0.8, the code will have to access a lot of regions iteratively.

@aaron-gu
Copy link
Contributor Author

New idea:

  1. take a subset of the Dataframe, which will be the rows to modify
  2. use an apply function on the start and end columns to get shifted positions.
  3. Drop the old rows, and append this new Dataframe

@aaron-gu aaron-gu added the enhancement New feature or request label Mar 12, 2021
@aaron-gu aaron-gu closed this as completed Apr 2, 2021
@aaron-gu aaron-gu reopened this Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant