Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle Rfam seeds into cms tarball? #124

Open
afg1 opened this issue Nov 9, 2023 · 2 comments · Fixed by #132 · May be fixed by #112
Open

Bundle Rfam seeds into cms tarball? #124

afg1 opened this issue Nov 9, 2023 · 2 comments · Fixed by #132 · May be fixed by #112
Labels
enhancement New feature or request

Comments

@afg1
Copy link
Contributor

afg1 commented Nov 9, 2023

As part of the RNAcentral pipeline, we run R2DT in a nextflow process that makes use of a singularity container converted from the docker container built in this repo.

We currently set up for execution by downloading and expanding cms.tar.gz and bind-mounting the resulting folder to the correct place within the R2DT container. This worked in previous versions of R2DT (<1.3).

Now that R2DT can transfer pseudoknots, it needs access to the Rfam seed files. As these are not contained in the prepared data, R2DT attempts to download them. However, the directory /rna/r2dt/data/rfam/ is not writeable, since it is within the singularity container and not bind-mounted.

Would it be possible to include the Rfam seeds in the precomputed library?

@afg1 afg1 added the enhancement New feature or request label Nov 9, 2023
@AntonPetrov
Copy link
Member

@afg1 Thanks for raising the issue Andrew!

Bundling Rfam seeds with the downloadable files is a good solution as it will increase the size only marginally. However, this is a potentially breaking change because some people might have already downloaded the precomputed library and would not know that they need to download a new file. 🤔

As a workaround, R2DT could first check in /rna/r2dt/data/cms and if the Rfam seed files are not there, R2DT would try to download them as it does now, but if you have a new precomputed library the files would be present and no download will be needed.

I can look into it in the next couple of days and make a new release.

@AntonPetrov
Copy link
Member

@afg1 also reported another issue related to Rfam and network requests:

requests.exceptions.InvalidSchema: No connection adapters were found for 'ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/database_files/family.txt.gz'

All Rfam network requests should be eliminated and Rfam files should be bundled into the precomputed library.

This was referenced Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants