Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to using Seqera Containers #5832

Open
Tracked by #5828
ewels opened this issue Jun 19, 2024 · 7 comments
Open
Tracked by #5828

Migrate to using Seqera Containers #5832

ewels opened this issue Jun 19, 2024 · 7 comments

Comments

@ewels
Copy link
Member

ewels commented Jun 19, 2024

Seqera Containers is a new service to provide Docker + Singularity containers from any Conda / PyPI packages. Images are generated on demand and can include multiple packages.

Links for background:

We should strip out the BioContainers quay.io docker images + Galaxy server Singularity images and replace with image URIs from Seqera Containers.

We should use this change as an opportunity to rethink the optimal code structure for defining image names. This is currently under discussion. Group consensus can be posted here once achieved for broader community approval.

Milestone to track broad progress on this update: https://github.com/nf-core/modules/milestone/6

@subwaystation
Copy link
Contributor

It is still unclear to me, that images generated on demand should not harm reproducibility.

@subwaystation

This comment was marked as resolved.

@subwaystation
Copy link
Contributor

This will also increase the used CPU hours, right?
Also, it feels a bit like Seqera will have a monopoly of containers. I am aware that Seqera did an awesome job developing Nextflow open source. But moving all container logic to Seqera (and I don't know the details here, maybe I am uninformed) gives a weird taste.

@CharlotteAnne
Copy link
Contributor

What’s the purpose of enforcing this? Biocontainers are automatically generated for every bioconda package and get automatically generated upon bioconda software version bump. I’m not seeing the reason to then manually create a seqera container?

@ewels
Copy link
Member Author

ewels commented Jun 20, 2024

Hi both - thanks for your comments. You're right that this issue precedes some community discussion that we still need to have. That started with the recent two bytesize talks and resulting conversations on Slack, but we should still open it up to wider input.

To address your concerns:


It is still unclear to me, that images generated on demand should not harm reproducibility.

Wave generates images on demand, but Seqera Containers is a registry that sits behind Wave. The intention here is that the images are generated on demand by the developer when a package is updated - but then they are cached in the Seqera Containers registry. The image URIs will then be hardcoded into pipelines and the exact same container images will always be fetched by all users - just the same as they are today. We're also going to introduce conda-lock files (see #5835) so reproducibility should be even better than it is today.


This will also increase the used CPU hours, right?

No - Wave / Seqera Containers handles the build server side. As mentioned above, the generated images are stored in a registry and simply downloaded. So just as today, native images will be downloaded. No increase in CPU hours.


Also, it feels a bit like Seqera will have a monopoly of containers. But moving all container logic to Seqera gives a weird taste.

This one is more subjective. We will not make it a requirement to use Seqera Containers, just as we don't make it a requirement to use BioContainers today, so for me it feels about the same. We will keep the vast majority of build logic (eg. conda env files, conda lock files) on the nf-core side and will be free to reverse the decision at any point should we wish.


Biocontainers are automatically generated for every bioconda package and get automatically generated upon bioconda software version bump. I’m not seeing the reason to then manually create a seqera container?

One of the main reasons for adopting Seqera Containers is that it'll have even more automation and less manual work than the current setup. The process will roughly be:

  • Conda environment.yml files created or edited in a PR
    • Either manually, or with automatic version bumps on new BioConda releases by Renovate
  • New images requested from Seqera Containers via CI automation and pinned to module
  • (Future goal) If an automatic Renovate bump, the PR will be automatically merged if tests pass
  • Pipeline developers pull in module updates as currently, which will come with the new containers

Note that this process will also work for multi-package containers, which is not currently the case with BioContainers (mulled images). So it should represent a significantly easier workflow.

Note that although Seqera Containers has a web interface (https://seqera.io/containers/) it also works programmatically via CLI, API and Nextflow (eg. nextflow inspect). Check out the recent bytesize with YouTube recording to see all this in action.

BioContainers has been brilliant for nf-core, but there are several reasons to move away:

  • The API is down a lot, which breaks a lot of our CI testing and developer tooling.
  • Docker images are hosted on quay.io, which has also had reliability problems
  • The BioContainers base image is outdated, with old Docker Image Format v1 and manifest version 2. This causes problems with usage, also other problems with GitPod environments
  • Mulled containers are awkward and slow to make
  • We have limited / no control over image generation to solve any of the above issues

Wave and Seqera Containers have been built specifically for our community, based on our combined experience and needs. So hopefully we can mitigate / avoid these pitfalls.


I hope these responses help clear things up! Shout if you have any questions or concerns, and I'd recommend checking out the podcast and bytesize videos in the top comment as they go through how much of this works.

@JoseEspinosa
Copy link
Member

To provide a practical example of these points:

  • The BioContainers base image is outdated, with old Docker Image Format v1 and manifest version 2

and

  • We have limited / no control over image generation to solve any of the above issues

When opening a PR to chipseq I found the "docker image format v1" error, see here. To fix the error, I tried to bump to a newer version of the tool (phantompeakqualtools), but it turns out that the image's last available version was built in March 2021. Anyhow, I tried to update the image on the module and I found again the same error, see here.
In this case, the most straightforward fix would be to use the wave version of the package since this image will be compliant with the new docker specifications. Otherwise, we will have to wait for bioconda getting the images update, I am not sure whether there is a timeline established for this, or do a dirty hack as creating a mulled image with phantompeakqualtools and a random small package to trigger a new build.

@JoseEspinosa
Copy link
Member

As shown here, using wave images fix the issue above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants