Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Long Term Strategy and Priorities] Migration of S3 Bucket Payments to Foundation #86

Open
refroni opened this issue Jun 5, 2023 · 25 comments

Comments

@refroni
Copy link
Contributor

refroni commented Jun 5, 2023

Listing out all options/possibilities that have been brought up or being explored for the long term improvements/resolutions/options below. Please add in anything that might be of interest to bring up/discuss/alternative options on the topic.
Discussion: https://discourse.nixos.org/t/the-nixos-foundations-call-to-action-s3-costs-require-community-support/28672
Thank you to joepie91 and raitobezarius for helping put this initial list together from the matrix/discourse discussions:

  1. Tahoe-LAFS (distributed storage, not S3-compatible out of the box, can support storage nodes of any size and complexity including low-trust, but is slow) + central gateway server(s) to bridge to Fastly
  2. Tahoe-LAFS but with narinfo stored directly on the central gateway server(s) for better performance
  3. Garage (distributed storage, S3-compatible, flexible in storage node size, but nodes must be reliable and trustworthy, and cluster configuration must stay reasonably stable)
  4. Minio (distributed storage-ish, S3 compatible, fairly rigid expectations in cluster layout, commercial so future FOSS status is questionable)
  5. Single Big Server (optionally with replica) serving up the entire cache ... as owned hardware, colocated at a datacenter, with or without outsourced hardware management ... as a rented dedicated server(s), so hardware issues will be taken care of by the datacenter ... supplied by one or more sponsors
  6. Running university/ISP mirror schemes like many other distros do (eg. MirrorBrain)
  7. Hosting (historical) content at an academic/research institution
  8. Deleting old items from the cache (irrecoverable)
  9. Deleting items from staging once no longer needed
  10. Aggressively deduplicating and/or compressing our storage
  11. Ceph (distributed filesystem for petabyte scale, S3 compatible, industry standard, non-trivial to operate)
@refroni refroni changed the title Migration of S3 Bucket Payments to Foundation [Long Term Resolutions and Priorities] [Long Term Resolutions and Priorities] Migration of S3 Bucket Payments to Foundation Jun 5, 2023
@refroni refroni changed the title [Long Term Resolutions and Priorities] Migration of S3 Bucket Payments to Foundation [Long Term Strategy and Priorities] Migration of S3 Bucket Payments to Foundation Jun 5, 2023
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/the-nixos-foundations-call-to-action-s3-costs-require-community-support/28672/96

@7c6f434c
Copy link
Member

7c6f434c commented Jun 5, 2023

Maybe for 8 there is 8.1: delete some of the old non-fixed-output paths, technically irrecoverable, but rebuildable in a «close enough» manner if desired.

@vcunat
Copy link
Member

vcunat commented Jun 5, 2023

8.2 I like the notion that if we had good bit-for-bit reproducibility (and we do in most packages I think), keeping the tiny *.narinfo files with signed hash could be quite beneficial, as anyone could supply (or reupload) the binary at any time later. Even in 3rd party community/distributed fashion, for those old and (wrongly-)assumed-unneeded builds.

@nh2
Copy link

nh2 commented Jun 5, 2023

Request for addition:

  • Self-host on Hetzner-dedicated+Ceph: $2.3/TB storage, $0.15/TB traffic, run by community infra team

I suggested this as a short-term option in #82 (comment) but it is of course also a long-term possibility, and an alternative to Tahoe-LAFS or Minio.

@RaitoBezarius
Copy link
Member

  1. Ceph (distributed filesystem for petabyte scale, S3 compatible, industry standard, non-trivial to operate)

@jtolio
Copy link

jtolio commented Jun 9, 2023

  1. Self-run Storj (open source, can be run on private instances in addition to the public service offering, we have a number of folks switching to us after tearing their hair out with Ceph. Works with self-run storage nodes or community contributed storage nodes. Does not require any involvement with cryptocurrency or blockchains.)

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/s3-update-and-recap-of-community-call/28942/1

@abathur
Copy link
Member

abathur commented Jun 9, 2023

I'm not sure if they belong here, in the short-term thread, or in a dedicated thread, but I imagine there's a category of things that might help reduce the egress charges (if we aren't moving somewhere that offers to cover them).

Some of them overlap with items already mentioned for general cost reduction (like deduplication, deleting items), but I imagine there are more. (#82 (comment) in the short-term thread mentions two.)

Maybe also:

  • ~most sources can probably still be fetched from the original location or a mirror and directly populated to a new cache. Just export what has disappeared or now has a hash conflict.
  • If there aren't many Very Large packages, it might be tractable to rebuild and directly populate the top N (+ anything that had to get built in the process) if they were fully reproducible?
  • Start duplicating data outside of S3 ~early by doing the above plus things like:
    • push an extra copy of new hydra builds
    • slip a node between S3 and fastly to double-dip egress fees we'd pay anyways

If a new cache was in place early and the extra latency of using two backends at fastly (only hitting S3 for things that aren't already in the new store) was tolerable, I guess the above could also be paired with deleting the corresponding paths from S3 to reduce storage costs there? (I guess this would also make it progressively easier/cheaper to understand what's left, perform deduplication, etc.)

@misuzu
Copy link

misuzu commented Jun 9, 2023

Running university/ISP mirror schemes like many other distros do (eg. MirrorBrain)
Deleting old items from the cache (irrecoverable)
Deleting items from staging once no longer needed
Aggressively deduplicating and/or compressing our storage

Do we even have the tools for doing stuff like this? If we have, are they documented? How easy are they to use?

@Nabile-Rahmani
Copy link

Could we extend the binary cache concept to include a streamlined peer-to-peer service that any Nix machine could opt into ?

Users could add services.nix-serve-p2p.{enable,max-upload-speed} to their configuration and their machine's store would become part of the cache substituters. The manual steps would be automated, and a self-hosted daemon would serve data without the need to configure a nginx proxy.

Automatic peer discovery has to be taken into account:

  • It could be implemented directly into the way Nix finds substituters, separately from the HTTP protocol.
  • The nix-serve-p2p service could signal its availability on start/stop (transmitting its address/port and trusted key) to a master server which would redirect to one of those random substituters at https://p2p-cache.nixos.org, and this would be added to nix.settings.substituters, though it certainly doesn't sound ideal to hit a centralised server, especially if there's a lot of trial and error to find a mirror that contains the store paths we want.

(I apologise if I said a bunch of useless nonsense.)

@RaitoBezarius
Copy link
Member

Could we extend the binary cache concept to include a streamlined peer-to-peer service that any Nix machine could opt into ?

Users could add services.nix-serve-p2p.{enable,max-upload-speed} to their configuration and their machine's store would become part of the cache substituters. The manual steps would be automated, and a self-hosted daemon would serve data without the need to configure a nginx proxy.

Automatic peer discovery has to be taken into account:

  • It could be implemented directly into the way Nix finds substituters, separately from the HTTP protocol.
  • The nix-serve-p2p service could signal its availability on start/stop (transmitting its address/port and trusted key) to a master server which would redirect to one of those random substituters at https://p2p-cache.nixos.org, and this would be added to nix.settings.substituters, though it certainly doesn't sound ideal to hit a centralised server, especially if there's a lot of trial and error to find a mirror that contains the store paths we want.

(I apologise if I said a bunch of useless nonsense.)

IMHO, this is a distribution question, not a storage question. And anyone is free to open a long term issue / exploration on better distribution layers for the Nix store :).

@Nabile-Rahmani
Copy link

Nabile-Rahmani commented Jun 10, 2023

IMHO, this is a distribution question, not a storage question. And anyone is free to open a long term issue / exploration on better distribution layers for the Nix store :).

Got it, though I guess if we were to go all in with community load sharing, storage could purge the binary caches (reproducible output, not "valuable" sources) and reduce costs on that front as a result.

This avenue would make sense if there is a large amount of seeders able to mostly take over the existing hosting solution, or its benefits outweigh the costs of S3/Fastly.

@RaitoBezarius
Copy link
Member

IMHO, this is a distribution question, not a storage question. And anyone is free to open a long term issue / exploration on better distribution layers for the Nix store :).

Got it, though I guess if we were to go all in with community load sharing, storage could purge the binary caches (reproducible output, not "valuable" sources) and reduce costs on that front as a result.

This avenue would make sense if there is a large amount of seeders able to mostly take over the existing hosting solution, or its benefits outweigh the costs of S3/Fastly.

I think a lot of people would argue there's no real incentive for community to maintain a certain a QoS and then we are back to the centralized problem anyway, so I would not focus my own time there too quickly before we get a centralized system that is sustainable.

@wmertens
Copy link

  1. Aggressively deduplicating and/or compressing our storage

Ideally, we'd store builds in large chunk stores after pre-processing them to move store paths out of the files (*). A frontend can pretend to be a NAR store.

Then, we'd make nix-store aware of this format, and instead of requesting NARs it would fetch the needed chunks and combine them. This way, less data is transferred to the client, and the frontend is no longer needed.

I believe this will reduce the amount of data stored and transferred by several-fold.


(*) stripping store paths, here's how I see that happening:

Given that rolling chunks for deduplication are still quite big, and I suspect that many files only change after building by their inclusion of nix store paths that changed, how about pre-processing all stored files as follows:

The idea is to move the nix store paths (only up to the first part) into a separate list, and remove them from the file. So then you would replace a file F with a tuple (Fx, L). Fx is the binary contents of the file with every sequence matching /nix/store/[^/]+ removed, and L is a list of (position, path) tuples, that store the removed paths.

This can be encoded in a streaming manner, and decoded in a streaming manner provided you have access to the tuples L.

L can be compressed better by making position be relative to the end of the last match, and making path an index of a list of found paths. So then we get Lrel being a list of (relPosition, pathIndex) tuples, and P a list of paths, so F becomes (Fx, Lrel, P).

This result should be way better at being chunked. I am hoping that many rebuilt files will have the same Fx and Lrel, and only P will differ.

The /nix/store/ part should be configurable during encoding.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-s3-short-term-resolution/29413/1

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-s3-short-term-resolution/29413/3

@chkno
Copy link
Member

chkno commented Jun 23, 2023

I think a lot of people would argue there's no real incentive for community to maintain a certain a QoS

When clients fetch from multiple redundant sources in parallel, slow or unavailable sources have very little impact on performance; Bittorrent solved this problem twenty years ago. See also The Tail at Scale.

Could we extend the binary cache concept to include a streamlined peer-to-peer service that any Nix machine could opt into ?

I think a distributed storage model like this is the best path to a long-term-sustainable, no-monetary-cost solution. There is so much good will in the general user community. If it's as simple as responding to an "uncomment this line to help support this community" note in the nixos-generate-config default config, I think we'd easily get distributed storage serving capacity adequate to handle the binary cache. And if I'm wrong about this and we only get, say 50% of the needed capacity from volunteers, that's a 50% cost reduction on whatever service picks up the rest of the load.

@Nabile-Rahmani
Copy link

I think a distributed storage model like this is the best path to a long-term-sustainable, no-monetary-cost solution. There is so much good will in the general user community. If it's as simple as responding to an "uncomment this line to help support this community" note in the nixos-generate-config default config, I think we'd easily get distributed storage serving capacity adequate to handle the binary cache. And if I'm wrong about this and we only get, say 50% of the needed capacity from volunteers, that's a 50% cost reduction on whatever service picks up the rest of the load.

The only risk I see in participating is potentially leaking private/secret derivation data, is it not ?

Currently, it looks like servers help clients know in advance if cached results exist by exposing a listing of store paths (curl https://releases.nixos.org/nixos/23.05/nixos-23.05.1272.ecb441f2206/store-paths.xz | xzless).

But in a peer-to-peer system, could leechers instead query on-demand paths they know about (i.e. public packages) to seeders to reduce attacks ?

Attackers would have to bruteforce the private hash + derivation name & version, but I don't know how risky this still is.

Additionally, the service implementation could rate limit too many failed queries, but only if they're not part of public store-paths from registered substituters since we don't want to rate limit legitimate queries.

@zimbatm
Copy link
Member

zimbatm commented Jun 24, 2023

What I would suggest is to start a P2P Nix cache working group and discuss the implementation details there. The best way to do this is to announce it on Discourse and gather interested members. And then start implementing a prototype to demonstrate the feasibility.

What's nice is that we have multiple concurrent efforts, and all of them are complementary AFAIK. And the exact shape of the new solution will mostly depend on your participation.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/peer-to-peer-binary-cache-rfc-working-group-poll/29568/1

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/peer-to-peer-binary-cache-rfc-working-group-poll/29568/5

chkno referenced this issue in NixOS/nixpkgs Jun 24, 2023
@7c6f434c
Copy link
Member

And if I'm wrong about this and we only get, say 50% of the needed capacity from volunteers, that's a 50% cost reduction on whatever service picks up the rest of the load.

The question is not so much about getting enough capacity, it is about having a good understanding of availability projections, and also of people having or not having time to perform active maintenance (like support software version updates) on their chunks of distributed storage.

See also: OfBorg community-provided builders (where Graham complained about people having time to update the builder code not being too predictable — the updates were smooth BTW, and eventually it all ended up with a centrally-managed infrastructure)

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-s3-short-term-resolution/29413/14

@endgame
Copy link

endgame commented Jun 28, 2023

Observation: one of the big challenges with moving off of AWS is the egress costs for the S3 bucket, whether via outbound bandwidth or Snow Family products. If we had a job somewhere that downloaded only items that were in the fastly cache, we'd be able to accumulate a subset of built derivations without incurring egress charges. This seems like it'd converge to a decent subset of the files currently in S3, and more importantly it will put an upper bound on the amount of "old stuff" we'd have to egress.

@rjpcasalino
Copy link

rjpcasalino commented Nov 30, 2023

I'm trying to keep track of this and dropping this here for others; last meeting was on 11-21-2023 https://pad.lassul.us/nixos-cache-gc# - let me know if this is the wrong place

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests