Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where do we store lesson media? #15

Open
garrison opened this issue Jul 21, 2020 · 1 comment
Open

Where do we store lesson media? #15

garrison opened this issue Jul 21, 2020 · 1 comment
Labels
lesson structures TOML data structures representing the lessons media storage Issues dealing with the storage of lesson media

Comments

@garrison
Copy link
Contributor

garrison commented Jul 21, 2020

Currently, 5 gigabytes is required to store all lesson media on the site. This is a large amount of data. If we were to add it directly to this repository, it would significantly slow down cloning, and there would be no way of removing this later aside from rewriting git history. For this reason, we have decided to store lesson media outside this repository for now.

Current solution: a separate git repository

Currently, all lesson media is stored in a separate repository at https://github.com/wikiotics/wikiotics.github.io. 5 GB is still well below github's limit of 100 GB per repository. However, the limit of a GitHub Pages site is 10 GB, so if were to cross this threshold we would have to find another web host.

More important, having lesson media in a separate repository means that all changes to lesson media must be coordinated across two repositories. This is just fine if our goal is to archive the site, but it provides a major barrier to improving and developing additional content.

Additionally, it is important to note that this repository contains both the original media and any other files derived from it. For pictures, this includes thumbnails at smaller sizes. For audio files, this includes conversions to different formats, as well as concatenated "podcast" files.

One possible solution: git "large file storage" (LFS)

One possible solution is to use git large file storage. This would mean that the large files are hosted by a LFS provider outside the repository. GitHub allows 1 GB of storage and 1GB per month of free bandwidth, so their free plan already does not meet our needs.

Alternatively, the LFS host could be another provider, such as Netlify. We might be able to get a free Netlify plan under their open source plan policy, but doing so would mean linking to Netlify from our main page. With this, we could also use their proprietary infrastructure for image thumbnailing, so we would not have to explicitly store thumbnails, but this comes with all the downsides of relying on proprietary web services. I don't think they have proprietary infrastructure for dealing with converting and concatenating audio files, though. Also, it is unclear to me whether random people contributing to our repository will be able to upload their "large files" directly to Netlify as part of making a pull request. Somehow I doubt this would be straightforward.

Another possible solution: IPFS

Another possible solution is to use IPFS, which is a distributed global namespace of content-addressable files. Think of it as the original vision of how we stored media in Ductus (urn:sha386:[...]) with a distributed network like BitTorrent. Already, I have "pinned" all lesson media to IPFS and created a preview that hosts all lesson media through Cloudflare's IPFS gateway (#6).

However, given acceptable performance and reliability, this could be taken a few steps further. Each lesson's TOML structure could contain a hash that resolves to an IPFS directory which contains all the media elements for a lesson. Then, a pull request could simply update this hash to a new one, thus solving the problem of coordination with a second repository.

Pinning infrastructure

Because of IPFS's architecture, content on the network can vanish at any time unless it is "pinned" by at least one computer connected to the network. So in addition to a contributor to a lesson providing a hash, we would need to make sure that the hash remains accessible.

One could imagine a virtual server running the IPFS daemon and configured to respond to events via the github API. When something is pushed to any branch, or when a pull request is opened, one could imagine this machine running a simple script (e.g. grep) that outputs the full list of hashes relevant for that version of the site. The daemon could then pin the relevant hashes by adding them to IPFS's "mutable file system". Anytime a pull request is updated, this pin could be updated, and when a pull request is closed, this pin could be freed.

What I describe is very general infrastructure that could be useful to many projects using IPFS. But I don't know of anything like it, at the moment. The "pinning services" in existence right now are pretty rudimentary.

Working with transformed elements

Above, I said that a lesson could contain a hash to a directory which contains all its media elements. However, we would have to decide: does this directory contain only the original media files, or does it also contain transformed elements (such as thumbnails and conversions)? If it contains only the original files, we would need infrastructure somewhere to deal with the conversions. If the directory is meant to contain everything, this puts additional burdens on the human who is editing the TOML or any potential future editing interface.

@garrison garrison added media storage Issues dealing with the storage of lesson media lesson structures TOML data structures representing the lessons labels Jul 21, 2020
@garrison
Copy link
Contributor Author

Looks like GitLab's LFS option might meet our needs, even on their free plan, but it's somewhat difficult to tell for sure from their documentation. https://gitlab.com/gitlab-com/www-gitlab-com/-/issues/1003

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lesson structures TOML data structures representing the lessons media storage Issues dealing with the storage of lesson media
Projects
None yet
Development

No branches or pull requests

1 participant