Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mutable mount incremental uploads #197

Open
ransomw1c opened this issue Jun 5, 2019 · 0 comments
Open

mutable mount incremental uploads #197

ransomw1c opened this issue Jun 5, 2019 · 0 comments
Labels
filesystem datamon fs related issues P1 This is next and will move to P0 once planned.

Comments

@ransomw1c
Copy link
Contributor

current use of within Argo workflows is via bundle upload, so all the bytes in a bundle have to be shuttled into datamon at once. this issue is about decreasing the time between an Argo DAG node deciding to commit a bundle and getting a response indicating that its data has been safely stored. if the Argo workflows use bundle mount new to create new bundles, datamon can take a more eager approach to storing data.

an implementation detail of note here is the use of a secondary blob-store-like "staging area": files are to be hashed and stored to GCS as soon as their handles are released by the OS. in case files are appended to or otherwise modified before the final commit, and since the blob store is treated as immutable, data cannot be uploaded directly to the blob store upon file descriptor close without winding up with some hashes (and the corresponding storage costs) that aren't accessible via metadata: i.e. we'd wind up paying to store data that the application has no access to.

so while closing a file at the application level will always result in a new entry in the staging area, only the most recent entry will be recorded in the bundle metadata.

of course, the final destination for all file contents is the blob store. yet to facilitate accessing data between Argo DAG nodes as quickly as possible, the datamon may transfer bytes directly from the staging area when accessing a bundle that hasn't transferred all its blobs out of the staging area: it will always look for blobs in the blob store first, then fall back to the staging area (if a staging area is specified).

given such behavior of the end-user binary (i.e. the command-line interface), the only remaining component of the overall incremental upload design is some method to garbage collect the staging area -- to transfer blobs associated with bundles into the blob store and to discard unused blobs.

@ransomw1c ransomw1c added P1 This is next and will move to P0 once planned. filesystem datamon fs related issues labels Jun 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
filesystem datamon fs related issues P1 This is next and will move to P0 once planned.
Development

No branches or pull requests

1 participant