-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review of the fuse mount chain of command #368
Conversation
67fe717
to
428d341
Compare
|
**Tracing** * storage/gcs: added logging and debug traces * cmd(CLI): refactored logger injection, added more commands subject to log level setting * core: enabled fuse internal debug logger (experimental, since it is quite verbose) * cafs: injected logger **Tuning/user experimentation** * cmd(CLI): added new options to tune fuse mount cache size and prefetch behavior (e.g. --cache-size 100MB, --prefetch 3) * core: bundle conveys new options down to cafs factory **Refactoring & readability** * core: refactored NewXXXFS to suppress redundant staging path and logging (following initial idea from PR#354) * core: refactored RO mount for readability * cafs: more godoc, resolved question marks left in code * cafs: refactoring of struct factories, chunkReader and writer, more options to control LRU cache and prefetch behavior * cafs: refactored hasher ("helpers") for readability **Testing** * cmd(CLI): re-enacted fuse CLI e2e testing (runs on CI) * cafs: added more unit tests on freelists, hasher and ReadAt with concurrency testing **Fixes** * core: fixed spurious warnings on fs (extended attributes, flush on RO mount) * cafs: enabled evicted LRU buffers to be relinquished to freelist * cafs: fixed concurrency issues with corrupted relinquished buffers * cafs: fixed leaf hash verification for ReadAt **Performance improvements** * cafs: set up cache for resolved leaf keys from root * cafs: more efficient prefetching, with much less waste of work and prefetching blobs with tunable look-ahead **Memory footprint reduction** * core: relinquished unused RadixTree after ro fs construction * cafs: adapted freelist buffers to leaf size, not systematically max size **CI changes** * cafs: cafs tests are memory hungry: run them in CI separately with different settings (less parallelism, no race) * hack/fuse_demo: adapted fuse mount detection, since we slightly changed the mount signature (see below) **Other enhancements** * core: set fuse moumt subtype to "datamon" and "datamon-mutable", so it appears as such in the output of the "mount" UNIX command Signed-off-by: Frederic BIDON <[email protected]>
4743f7c
to
8eed79d
Compare
Signed-off-by: Frederic BIDON <[email protected]>
r.fetchingLatch.Lock() | ||
defer r.fetchingLatch.Unlock() | ||
|
||
if _, beingFetched := r.fetching[index+1]; !beingFetched { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just have a hash map of the keys in flight, start a goroutine that waits for the key to be removed from the map and be present in the cache. If we implement a different prefetch scheme the logic will still hold true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, but came to recursive decision making about prefetching, reassessing the current situation every time the process requests a new leaf.
Indeed, this is a moving target: new keys to fetch ahead are discovered as the reading goes: every time a leaf is cached, I jump next, and assess the new next fetch-ahead list.
Moreover, the cache cannot guarantee to hold buffers fetched ahead: sometimes (a few as I could, and only when we have much more readers than available cache entries) the prefetch work is wasted (must more uncommon event than in the previous implementation, though) and the leaft must be read again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so basically, the doPrefetch is here to:
i) decide whether to prefetch or not
ii) manage the current list of leafs being prefetched
iii) use the buffer cache as the referee on whether to consider work done or declare it wasted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow-up note: open discussion thread
Signed-off-by: Frederic BIDON <[email protected]>
meta-review: where, as a project, do we stand on the "one idea per commit" practice advocated by some? my first impression upon reading the description and noticing the size of the diff is that this could probably be separated out into a few merge requests? i know this is extra effort on the person preparing the requests, and we've seen that needing to rebase several commits for merge all at once can lead to errors. yet i still suppose that the benefits to the project that would result from getting this landed as a sequence of commits in the history of |
I have to go over tests but the changes overall are in the right direction. Will continue to review. |
@@ -78,10 +78,14 @@ func (a AuthMock) Principal(_ string) (model.Contributor, error) { | |||
}, nil | |||
} | |||
|
|||
func testContext() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cruft?
of the physical layout conventions so far, the |
i'm going to have to split this review up over a couple of days.. .. have given a reasonably diligent first read to everything except cafs and fuse changes, and i have read the cafs changes in part, leaving a few comments. the
in particular is something i get the gist of – namely,
and is a desirable addition – the desirable conceptual addition distinct from fixes and logging and so forth. i need to give it one more pass so that i'm clearer on all specifics. the fuse changes are just logging and readability? overall, quite pleased to see this patch appear. |
Mostly yes. There are some small changes, though:
|
Signed-off-by: Frederic BIDON <[email protected]>
cmd/datamon/cmd/cli_fuse_test.go
Outdated
pipeErr, err := cmd.StderrPipe() | ||
require.NoError(t, err) | ||
|
||
// combine stdout & stderr, tee this output to os.Stdout and return pipe reader for output scanning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we want to test if the output of a cli command is correctly being routed to STDOUT, we might want to keep the separation of what gets logged to which channel. Since operations such a list needs to be piped in shell to other unix tools, we have to add a test to make sure the output is to STD OUT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will remove the combined output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kerneltime after some more inspection, it so happens that I am asserting on both output...
I am deferring the more accurate separate assertions on stderr and stdout as a TODO (see #374).
Deferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm,
although i suggest waiting on approval from Ritesh before landing
Signed-off-by: Frederic BIDON <[email protected]>
… a todo Signed-off-by: Frederic BIDON <[email protected]>
Will merge this in once tests pass. |
Tracing
storage/gcs: added logging and debug traces
cmd(CLI): refactored logger injection, added more commands subject to log level setting
core: enabled fuse internal debug logger (experimental, since it is quite verbose)
cafs: injected logger
Tuning/user experimentation
cmd(CLI): added new options to tune fuse mount cache size and prefetch behavior (e.g. --cache-size 100MB, --prefetch 3)
core: bundle conveys new options down to cafs factory
Refactoring & readability
core: refactored NewXXXFS to suppress redundant staging path and logging (following initial idea from PR#354)
core: refactored RO mount for readability
cafs: more godoc, resolved question marks left in code
cafs: refactoring of struct factories, chunkReader and writer, more options to control LRU cache and prefetch behavior
cafs: refactored hasher ("helpers") for readability
Testing
cmd(CLI): re-enacted fuse CLI e2e testing (runs on CI)
cafs: added more unit tests on freelists, hasher and ReadAt with concurrency testing
Fixes
core: fixed spurious warnings on fs (extended attributes, flush on RO mount)
cafs: enabled evicted LRU buffers to be relinquished to freelist
cafs: fixed concurrency issues with corrupted relinquished buffers
cafs: fixed leaf hash verification for ReadAt
Performance improvements
Memory footprint reduction
core: relinquished unused RadixTree after ro fs construction
cafs: adapted freelist buffers to leaf size, not systematically max size
CI changes
Other enhancements
Signed-off-by: Frederic BIDON [email protected]