You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There should be an optional header section, that (potentially) holds the following information:
which version of annatto the workflow was successfully run with. This would require a write-back to the file in case we want annatto to set this automatically. Technically it could also be a key only to be set by users, since a successful run does not imply that the workflow produces the desired output across all versions
the memory switch (ANNATTO_IN_MEMORY is currently an environment variable) could be placed there. There are pros and cons: On the one hand, this would certainly be more user-friendly, but could on the other hand make workflows less portable for large corpora, i. e. another user running the same workflow on the same data might not have the same memory available to run the process in it.
The text was updated successfully, but these errors were encountered:
I would argue that the ANNATTO_IN_MEMORY needs more discoverability, but this can also be achieved by adding it as a command line argument in addition to using the environment variable. This would make it appear in the --help description, and we could add a good explanation of what it does. The clap command line argument parser supports the env field that could be added to the argument.
We default to on disk mode because even if this can be much slower, we want to make sure the conversion does not fail. A "hint" in the header section would probably mean something like "this is a small corpus and would most likely fit in your main memory". But since we know nothing about the environment the workflow is run (could be even a very resource limited CI job) we should not include this in the workflow in my opinion.
There is however the downside, that there could be a similar situation like in Rust with the --release compile flag, where people discover that their programs get magically faster by applying it. So we really have to work on the discoverability, e.g. with the flag and/or with a troubleshooting guide. Also, the idea of "hints" in addition to "warnings" in the console output is flowing in my head. If annatto discovers that something could work better (like running faster because the corpus is small), we could output "hints" to the user. Warnings would indicate that something is wrong (which it isn't) and would be the wrong way to approach this. But "hints" could help with discoverability when workflows and annatto become more complex.
As #228 is unresolved, running a workflow on disk can provide incorrect / incomplete output data. That would weaken the argument for defaulting to disk mode, which I nevertheless agree it should be.
There should be an optional header section, that (potentially) holds the following information:
The text was updated successfully, but these errors were encountered: