Skip to content

Commit

Permalink
Fixing bug in CLI - updating references
Browse files Browse the repository at this point in the history
  • Loading branch information
John Hawkins authored and John Hawkins committed Jul 25, 2023
1 parent 6ab800f commit 61ba4df
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 10 deletions.
19 changes: 12 additions & 7 deletions docs/paper/paper.tex
Original file line number Diff line number Diff line change
Expand Up @@ -124,8 +124,8 @@
methods of implementing and recording the details of scientific projects. Monolothic applications
have the advantage of a single and consistent design, however they impede the ability of users to
innovate and incrementally improve processes. We discuss the qualities of an ideal eScience framework
for building multi-stage collaboration scinetific workflows and present an open source implementation
for managing scientific processes in a decoupled fashion that permits both flexible implementation of
for building multi-stage collaboration scientific workflows and present an open source implementation
for managing data science processes in a decoupled fashion that permits both flexible implementation of
any stage of processing, and greater ease of meta-data analysis.
\end{abstract}

Expand All @@ -140,8 +140,9 @@ \section{Introduction}
experimental approaches and results. Failure to maintain these records impedes progress
by making it difficult to reproduce work, or imposing the costs of repeatedly testing
failed lines of experimentation. This cost becomes particularly high in the age of the
reproducibility crisis, as a many teams running the same experiments in parallel, without
knowledge of each others work, has produced a factory line of un-reproducible results.
reproducibility crisis where many teams are running the same experiments in parallel, without
knowledge of each others work, and producing a factory line of un-reproducible results
\cite{Ioannidis2005}.

Many Machine Learning experimentation frameworks focus on the task of making experiments
easier to execute and deploy into production systems\cite{Alberti:2018,MolnerDomenech:2020}.
Expand Down Expand Up @@ -264,14 +265,18 @@ \subsection{Implementation}
Projit has been implemented as python package that functions as both a command line application
and library that can be included inside other scripts and applications. The command line application
can be used to query the project metadata in much the same way that the git application can be used.
A user can add, modify and list the collection of data assets in the project: datasets, experiments
and results are all accessible from the command line application.
A user can add, modify and list the collection of analytical assets in the project: datasets, experiments
parameters, tags and results are all accessible from the command line application.

The python package can be included in a script so that the script can access the project metadata store.
This allows the script to find the location of common datasets, register themselves as an experiment
and store results once the script is complete. Programmatic interaction with the project data through
the projit API is what permits the scripts of a project to be decoupled and contribute to the project
without being aware of how any other element is structured or implemented.
without being aware of how any other element is structured or implemented. Furthermore, as the metadata
is stored locally in standard JSON files, these can be stored inside a cloud repository and then contributed
to by collaborators. This allows distributed data science teams to define their own lines of experimentation,
but use a synchronised data set then continually contribute to a central meta-data store of project results.


\section{Case Study}

Expand Down
12 changes: 12 additions & 0 deletions docs/paper/refs.bib
Original file line number Diff line number Diff line change
Expand Up @@ -113,4 +113,16 @@ @article{Sahoo2009
url = {https://corescholar.libraries.wright.edu/knoesis/80/}
}

@article{Ioannidis2005,
doi = {10.1371/journal.pmed.0020124},
author = {Ioannidis, John P. A.},
journal = {PLOS Medicine},
publisher = {Public Library of Science},
title = {Why Most Published Research Findings Are False},
year = {2005},
month = {08},
volume = {2},
url = {https://doi.org/10.1371/journal.pmed.0020124},
number = {8},

}
12 changes: 9 additions & 3 deletions projit/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -417,10 +417,11 @@ def print_usage(prog):
print(" ", prog, "plot initial execution # Plot the execution times for the experiment named 'initial'")
print(" ", prog, "plot initial hyperparam alpha # Plot the change in hyperparam 'alpha' for the experiment named 'initial'")
print(" ", prog, "plot initial result MSE # Plot the change in result 'MSE' for the experiment named 'initial'")
print(" ", prog, "-m list results test # List results on 'test' data in markdown")
print(" ", prog, "render path_to_output.pdf # Render a PDF document summarising the project")
print(" ", prog, "-m list results test # List results on 'test' data in Markdown format")
print(" ", prog, "rm experiment explore # Remove the experiment explore (requires confirmation)")
print(" ", prog, "rm experiment . # Remove all experiments (requires confirmation)")
print(" ", prog, "-m list results test # List results on test data in markdown")
print(" ", prog, "-m list results test # List results on test data in Markdown format")
print(" ", prog, "compare dataone,datatwo MAE # Compare results over datasets using metric MAE")
print("")

Expand All @@ -442,6 +443,7 @@ def cli_main():
parser = argparse.ArgumentParser()
parser.add_argument('-v', '--version', help='Print Version', action='store_true')
parser.add_argument('-m', '--markdown', help='Use markdown for output', action='store_true')
parser.add_argument('-u', '--usage', help='Print detailed usage instructions with examples', action='store_true')

subparsers = parser.add_subparsers(dest="cmd")

Expand Down Expand Up @@ -490,6 +492,10 @@ def cli_main():
print(" Version:", __version__)
exit(1)

if args.usage:
print_usage("projit")
exit(1)

if args.cmd == None:
print_usage("projit")
exit(1)
Expand Down Expand Up @@ -536,7 +542,7 @@ def cli_main():
task_status(project)

if args.cmd == 'render':
task_render(project)
task_render(project, args.path)


##########################################################################################
Expand Down

0 comments on commit 61ba4df

Please sign in to comment.