Skip to content

Commit

Permalink
Fixing references in JOSS draft
Browse files Browse the repository at this point in the history
  • Loading branch information
john-hawkins committed Sep 29, 2024
1 parent 3377c28 commit 000db06
Showing 1 changed file with 21 additions and 21 deletions.
42 changes: 21 additions & 21 deletions docs/paper/joss.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,25 +41,25 @@ and processes.

Software approaches to managing scientific data, processes and meta-data are
typically either built as front-ends for specific
scientific domains [@Howe2008;Pettit:2010] (leveraging known analytical practices in
the given domain) or they are designed to faciliate interoperability between different
scientific domains [@Howe2008;@Pettit:2010]
or they are designed to faciliate interoperability between different
technology stacks [@Subramanian2013]. Machine learning focused frameworks tend to
focus on solving problems of model training and deployment for specific
technologies\cite[@Alberti:2018;MolnerDomenech:2020], and hence have limited generality.
technologies [@Alberti:2018;@MolnerDomenech:2020], and hence have limited generality.

`Projit` is a Python package for managing data science project meta-data
inside a simple local JSON store. It also provides a CLI tool for
inside a simple local JSON store. It provides a CLI tool for
interrogating this data so that the current state of a project can easily
be assessed and understood. The API for `projit` was
designed so that it can be included in arbitrary python scripts to
locate datasets, register experiments and store results along
with hyper-parameters.

The `projit` datastore is light-weight enough that it can easily be stored
with code inside a source code repository. Meaning that future users can
interrogate the experiment history of the project. This is useful for both
The `projit` datastore is light-weight so it can be saved
with code inside a source code repository. Allowing future users to
interrogate the experiment history of project. This is useful for both
project continuation, auditing/repeatability and opening the possibility
of scripted meta-data analysis. The package has been
of scripted meta-data analysis. The `projit` package has been
used in a number of scientific publications to manage the results of
machine learning experiments into systematic reviews for biomedical
projects [@Hawkins+Tivey:2024] and the analysis of text features derived
Expand All @@ -80,12 +80,12 @@ generate standardised result sets for comparison.
To facilitate loose coupling between stages of the project the `projit` utility
imposes a simple schema for components of a data science project. These consist
of:
- Datasets
- Experiments
- Results
* Datasets
* Experiments
* Results

All of these entities can be added, removed or modified using either the CLI tool
or the Python package within scripts. The relation of these components is depicted
or the Python package within scripts. These entities in a project are depicted
in Figure \autoref{fig:projit}

![Projit Application Entities.\label{fig:projit}](images/Projit_decoupled_process.drawio.png)
Expand All @@ -109,29 +109,29 @@ from anywhere inside the project without tracking the location of the root direc
Secondly, we develop a sub-command structure that allows the `'projit` CLI to be
a versatile tool with something close to a natural language interface.
For example, the primary command `list` can be applied to any of the `projit`
entities, as shown in the code listing below:
entities, as shown in the command below:

```
projit list datasets
projit list experiments
projit list results
> projit list datasets
```

The same principle applies to the remove and add commands, which naturally require
additional paramaters to specifiy what is being added or removed. The design goal
of the CLI is to make project intuitive without imposing arbitrary constraints.
of the CLI is to make projit intuitive without imposing arbitrary constraints.

# Research Applications

The fundamental research application of `projit` is in managing the project lifecycle
and efficiency of development. Results to all experiments can be tracked and
interrogated to easily produce tables of data.
An additional level of application comes with a focus
and efficiency of development. Paths to datasets are retrieved from meta-data, not
hard coded. Experiments are named, with execution times tracked. The Results to
all experiments can be tracked over each iteration, with hyper-parameters and
interrogated to easily produce tables of data and analysis.
Additional application comes with a focus
on open science, allowing other teams to review and audit experiment history,
then easily repeat or extend experiments.
Finally, there is a research application in meta-analysis.
Projects in which the projit meta-data are stored along with open source code can
be interrogated to look at the performance of certain techniques or algorithms across
be analysed to look at the performance of certain techniques or algorithms across
multiple projects.

# Acknowledgements
Expand Down

0 comments on commit 000db06

Please sign in to comment.