-
Clone this repository
-
Open
metadata-automation-challenge.Rproj
-
Install packages. In the RStudio console, run:
renv::restore()
This may take some time to complete - get something nice to drink :)
-
Create the folders
input
,data
andoutput
in your current directory. -
Create
.synapseConfig
file
See this vignette about Managing Synapse Credentials to learn how to store credentials to login without needing to specify your username and password each time.
You can find the baseline demo R Notebook at baseline_demo/baseline_demo.Rmd
. After opening the notebook, you should be able to step through and execute each chunk in order.
docker build -t metadata-baseline -f Dockerfile.baseline .
docker build -t metadata-validation -f Dockerfile.validation .
docker build -t metadata-scoring -f Dockerfile.scoring .
Here we describe how to apply the baseline method to automatically annotate a dataset (see Data Description).
- Create the folders
input
,data
andoutput
in your current directory. - Place the input dataset in
input
, e.g.input/APOLLO-2-leaderboard.tsv
- Run the following command
docker run \
-v $(pwd)/input:/input:ro \
-v $(pwd)/data:/data:ro \
-v $(pwd)/output:/output \
metadata-baseline APOLLO-2-leaderboard
where APOLLO-2
is the name of the dataset in the folder input
(without the extension .tsv
). Here $(pwd)
is automatically replaced to the absolute path of the current directory.
The file /output/APOLLO-2-leaderboard-Submission.json
is created upon successful completion of the above command.
The following command checks that the format of the submission file generated is valid.
$ docker run \
-v $(pwd)/output/APOLLO-2-leaderboard-Submission.json:/input.json:ro \
metadata-validation \
validate-submission --json_filepath /input.json
Your JSON file is valid!
where $(pwd)/output/APOLLO-2-leaderboard-Submission.json
points to the location of the submission file generated in the previous section.
Alternatively, the validation script can be run directly using Python.
$ python3 -m venv venv
$ pip install click jsonschema
Here is the generic command to validate the format of a submission file.
$ python schema/validate.py validate-submission \
--json_filepath yourjson.json \
--schema_filepath schema/output-schema.json
To validate the submission file generated in the previous section, the command becomes:
$ python schema/validate.py validate-submission \
--json_filepath output/APOLLO-2-leaderboard-Submission.json \
--schema_filepath schema/output-schema.json
Your JSON file is valid!
Here we evaluate the performance of the submission by comparing the content of the submission file to a gold standard (e.g. manual annotations).
$ docker run \
-v $(pwd)/output/APOLLO-2-leaderboard-Submission.json:/submission.json:ro \
-v $(pwd)/data/Annotated-APOLLO-2-leaderboard.json:/goldstandard.json:ro \
metadata-scoring score-submission /submission.json /goldstandard.json
1.24839015151515