Metadata Automation Challenge

Using the baseline demo in RStudio

Environment setup

Clone this repository
Open metadata-automation-challenge.Rproj
Install packages. In the RStudio console, run:

renv::restore()

This may take some time to complete - get something nice to drink :)

Create the folders input, data and output in your current directory.
Create .synapseConfig file

See this vignette about Managing Synapse Credentials to learn how to store credentials to login without needing to specify your username and password each time.

Open and run the demo notebook

You can find the baseline demo R Notebook at baseline_demo/baseline_demo.Rmd. After opening the notebook, you should be able to step through and execute each chunk in order.

Building Docker images

docker build -t metadata-baseline -f Dockerfile.baseline .
docker build -t metadata-validation -f Dockerfile.validation .
docker build -t metadata-scoring -f Dockerfile.scoring .

Running the baseline method with Docker

Here we describe how to apply the baseline method to automatically annotate a dataset (see Data Description).

Create the folders input, data and output in your current directory.
Place the input dataset in input, e.g. input/APOLLO-2-leaderboard.tsv
Run the following command

docker run \
  -v $(pwd)/input:/input:ro \
  -v $(pwd)/data:/data:ro \
  -v $(pwd)/output:/output \
  metadata-baseline APOLLO-2-leaderboard

where APOLLO-2 is the name of the dataset in the folder input (without the extension .tsv). Here $(pwd) is automatically replaced to the absolute path of the current directory.

The file /output/APOLLO-2-leaderboard-Submission.json is created upon successful completion of the above command.

Validating the submission file

The following command checks that the format of the submission file generated is valid.

$ docker run \
  -v $(pwd)/output/APOLLO-2-leaderboard-Submission.json:/input.json:ro \
  metadata-validation \
  validate-submission --json_filepath /input.json
Your JSON file is valid!

where $(pwd)/output/APOLLO-2-leaderboard-Submission.json points to the location of the submission file generated in the previous section.

Alternatively, the validation script can be run directly using Python.

$ python3 -m venv venv
$ pip install click jsonschema

Here is the generic command to validate the format of a submission file.

$ python schema/validate.py validate-submission \
  --json_filepath yourjson.json \
  --schema_filepath schema/output-schema.json

To validate the submission file generated in the previous section, the command becomes:

$ python schema/validate.py validate-submission \
  --json_filepath output/APOLLO-2-leaderboard-Submission.json \
  --schema_filepath schema/output-schema.json
Your JSON file is valid!

Scoring the submission

Here we evaluate the performance of the submission by comparing the content of the submission file to a gold standard (e.g. manual annotations).

$ docker run \
  -v $(pwd)/output/APOLLO-2-leaderboard-Submission.json:/submission.json:ro \
  -v $(pwd)/data/Annotated-APOLLO-2-leaderboard.json:/goldstandard.json:ro \
  metadata-scoring score-submission /submission.json /goldstandard.json
1.24839015151515

Name		Name	Last commit message	Last commit date
Latest commit History 317 Commits
R		R
baseline_demo		baseline_demo
bin		bin
renv		renv
schema		schema
scoring_demo		scoring_demo
workflow		workflow
.gitignore		.gitignore
Dockerfile.baseline		Dockerfile.baseline
Dockerfile.scoring		Dockerfile.scoring
Dockerfile.validation		Dockerfile.validation
README.md		README.md
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metadata Automation Challenge

Using the baseline demo in RStudio

Environment setup

Open and run the demo notebook

Building Docker images

Running the baseline method with Docker

Validating the submission file

Scoring the submission

About

Releases

Packages

Contributors 4

Languages

Sage-Bionetworks/metadata-automation-challenge

Folders and files

Latest commit

History

Repository files navigation

Metadata Automation Challenge

Using the baseline demo in RStudio

Environment setup

Open and run the demo notebook

Building Docker images

Running the baseline method with Docker

Validating the submission file

Scoring the submission

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages