Skip to content

nlpsandbox/nlpsandbox-schemas

Repository files navigation

nlpsandbox.io

NLP Sandbox OpenAPI Specifications

GitHub Release GitHub CI GitHub License Discord NLPSandbox.io

OpenAPI specifications of the NLP Sandbox tools and services

Introduction

NLPSandbox.io is an open platform for benchmarking modular natural language processing (NLP) tools on both public and private datasets. Academics, students, and industry professionals are invited to browse the available tasks and participate by developing and submitting an NLP Sandbox tool.

This repository contains the OpenAPI specifications of the NLP Sandbox tools and services. Visit NLPSandbox.io for more information on how to develop and benchmark NLP Sandbox tools.

Specification

  • NLP Sandbox schemas version: 1.2.0

Requirements

NLP Sandbox Tools

The OpenAPI specifications of the NLP Sandbox tools listed below are available in JSON and YAML formats. These specifications can be given as input to the OpenAPI Generator to generate tool "stubs". For more information on how to develop and benchmark an NLP Sandbox tool, visit NLPSandbox.io.

PHI Annotation and Deidentification

API Name Type
Contact Annotator nlpsandbox:contact-annotator
Date Annotator nlpsandbox:date-annotator
ID Annotator nlpsandbox:id-annotator
Location Annotator nlpsandbox:location-annotator
Person Name Annotator nlpsandbox:person-name-annotator
PHI Annotator nlpsandbox:phi-annotator
PHI Deidentifier nlpsandbox:phi-deidentifier

Symptom Annotation

API Name Type
COVID Symptom Annotator nlpsandbox:covid-symptom-annotator

NLP Sandbox Services

API Name Type
Data Node nlpsandbox:data-node

Implementations

Example tools

These repositories provide example implementations of each NLP Sandbox tool. These repositories have a CI/CD workflow that automatically tests the tool, and build and publish a Docker image that can then be submitted as-is to NLPSandbox.io, if you wish to benchmark its performance -- just don't expect a high score!

GitHub repository Language
nlpsandbox/contact-annotator-example Python
nlpsandbox/covid-symptom-annotation-example Python
nlpsandbox/date-annotator-example Python
nlpsandbox/date-annotator-example-java Java
nlpsandbox/id-annotator-example Python
nlpsandbox/location-annotator-example Python
nlpsandbox/person-name-annotator-example Python
nlpsandbox/phi-annotator-example Python
nlpsandbox/phi-deidentifier-example Python

Data Node

A Data Node instance enables to store FHIR and annotation resources used to benchmark NLP Sandbox tools.

GitHub repository Language
nlpsandbox/data-node Python

Versioning

GitHub release tags

This repository uses semantic versioning to track the releases of the NLP Sandbox schemas. This repository uses "non-moving" GitHub tags, that is, a tag will always point to the same git commit once it has been created.

Schemas / GitHub Pages tags

The artifact published by this repository are OpenAPI schemas and documentation published to GitHub Pages. The versions of the schemas are aligned with the GitHub tags of this repository. For example:

The table below describes the schemas tags available.

Tag name Description Moving
latest Latest stable release. Yes
edge Lastest commit made to the default branch. Yes
edge-<sha> Same as above with the reference to the git commit. No
<major>.<minor>.<patch> Latest stable patch release <major>.<minor>.<patch>. No

Contributing

Thinking about contributing to this project? Get started by reading our contribution guidelines.

License

Apache License 2.0