Skip to content

Quick Start

Huan He edited this page Jul 21, 2022 · 13 revisions

A Minimal Annotation Task

MedTator doesn’t require any server or client runtime environment to be installed. Annotators could use the latest web browser to run MedTator, such as Chromium, Microsoft Edge, Google Chrome, Vivaldi, and other Chromium-based browsers. Due to the limited support to HTML5 in Internet Explorer, MedTator couldn’t run in Internet Explorer. As we used the latest HTML5 File System Access API, the “Save” function may not be available in those web browsers not compatible with this API.

You could use the public version MedTator to start annotation quickly by accessing this URL: https://ohnlp.github.io/MedTator/ . Then, the following user interface would be displayed for you start annotation.

The MedTator UI

You could import your own schema file and text files for annotation. If you don’t have schema or text file yet, you can also try our online sample by clicking the “Sample” button in the menu:

The MedTator UI

After clicking the “Sample” button, a sample dataset will be loaded to demonstrate the main features of MedTator, and you could explore all the four tabs (e.g., Annotation, Statistics, Export, and IAA) to try the functions in each tab. More details of the functions in each tab are described the in the “Usage” section.

Sample data

In the MedTator repository, there is a sample/ folder, which contains a minimal annotation task MINIMAL_TASK to demonstrate how to use MedTator to annotation. In this task, we only need to annotate the symptoms related to COVID-19 vaccination (e.g., headache, fever, pain, etc.) and there are only three text files, namely doc_01.txt, doc_02.txt, and doc_03.txt. MedTator accepts .txt files for annotation, but it can be easier to use .xml files directly.

For the sample data, we have already converted the .txt files to .xml. But for your own project, you can use the Converter Tab to convert many .txt files to .xml files. For more information about how to prepare the .xml files, check Prepare Dataset.

Import schema and text files

As shown in the following figure, we could drag and drop the annotation schema file (.yaml, or .json, or .dtd) to schema file box (the details of schema file are specified in “Annotation schema file” section), and the 3 .xml files to the annotation file box (the details of annotation .xml file are specified in “Annotation data file” section). MedTator will read and load those files from your local disk directly to your web browser.

drag and drop the schema and xml files

Annotate files

As you can see, doc_01 and doc_02 have been annotated. You could check if there is any missing in these two files. After checking the first two files, only one file is left for you to annotate, which is the doc_03.

annotated doc_01 and doc_02

Then, click on the “doc_03.txt.xml” in the file list and the text will be displayed in the tag editor. As shown in the following figure, although we haven’t annotated this file yet, MedTator has already found some potential tags and shown the hints as dotted boxes based on the annotated tags in the doc_01 and doc_02. You could click on each hint box to add it.

annotated doc_03

Or you could just click the “Accept All” in the menu bar to accept all hints:

accept hints

Once you update the annotations in any file, you will find a yellow disk icon will be displayed on the left of the file name, which indicates that this annotation file is changed (e.g., added new tags, deleted tags, or updated attribute values). You need to save this file otherwise the changes won’t be saved. You could click on the yellow disk icon or the “Save” button in the menu to save the current annotation file.

save file

Analyze the annotations

When the annotation is finished, we could check the overall statistical result on the annotated tags and the detailed list of all the texts by using the “Statistics” tab. For example, as shown in the left panel, the statistical result shows that there are 23 annotated tags found across 3 files. Moreover, in the right panel, there are 17 unique tokens or phrases identified for the SYMP concept. And the count of each token or phrase and which file it comes from are also listed. You could check if there is any mistake in the annotation and go back to the file to correct it by clicking the file name.

statistics on the annotated tags

Export the annotations

Once the analysis is finished, we could send the annotation files (e.g., the 3 .xml files) to downstream tasks directly. To streamline the data processing, MedTator supports exporting the annotation files to other formats used by downstream tasks. For example, MedTator could export all the annotated tags with the sentence context as a tab-separated file (e.g., .tsv file):

export the annotations

As shown in the above figure, the exported .tsv file contains 6 columns, which includes the spans location in the document, spans location in the sentence, and the surrounding sentence of each tag.