Skip to content

Evaluation

Guilherme Passos edited this page May 5, 2018 · 1 revision

cl-conllu has a specific package, conllu.evaluate, for evaluating datasets and parsers output, by comparing sets of CoNLL-U sentences.

The usual usecase is, given a test set of sentences and a set of predicted analysis for this set done by an automatic parser, we would like to evaluate this analysis.

For illustrating the functionalities, we will use files in the repository.

(defparameter *test-list* (cl-conllu:read-conllu "./cl-conllu/test/test-evaluate.conllu"))
(defparameter *predicted-list* (cl-conllu:read-conllu "./cl-conllu/test/test-evaluate-predicted.conllu"))

General metrics

Some usual metrics we would like to have are:

exact match score
number of sentences with a perfectly correct analysis
unlabeled attachment score (UAS)
ratio of words that have the correct head
UAS by word (microaverage)
the ratio for wall words in the dataset
UAS by sentence (macroaverage)
the mean for sentences of the UAS of each sentence
labeled attachment score (LAS)
ratio of words that have both correct head and the correct label (DEPREL)
LAS by word (microaverage)
the ratio for wall words in the dataset
LAS by sentence (macroaverage)
the mean for sentences of the LAS of each sentence
(format nil "Exact match score: ~a~%Microaverage UAS: ~a~%Macroaverage UAS: ~a~%Microaverage LAS: ~a~%Macroaverage UAS: ~a~%"
        (conllu.evaluate:exact-match-score *predicted-list* *test-list*)
        (conllu.evaluate:attachment-score-by-word *predicted-list* *test-list*
                                                  :labeled nil)
        (conllu.evaluate:attachment-score-by-sentence *predicted-list* *test-list*
                                                      :labeled nil)
        (conllu.evaluate:attachment-score-by-word *predicted-list* *test-list*
                                                  :labeled t)
        (conllu.evaluate:attachment-score-by-sentence *predicted-list* *test-list*
                                                      :labeled t))
Exact match score: 0.33333334
Microaverage UAS: 0.93333334
Macroaverage UAS: 0.9444444
Microaverage LAS: 0.8666667
Macroaverage UAS: 0.8888889

It is also possible to calculate precision and recall of each DEPREL label:

(dolist (deprel '("nsubj" "obj" "obl" "punct"))
  (format t "For deprel ~a:~% - Precision: ~a~% - Recall: ~a~%~%"
          deprel
          (conllu.evaluate:precision *predicted-list* *test-list* deprel)
          (conllu.evaluate:recall *predicted-list* *test-list* deprel)))
For deprel nsubj:
 - Precision: 0.75
 - Recall: 1.0

For deprel obj:
 - Precision: 0.5
 - Recall: 0.5

For deprel obl:
 - Precision: NIL
 - Recall: 0.0

For deprel punct:
 - Precision: 1.0
 - Recall: 1.0

There was a warning as well:

WARNING: There are no tokens predicted as of deprel obl.

Confusion matrices

It is also possible to generate confusion matrices.

(defparameter *cm*
  (conllu.evaluate:make-confusion-matrix *predicted-list* *test-list*
                                         :corpus-id "example-corpus"
                                         :key-fn #'cl-conllu:token-deprel))

*cm*
#<CONLLU.EVALUATE:CONFUSION-MATRIX 
(advcl advcl 1)
(advmod advmod 1)
(amod amod 3)
(case case 1)
(compound compound 1)
(det det 5)
(expl expl 2)
(nsubj nsubj 3)
(nsubj obj 1)
(obj obj 1)
(obj obl 1)
(parataxis parataxis 1)
(punct punct 6)
(root root 3)
>

Below are some examples of accessing a confusion matrix:

  • Seeing labels
    (print (conllu.evaluate:confusion-matrix-labels *cm*))
        
    ("advcl" "advmod" "amod" "case" "compound" "det" "expl" "nsubj" "obj" "obl"
     "parataxis" "punct" "root") 
        
  • Acessing cells
    (format t
            "Cell (obj, obl):~% - count: ~a~% - list of instances: ~a~%~%"
            (conllu.evaluate:confusion-matrix-cell-count "obj" "obl"  *cm*)
            (conllu.evaluate:confusion-matrix-cell-tokens "obj" "obl"  *cm*))
        
    Cell (obj, obl):
     - count: 1
     - list of instances: ((test1 8))
    
        

    This means that in the sentence test1, the token 8 is such that in the testset it has an obj DEPREL, while the predicted DEPREL was obl.

If it is necessary to expand a confusion matrix to all possible pairs in CONFUSION-MATRIX-LABELS, an expanded (“normalized”) version of the confusion matrix can be produced:

(print (conllu.evaluate:confusion-matrix-normalize *cm*))
#<CONLLU.EVALUATE:CONFUSION-MATRIX 
(advcl advcl 1)
(advcl advmod 0)
(advcl amod 0)
(advcl case 0)
(advcl compound 0)
(advcl det 0)
(advcl expl 0)
(advcl nsubj 0)
(advcl obj 0)
(advcl obl 0)
(advcl parataxis 0)
(advcl punct 0)
(advcl root 0)
(advmod advcl 0)
(advmod advmod 1)
(advmod amod 0)
(advmod case 0)
(advmod compound 0)
(advmod det 0)
(advmod expl 0)
(advmod nsubj 0)
(advmod obj 0)
(advmod obl 0)
(advmod parataxis 0)
(advmod punct 0)
(advmod root 0)
(amod advcl 0)
(amod advmod 0)
(amod amod 3)
(amod case 0)
(amod compound 0)
(amod det 0)
(amod expl 0)
(amod nsubj 0)
(amod obj 0)
(amod obl 0)
(amod parataxis 0)
(amod punct 0)
(amod root 0)
(case advcl 0)
(case advmod 0)
(case amod 0)
(case case 1)
(case compound 0)
(case det 0)
(case expl 0)
(case nsubj 0)
(case obj 0)
(case obl 0)
(case parataxis 0)
(case punct 0)
(case root 0)
(compound advcl 0)
(compound advmod 0)
(compound amod 0)
(compound case 0)
(compound compound 1)
(compound det 0)
(compound expl 0)
(compound nsubj 0)
(compound obj 0)
(compound obl 0)
(compound parataxis 0)
(compound punct 0)
(compound root 0)
(det advcl 0)
(det advmod 0)
(det amod 0)
(det case 0)
(det compound 0)
(det det 5)
(det expl 0)
(det nsubj 0)
(det obj 0)
(det obl 0)
(det parataxis 0)
(det punct 0)
(det root 0)
(expl advcl 0)
(expl advmod 0)
(expl amod 0)
(expl case 0)
(expl compound 0)
(expl det 0)
(expl expl 2)
(expl nsubj 0)
(expl obj 0)
(expl obl 0)
(expl parataxis 0)
(expl punct 0)
(expl root 0)
(nsubj advcl 0)
(nsubj advmod 0)
(nsubj amod 0)
(nsubj case 0)
(nsubj compound 0)
(nsubj det 0)
(nsubj expl 0)
(nsubj nsubj 3)
(nsubj obj 1)
(nsubj obl 0)
(nsubj parataxis 0)
(nsubj punct 0)
(nsubj root 0)
(obj advcl 0)
(obj advmod 0)
(obj amod 0)
(obj case 0)
(obj compound 0)
(obj det 0)
(obj expl 0)
(obj nsubj 0)
(obj obj 1)
(obj obl 1)
(obj parataxis 0)
(obj punct 0)
(obj root 0)
(obl advcl 0)
(obl advmod 0)
(obl amod 0)
(obl case 0)
(obl compound 0)
(obl det 0)
(obl expl 0)
(obl nsubj 0)
(obl obj 0)
(obl obl 0)
(obl parataxis 0)
(obl punct 0)
(obl root 0)
(parataxis advcl 0)
(parataxis advmod 0)
(parataxis amod 0)
(parataxis case 0)
(parataxis compound 0)
(parataxis det 0)
(parataxis expl 0)
(parataxis nsubj 0)
(parataxis obj 0)
(parataxis obl 0)
(parataxis parataxis 1)
(parataxis punct 0)
(parataxis root 0)
(punct advcl 0)
(punct advmod 0)
(punct amod 0)
(punct case 0)
(punct compound 0)
(punct det 0)
(punct expl 0)
(punct nsubj 0)
(punct obj 0)
(punct obl 0)
(punct parataxis 0)
(punct punct 6)
(punct root 0)
(root advcl 0)
(root advmod 0)
(root amod 0)
(root case 0)
(root compound 0)
(root det 0)
(root expl 0)
(root nsubj 0)
(root obj 0)
(root obl 0)
(root parataxis 0)
(root punct 0)
(root root 3)
> 
Clone this wiki locally