Skip to content

The Bayesian Multinomial Mixture Model code from my 2011 paper (and thesis)

Notifications You must be signed in to change notification settings

christos-c/bmmm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BMMM

The Bayesian Multinomial Mixture Model code from my 2011 paper (and thesis)

Requirements

  1. Java 1.7
  2. Maven (http://maven.apache.org/download.cgi)

Running BMMM

After cloning the project, or downloading the zip, open the bmmm folder in command line and run:

mvn package
mvn dependency:copy-dependencies

If the build is successful, to see the available runtime configuration options run

java -cp target/bmmm-2.0.11.jar:target/dependency/* tagInducer.Inducer

The main requirement is a CoNLL-style file with UPOS annotation (9 columns in total) as input. If the the input file contains dependencies (column 8) the deps feature can also be used. To use morphology (Morfessor) and PARG-based features you will need the appropriate files. You can convert a raw tokenised corpus to CoNLL format using the following command:

java -cp target/bmmm-2.0.10.jar tagInducer.utils.RawToCoNLL corpus.txt

You can also use a JSON file format with the following fields (one sentence per line):

{
    "words":[{"word":"more","pos":"qn","upos":"DET","cluster":"48"},
        {"word":"juice","pos":"n","upos":"NOUN","cluster":"48"},
        {"word":"?","pos":"?","upos":".","cluster":"-1"}]
}

Evaluating BMMM

To evaluate the output of the Inducer use:

java -cp target/bmmm-2.0.11.jar:target/dependency/* tagInducer.Evaluator

The input can be either a CoNLL-style file, where the clusters are contained in column 5 (4th 0-index-based column). The same file needs to contain either fine-grained tags (3rd 0-index column), UPOS (5th column) or CCG categories (6th column).

About

The Bayesian Multinomial Mixture Model code from my 2011 paper (and thesis)

Resources

Stars

Watchers

Forks

Packages

No packages published