Skip to content

r0ller/hi-engmorph-foma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The goal of this project is to create a useable English morphological analyser for the Alice project.

The whole project is based on the following resources:

University of Pennsylvania, XTAG Project: https://www.cis.upenn.edu/~xtag/

morph-1.5: https://www.cis.upenn.edu/~xtag/swrelease.html

original text file: morph-1.5\data\morph_english.flat

I just commited all intermediate artifacts (source code, db files, fst, build scripts, etc.) emerged during the conversion of the text file morph_english.txt.

Some short description:
-createdb.sql was used the create the empty db files
-upenn_morphtxt2db.cpp was used to create the converted_engmorph.db from comment_free_morph_english.txt
-adjust_upenn_morphdb.cpp was used to adjust it and get engmorph.db which was finally used onwards
-the various build*.sh scripts were used to build the programs to convert the relevant info from the engmorph.db into the corresponding *.lexc files
-english.foma was taken from the foma site and enhanced manually -finally some manual adjustments were applied to the lexc files
-createfst.sh was used to create english.fst in the end

Note: This is not a 1:1 transformation of the original and is still in development so expect many bugs/mistakes

Releases

No releases published

Packages

No packages published