Skip to content

Language Specific Limitations

SwampertSupport edited this page Nov 7, 2021 · 1 revision

Since the project began with only Cherokee language data, our current design has some limitations that we hope to resolve in future iterations and as we expand to other languages. The following is an incomplete list:

  • Currently, our data ingestion process is specific to the Cherokee language data that we have access to.
  • Different language varieties are not indicated. The project is currently working with the Oklahoma variety, with citizens from the United Keetoowah Band of Cherokee Indians for audio recordings. However, it receives support and feedback from all three federally recognized tribes.
  • When querying for the shape of a morpheme, you can specify an orthography, but there is a static set of choices specific to Cherokee (TAOC, CRG, Learner).
  • There is no tagging of orthography type in the source layer of a form. We tend to assume that this is the Cherokee syllabary or other writing system based on the Latin alphabet.
  • We currently exclude the phonemic representation from the front-end, since it is used for very specific types of linguistic analysis and is, therefore, outside of our scope.
  • Functional morpheme tags are identified from a global list, but different languages might use the same string for different meanings.