Skip to content

module__SeSMig

Robert Bossy edited this page Jul 27, 2017 · 1 revision

#org.bibliome.alvisnlp.modules.segmig.SeSMig

Synopsis

Detects sentence boundaries and creates one annotation for each sentence.

This module assumes WoSMig processed the same sections.

Description

org.bibliome.alvisnlp.modules.segmig.SeSMig scans for annotations in wordLayerName and detects a sentence boundaries defined as either:

  • an annotation whose feature eosStatusFeature equals eos;
  • an annotation whose surface form contains only characaters of the value of strongPunctuations and which is followed by an uppercase character;
  • an annotation whose feature eosStatusFeature equals maybe-eos and which is followed by an uppercase character.

org.bibliome.alvisnlp.modules.segmig.SeSMig creates an annotation for each sentence and adds it into the targetLayerName. The eosStatusFeature of word annotations are given a new value:

  • eos: for the last word of each sentence;
  • not-eos: for all other words.

If noBreakLayerName is defined, then org.bibliome.alvisnlp.modules.segmig.SeSMig will prevent sentence boundaries inside annotations in this layer.

Parameters

Optional

Type: Mapping

Constant features to add to each annotation created by this module

Optional

Type: String

Name of the layer containing annotations within which there cannot be sentence boundaries.

Default value: true

Type: Expression

Only process document that satisfy this filter.

Default value: eos

Type: String

Name of the feature (in words) containing the end-of-sentence status (not-eos, maybe-eos).

Default value: form

Type: String

Name of the feature containing the word surface form.

Default value: boolean:and(true, nav:layer:words())

Type: Expression

Process only sections that satisfy this filter.

Default value: ?.!

Type: String

List of strong punctuations.

Default value: sentences

Type: String

Name of the layer where to store sentence annotations.

Default value: wordType

Type: String

Name of the feature where to read word annotation type.

Default value: words

Type: String

Name of the layer containing word annotations.

Clone this wiki locally