Skip to content

module__YateaExtractor

Robert Bossy edited this page Jul 27, 2017 · 1 revision

#org.bibliome.alvisnlp.modules.yatea.YateaExtractor

Synopsis

Extract terms from the corpus using the YaTeA term extractor.

Description

org.bibliome.alvisnlp.modules.yatea.YateaExtractor hands the corpus to the YaTeA extractor. The corpus is first written in a file in the YaTeA input format. Tokens are annotations in the layer wordLayerName, their surface form, POS tag and lemma are taken from formFeature, posFeature and lemmaFeature features respectively. If sentenceLayerName is set, then an additional SENT marker is added to reinforce sentence boundaries corresponding to annotations in this layer.

The YaTeA is called using the executable set in yateaExecutable, it will run as if it is called from directory workingDir: the result will be written in the subdirectory named corpusName.

Parameters

Optional

Type: SourceStream

Path to the YaTeA configuration file.

Optional

Type: WorkingDirectory

Path to the directory where YaTeA is launched.

Optional

Type: ExecutableFile

Path to the YaTeA executable file.

Optional

Type: InputDirectory

Optional

Type: String

Optional

Type: InputDirectory

Optional

Type: OutputDirectory

Optional

Type: String

Contents of the PERLLIB in the environment of Yatea binary.

Optional

Type: InputFile

BioYaTeA option: path to the post-processing file option.

Optional

Type: OutputFile

BioYaTeA option: path to the result file after post-processing.

Optional

Type: String

Optional

Type: TestifiedTerminology

Default value: false

Type: Boolean

Default value: true

Type: Expression

Only process document that satisfy this filter.

Default value: true

Type: Boolean

Either to write DOCUMENT special tokens. Not every YaTeA version accepts them.

Default value: form

Type: String

Feature containing the word form.

Default value: lemma

Type: String

Feature containing the word lemma.

Default value: pos

Type: String

Feature containing the word POS tag.

Default value: boolean:and(true, nav:layer:words())

Type: Expression

Process only sections that satisfy this filter.

Default value: sentences

Type: String

Name of the layer containing sentence annotations, sentences are reinforced.

Default value: words

Type: String

Name of the layer containing the word annotations.

Default value: {}

Type: Mapping

Default value: {}

Type: Mapping

Clone this wiki locally