Skip to content

module__XMLReader

Robert Bossy edited this page Jul 27, 2017 · 1 revision

#org.bibliome.alvisnlp.modules.xml.XMLReader

Synopsis

Reads XML files and creates elements.

Description

org.bibliome.alvisnlp.modules.xml.XMLReader reads its input from sourcePath as XML and creates documents, sections, annotations, relations or tuples. The structure of the input XML is handled through the xlsTransform XSLT stylesheet.

org.bibliome.alvisnlp.modules.xml.XMLReader also provides XSLT function and element extensions. The namespace for all extensions is xalan://org.bibliome.alvisnlp.modules.xml.XMLReader2.

Element extensions

  • document: creates a document in the current corpus. The identifier is either a string specified by the attribute id, or an XPath expression specified by the xpath-id attribute. The expression is evaluated as a string.
  • section: creates a section in the current document; this element should occur inside a document element. The name of the section is either specified as a string in the name attribute, or as an XPath expression in the xpath-name attribute. The contents of the section is either specified as a string in the contents attribute, or as an XPath expression in the xpath-contents attribute.
  • annotation: creates an annotation in the current section; this element should occur inside a section element. The start and end positions are specified by the start and end attributes respectively. The value of these attributes are XPath expressions evaluated as integers. The layers in which the annotation should be added are either specified as a string by the layers attribute, or as an XPath expression by the xpath-layers attribute. The layers should be a space spearated list of layer names, the annotation will be added in each named layer. If the layers list is empty, then no annotation is created at all. Moreover an identifier for the created annotation may be specified either as a string by the ref attribute, or as an XPath expression by the xpath-ref attribute. This identifier can be referenced later to set tuple arguments.
  • relation: creates a relation in the current section; this element should occur inside a section element. The name of the relation is specified either as a string by the name attribute, or as an XPath expression by the xpath-name attribute.
  • tuple: creates a tuple inside the current relation; this element should occur inside a relation element.
  • arg: sets an argument of the current tuple; this element should occur inside a tuple element. The role of the argument is specified either as a string by the role attribute, or as an XPath expression by the xpath-role attribute. The identifier of the argument is specified either as a string by the ref attribute, or as an XPath expression by the xpath-ref attribute. The value must have been set previously in an annotation element.
  • feature: adds a feature to the current element; this element should occur inside a document, section, annotation, relation or tuple element. The name of the feature is set either as a string by the name attribute, or as an XPath expression by the xpath-name attribute. The value of the feature is set either as a string by the value attribute, or as an XPath expression by the xpath-value attribute.

Function extensions

  • inline: this function evaluates as a node set containing a copy of each element inside the current node. The elements will have two additional attributes start and end that indicate the character positions of the start and end tags. These attributes have the namespace http://bibliome.jouy.inra.fr/alvisnlp/bibliome-module-factory/inline. This function is useful to read in-text annotations. Note: the inline() function also process comment and processing instruction nodes. These nodes are then wrapped within an extra element named wrapper (in the namespace http://bibliome.jouy.inra.fr/alvisnlp/bibliome-module-factory/inline), which will have the two attributes start and end to indicate the character position where the comment or the processiçng instruction is inserted.

Parameters

Optional

Type: SourceStream

Path to the source directory or source file.

Optional

Type: SourceStream

XSLT Stylesheet to apply on the input.

Optional

Type: Mapping

Constant features to add to each annotation created by this module

Optional

Type: Mapping

Constant features to add to each document created by this module

Optional

Type: Mapping

Constant features to add to each relation created by this module

Optional

Type: Mapping

Constant features to add to each section created by this module

Optional

Type: Mapping

Constant features to add to each tuple created by this module

Optional

Type: Mapping

Parameters to pass to the XSLT Stylesheet specified by xslTransform.

Default value: false

Type: Boolean

Set to true if the input is HTML rather than XML.

Default value: false

Type: Boolean

If true, do not convert tag names to upper case.

Clone this wiki locally