Skip to content

0.8.1

Compare
Choose a tag to compare
@jbaiter jbaiter released this 10 Jun 08:56
· 153 commits to main since this release

This is a bugfix release targeting mainly the MiniOCR and ALTO implementations.

Bufgfixes:

  • ALTO: Fix handling of empty words. Previously any words after a word element with no text would be skipped entirely during indexing 😱😱.
  • MiniOCR: Fix handling of empty words, Previously a word element with no text would make the parser crash.
  • MiniOCR: Make the wh attribute on <p> page elements actually optional. The documentation said it was optional, but the parser would crash when attempting to handle elements without the attribute

Other Changes:

  • A warning will now be logged if none of the fields requested with hl.ocr.fl exist or are defined as stored fields. Previously highlighting would just not work, with no indications to users as to why this was the case.