Skip to content

0.8.2

Compare
Choose a tag to compare
@jbaiter jbaiter released this 22 Sep 16:28
· 132 commits to main since this release

Bugfix release for an edge-case in hOCR parsing.

Bugfixes:

  • hOCR: Fix stack overflow when handling empty words in combination with a partially
    hyphenated word

Other Changes:

  • Improved error message in case of errors during highlighting, the message now includes the source pointer of the failed document, or if storing OCR in the index, the beginning of the broken content. Also included is the internal Lucene document identifier. By adding the [docid] field to the returned fields for the failing query, the internal id is added to very document in the result set for a failing query, which should allow quick identification of the documents that cause issues during highlighting.