Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird behaviour in complex workflow featuring xlsx export #228

Open
MartinKl opened this issue May 3, 2024 · 3 comments
Open

Weird behaviour in complex workflow featuring xlsx export #228

MartinKl opened this issue May 3, 2024 · 3 comments
Assignees
Labels
bug Something isn't working graphANNIS The issue is at least partially related to graphANNIS (bug or wrong application)

Comments

@MartinKl
Copy link
Collaborator

MartinKl commented May 3, 2024

When running annatto in disk mode, annotation layers seem to get lost in a setting with lots of manipulations (it's possible that export starts before the annotation storage is ready). Running in memory mode, though, everything passes.

The following workflow failed (it's not about the details, but about the complexity) to export annotation norm::auto_lemma when run on disk, but succeeded in memory. Note that only the export to xlsx showed this behaviour, the graphml file always contained the lemma layer.

[[import]]
format = "xlsx"
path = "./Luther/2_Exceldateien/Fabeln/"

[import.config]
column_map = { norm = [], edition = [], text = [] }

[[import]]
format = "treetagger"
path = "Luther/6_rnn-tagger-output/Fabeln-tagged/"
config = {}

[[graph_op]]
action = "split"

[graph_op.config]
delimiter = "."
anno = "default_ns::pos"
index_map = { "norm::auto_pos" = 1 }
keep = false
 
[graph_op.config.layer_map]
"norm::Gender" = ["Fem", "Neut", "Mask"]
"norm::Case" = ["Nom", "Gen", "Dat", "Akk"]
"norm::Degree" = ["Pos", "Komp", "Sup"]
"norm::Tense" = ["Präs", "Prät"]
"norm::Mood" = ["Ind", "Konj"]
"norm::Number" = ["Sg", "Pl"]
"norm::VerbClass" = ["Sw", "St", "Unr"]
"norm::Person" = ["1", "2", "3"]
 
[[graph_op]]
action = "enumerate"

[graph_op.config]
queries = ["tok @* annis:doc=/Lut_F_0Vorrede_tg/ @* annis:node_name=/Fabeln-tagged/", "tok @* annis:doc=/Lut_F_10Hund_tg/ @* annis:node_name=/Fabeln-tagged/", "tok @* annis:doc=/Lut_F_11Mogenhofer_tg/ @* annis:node_name=/Fabeln-tagged/", "tok @* annis:doc=/Lut_F_12Esel_tg/ @* annis:node_name=/Fabeln-tagged/", "tok @* annis:doc=/Lut_F_13Stadtmaus_tg/ @* annis:node_name=/Fabeln-tagged/", "tok @* annis:doc=/Lut_F_14Rabe_tg/ @* annis:node_name=/Fabeln-tagged/", "tok @* annis:doc=/Lut_F_1Torheit_tg/ @* annis:node_name=/Fabeln-tagged/", "tok @* annis:doc=/Lut_F_2Hass_tg/ @* annis:node_name=/Fabeln-tagged/", "tok @* annis:doc=/Lut_F_3Untreu_tg/ @* annis:node_name=/Fabeln-tagged/", "tok @* annis:doc=/Lut_F_4Neid_tg/ @* annis:node_name=/Fabeln-tagged/", "tok @* annis:doc=/Lut_F_5Geiz_tg/ @* annis:node_name=/Fabeln-tagged/", "tok @* annis:doc=/Lut_F_6Frevel_tg/ @* annis:node_name=/Fabeln-tagged/", "tok @* annis:doc=/Lut_F_7_tg/ @* annis:node_name=/Fabeln-tagged/", "tok @* annis:doc=/Lut_F_8Dieb_tg/ @* annis:node_name=/Fabeln-tagged/", "tok @* annis:doc=/Lut_F_9Kranich_tg/ @* annis:node_name=/Fabeln-tagged/"]
target = 1
label_ns = "source"
label_name = "id"
start = 0
value = 2

[[graph_op]]
action = "enumerate"

[graph_op.config]
queries = ["norm @* annis:doc=/Lut_F_0Vorrede_tg/", "norm @* annis:doc=/Lut_F_10Hund_tg/", "norm @* annis:doc=/Lut_F_11Mogenhofer_tg/", "norm @* annis:doc=/Lut_F_12Esel_tg/", "norm @* annis:doc=/Lut_F_13Stadtmaus_tg/", "norm @* annis:doc=/Lut_F_14Rabe_tg/", "norm @* annis:doc=/Lut_F_1Torheit_tg/", "norm @* annis:doc=/Lut_F_2Hass_tg/", "norm @* annis:doc=/Lut_F_3Untreu_tg/", "norm @* annis:doc=/Lut_F_4Neid_tg/", "norm @* annis:doc=/Lut_F_5Geiz_tg/", "norm @* annis:doc=/Lut_F_6Frevel_tg/", "norm @* annis:doc=/Lut_F_7_tg/", "norm @* annis:doc=/Lut_F_8Dieb_tg/", "norm @* annis:doc=/Lut_F_9Kranich_tg/"]
target = 1
label_ns = "target"
label_name = "id"
start = 0
value = 2

[[graph_op]]
action = "check"

[[graph_op.config.tests]]
query = "source:id"
expected = [1, inf]
description = "There are source annotations"

[[graph_op.config.tests]]
query = "target:id"
expected = [1, inf]
description = "There are target annotations"

[[graph_op]]
action = "link"

[graph_op.config]
source_query = "source:id"
source_node = 1
source_value = [1]
target_query = "target:id"
target_node = 1
target_value = [1]
link_type = "Pointing"
link_name = "align"

[[graph_op]]
action = "check"

[[graph_op.config.tests]]
query = "node ->align node"
expected = "norm"
description = "There are as many alignment edges as there are norm annotations."

[[graph_op.config.tests]]
query = "node? !->align norm"
expected = 0
description = "There is no norm node without an ingoing alignment edge."

[[graph_op]]
action = "collapse"
 
[graph_op.config]
ctype = "Pointing"
layer = ""
name = "align"
disjoint = true

[[graph_op]]
action = "revise"

[graph_op.config]
remove_nodes = ["Fabeln-tagged", "Fabeln-tagged/Lut_F_0Vorrede_tg", "Fabeln-tagged/Lut_F_10Hund_tg", "Fabeln-tagged/Lut_F_11Mogenhofer_tg", "Fabeln-tagged/Lut_F_12Esel_tg", "Fabeln-tagged/Lut_F_13Stadtmaus_tg", "Fabeln-tagged/Lut_F_14Rabe_tg", "Fabeln-tagged/Lut_F_1Torheit_tg", "Fabeln-tagged/Lut_F_2Hass_tg", "Fabeln-tagged/Lut_F_3Untreu_tg", "Fabeln-tagged/Lut_F_4Neid_tg", "Fabeln-tagged/Lut_F_5Geiz_tg", "Fabeln-tagged/Lut_F_6Frevel_tg", "Fabeln-tagged/Lut_F_7_tg", "Fabeln-tagged/Lut_F_8Dieb_tg", "Fabeln-tagged/Lut_F_9Kranich_tg", "Fabeln-tagged/Lut_F_0Vorrede_tg#text", "Fabeln-tagged/Lut_F_10Hund_tg#text", "Fabeln-tagged/Lut_F_11Mogenhofer_tg#text", "Fabeln-tagged/Lut_F_12Esel_tg#text", "Fabeln-tagged/Lut_F_13Stadtmaus_tg#text", "Fabeln-tagged/Lut_F_14Rabe_tg#text", "Fabeln-tagged/Lut_F_1Torheit_tg#text", "Fabeln-tagged/Lut_F_2Hass_tg#text", "Fabeln-tagged/Lut_F_3Untreu_tg#text", "Fabeln-tagged/Lut_F_4Neid_tg#text", "Fabeln-tagged/Lut_F_5Geiz_tg#text", "Fabeln-tagged/Lut_F_6Frevel_tg#text", "Fabeln-tagged/Lut_F_7_tg#text", "Fabeln-tagged/Lut_F_8Dieb_tg#text", "Fabeln-tagged/Lut_F_9Kranich_tg#text"]

[graph_op.config.node_annos]
"source::id" = ""
"target::id" = ""
"default_ns::lemma" = "norm::auto_lemma"

[[export]]
format = "graphml"
path = "./"
config = {  }

[[export]]
format = "xlsx"
path = "xlsx-with-tags/"
config = { include_namespace = false, annotation_order = ["edition::edition", "text::text", "norm::norm", "norm::auto_pos", "default_ns::lemma", "norm::auto_lemma", "norm::Case", "norm::Degree", "norm::Gender", "norm::Mood", "norm::Number", "norm::Person", "norm::Tense", "norm::VerbClass"] }

@MartinKl MartinKl added bug Something isn't working graphANNIS The issue is at least partially related to graphANNIS (bug or wrong application) labels May 3, 2024
@MartinKl MartinKl self-assigned this May 3, 2024
@MartinKl
Copy link
Collaborator Author

MartinKl commented May 3, 2024

When running in disk mode, this line sometimes yields true and sometimes false for the node holding the lemma annotation:

if token_helper.is_token(m.node)? {

In contrast, in memory mode, it always returns false

@MartinKl
Copy link
Collaborator Author

MartinKl commented May 3, 2024

Importing the graphml file with annis cli, a query tok _ident_ auto_lemma returns 0 matches.

All of this together points to a not completely updated storage, probably the storage of the Coverage component which influences the result of is_token

@MartinKl
Copy link
Collaborator Author

We figured out, that this happens because Coverage components can get unloaded in workflow steps that use a CorpusStorage. Even though AQL queries can now be executed on graphs directly, which would avoid unloading, removing CorpusStorages is not an option right now, since a lot of graph_ops rely on the correct order of results which only CorpusStorage provides.

If run in memory, the either CorpusStorage does not unload the Coverage component or is simply fast enough in reloading it again, so the bug does not occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working graphANNIS The issue is at least partially related to graphANNIS (bug or wrong application)
Projects
None yet
Development

No branches or pull requests

1 participant