Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with harvester on Latvian open data portal #543

Open
vicehovskis opened this issue Nov 27, 2023 · 3 comments
Open

Problem with harvester on Latvian open data portal #543

vicehovskis opened this issue Nov 27, 2023 · 3 comments

Comments

@vicehovskis
Copy link

Hello, all

We have some issues with harvester on our Latvian open data portal -> https://data.gov.lv/lv

Now we have CKAN version 2.8.6 and we could successful harvest from two CSW resources ->

http://195.244.156.233:8080/geoportal/csw
https://geometadati.viss.gov.lv/geoportal/csw

But now on our Latvian Geoportal test enviroment we created new OGC CSW and we try to harvest from it in TEST enviroment in the same way, but receive this error ->

999 ERROR [ckanext.spatial.harvesters.csw.CSW.gather] Exception: Traceback (most recent call last):
  File "/usr/lib/ckan/default/src/ckanext-spatial/ckanext/spatial/harvesters/csw.py", line 95, in gather_stage
    for identifier in self.csw.getidentifiers(page=10, outputschema=self.output_schema(), cql=cql):
  File "/usr/lib/ckan/default/src/ckanext-spatial/ckanext/spatial/lib/csw_client.py", line 127, in getidentifiers
    csw.getrecords2(**kwa)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/owslib/csw.py", line 341, in getrecords2
    self._invoke()
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/owslib/csw.py", line 582, in _invoke
    self.response = util.http_post(self.url, self.request, self.lang, self.timeout)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/owslib/util.py", line 285, in http_post
    up = urllib2.urlopen(r,timeout=timeout);
  File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib64/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib64/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 500: Internal Server Error
2023-10-23 16:19:01,009 ERROR [ckanext.harvest.queue] Gather stage failed

Also in our test enviroment our developer trying to update CKAN version to 2.10.1, and with that new version harvester doesn`t work at all, there are lots the same errors and no ideas how to solve them ->

sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (builtins.RecursionError) maximum recursion depth exceeded while calling a Python object
[SQL: INSERT INTO harvest_log (id, content, level, created) VALUES (%(id)s, %(content)s, %(level)s, %(created)s)]
[parameters: [{'content': 'Harvest source not found for dataset 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9', 'level': 'ERROR'}]] (Background on this error at: https://sqlalche.me/e/14/7s2a)
2023-11-14 16:29:53,874 ERROR [ckanext.harvest.plugin] Harvest source not found for dataset 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9
2023-11-14 16:29:54,463 ERROR [ckan.lib.search] This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (builtins.RecursionError) maximum recursion depth exceeded while calling a Python object
[SQL: INSERT INTO harvest_log (id, content, level, created) VALUES (%(id)s, %(content)s, %(level)s, %(created)s)]
[parameters: [{'content': 'Harvest source not found for dataset 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9', 'level': 'ERROR'}]] (Background on this error at: https://sqlalche.me/e/14/7s2a)
Traceback (most recent call last):
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/init.py", line 143, in dispatch_by_operation
index.insert_dict(entity)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 79, in insert_dict
return self.update_dict(data)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 106, in update_dict
self.index_package(pkg_dict, defer_commit)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 123, in index_package
validated_pkg_dict, _errors = lib_plugins.plugin_validate(
File "/usr/lib/ckan/default/src/ckan/ckan/lib/plugins.py", line 331, in plugin_validate
return validate(data_dict, schema, context)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/navl/dictization_functions.py", line 305, in validate
flat_data, errors = _validate(flattened, schema, validators_context)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/navl/dictization_functions.py", line 356, in _validate
convert(converter, key, converted_data, errors, context)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/navl/dictization_functions.py", line 262, in convert
value = converter(*params)
File "/usr/lib/ckan/default/src/ckan/ckan/logic/validators.py", line 195, in package_id_exists
result = session.query(model.Package).get(value)
File "", line 2, in get
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/util/deprecations.py", line 402, in warned
return fn(*args, **kwargs)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/query.py", line 947, in get
return self._get_impl(ident, loading.load_on_pk_identity)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/query.py", line 951, in _get_impl
return self.session._get_impl(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 2912, in _get_impl
return db_load_fn(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/loading.py", line 530, in load_on_pk_identity
session.execute(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 1711, in execute
conn = self._connection_for_bind(bind)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 1552, in _connection_for_bind
return self._transaction._connection_for_bind(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 721, in _connection_for_bind
self._assert_active()
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (builtins.RecursionError) maximum recursion depth exceeded while calling a Python object
[SQL: INSERT INTO harvest_log (id, content, level, created) VALUES (%(id)s, %(content)s, %(level)s, %(created)s)]
[parameters: [{'content': 'Harvest source not found for dataset 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9', 'level': 'ERROR'}]] (Background on this error at: https://sqlalche.me/e/14/7s2a)
2023-11-14 16:29:54,464 ERROR [ckan.model.modification] This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (builtins.RecursionError) maximum recursion depth exceeded while calling a Python object
[SQL: INSERT INTO harvest_log (id, content, level, created) VALUES (%(id)s, %(content)s, %(level)s, %(created)s)]
[parameters: [{'content': 'Harvest source not found for dataset 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9', 'level': 'ERROR'}]] (Background on this error at: https://sqlalche.me/e/14/7s2a)
Traceback (most recent call last):
File "/usr/lib/ckan/default/src/ckan/ckan/model/modification.py", line 71, in notify
observer.notify(entity, operation)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/init.py", line 165, in notify
dispatch_by_operation(
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/init.py", line 143, in dispatch_by_operation
index.insert_dict(entity)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 79, in insert_dict
return self.update_dict(data)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 106, in update_dict
self.index_package(pkg_dict, defer_commit)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 123, in index_package
validated_pkg_dict, _errors = lib_plugins.plugin_validate(
File "/usr/lib/ckan/default/src/ckan/ckan/lib/plugins.py", line 331, in plugin_validate
return validate(data_dict, schema, context)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/navl/dictization_functions.py", line 305, in validate
flat_data, errors = _validate(flattened, schema, validators_context)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/navl/dictization_functions.py", line 356, in _validate
convert(converter, key, converted_data, errors, context)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/navl/dictization_functions.py", line 262, in convert
value = converter(*params)
File "/usr/lib/ckan/default/src/ckan/ckan/logic/validators.py", line 195, in package_id_exists
result = session.query(model.Package).get(value)
File "", line 2, in get
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/util/deprecations.py", line 402, in warned
return fn(*args, **kwargs)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/query.py", line 947, in get
return self._get_impl(ident, loading.load_on_pk_identity)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/query.py", line 951, in _get_impl
return self.session._get_impl(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 2912, in _get_impl
return db_load_fn(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/loading.py", line 530, in load_on_pk_identity
session.execute(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 1711, in execute
conn = self._connection_for_bind(bind)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 1552, in _connection_for_bind
return self._transaction._connection_for_bind(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 721, in _connection_for_bind
self._assert_active()
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (builtins.RecursionError) maximum recursion depth exceeded while calling a Python object
[SQL: INSERT INTO harvest_log (id, content, level, created) VALUES (%(id)s, %(content)s, %(level)s, %(created)s)]
[parameters: [{'content': 'Harvest source not found for dataset 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9', 'level': 'ERROR'}]] (Background on this error at: https://sqlalche.me/e/14/7s2a)
2023-11-14 16:29:53,790 INFO [ckanext.harvest.plugin] Creating harvest source: {'__extras': {'active': True}, 'frequency': 'MANUAL', 'name': 'geolatvija-test', 'owner_org': '537dea7f-c55d-4607-8838-260bea1c7f1e', 'source_type': 'csw', 'title': 'zmni', 'type': 'harvest', 'url': 'https://geolatvija-test.vraa.gov.lv/geonetwork/opendata/eng/csw', 'extras': [{'key': 'frequency', 'value': 'MANUAL'}, {'key': 'source_type', 'value': 'csw'}], 'creator_user_id': '452a87b4-897b-4b62-8d8a-eae98964ce55', 'id': '8bbf3c49-0fac-4575-a0bb-d3e52b6996f9'}
2023-11-14 16:29:54,465 INFO [ckanext.harvest.plugin] Harvest source created: 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9
Traceback (most recent call last):
File "/usr/lib/ckan/default/bin/ckan", line 8, in
sys.exit(ckan())
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/cli.py", line 51, in create
result = utils.create_harvest_source(
File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/utils.py", line 139, in create_harvest_source
source = tk.get_action("harvest_source_create")(context, data_dict)
File "/usr/lib/ckan/default/src/ckan/ckan/logic/init.py", line 551, in wrapped
result = _action(context, data_dict, **kw)
File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/logic/action/create.py", line 72, in harvest_source_create
source = toolkit.get_action('package_create')(context, data_dict)
File "/usr/lib/ckan/default/src/ckan/ckan/logic/init.py", line 551, in wrapped
result = _action(context, data_dict, **kw)
File "/usr/lib/ckan/default/src/ckan/ckan/logic/action/create.py", line 243, in package_create
model.repo.commit()
File "", line 2, in commit
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 1451, in commit
self._transaction.commit(_to_root=self.future)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 827, in commit
self._assert_active(prepared_ok=True)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (builtins.RecursionError) maximum recursion depth exceeded while calling a Python object
[SQL: INSERT INTO harvest_log (id, content, level, created) VALUES (%(id)s, %(content)s, %(level)s, %(created)s)]
[parameters: [{'content': 'Harvest source not found for dataset 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9', 'level': 'ERROR'}]]
@amercader
Copy link
Member

hi @vicehovskis , I did a quick test harvesting one of the CSW servers that you provided and it worked fine on CKAN 2.10.1 so let's go step by step.

The first error you mention (500 Internal error), this is an error returned by the remote CSW server, not CKAN. The harvester is failing because the remote server returned an error. There's not much we can do here, maybe if you share the URL we can identify some problem with it.

Regarding the second stack trace, I see two potential problems. First I'd try disabling the harvesting logging, as it seems like it's trying to add some records to the harvest_log table and failing. Set ckan.harvest.log_scope = -1 or just remove all the ckan.harvest.log_* options.

Second this error Harvest source not found for dataset 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9 suggest that the harvest_* tables might not be present in the database. Check that you have a harvest_source table with records, and that those records ids match datasets (table package) of type="harvest" with the same ids. Perhaps the tables were not migrated during the migration.

Hope this helps, it's hard to debug without more details.

@vicehovskis
Copy link
Author

Hello, @amercader Thanks for reply! About first issue with 500 code error - where we can see which request harvester is sending, which returns this error?

About second one - developer says, that after turning off logging it is possible to register new harvesting source.

@amercader
Copy link
Member

About first issue with 500 code error - where we can see which request harvester is sending, which returns this error?

This should be listed as a harvest error on the harvest admin UI, eg https://yoursite/harvest/<csw source/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants