-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/mx 1604 wikidata search endpoint #91
Open
mr-kamran-ali
wants to merge
24
commits into
main
Choose a base branch
from
feature/mx-1604-wikidata-search-endpoint
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+517
−58
Open
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
1cea046
implement wikidata endpoint
mr-kamran-ali e31a2e7
refactor model
mr-kamran-ali c1a2761
test wikidata endpoint
mr-kamran-ali 027c155
update changelog
mr-kamran-ali b90ddc1
make wikidata endpoint read key access
mr-kamran-ali 9ef61db
add sanity length check for query in wikidata search
mr-kamran-ali a86d1bc
cache the wikidata primary source
mr-kamran-ali 510c7a3
update mex-common version
mr-kamran-ali c76b946
update mex-common
mr-kamran-ali a3b3b1f
update lock
mr-kamran-ali 71a9b93
update lock
mr-kamran-ali a90ebd4
get total count of wikidata organizations
mr-kamran-ali 4aa31f1
update test
mr-kamran-ali c1c62a6
revert dependencies downgrade
mr-kamran-ali f048e1a
revert dependencies downgrade
mr-kamran-ali 9dcc59f
implement wikidata endpoint
mr-kamran-ali 6173e65
refactor model
mr-kamran-ali 977663e
update mex-common version
mr-kamran-ali b956658
update mex-common
mr-kamran-ali f1184b1
update lock
mr-kamran-ali 77bdb60
update lock
mr-kamran-ali 24c1283
get total count of wikidata organizations
mr-kamran-ali 6b2ba9e
revert dependencies downgrade
mr-kamran-ali b9c3532
revert dependencies downgrade
mr-kamran-ali File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
from typing import Generic, TypeVar | ||
|
||
from pydantic import BaseModel | ||
|
||
T = TypeVar("T") | ||
|
||
|
||
class PagedResponseSchema(BaseModel, Generic[T]): | ||
"""Response schema for any paged API.""" | ||
|
||
total: int | ||
offset: int | ||
limit: int | ||
results: list[T] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
from functools import cache | ||
from typing import Annotated | ||
|
||
from fastapi import APIRouter, Query | ||
|
||
from mex.backend.auxiliary.models import PagedResponseSchema | ||
from mex.common.models import ExtractedOrganization, ExtractedPrimarySource | ||
from mex.common.primary_source.extract import extract_seed_primary_sources | ||
from mex.common.primary_source.transform import ( | ||
get_primary_sources_by_name, | ||
transform_seed_primary_sources_to_extracted_primary_sources, | ||
) | ||
from mex.common.types import TextLanguage | ||
from mex.common.wikidata.extract import ( | ||
get_count_of_found_organizations_by_label, | ||
search_organizations_by_label, | ||
) | ||
from mex.common.wikidata.transform import ( | ||
transform_wikidata_organizations_to_extracted_organizations, | ||
) | ||
|
||
router = APIRouter() | ||
|
||
|
||
@router.get("/wikidata", status_code=200, tags=["wikidata"]) | ||
def search_organization_in_wikidata( | ||
q: Annotated[str, Query(min_length=1, max_length=1000)], | ||
offset: Annotated[int, Query(ge=0, le=10e10)] = 0, | ||
limit: Annotated[int, Query(ge=1, le=100)] = 10, | ||
lang: TextLanguage = TextLanguage.EN, | ||
) -> PagedResponseSchema[ExtractedOrganization]: | ||
"""Search an organization in wikidata. | ||
|
||
Args: | ||
q: label of the organization to be searched | ||
offset: start page number | ||
limit: end page number | ||
lang: language of the label. Example: en, de | ||
|
||
Returns: | ||
Paginated list of ExtractedOrganization | ||
""" | ||
total_orgs = get_count_of_found_organizations_by_label(q, lang) | ||
organizations = search_organizations_by_label(q, offset, limit, lang) | ||
|
||
extracted_organizations = list( | ||
transform_wikidata_organizations_to_extracted_organizations( | ||
organizations, extracted_primary_source_wikidata() | ||
) | ||
) | ||
|
||
return PagedResponseSchema( | ||
total=total_orgs, | ||
offset=offset, | ||
limit=limit, | ||
results=[organization for organization in extracted_organizations], | ||
) | ||
|
||
|
||
@cache | ||
def extracted_primary_source_wikidata() -> ExtractedPrimarySource: | ||
cutoffthetop marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Load and return wikidata primary source.""" | ||
seed_primary_sources = extract_seed_primary_sources() | ||
extracted_primary_sources = list( | ||
transform_seed_primary_sources_to_extracted_primary_sources( | ||
seed_primary_sources | ||
) | ||
) | ||
(extracted_primary_source_wikidata,) = get_primary_sources_by_name( | ||
extracted_primary_sources, | ||
"wikidata", | ||
) | ||
|
||
return extracted_primary_source_wikidata |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
import json | ||
from pathlib import Path | ||
from typing import Any | ||
from unittest.mock import MagicMock, Mock | ||
|
||
import pytest | ||
import requests | ||
from pytest import MonkeyPatch | ||
from requests import Response | ||
|
||
from mex.common.wikidata.connector import ( | ||
WikidataAPIConnector, | ||
WikidataQueryServiceConnector, | ||
) | ||
from mex.common.wikidata.models.organization import WikidataOrganization | ||
|
||
TEST_DATA_DIR = Path(__file__).parent / "test_data" | ||
|
||
|
||
@pytest.fixture | ||
def wikidata_organization_raw() -> dict[str, Any]: | ||
"""Return a raw wikidata organization.""" | ||
with open(TEST_DATA_DIR / "wikidata_organization_raw.json") as fh: | ||
return json.load(fh) | ||
|
||
|
||
@pytest.fixture | ||
def wikidata_organization( | ||
wikidata_organization_raw: dict[str, Any], | ||
) -> WikidataOrganization: | ||
"""Return a wikidata organization instance.""" | ||
return WikidataOrganization.model_validate(wikidata_organization_raw) | ||
|
||
|
||
@pytest.fixture | ||
def mocked_wikidata( | ||
monkeypatch: MonkeyPatch, wikidata_organization_raw: dict[str, Any] | ||
) -> None: | ||
"""Mock wikidata connector.""" | ||
response_query = Mock(spec=Response, status_code=200) | ||
|
||
session = MagicMock(spec=requests.Session) | ||
session.get = MagicMock(side_effect=[response_query]) | ||
|
||
def mocked_init(self: WikidataQueryServiceConnector) -> None: | ||
self.session = session | ||
|
||
monkeypatch.setattr(WikidataQueryServiceConnector, "__init__", mocked_init) | ||
monkeypatch.setattr(WikidataAPIConnector, "__init__", mocked_init) | ||
|
||
# mock search_wikidata_with_query | ||
|
||
def get_data_by_query( | ||
self: WikidataQueryServiceConnector, query: str | ||
) -> list[dict[str, dict[str, str]]]: | ||
return [ | ||
{ | ||
"item": { | ||
"type": "uri", | ||
"value": "http://www.wikidata.org/entity/Q26678", | ||
}, | ||
"itemLabel": {"xml:lang": "en", "type": "literal", "value": "BMW"}, | ||
"itemDescription": { | ||
"xml:lang": "en", | ||
"type": "literal", | ||
"value": "German automotive manufacturer, and conglomerate", | ||
}, | ||
"count": { | ||
"datatype": "http://www.w3.org/2001/XMLSchema#integer", | ||
"type": "literal", | ||
"value": "3", | ||
}, | ||
}, | ||
] | ||
|
||
monkeypatch.setattr( | ||
WikidataQueryServiceConnector, "get_data_by_query", get_data_by_query | ||
) | ||
|
||
# mock get_wikidata_org_with_org_id | ||
|
||
def get_wikidata_item_details_by_id( | ||
self: WikidataQueryServiceConnector, item_id: str | ||
) -> dict[str, str]: | ||
return wikidata_organization_raw | ||
|
||
monkeypatch.setattr( | ||
WikidataAPIConnector, | ||
"get_wikidata_item_details_by_id", | ||
get_wikidata_item_details_by_id, | ||
) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total
is supposed to be the total number of search hits found on wikidata, not the number of returned items, any request-sender could figure that out by itself. (same is true for offset and limit, we don't need to mirror that back.)e.g. if you'd search for "institute" you might get a couple of hundred hits on wikidata, but we only return
limit
/ up to 100 of them. and a UI would need to know how many there are overall, to display a proper pagination bar.i know this might complicate this PR a lot and i'm sorry i didn't spot that earlier in the mex-common pr. so at this point, i'd suggest you add a stop-gap comment in this line and create a new ticket stub to return the correct pagination total later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in 561e341