Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch changes list from review #654

Open
7 tasks done
sfisher opened this issue Jun 13, 2024 · 2 comments
Open
7 tasks done

OpenSearch changes list from review #654

sfisher opened this issue Jun 13, 2024 · 2 comments
Assignees

Comments

@sfisher
Copy link
Contributor

sfisher commented Jun 13, 2024

Jing and I have had a bunch of meetings about OpenSearch and there are requested changes.

  • Have the proc-link-checker-update.py update OpenSearch and the link_is_broken and has_issues fields. (around line 98).
    • See if I can add something to only update these fields in the OpenSearchDoc class instead of updating the full record just for those two fields.
  • Jing had concerns about updating OpenSearch in the proc-search-indexer.py using the Identifier (after SearchIdentifier in DB had been updated) but would like to use RefIdentifier instead, even though the tables should be in sync when updates happen. (around line 86).
    • The Identifier is the permanent source of truth, but there are some shenanigans going on with queuing and duplicating the same identifier information across three different tables (RefIdentifier, SearchIdentifier and Identifier).
    • She felt it would be best to use RefIdentifier since it's one of the copies of Identifier information being used there and we hope 🤞 that it maintains the contract of having all the same fields and methods that Identifier does so that we don't have to complicate or modify the indexing to include additional logic.
  • Jing noted that "datacenter" field only contains the ID (primary key info) of the DataCenter (and I never found more than that used in search or reports I needed to modify), but it thought it would be nicer to include more information such as the abbreviation and the full name in the search index.
    • OpenSearch schema needs updating in code.
    • OpenSearch indices for dev and stage then need to be completely rebuilt to include these subfields for all records.
  • There were concerns about the link checker process (see proc-link-checker-update.py:98) and generating our indices and doing index updates after release and being sure those were fully up to date. The "date update" feature of the indexing script should be able to update everything that has been modified after EITHER SearchIdentifier.updateTime OR Identifier.updateTime (the union of the two sets of update times).
@sfisher sfisher self-assigned this Jun 13, 2024
@sfisher sfisher changed the title OpenSearch changes from review OpenSearch changes list from review Jun 13, 2024
@sfisher
Copy link
Contributor Author

sfisher commented Jun 14, 2024

Hi @jsjiang

I've completed these changes that we discussed in my search-ui branch. I think you'll see the commits for them (the last 4 or 5 or so commits). As we discussed it seems ok to do indexing from the SearchIdentifiers rather than the Identifier. Trying to combine the queries with a union was difficult without dropping into pure SQL since the Django ORM models are set up oddly and there are not any real relationships between the tables as would normally be the case (1:1, 1:many or many:many) and with no real defined foreign keys, even though they all use the same ID key, I think.

They are indexing now and over the weekend and I hope they complete my Mon or Tuesday next week.

@sfisher
Copy link
Contributor Author

sfisher commented Jun 25, 2024

  • Also reverted the downloader script to use SearchIdentifier rather than removing the dependencies on it per PR comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant