Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting in OpenSearch #627

Open
10 tasks done
sfisher opened this issue May 14, 2024 · 1 comment
Open
10 tasks done

Sorting in OpenSearch #627

sfisher opened this issue May 14, 2024 · 1 comment
Assignees

Comments

@sfisher
Copy link
Contributor

sfisher commented May 14, 2024

I implemented an initial version of the sort feature that used to exist in the database.

I get the warning:

RequestError(400, 'search_phase_execution_exception', 'Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [resource.title] in order to load field data by uninverting the inverted index. Note that this can use significant memory.')

Probably the most efficient way to solve this is to also store a "sort" version of the fields we want to sort on with a limited length. This field would be of keyword type rather than text (since the text has the problem of high processing usage).

My initial proposal is that we only support sorting on the 6 or so columns that the search actually displays for sorting column headings. (Right now the backend code for the DB allows sorting on a huge number of fields like 15-20, which I think is overkill.)

  • Title
  • Creator
  • Identifier
  • Object Publisher
  • Publication date (already works)
  • Object type

This would mean adding some special fields for these with limited lengths (how many degrees of precision do we really need for sort? 10-50 characters maybe? and that would limit the growth of the index too much).

It seems like the "manage IDs" presents a couple more options for columns that could be sortable. I believe these are the extra 4.

  • ID owner
  • Created Date
  • ID date last modified (??? what is this field, not sure we have it in the OpenSearch data?)
  • ID Status

(I also believe the date-style fields do not need additional indexing.)

@sfisher sfisher self-assigned this May 14, 2024
@sfisher
Copy link
Contributor Author

sfisher commented May 17, 2024

Opensearch has some hidden keyword fields that work for sorting. This all seems to be working now for the two main forms (search, manage).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant