Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor parsing logic of the Query Builder #1793

Closed

Conversation

jmazanec15
Copy link
Member

Description

I was working on some code for #1779 where I had to add a parameter to KNNQueryBuilder and noticed it has gotten pretty complex. I know @shatejas recently did some refactoring and wanted to build upon that to see if I could simplify the parsing logic.

What I found was that we can use ObjectParser on the inner structure of the query to better define the parsing structure so we do not have to deal with tokens. I have noticed a couple other queries also use this, but many still do it the old way (one example using parser: https://github.com/opensearch-project/OpenSearch/blob/main/modules/mapper-extras/src/main/java/org/opensearch/index/query/RankFeatureQueryBuilder.java)

Anyway, this is in early phases. Just wanted to get some preliminary feedback.

Issues Resolved

[List any issues this PR will resolve]

Check List

  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: John Mazanec <[email protected]>
Put important static methods at the top and helper methods at bottom to
improve readability.

Signed-off-by: John Mazanec <[email protected]>
Copy link
Contributor

@shatejas shatejas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! None of them are blockers, some minor suggestions

nit:
This is a critical path, lets make sure there are heavy unit tests around this. I know there are some, but I am not really sure if it covers all use cases.

Might be worth having KNNQueryBuilderParserTest which manages all parsing tests. I attempted it with InvalidParamsTest

* @param vector Array of floating points
* @deprecated Use {@code {@link KNNQueryBuilder.Builder}} instead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add it back? the intent was to always use the builder so it will validate whenever the object is created

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops - copy and paste error. will add back

INTERNAL_PARSER.declareObject(KNNQueryBuilder.Builder::filter, (p, v) -> parseInnerQueryBuilder(p), FILTER_FIELD);
}

public static KNNQueryBuilder fromXContent(XContentParser parser) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much simpler. Thanks for making this more maintainable

b.ignoreUnmapped(v);
}
}, IGNORE_UNMAPPED_FIELD);
INTERNAL_PARSER.declareFloat(KNNQueryBuilder.Builder::maxDistance, MAX_DISTANCE_FIELD);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding it to the builder while parsing it was exactly what I had in mind. Seems like ObjectParser is giving us a lot of boiler plate. Nice find!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Im surprised more queries do not do this. The fromXContent was becoming a mess. This greatly simplfiies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to check how this will impact to overall latency. Latency could be the reason of not using this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good point @heemin32. I saw the ConstructingObjectParser may add some latency, but didnt see anything about ObjectParser.

fieldName = in.readString();
vector = in.readFloatArray();
k = in.readInt();
filter = in.readOptionalNamedWriteable(QueryBuilder.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you might have a conflict with backport merge. Might be worth adding back the check and tackle it on major version upgrade

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh is this the issue you ran into? I can look into it.

@@ -114,8 +225,20 @@ public KNNQueryBuilder(String fieldName, float[] vector) {
if (vector.length == 0) {
throw new IllegalArgumentException(String.format("[%s] query vector is empty", NAME));
}
if (k <= 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Had marked deprecated so we don't have to add stuff. But I understand the risk, hopefully nothing other than tests are using it

NAME,
KNNQueryBuilder.Builder::new
);
static {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

  1. Instead of static block you can make a static method out it which returns ObjectParser
  2. Consider pulling this and the utility methods related to parsing into its own class. I attempted that with MethodParameterParser to single out the responsibility and better testing. The interface is made in a way that forces a lot of responsibilities on this class but we can always pull it out. Ideally I would want to see just the doToQuery logic in here, rest all can be delegated to other classes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to pull out parsing into separate class. Will make that change.

@jmazanec15
Copy link
Member Author

@shatejas Yes that makes sense. I think Ill try to take it out of draft state and add some tests, etc. @heemin32 raises good point on latency. Ill see if I can do a quick analysis to show this wont add anything or much.

@jmazanec15
Copy link
Member Author

closing in favor of #1824

@jmazanec15 jmazanec15 closed this Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants