Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add min_by, max_by #74

Merged
merged 4 commits into from
Jun 30, 2023
Merged

Add min_by, max_by #74

merged 4 commits into from
Jun 30, 2023

Conversation

kklingenberg
Copy link
Contributor

This PR is a naive attempt to implement the missing min_by and max_by filters. I'm unsure about the part on run_if_ok and checking whether an error happened after reduction, but in the end I decided to imitate the sort_by and group_by functions on that, since I'm not very sure about how things work in the library, generally.

I was in doubt about sending the PR because this seems like a pair of functions that could have been implemented directly in std.jq using a map-reduce approach, but having tried that and benchmarked against gojq, I saw that it had to be provided from Rust to be comparable. In the end, and since I don't have much of a clue on how to make these things fast, I "ported" my jq implementation to Rust 😅.

Hopefully this is useful.

@kklingenberg kklingenberg changed the title Add min_by, max_by. Add min_by, max_by Apr 22, 2023
@01mf02
Copy link
Owner

01mf02 commented Jun 26, 2023

Hi @kklingenberg, thanks a lot for your PR!
I discovered a case where your implementation diverges from jq;
[{a: 1, b: 2}, {a: 1, b: 3}] | max_by(.a) | .b yields 3 in jq, but 2 in jaq.
Furthermore, it would be good to add a test where the function passed to max_by yields multiple outputs.
When you address both these issues, I will be happy to merge your PR.

@kklingenberg
Copy link
Contributor Author

Hello @01mf02 ! I've updated the PR and it should now follow your suggestions. Thank you! And thank you again for building jaq!

@01mf02
Copy link
Owner

01mf02 commented Jun 30, 2023

I made some benchmarks myself, and compared your native implementation with:

def min_by(f): reduce .[] as $x (.[0] | [., [f]]; ($x | [f]) as $y | if $y < .[1] then [$x, $y] else . end) | .[0];
def max_by(f): reduce .[] as $x (.[0] | [., [f]]; ($x | [f]) as $y | if $y < .[1] then . else [$x, $y] end) | .[0];

The filter [range(5000000)] | min_by(.) terminated in about 1.2 seconds with your native implementation, whereas the implementation by definition took about 10 seconds. This is clearly in support of having a native implementation, even if it adds quite some code.

@01mf02 01mf02 merged commit abe8117 into 01mf02:main Jun 30, 2023
1 check passed
@01mf02
Copy link
Owner

01mf02 commented Jun 30, 2023

I reworked your implementation by making it abort on the first error, instead of continuing until the end of the array and only then yielding the error.
Furthermore, because the tests are now for a native function, I moved most of them to jaq-core/tests.
Thanks again for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants