Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jq and try_jq function. #17217

Merged
merged 5 commits into from
Jul 8, 2024
Merged

Add jq and try_jq function. #17217

merged 5 commits into from
Jul 8, 2024

Conversation

fengttt
Copy link
Contributor

@fengttt fengttt commented Jun 29, 2024

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #17134

What this PR does / why we need it:

Added jq and try_jq function.


PR Type

Enhancement, Tests


Description

  • Added new jq and try_jq functions for JSON querying.
  • Implemented caching for jq queries and added an encoder for jq results.
  • Refactored Or method in Nulls to modify in place.
  • Added methods for handling null values in FunctionResult.
  • Updated null handling in opBinaryStrStrToFixedWithErrorCheck to use new FunctionResult methods.
  • Added null checks in FunctionSelectList methods.
  • Added JQ and TRY_JQ function IDs.
  • Added jq and try_jq to the list of supported functions.
  • Added gojq dependency and updated golang.org/x/sys dependency.
  • Added test cases and SQL test script for jq and try_jq functions.

Changes walkthrough 📝

Relevant files
Enhancement
7 files
nulls.go
Refactor `Or` method in `Nulls` to modify in place             

pkg/container/nulls/nulls.go

  • Refactored Or method to modify nsp in place instead of returning a new
    Nulls object.
  • +4/-11   
    functionTools.go
    Add null handling methods to `FunctionResult`                       

    pkg/container/vector/functionTools.go

    • Added methods for handling null values in FunctionResult.
    +17/-0   
    baseTemplate.go
    Update null handling in `opBinaryStrStrToFixedWithErrorCheck`

    pkg/sql/plan/function/baseTemplate.go

  • Updated null handling in opBinaryStrStrToFixedWithErrorCheck to use
    new FunctionResult methods.
  • +14/-15 
    func_builtin_jq.go
    Add `jq` and `try_jq` functions for JSON querying               

    pkg/sql/plan/function/func_builtin_jq.go

  • Added new jq and try_jq functions for JSON querying.
  • Implemented caching for jq queries.
  • Added an encoder for jq results.
  • +491/-0 
    function.go
    Add null checks in `FunctionSelectList` methods                   

    pkg/sql/plan/function/function.go

    • Added null checks in FunctionSelectList methods.
    +9/-0     
    function_id.go
    Add `JQ` and `TRY_JQ` function IDs                                             

    pkg/sql/plan/function/function_id.go

    • Added JQ and TRY_JQ function IDs.
    +4/-0     
    list_builtIn.go
    Add `jq` and `try_jq` to supported functions                         

    pkg/sql/plan/function/list_builtIn.go

    • Added jq and try_jq to the list of supported functions.
    +40/-0   
    Dependencies
    2 files
    go.mod
    Add `gojq` dependency and update `golang.org/x/sys`           

    go.mod

    • Added gojq dependency.
    • Updated golang.org/x/sys dependency.
    +4/-2     
    go.sum
    Update checksums for dependencies                                               

    go.sum

    • Updated checksums for new and updated dependencies.
    +10/-5   
    Tests
    2 files
    func_jq.result
    Add test cases for `jq` and `try_jq` functions                     

    test/distributed/cases/function/func_jq.result

    • Added test cases for jq and try_jq functions.
    +306/-0 
    func_jq.sql
    Add SQL test script for `jq` and `try_jq` functions           

    test/distributed/cases/function/func_jq.sql

    • Added SQL test script for jq and try_jq functions.
    +125/-0 

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review [1-5] 4
    🧪 Relevant tests Yes
    🔒 Security concerns No
    ⚡ Key issues to review Possible Bug:
    The implementation of the jq and try_jq functions in func_builtin_jq.go does not handle the case where both JSON and query are constants across multiple rows efficiently. This could lead to performance issues when the same query is executed multiple times on different rows.
    Refactoring Needed:
    The FunctionResult methods such as AddNullRange, AddNullAt, and AddNulls are repeatedly called in various parts of the code, suggesting that these operations could be abstracted into a more generic method to avoid redundancy and improve maintainability.
    Error Handling:
    The error handling in jqImpl method in func_builtin_jq.go could be improved by providing more specific error messages or handling specific types of JSON parsing errors more gracefully.

    @matrix-meow matrix-meow added the size/XL Denotes a PR that changes [1000, 1999] lines label Jun 29, 2024
    @mergify mergify bot added the kind/feature label Jun 29, 2024
    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Possible bug
    Fix invalid JSON and jq expression syntax

    Correct the syntax error by using valid JSON and jq expressions, ensuring proper
    encapsulation and syntax.

    test/distributed/cases/function/func_jq.result [35]

    -jq({"a":1} [2] 3, '. as {$a} ?// [$a] ?// $a | $a')
    +jq('{"a":1, "2": 3}', '. as {$a} ?// [$a] ?// $a | $a')
     
    • Apply this suggestion
    Suggestion importance[1-10]: 10

    Why: This suggestion fixes a major syntax error that would cause the jq function to fail, making it a critical improvement.

    10
    Add error handling for insufficient parameters to prevent runtime errors

    Consider handling the case where params has fewer than 2 elements before accessing them.
    This will prevent potential runtime panics due to index out of range errors.

    pkg/sql/plan/function/func_builtin_jq.go [72-73]

    +if len(params) < 2 {
    +    return errors.New("insufficient parameters for jq function")
    +}
     p1 := vector.GenerateFunctionStrParameter(params[0])
     p2 := vector.GenerateFunctionStrParameter(params[1])
     
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: This suggestion addresses a potential runtime panic by adding a check for the number of parameters, which is crucial for preventing errors and improving code robustness.

    9
    Correct JSON syntax by using double quotes around keys

    Replace the incorrect JSON syntax by using double quotes around object keys to prevent
    syntax errors in SQL queries.

    test/distributed/cases/function/func_jq.result [50]

    -jq({"foo": {bar: []} }, .)
    +jq({"foo": {"bar": []} }, '.')
     
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: This suggestion addresses a clear syntax error in the JSON object, which is crucial for the proper functioning of the SQL queries.

    9
    Ensure valid JSON syntax in jq function arguments

    Use valid JSON syntax for the jq function by ensuring all keys and string values are
    enclosed in double quotes.

    test/distributed/cases/function/func_jq.result [110]

    -jq({aa: .a, bb: .b})
    +jq({"aa": .a, "bb": .b})
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: The suggestion corrects the JSON syntax, which is important for the jq function to parse the input correctly.

    8
    Performance
    Optimize the cache clearing strategy to improve performance

    Replace the manual cache clearing strategy with a more efficient one that removes the
    least recently used (LRU) items instead of clearing all items, which can help maintain
    useful cache entries and improve performance.

    pkg/sql/plan/function/func_builtin_jq.go [249-256]

    -for key := range op.jqCache {
    -    delete(op.jqCache, key)
    +// Implement an LRU cache mechanism
    +if len(op.jqCache) >= jqMapSizeLimit {
    +    oldestKey := findOldestKey(op.jqCache) // Implement this function based on time or usage tracking
    +    delete(op.jqCache, oldestKey)
     }
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Implementing an LRU cache mechanism would improve performance by retaining frequently used items, which is a significant enhancement over the current approach of clearing all items.

    8
    Best practice
    Ensure JSON keys are properly quoted in jq queries

    Ensure that the JSON keys used in the jq queries are properly quoted to avoid syntax
    errors or unexpected behavior, especially when keys might be interpreted as SQL
    identifiers or functions.

    test/distributed/cases/function/func_jq.sql [25]

    -select jq('{"foo": {bar: []} }', '.');
    +select jq('{"foo": {"bar": []} }', '.');
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Properly quoting JSON keys is a best practice that can prevent syntax errors and unexpected behavior, especially in complex queries.

    8
    Use consistent quotation marks in JSON keys and values

    Ensure consistent use of single quotes for JSON keys and values to prevent syntax errors
    and maintain the consistency of the SQL queries.

    test/distributed/cases/function/func_jq.result [2]

    -jq({"foo": 128}, .foo)
    +jq('{"foo": 128}', '.foo')
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: The suggestion improves consistency and prevents potential syntax errors, but it is not critical as the existing code is technically correct.

    7
    Maintainability
    Refactor repeated null handling and error checking code into a separate method

    Refactor the repeated code for handling null values and errors in the tryJqImpl function
    into a separate method to improve code maintainability and reduce duplication.

    pkg/sql/plan/function/func_builtin_jq.go [87-104]

    -if null1 || null2 {
    -    rs.AddNullRange(0, uint64(length))
    -} else {
    -    code, err := op.getJqCode(string(v2))
    -    if err == nil {
    -        err = op.jqImpl(v1, code)
    +func (op *opBuiltInJq) handleParams(v1, v2 string, null1, null2 bool, length int, isTry bool) error {
    +    if null1 || null2 {
    +        rs.AddNullRange(0, uint64(length))
    +        return nil
         }
    -    if err != nil {
    -        if isTry {
    -            rs.AddNullRange(0, uint64(length))
    -            return nil
    -        } else {
    -            return err
    -        }
    -    }
    -    rs.AppendBytes(op.enc.bytes(), false)
    -    op.enc.done()
    +    return op.processParams(v1, v2, length, isTry)
     }
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: This refactoring improves code maintainability and reduces duplication, making the codebase easier to manage and understand.

    7
    Possible issue
    Replace jq with try_jq to handle potential JSON parsing errors more gracefully

    Consider handling potential exceptions or errors that might arise from the jq function
    when the JSON input or the query is malformed. Using try_jq in all cases where jq is used
    can prevent runtime errors and provide a safer way to handle JSON parsing and querying.

    test/distributed/cases/function/func_jq.sql [5-15]

    -select jq('{"foo": 128}', '.foo');
    -select jq('{"a": {"b": 42}}', '.a.b');
    -select jq('{"id": "sample", "10": {"b": 42}}', '{(.id): .["10"].b}');
    +select try_jq('{"foo": 128}', '.foo');
    +select try_jq('{"a": {"b": 42}}', '.a.b');
    +select try_jq('{"id": "sample", "10": {"b": 42}}', '{(.id): .["10"].b}');
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: This suggestion improves error handling by using try_jq instead of jq, which can prevent runtime errors when parsing JSON. However, it may not be necessary in all cases, so a blanket replacement might not be ideal.

    7
    Enhancement
    Implement structured logging for better error traceability in the jqImpl function

    Use a structured logging approach instead of relying on the standard error handling for
    better traceability and debugging of errors within the jqImpl function.

    pkg/sql/plan/function/func_builtin_jq.go [207-209]

     if err != nil {
    +    log.WithFields(log.Fields{
    +        "jsonStr": jsonStr,
    +        "error": err,
    +    }).Error("Failed to unmarshal JSON in jqImpl")
         return err
     }
     
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: Structured logging enhances debugging and traceability, but it is a minor enhancement compared to functional or performance improvements.

    6
    Validate JSON input to ensure it is well-formed and not empty

    Validate the JSON input in jq functions to ensure it is well-formed and not empty,
    especially when the function is expected to parse complex JSON structures or queries.

    test/distributed/cases/function/func_jq.sql [33]

    -select jq('', '.');
    +select jq('{"valid": "json"}', '.');
     
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: Ensuring that JSON input is well-formed and not empty is a good practice, but the suggested replacement JSON may not be appropriate for all test cases.

    6

    @mergify mergify bot merged commit 850b992 into matrixorigin:main Jul 8, 2024
    17 of 18 checks passed
    @fengttt fengttt deleted the fengttt-jq branch July 12, 2024 15:58
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    None yet

    6 participants