Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix caching issues in JSON::Util::URI #515

Merged
merged 16 commits into from
Aug 19, 2024

Conversation

bolshakov
Copy link
Contributor

@bolshakov bolshakov commented Jul 5, 2024

This PR fixes the issue #514

The implementation of the whole JSON::Util::URI was not thread-safe. For instance, if you look at the .parse method.

def self.parse(uri)
  if uri.is_a?(Addressable::URI) 
    uri.dup
  else
    @parse_cache ||= {}
    parsed_uri = @parse_cache[uri]
    if parsed_uri
      parsed_uri.dup
    else
      @parse_cache[uri] = Addressable::URI.parse(uri)
    end
  end
rescue Addressable::URI::InvalidURIError => e
  raise JSON::Schema::UriError, e.message
end

In line #8, the URL is retrieved from the cache, and the caller receives a duplicate of the cached URL. However, in line #10, the caller does not receive a duplicate, and in a few instances, it changes the returned reference, effectively modifying it in the cache.

def self.normalize_ref(ref, base)
  ref_uri = parse(ref)
  base_uri = parse(base)

  ref_uri.defer_validation do
    if ref_uri.relative?
      ref_uri.merge!(base_uri)

      # Check for absolute path
      path, fragment = ref.to_s.split('#')
      if path.nil? || path == ''
        ref_uri.path = base_uri.path
      elsif path[0, 1] == '/'
        ref_uri.path = Pathname.new(path).cleanpath.to_s
      else
        ref_uri.join!(path)
      end

      ref_uri.fragment = fragment
    end

    ref_uri.fragment = '' if ref_uri.fragment.nil? || ref_uri.fragment.empty?
  end

  ref_uri
end

As you can see, if the referred URI is not in the cache, it is added to the cache and then modified inside the .normalize_ref method. Basically, we pass an object by reference between methods and modify it. This code is not thread-safe because it accesses and modifies a shared resource, @parse_cache, without any form of synchronization.

Other methods that call .parse method to resolve references also implement a similar non-synchronized caching approach. In a multi-threaded environment, this could lead to a race condition and potential incorrect resolution of references in JSON Schemas.

Initially, I wanted to add Mutex synchronization or use thread-safe data structures (such as Corcurrent::Map for cache), but it appeared that the code is deeply intertwined, with many methods from the module calling each other, making synchronization prone to deadlocks.

I have removed caching from this module entirely because it is not thread-safe and causes more problems than it solves. The improvements it offers are minimal since we don't cache expensive network or disk calls but only in-memory calculations.

I also fixed a couple of bugs in method implementations along the way:

I tried to keep, changes in the tests suite at minimum, but ended up changing this single test which behaviour was not consistent with other test examples..

Copy link

codecov bot commented Jul 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.03%. Comparing base (4aed27d) to head (df39511).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #515      +/-   ##
==========================================
- Coverage   90.08%   90.03%   -0.06%     
==========================================
  Files          76       76              
  Lines        1584     1575       -9     
==========================================
- Hits         1427     1418       -9     
  Misses        157      157              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bolshakov bolshakov changed the title Reproduce caching issue with the JSON::Util::URI.normalize_ref method [WIP] Reproduce caching issue with the JSON::Util::URI.normalize_ref method Jul 6, 2024
@bolshakov bolshakov marked this pull request as ready for review July 6, 2024 17:26
@bolshakov bolshakov changed the title [WIP] Reproduce caching issue with the JSON::Util::URI.normalize_ref method Refactor JSON::Util::URI Jul 6, 2024
@bolshakov bolshakov changed the title Refactor JSON::Util::URI Fix cashing issues in JSON::Util::URI Jul 6, 2024
@bolshakov bolshakov changed the title Fix cashing issues in JSON::Util::URI Fix caching issues in JSON::Util::URI Jul 6, 2024
@bolshakov
Copy link
Contributor Author

Hey, @bastelfreak 👋 The PR Is ready for review. I would appreciate your feedback

@bolshakov
Copy link
Contributor Author

@ekohl, @TheMeier, @evgeni folks, may you also help with the review?

@evgeni
Copy link
Member

evgeni commented Jul 9, 2024

I have no idea about this gem, sorry.

@bolshakov
Copy link
Contributor Author

My apologies. I pinged you because you reviewed one of the recent PRs #509

bastelfreak added a commit to bastelfreak/json-schema that referenced this pull request Jul 12, 2024
Both ruby version are EoL.

We've another breaking change that I need to release,
voxpupuli#515. 2.7 is EoL as well,
but still widely used in the Puppet Ecosystem, so I think switching to
2.7 for now is fine.
bastelfreak added a commit to bastelfreak/json-schema that referenced this pull request Jul 12, 2024
Both ruby version are EoL.

We've another breaking change that I need to release,
voxpupuli#515. 2.7 is EoL as well,
but still widely used in the Puppet Ecosystem, so I think switching to
2.7 for now is fine.
@bastelfreak
Copy link
Member

@bolshakov I think we're ready for a merge. Can you please rebase this and if the updated rubocop config is happy, I can merge it.

@bolshakov
Copy link
Contributor Author

@bastelfreak ready

@mjknight50
Copy link

Is there a timeline on when this will be merged?

@bastelfreak
Copy link
Member

Hi, I am sorry for the huge delay. Thanks for all the work and thanks for pinging me again!

I decided not to re-raise the `Addressable::URI::InvalidURIError`, which was originally re-raised
as `JSON::Schema::UriError`. This decision was made because the subsequent call
to `Addressable::URI.unescape(uri)` does not handle these errors, requiring the user to handle
both types of errors.

Given that any call to the Addressable gem could potentially lead to an error, and
the JSON Schema gem does not consistently wrap them as `JSON::Schema::UriError`,

I decided to simplify the code for now. Error handling can be added later if necessary.
@bastelfreak bastelfreak merged commit 96d8bd5 into voxpupuli:master Aug 19, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants