Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

neo4j.core.bolt.request activesupport instrumentation is incorrect #1647

Open
yourpalal opened this issue Mar 10, 2021 · 2 comments
Open

neo4j.core.bolt.request activesupport instrumentation is incorrect #1647

yourpalal opened this issue Mar 10, 2021 · 2 comments

Comments

@yourpalal
Copy link

The neo4j.core.bolt.request event duration seems to generally be very short, often 0ms. This is even the case when running queries that take multiple seconds to run.

Unfortunately, this makes it very hard to get an idea of how neo4j queries are performing, what queries are slow, etc.

My guess at what's wrong here is that it's measuring the time to send the query and get some response, but not to get the full response data. I could be wrong though. I tried to dig into this but I'm not familiar at all with neo4j-ruby-driver and seabolt, so I wasn't able to make much progress.

I was however able to confirm:

example

Below is the timing information for a query being run in the neo4j web interface. This query is shown in the log as taking 0ms. In both cases, the neo4j server is being run on my machine. The query itself is limited to 500 returned entities, but does basically a full scan of my neo4j db.

image

Runtime information:

versions:

    activegraph (10.1.0)
    neo4j-ruby-driver (1.7.4)
    activesupport (6.1.3)

    ruby 3.0.0 (mri)

neo4j: 4.0.0

@yourpalal
Copy link
Author

@klobuczek do you have any ideas on where to start looking in order to fix this? I have a feeling that improving the instrumentation will probably involve getting a better idea of how the driver works?

If you can provide some pointers on where to look I will put together a PR.

@yourpalal
Copy link
Author

I noticed that the run handlers in the neo4j-ruby-driver classes have access to a summary object with nice run-time info that appears to come straight from the driver, or maybe from the server.

The run handlers also have a lifecycle that allows for recognizing when a query has actually finished pulling all the related data.

Using this, I did a little monkey patching in a rails initializer to hack together a proof-of-concept!

class Neo4j::Driver::Internal::Handlers::TransactionPullAllResponseHandler
  def after_success(_)
    # instrument new event - finished queries
    ActiveSupport::Notifications.instrument('neo4j.core.bolt.result', summary)
  end
end

module ActiveGraph
  module Base
    def self.subscribe_to_request(&logger_proc)
      super(&logger_proc)

      ActiveSupport::Notifications.subscribe('neo4j.core.bolt.result') do |_, start, finish, _id, summary|
        query = summary.statement.text
        available = summary.result_available_after
        consumed = summary.result_consumed_after

        yield "#{ANSI::RED}#{query}#{ANSI::CLEAR} #{ANSI::BLUE}BOLT:#{ANSI::CLEAR} #{ANSI::YELLOW}#{available}ms + #{consumed}ms#{ANSI::CLEAR}"
      end
    end
  end
end

Obviously this isn't perfect, but I think it shows that there's good information to be had that can be used for instrumentation!

This also confirmed that the current instrumentation numbers don't tell the whole story. In any substantial query, the durations logged with my new instrumentation are longer than those given by the standard neo4j.core.bolt.request instrumentation.

@klobuczek if you can provide a pointer on what the ideal form of this would be I can clean it up and turn it into a nice PR.

  • presumably all instrumentation should be in the activegraph gem. Would monkey-patching TransactionPullAllResponseHandler (or another handler) be acceptable in that gem?
  • the order of the neo4j.core.bolt.query and neo4j.core.bolt.result (my addition) instrumentations is no longer be guaranteed. Would it be okay to merge them into a single instrumentation that could thereby ensure the query text + times are correctly associated in the log file?
  • do you foresee any problems with this approach? One issue I can see is that the time it takes to pull all results is dependent on the time the user application takes to iterate through each on the query. That said, I think this is still very helpful information. Quite often, the results are pulled into an array, which should not add much overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant