Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize fetching individual records on each cache type #622

Merged

Conversation

Th3-M4jor
Copy link
Contributor

It was reported that for bots in a large number of servers it takes sometimes half a second to fetch a user from the cache under the current :qlc based method as it will do a full table scan.

As a workaround, functions which only return a single record by its primary key will be optimized to avoid :qlc

@Th3-M4jor Th3-M4jor marked this pull request as ready for review August 10, 2024 21:49
@Th3-M4jor Th3-M4jor force-pushed the optimize-basic-get-by-id-all-caches branch from db8474f to cf4adc4 Compare August 10, 2024 21:51
@jchristgit jchristgit self-assigned this Aug 11, 2024
Copy link
Collaborator

@jchristgit jchristgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, now we've gone full circle!

It's a bit sad, but inspecting the QLC output I see where it's coming from, whilst QLC provides lookup functions for primary keys it seems the optimizer doesn't work for us :-(

Can you remove the no longer called functions from the *_qlc.erl modules as well?

lib/nostrum/cache/guild_cache/mnesia.ex Outdated Show resolved Hide resolved
lib/nostrum/cache/user_cache.ex Outdated Show resolved Hide resolved
@Bentheburrito
Copy link

Hello, I'm curious: couldn't you give QLC a more efficient match spec via the the {select, MatchSpec} option to :mnesia.table/2?
(disclaimer: I'm not familiar with the codebase here, just learning mnesia/QLC and wondering why that option wouldn't work if y'all already evaluated it)

@jchristgit
Copy link
Collaborator

Well, what confuses me about this is that I checked the queries in question
locally and QLC was already optimizing them to do exactly that.

@Th3-M4jor do you maybe have the :qlc.info output of the queries from the user
in question? Maybe they are doing something that causes the queries to be
unoptimized? Which cache backend is in use?

@Th3-M4jor
Copy link
Contributor Author

I had the user execute

handle = :nostrum_guild_cache_qlc.get(<put a real guild id here>, Nostrum.Cache.GuildCache.Mnesia)

:io.format(~c"~s~n", [:qlc.info(handle)])

And this was the result:

qlc:q([ 
       Guild ||
           {GuildId, Guild} <-
               mnesia:table(nostrum_guilds,
                            [{n_objects, 100},
                             {lock, read} |
                             {traverse,
                              {select,
                               [{{'_', '$1', '$2'},
                                 [],
                                 [{{'$1', '$2'}}]}]}}]),
           GuildId =:= RequestedGuildId
      ])

@Th3-M4jor
Copy link
Contributor Author

Th3-M4jor commented Aug 15, 2024

Okay this is even more confusing.

I checked against the ETS guild cache implementation and got this:

ets:match_spec_run(ets:lookup(nostrum_guilds, 377642134112567296),
                   ets:match_spec_compile([{{'$1', '$2'}, [], ['$2']}]))

That's entirely as expected, plenty fast.

But when using Mnesia it does do

qlc:q([ 
       Guild ||
           {GuildId, Guild} <-
               mnesia:table(nostrum_guilds,
                            [{n_objects, 100},
                             {lock, read} |
                             {traverse,
                              {select,
                               [{{'_', '$1', '$2'},
                                 [],
                                 [{{'$1', '$2'}}]}]}}]),
           GuildId =:= RequestedGuildId
      ])

@jchristgit
Copy link
Collaborator

jchristgit commented Aug 15, 2024 via email

@atlas-oc
Copy link

@jchristgit Hi! User in question here.

The bug is relatively simple to work around (just use Mnesia directly instead of the GuildCache module), but I can see it being a pretty big "gotcha" that's hard to debug and only shows up in production. Either way, this appears to definitely be unintended behavior and can cause simple lookups to take upwards of 3x as long as always running Api.get_guild!, thus (in my opinion) the Mnesia cache adapters are unusable in their current state without either documentation or a patch.

jchristgit added a commit that referenced this pull request Aug 15, 2024
QLC queries via mnesia-based caches that would use the `{traverse,
{select, MatchSpec}}` in any shape or form would cause the QLC query to
be executed in two parts, the `mnesia:table` call running the entire
table over the selected match specification, and then another Erlang
list comprehension that would filter the results of that across a list
comprehension in plain Erlang. Per the discussion in #622, this is how
such a query would look like under `qlc:info/1`:

    qlc:q([
           Guild ||
               {GuildId, Guild} <-
                   mnesia:table(nostrum_guilds,
                                [{n_objects, 100},
                                 {lock, read} |
                                 {traverse,
                                  {select,
                                   [{{'_', '$1', '$2'},
                                     [],
                                     [{{'$1', '$2'}}]}]}}]),
               GuildId =:= RequestedGuildId
          ])

The issue is that neither QLC nor mnesia can cleanly optimize it here:
Mnesia does not know about the condition specified in QLC, and QLC's
optimization to use a `lookup_fun` is knocked out by the fact that it
can't reach into the traverse call to detect the reordered columns. It
might be possible to implement this in QLC itself, given it is smart
enough to figure out when to use the lookup function based on the
indices, together with some cooperation from Mnesia itself. Obviously,
this behaviour would lead to unacceptable performance.

This commit introduces an optimization that allows guild, member and
presence cache implementations to export a `query_handle/1` function
that accepts a match specification guard, that is, the "middle part" of
a match specification. The guard determines which rows shall be
filtered. Do note, however, that this is still unable to perform a
complete optimization of lookups of single records - it will still
traverse the table, but in the native ETS code.
@jchristgit
Copy link
Collaborator

@atlas-oc thanks for chiming in. Yup, I agree, this is absolutely not intended behaviour.

I've digged pretty deep into the issue and opened #625 which hits an optimization point somewhere inbetween "loads the entire list into memory" and "O(1) query". More specifically, it runs the match specification in ETS directly. More information in that PR.

That being said, while the PR would help with this case, for single-instance lookups it is still slow. So all-in-all I believe that we should do both: I will review this PR tomorrow, merge it, then update the other PR and merge it and then on the weekend I can hopefully come around to documenting it & cutting a release with both.

@atlas-oc
Copy link

I admittedly have no familiarity with QLC, but is there any particular reason why we're using it as an abstraction here instead of expanding the "base" Cache modules to delegate more functionality to each implementation, such as fold and get? I feel letting operations be entirely implementation-dependent (so that Mnesia could, for instance, directly execute an O(1) lookup) would allow for more flexible custom cache implementations, especially if they don't natively speak QLC like :ets/:dets/Mnesia.

Copy link
Collaborator

@jchristgit jchristgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jchristgit jchristgit merged commit 034f68c into Kraigie:master Aug 16, 2024
10 checks passed
@jchristgit
Copy link
Collaborator

Thanks!

@jchristgit
Copy link
Collaborator

jchristgit commented Aug 17, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants