Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache the results of match, validate, and compare #91

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nevsan
Copy link

@nevsan nevsan commented Feb 3, 2020

I've found that calling semantic_version.match(...) quickly becomes the performance bottleneck in tight loops, forcing me to wrap it in another function that I can put behind an lru_cache. I propose working into the package permanently.

@rbarrois
Copy link
Owner

rbarrois commented Feb 4, 2020

Thanks for the feedback! Do you have some performance measurement or a typical test case?
Ideally, I'd rather fix the performance issue at its core ;)

@nevsan
Copy link
Author

nevsan commented Feb 11, 2020

Running 10k matches (same string) on the current version completes in about 0.759s. Below are the top contributors (re.match seems to be one of the biggest ones and probably not easy to workaround). If I run the version proposed here, those same 10k matches run in about 0.002s. I don't think there's any way to improve the existing code to run 380x faster, and so I believe caching at the API interface is a good "free" way to gain a significant performance improvement.

         820004 function calls (800004 primitive calls) in 0.759 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    50000    0.078    0.000    0.245    0.000 base.py:84(__init__)
    40000    0.077    0.000    0.077    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
    10000    0.072    0.000    0.293    0.000 base.py:1056(parse_block)
    40000    0.061    0.000    0.097    0.000 base.py:363(_validate_kwargs)
    10000    0.033    0.000    0.033    0.000 {built-in method _warnings.warn}
    20000    0.031    0.000    0.182    0.000 base.py:929(match)
    10000    0.031    0.000    0.069    0.000 base.py:291(parse)
   100000    0.030    0.000    0.030    0.000 base.py:121(_coerce)
    20000    0.027    0.000    0.106    0.000 base.py:175(truncate)
    20000    0.023    0.000    0.023    0.000 base.py:918(__init__)
    10000    0.022    0.000    0.751    0.000 base.py:570(match)
    10000    0.021    0.000    0.425    0.000 base.py:1182(__init__)
    10000    0.020    0.000    0.348    0.000 base.py:1028(parse)
    10000    0.018    0.000    0.050    0.000 base.py:770(__init__)
    40000    0.017    0.000    0.017    0.000 base.py:410(precedence_key)
    30000    0.015    0.000    0.197    0.000 base.py:775(<genexpr>)
    80000    0.015    0.000    0.015    0.000 base.py:351(_validate_identifiers)
40000/20000    0.013    0.000    0.023    0.000 {built-in method builtins.hash}
    10000    0.013    0.000    0.367    0.000 base.py:615(__init__)
    10000    0.013    0.000    0.050    0.000 base.py:127(next_major)
    10000    0.012    0.000    0.024    0.000 base.py:472(__ge__)
    10000    0.011    0.000    0.067    0.000 base.py:835(__and__)
    10000    0.010    0.000    0.021    0.000 base.py:457(__lt__)
    40000    0.010    0.000    0.010    0.000 {built-in method builtins.isinstance}
    20000    0.010    0.000    0.013    0.000 base.py:405(__hash__)
    10000    0.010    0.000    0.216    0.000 base.py:774(match)
    10000    0.009    0.000    0.206    0.000 {built-in method builtins.all}
    20000    0.009    0.000    0.031    0.000 base.py:978(__hash__)
    30000    0.008    0.000    0.008    0.000 base.py:10(_has_leading_zero)
    20000    0.007    0.000    0.007    0.000 {method 'groups' of '_sre.SRE_Match' objects}
        1    0.007    0.007    0.759    0.759 <ipython-input-2-f9cce3d7d5c5>:1(match_test)
    10000    0.006    0.000    0.222    0.000 base.py:636(match)
    10000    0.005    0.000    0.354    0.000 base.py:1012(_parse_to_clause)
    10000    0.005    0.000    0.005    0.000 {method 'split' of 'str' objects}
    10000    0.003    0.000    0.003    0.000 {method 'get' of 'dict' objects}
    10000    0.002    0.000    0.002    0.000 {built-in method builtins.len}
    10000    0.002    0.000    0.002    0.000 {method 'join' of 'str' objects}
    10000    0.002    0.000    0.002    0.000 base.py:886(__and__)
        1    0.000    0.000    0.759    0.759 {built-in method builtins.exec}
        1    0.000    0.000    0.759    0.759 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

@nevsan
Copy link
Author

nevsan commented May 20, 2020

Ran into another case where semantic_version.match was taking 73% of an inner loop function. Do we want to revisit this?

I've found that calling `semantic_version.match(...)` quickly becomes the performance bottleneck in tight loops, forcing me to wrap it in another function that I can put behind an `lru_cache`.  I propose working into the package permanently.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants