Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: large snapshot diff recursion error #776

Merged
merged 12 commits into from
Jul 20, 2023

Conversation

iamogbz
Copy link
Collaborator

@iamogbz iamogbz commented Jul 20, 2023

Description

Avoids the recursion error limit in difflib.ndiff by truncating the checked lines to a subsection of the diff.

  • Documentation update and formatting
  • Poetry lock update via inv install

Related Issues

Checklist

  • This PR has sufficient documentation.
  • This PR has sufficient test coverage.
  • This PR title satisfies semantic convention.

Additional Comments

Went with the simple approach of just checking from the first mismatch and giving a large enough buffer to catch meaningful differences.

Can be validated by setting the DIFF_LINE_COUNT_LIMIT to a high value e.g. 100000 and running tests, maybe a follow up can make this configurable;

(syrupy-py3.11) ➜  syrupy git:(fix-large-snapshot-recursion-error) ✗ inv test -t test_diff_large
====================================== test session starts ======================================
platform darwin -- Python 3.11.4, pytest-7.4.0, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/Emmanuel/Sources/github/tophat/syrupy
configfile: pyproject.toml
plugins: syrupy-4.0.6, xdist-3.3.1, benchmark-4.0.0
collected 256 items / 254 deselected / 2 selected                                               

tests/syrupy/extensions/test_base.py FF                                                   [100%]

=========================================== FAILURES ============================================
____________________ TestSnapshotReporter.test_diff_large[SnapshotReporter] _____________________

self = <tests.syrupy.extensions.test_base.TestSnapshotReporter object at 0x1067e0c10>
Reporter = <class 'syrupy.extensions.base.SnapshotReporter'>
osenv = <function env_context at 0x105db7a60>

    def test_diff_large(self, Reporter, osenv):
        n_count = 3000
        obj_a = [str(x) + ("a" * 20) for x in range(n_count)]
        obj_b = [line_a + "b" for line_a in obj_a]
        str_a = "\n".join(obj_a)
        str_b = "\n".join(obj_b)
        with osenv(NO_COLOR="true"):
>           assert (
                len(list(Reporter().diff_lines(str_a, str_b)))
                <= DIFF_LINE_COUNT_LIMIT * 2
            )

tests/syrupy/extensions/test_base.py:44: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/syrupy/extensions/base.py:271: in diff_lines
    for line in self.__diff_lines(str(snapshot_data), str(serialized_data)):
src/syrupy/extensions/base.py:295: in __diff_lines
    for line in self.__diffed_lines(a, b):
src/syrupy/extensions/base.py:314: in __diffed_lines
    for line in qdiff(a.splitlines(keepends=True), b.splitlines(keepends=True)):
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:872: in compare
    yield from g
...
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:997: in _fancy_helper
    yield from g
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:985: in _fancy_replace
    yield from self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:997: in _fancy_helper
    yield from g
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:915: in _fancy_replace
    cruncher = SequenceMatcher(self.charjunk)
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:182: in __init__
    self.set_seqs(a, b)
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:194: in set_seqs
    self.set_seq2(b)
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:248: in set_seq2
    self.__chain_b()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <difflib.SequenceMatcher object at 0x107a71350>

    def __chain_b(self):
        # Because isjunk is a user-defined (not C) function, and we test
        # for junk a LOT, it's important to minimize the number of calls.
        # Before the tricks described here, __chain_b was by far the most
        # time-consuming routine in the whole module!  If anyone sees
        # Jim Roskind, thank him again for profile.py -- I never would
        # have guessed that.
        # The first trick is to build b2j ignoring the possibility
        # of junk.  I.e., we don't call isjunk at all yet.  Throwing
        # out the junk later is much cheaper than building b2j "right"
        # from the start.
        b = self.b
        self.b2j = b2j = {}
    
        for i, elt in enumerate(b):
            indices = b2j.setdefault(elt, [])
            indices.append(i)
    
        # Purge junk elements
        self.bjunk = junk = set()
        isjunk = self.isjunk
        if isjunk:
>           for elt in b2j.keys():
E           RecursionError: maximum recursion depth exceeded while calling a Python object

../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:288: RecursionError
==================================== short test summary info ====================================
FAILED tests/syrupy/extensions/test_base.py::TestSnapshotReporter::test_diff_large[SnapshotReporter] - RecursionError: maximum recursion depth exceeded while calling a Python object
FAILED tests/syrupy/extensions/test_base.py::TestSnapshotReporter::test_diff_large[SnapshotReporterNoContext] - RecursionError: maximum recursion depth exceeded while calling a Python object
========================= 2 failed, 254 deselected in 136.72s (0:02:16) =========================

@iamogbz iamogbz requested a review from noahnu July 20, 2023 03:54
"python.linting.pylintEnabled": true,
"python.linting.pylintEnabled": false,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pylint not used in project

@iamogbz iamogbz marked this pull request as ready for review July 20, 2023 04:03
Comment on lines -116 to +117
snapshot_collection, key=lambda s: cls._snapshot_sort_key(s) # type: ignore # noqa: E501
snapshot_collection,
key=cls._snapshot_sort_key, # noqa: E501
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linter flagged as redundant

"field_7": {False, "no", 0, None}, # noqa: disable=B033
"field_7": {False, "no", None},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was adding no value just confusion

@noahnu noahnu merged commit 24260b1 into main Jul 20, 2023
14 checks passed
@noahnu noahnu deleted the fix-large-snapshot-recursion-error branch July 20, 2023 11:58
tophat-opensource-bot pushed a commit that referenced this pull request Jul 20, 2023
## [4.0.7](v4.0.6...v4.0.7) (2023-07-20)

### Bug Fixes

* large snapshot diff recursion error ([#776](#776)) ([24260b1](24260b1))
@tophat-opensource-bot
Copy link
Collaborator

🎉 This PR is included in version 4.0.7 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RecursionError raised by _fancy_replace for a large snapshot
3 participants