Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster filtering with fragmented annotations #526

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

wjt
Copy link
Contributor

@wjt wjt commented May 2, 2015

Let me know if you'd prefer monolithic pull requests! The test data I'm looking at for this PR is here – you'll need to open it from tools/serve. The key commit message here:

If there are many spans of text for an annotation (many table cells,
say), previously the filter would be applied once for each span, and
matching annotations would appear n times in filter.annotations. In and
of itself this is relatively harmless, but filterHighlights then maps
back from annotations to highlights, and:

    for (var i = 0, len = filtered.length; i < len; i++) {
         highlights = highlights.not(filtered[i]._local.highlights);
    }

is ruinous. This change dramatically improves filtering performance on
my pet test case of big, overlapping highlights on a 5000×3 table.

I have mixed feelings about using a new identifier in _local for this,
but we don't have any guarantee that the id provided by the server is
unique, or even present.

I also have mixed feelings about setting the id in highlighter.js and
using it in filter.js, but the same coupling exists for
.data('annotation') already...

@wjt
Copy link
Contributor Author

wjt commented May 2, 2015

A nicer approach would be for Filter to track the set of annotations directly, rather than reconstructing it from the highlights. This would have other benefits as welll – for example, it wouldn't have to refilter all annotations just because a single annotation was modified – but it would be a more invasive change.

@tilgovi
Copy link
Member

tilgovi commented May 3, 2015

By the way, I think my latest commit on master will eliminate this suprious travis failures on pull requests. See what happens if you rebase.

wjt added 3 commits May 3, 2015 10:21
Before, it depended on pretty specific knowledge of how updateFilter is
implemented, which seems quite questionable!
If there are many spans of text for an annotation (many table cells,
say), previously the filter would be applied once for each span, and
matching annotations would appear n times in filter.annotations. In and
of itself this is relatively harmless, but filterHighlights then maps
back from annotations to highlights, and:

    for (var i = 0, len = filtered.length; i < len; i++) {
         highlights = highlights.not(filtered[i]._local.highlights);
    }

is ruinous. This change dramatically improves filtering performance on
my pet test case of big, overlapping highlights on a 5000×3 table.

I have mixed feelings about using a new identifier in _local for this,
but we don't have any guarantee that the id provided by the server is
unique, or even present.

I also have mixed feelings about setting the id in highlighter.js and
using it in filter.js, but the same coupling exists for
.data('annotation') already...
@wjt wjt force-pushed the faster-filtering-with-fragmented-annotations branch from 49f6dec to b097905 Compare May 3, 2015 09:22
@wjt
Copy link
Contributor Author

wjt commented May 3, 2015

Nope, the rebase didn't seem to help.

@tilgovi
Copy link
Member

tilgovi commented Sep 25, 2015

It would be nice to do this without coupling the highlighter and the filter with a private property. Any other ideas?

@tilgovi tilgovi modified the milestone: v2.0 Apr 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants