-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Priority Manager: Scores each source based on target and popular entities. Seemingly irrelevant sources are discarded. #40
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Further attempts to optimise source aggregator. Added new functions to entity processor for popular info finder. Created a unit test for the popular info finder. Took 14 hours 16 minutes
Added unit test for the priority manager, target info scorer. Fixed file_handler.get_keywords_from_target_info that did not remove irrelevant words from target entities. This fix will be superseded by the changes in c087bb8 Took 1 hour 39 minutes
Closed
Pylint results
|
Took 4 minutes
Took 9 minutes
Took 2 minutes
Took 51 minutes
Took 3 minutes
Popular info finder
Current state the manager takes ~15 minutes to assign scores based on the appearances of popular or target entities.
This has to be improved somehow |
The popular information finder can still get stuck on large bodies of text. |
Further attempts to optimise using multiprocessing and map. Took 2 hours 27 minutes
Took 14 minutes
Having problems with getting it running within the PriorityManager. Took 5 hours 29 minutes
Changed from imap to imap_unordered, which offers better accuracy for tqdm as to when tasks are completed. Took 5 hours 31 minutes
Took 22 minutes
This has had the side effect of parsing imgur sitemaps. Imgur and flickr are showing as relevant sources but the content is almost entirely irrelevant. Took 47 minutes
This is so the tool can be run on google colab, rather than locally. Could potentially increase performance (better hardware). Offers easier use. Took 2 minutes
Took 10 minutes
Took 13 minutes
Took 6 minutes
Took 9 minutes
Took 9 minutes
Took 2 minutes
Took 2 minutes
Took 4 minutes
Took 10 minutes
Took 11 minutes
Took 33 minutes
Took 1 hour 45 minutes
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added unit test for the priority manager, target info scorer. Fixed file_handler.get_keywords_from_target_info that did not remove irrelevant words from target entities. This fix will be superseded by the changes in c087bb8
Took 1 hour 39 minutes