Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Priority Manager: Scores each source based on target and popular entities. Seemingly irrelevant sources are discarded. #40

Merged
merged 37 commits into from
May 9, 2023

Conversation

UP2040499
Copy link
Owner

Added unit test for the priority manager, target info scorer. Fixed file_handler.get_keywords_from_target_info that did not remove irrelevant words from target entities. This fix will be superseded by the changes in c087bb8

Took 1 hour 39 minutes

Further attempts to optimise source aggregator.
Added new functions to entity processor for popular info finder.
Created a unit test for the popular info finder.

Took 14 hours 16 minutes
Added unit test for the priority manager, target info scorer.
Fixed file_handler.get_keywords_from_target_info that did not remove irrelevant words from target entities. This fix will be superseded by the changes in c087bb8

Took 1 hour 39 minutes
@UP2040499 UP2040499 linked an issue Apr 28, 2023 that may be closed by this pull request
@UP2040499
Copy link
Owner Author

Pylint results

************* Module auto_osint_v.priority_manager
auto_osint_v/priority_manager.py:42:19: W3101: Missing timeout argument for method 'requests.get' can cause your program to hang indefinitely (missing-timeout)
auto_osint_v/priority_manager.py:80:12: W0612: Unused variable 'i' (unused-variable)
************* Module auto_osint_v.sentiment_analyser
auto_osint_v/sentiment_analyser.py:19:0: C0301: Line too long (102/100) (line-too-long)
************* Module auto_osint_v.source_aggregator
auto_osint_v/source_aggregator.py:96:14: E1101: Instance of 'Resource' has no 'cse' member (no-member)
auto_osint_v/source_aggregator.py:325:4: W0105: String statement has no effect (pointless-string-statement)
************* Module auto_osint_v.specific_entity_processor
auto_osint_v/specific_entity_processor.py:15:0: R0903: Too few public methods (1/2) (too-few-public-methods)
************* Module unit_tests.test_priority_manager
unit_tests/test_priority_manager.py:10:0: C0115: Missing class docstring (missing-class-docstring)
unit_tests/test_priority_manager.py:11:4: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 9.67/[10]```

@UP2040499 UP2040499 changed the title Added priority manager, successfully scores based on target info Priority Manager: Scores each source based on target and popular entities. Seemingly irrelevant sources are discarded. May 2, 2023
@UP2040499
Copy link
Owner Author

UP2040499 commented May 3, 2023

Current state the manager takes ~15 minutes to assign scores based on the appearances of popular or target entities.

  • For 176 sources, it takes around 5 minutes to count target entity appearances and assign a score to each source.
  • Typically reduces source count to around 80 sources.
  • Finding popular entities can take around 10 minutes.

This has to be improved somehow

@UP2040499
Copy link
Owner Author

The popular information finder can still get stuck on large bodies of text.

Further attempts to optimise using multiprocessing and map.

Took 2 hours 27 minutes
Having problems with getting it running within the PriorityManager.

Took 5 hours 29 minutes
Changed from imap to imap_unordered, which offers better accuracy for tqdm as to when tasks are completed.

Took 5 hours 31 minutes
This has had the side effect of parsing imgur sitemaps. Imgur and flickr are showing as relevant sources but the content is almost entirely irrelevant.

Took 47 minutes
This is so the tool can be run on google colab, rather than locally. Could potentially increase performance (better hardware). Offers easier use.

Took 2 minutes
@UP2040499 UP2040499 merged commit 15e043b into main May 9, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Priority Manager
1 participant