Long Scan Times from Additional HTTP Requests #538

ja53n · 2024-05-27T13:36:31Z

Describe the bug
I noticed that TumblThree app scan times are much higher than expected for blogs with duplicates and decided to look into this.

The TumblThree app seems to be sending a HTTP request to ".media.tumblr.com/" for each duplicate found, creating a large amount of additional HTTP requests. The initial json response "/api/read/json?debug=1&num=..." seems to have a unique file reference ID that could be pulled from "regular-body". Greatly reducing the number of requests needed to complete the scan and reducing the server load. You can replicate this by enabling "force rescan" and using any HTTP logger of your choice. This issue impacts rescan, reblogs, duplicates, etc and I think this would be useful for a lot of users. Sadly I don't have the coding background to fix this myself, which is why I am raising this issue.

To Reproduce
Steps to reproduce the behavior:

Setup HTTP monitoring or debug trace for TumblThree.
Start TumblThree with deduplication setting enabled and rescan an existing site that was already processed.
See the additional ".media.tumblr.com/" requests for files already in the index cache.

Expected behavior
Fast scan times with only the json file if content is duplicates.

Desktop (please complete the following information):

TumblThree version: v2.13
OS: Windows 10 Home
Browser: Chrome
Version 125

thomas694 · 2024-05-31T01:01:55Z

Are you downloading normal or 'hidden' blogs? What are your settings? Any other relevant information?

thomas694 · 2024-07-01T21:34:19Z

Well, the missing information was that the already downloaded files were downloaded for another blog and not for the scanned one.
And the affected posts are those with embedded images, so the JSON structure isn't that helpful.

We'll change it to check not only the current blog but also all other blogs for duplicates in this case.

- For embedded photos it only checked the current index file for duplicates. - Now all index files and archives are checked, if enabled.

thomas694 added a commit that referenced this issue Jul 3, 2024

Fix issue #538 Long scan times from additional HTTP requests

265cd04

- For embedded photos it only checked the current index file for duplicates. - Now all index files and archives are checked, if enabled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long Scan Times from Additional HTTP Requests #538

Long Scan Times from Additional HTTP Requests #538

ja53n commented May 27, 2024

thomas694 commented May 31, 2024

thomas694 commented Jul 1, 2024

Long Scan Times from Additional HTTP Requests #538

Long Scan Times from Additional HTTP Requests #538

Comments

ja53n commented May 27, 2024

thomas694 commented May 31, 2024

thomas694 commented Jul 1, 2024