Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor batch generation logic to allow large reports to be generated #1444

Open
aequitas opened this issue Jun 26, 2024 · 0 comments
Open

Comments

@aequitas
Copy link
Collaborator

Related to #1395

During batch result report generation the result is stored in a variable before being written to a file:

results = gather_batch_results(user, batch_request, site_url)
save_batch_results_to_file(user, batch_request, results)
del results
results = gather_batch_results_technical(user, batch_request, site_url)
save_batch_results_to_file(user, batch_request, results, technical=True)

For batch requests with 5000 domains this results in a memory usage of 1GB, for 10k domain almost 2GB, etc. Requiring the worker performing this task to have this much memory available for this short time it takes to generate the reports. Furthermore this memory is retained by the worker until the next report generation is run where the memory will be reused but not freed.

Suggest to refactor the generation logic to write the report file to disk in a streaming fashon in gather_batch_results to eliminate the dom_results (

dom_results = {}
) variable which contains the bulk of the memory used.

Because the dom_results (domains field in the report

data["domains"] = dom_results
) is a dictionary/object, existing JSON encoders might not be able to handle this in a streaming manner. But because the report structure is simple enough it might be best to just write a custom encoder or write the JSON directory without any encoder/library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant