-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload speed to S3 #9
Comments
Hi Marco! I run some tests in the past with big files and they seemed to be fine. You said you've tested the upload using the WSGI app and also using boto3 directly, do you think you could also perform a quick test just running the code you wrote before to upload the file but without involving the WSGI server? Or maybe something similar to https://github.com/inveniosoftware/invenio-s3/blob/master/tests/test_storage.py#L113 I just want to see where the time is expended when just using Invenio-S3. I do remember having some troubles with gunicorn and big files in the past, but I can't seem to recall what was it, we eventually switched to uWSGI 😂 |
Hello Esteban. Thank you for your help, and sorry for my very late reply. I repeated the test I did last time. For some reason now the first part of the upload (the transfer from the browser to our server) has gone from 1 minute to ~30 seconds. I suspect that our cloud provider has given us more powerful vCPUs, since this is the CPU-bound part. As for the second part of the process (the transfer from the server to the object store), I found that I can speed it up by setting a larger Finally, I repeated all this without gunicorn, using instead the builtin Flask server (via This is already a nice improvement for us (I can now upload a 1GB file in 1 minute). It's still not as fast as Zenodo's upload, but Zenodo seems to use the deposit API directly, while we pass via a form, which is probably not ideal. Also I am not sure if Zenodo immediately pushes deposits to an object store, or if instead they use local storage. |
We kind of saw the same behavior and added a few changes and configuration variables already. Check #15 I think it's what you are looking for, it should get merged and released in the near future. |
Hello.
Is there anything that we can do to increase the upload speed to an S3 service via invenio-s3?
I compared the upload speed obtained in our app versus a direct upload to S3 with boto3 (from the same machine that serves our app), and I am getting different results. For a 1 GB file, when uploading through our app we see first 150-200 Mbps data transfer from the browser for about 1 minute, with gunicorn sitting at 99% CPU; then for about 2 minutes we see no upload from the browser, while gunicorn sits at 10-15% CPU, until the browser finally receives a 200 response (total 3 minutes). With a direct upload to S3 via boto3, instead, it takes about 13 seconds in total.
To simplify testing, I'm using a simple Flask view, in which I have the following lines that do the job:
In the real app, we actually create a record with
invenio_deposit.api.Deposit.create()
, then attach the file to the record, but we see the same speed as in this simple test.Our setup is: Apache2 acting as front line server, with a reverse proxy to gunicorn on the same machine. Setting or not
DEBUG=True
in config.py does not seem to make a difference for this.We are actually using our own fork of invenio-s3, with some changes that we needed to make it work (I opened PR #8 in case you find them useful), but I don't think they are relevant to issue.
I also found some code to profile requests to gunicorn: I'll paste below the result, but I'm not quite sure how to interpret it.
Thanks a lot in advance for the help!
The text was updated successfully, but these errors were encountered: