You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your great work!
Could you please share the influence of the batch size and the number of GPUs?
Also how to choose a suitable learning rate and batch size if the available GPUs is not enough.
Thank you!
The text was updated successfully, but these errors were encountered:
Recently I use distribute train more often. You need to make sure single gpu has same batch size with me, you should get same result but may take more time if you have less gpu.
Thanks for your reply.
I understand that a single gpu should has the same batch size (128) as yours.
I have a question about the learning rate. Does the learning rate need to be changed?
No need to change I think. This paper should mean batch size on one device, normally batch size in paper just mean on device hold, take care of the difference between DistributedDataParallel and DataParallel.
Thanks for your great work!
Could you please share the influence of the batch size and the number of GPUs?
Also how to choose a suitable learning rate and batch size if the available GPUs is not enough.
Thank you!
The text was updated successfully, but these errors were encountered: