Investigate effects of learning rate, learning rate schedules #426

isaac091 · 2024-06-25T16:48:29Z

I've been observing that for models that take a large amount of steps to reach the early stopping criteria (~20k+ steps), increasing the learning rate significantly (5e-5 --> 2e-4) often cuts the number of steps needed in half, which in turn cuts the training time in half. For models that take less steps to begin with, an increased learning rate can also reduce the number of steps needed, but that is less often the case. The score metrics do not seem to be significantly affected by the learning rate.

To do:

Use some hyperparameter optimization tool (ClearML, Weights and Biases?) to see if there is a learning rate that consistently reduces the training time for Scripture projects using NLLB
Experiment with different learning rate schedules

ddaspit · 2024-07-01T18:47:06Z

Is this true of fully fine-tuned models or just LoRA models?

isaac091 · 2024-07-01T19:13:37Z

I've noticed it for both, but I've run a lot more experiments with LoRA/other model reduction methods than without, so I will need to get some more data points before I'm more confident about the types of scenarios that benefit from a higher learning rate. This issue is meant to be focusing on fully fine-tuned models, since the default learning rate for LoRA models has already been updated to be higher.

ddaspit · 2024-07-01T21:48:35Z

Sounds good. This could be an easy way to speed up training.

isaac091 added the optimization Model training/inferencing optimization label Jun 25, 2024

isaac091 self-assigned this Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate effects of learning rate, learning rate schedules #426

Investigate effects of learning rate, learning rate schedules #426

isaac091 commented Jun 25, 2024

ddaspit commented Jul 1, 2024

isaac091 commented Jul 1, 2024

ddaspit commented Jul 1, 2024

Investigate effects of learning rate, learning rate schedules #426

Investigate effects of learning rate, learning rate schedules #426

Comments

isaac091 commented Jun 25, 2024

ddaspit commented Jul 1, 2024

isaac091 commented Jul 1, 2024

ddaspit commented Jul 1, 2024