Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate effects of learning rate, learning rate schedules #426

Open
isaac091 opened this issue Jun 25, 2024 · 3 comments
Open

Investigate effects of learning rate, learning rate schedules #426

isaac091 opened this issue Jun 25, 2024 · 3 comments
Assignees
Labels
optimization Model training/inferencing optimization

Comments

@isaac091
Copy link
Collaborator

I've been observing that for models that take a large amount of steps to reach the early stopping criteria (~20k+ steps), increasing the learning rate significantly (5e-5 --> 2e-4) often cuts the number of steps needed in half, which in turn cuts the training time in half. For models that take less steps to begin with, an increased learning rate can also reduce the number of steps needed, but that is less often the case. The score metrics do not seem to be significantly affected by the learning rate.

To do:

  • Use some hyperparameter optimization tool (ClearML, Weights and Biases?) to see if there is a learning rate that consistently reduces the training time for Scripture projects using NLLB
  • Experiment with different learning rate schedules
@isaac091 isaac091 added the optimization Model training/inferencing optimization label Jun 25, 2024
@isaac091 isaac091 self-assigned this Jun 25, 2024
@ddaspit
Copy link
Collaborator

ddaspit commented Jul 1, 2024

Is this true of fully fine-tuned models or just LoRA models?

@isaac091
Copy link
Collaborator Author

isaac091 commented Jul 1, 2024

I've noticed it for both, but I've run a lot more experiments with LoRA/other model reduction methods than without, so I will need to get some more data points before I'm more confident about the types of scenarios that benefit from a higher learning rate. This issue is meant to be focusing on fully fine-tuned models, since the default learning rate for LoRA models has already been updated to be higher.

@ddaspit
Copy link
Collaborator

ddaspit commented Jul 1, 2024

Sounds good. This could be an easy way to speed up training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Model training/inferencing optimization
Projects
Status: 🏗 In progress
Development

No branches or pull requests

2 participants