Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training with multi gpus #263

Open
NtaylorOX opened this issue Feb 7, 2023 · 6 comments
Open

Training with multi gpus #263

NtaylorOX opened this issue Feb 7, 2023 · 6 comments

Comments

@NtaylorOX
Copy link

NtaylorOX commented Feb 7, 2023

Great work.

I saw in another issue there had been plans to migrate to later versions of allennlp?

I have actually got a version of this DeCLUTR code working with allennlp v2.10 - however cannot get the multi-gpu setup to work, as the config arguments seem to have changed and I cannot seem to find out how.

For instance, using overrides with: "distributed.cuda_devices"

leads to: ValueError: overrides dict contains unused keys: ['distributed.cuda_devices']

I imagine this project may have become a bit too old to keep working on, but any help with mutli-gpu training with allennlp v2.10 in relation to declutr would be great.

Best,

Niall

@NtaylorOX
Copy link
Author

Fixed it ... Afterall I stumbled upon the subtle change required for the config:

Honestly thought I had tried this several times, so perhaps it was fatigue. But the following works for multi-gpu with allennlp v2.10. It is a subtle change from v1.1

"distributed": {
"cuda_devices": [8,9],
},
"trainer": {
// Set use_amp to true to use automatic mixed-precision during training (if your GPU supports it)
"use_amp": true,
"optimizer": {
"type": "huggingface_adamw",
"lr": 5e-5,
"eps": 1e-06,
"correct_bias": false,
"weight_decay": 0.1,
"parameter_groups": [
// Apply weight decay to pre-trained params, excluding LayerNorm params and biases
[["bias", "LayerNorm\.weight", "layer_norm\.weight"], {"weight_decay": 0}],
],
},
"callbacks":[{"type":'tensorboard'}],
"num_epochs": 10,
"checkpointer": {
// A value of null or -1 will save the weights of the model at the end of every epoch
"keep_most_recent_by_count": 2,
},
"grad_norm": 1.0,
"learning_rate_scheduler": {
"type": "slanted_triangular",
},
},
}

@JohnGiorgi
Copy link
Owner

Hi @NtaylorOX, does this work without any changes to this codebase? I started migrating this to allennlp>2.0.0 a while back but ended up giving up because every breaking change I fixed seemed to be followed by another.

@NtaylorOX
Copy link
Author

Hi @JohnGiorgi,

So I did have to make a few changes - in line with the guidance found here: allenai/allennlp#4933.

Whilst I seemed to have been successful in modifying the DeCLUTR codebase to work with allennlp v2.10, it has involved a couple crude/less than ideal changes from me. I was trying to get it to work on both windows and linux, which was a bit of a pain. I think I end up commenting out an assertion somewhere to get it to work.... At least now allennlp isn't changing the codebase.

I have wanted to take the time to make it much cleaner/robust to submit a pull request.

If it would be helpful for you, I can submit one anyway, or just share the code with you directly.

Let me know how you want to proceed

@JohnGiorgi JohnGiorgi reopened this Feb 17, 2023
@JohnGiorgi
Copy link
Owner

Hi @NtaylorOX, yeah would definitely be interested in an update that works on AllenNLP > 2.0. I think the big thing for me to merge it would be a demonstration that models are trained to the same loss and downstream performance

@NtaylorOX
Copy link
Author

Hi @JohnGiorgi . Sorry I ended up so quiet on this, got swamped with other things...

I am still planning to find a day to action on this - am also beginning to migrate the functionality of DeCLUTR to the transformers library directly, just to make using your awesome architecture/algorithm hopefully more straight forward with what seems to have become the library of choice for NLP work.

Will try to keep you posted on both fronts.

Thanks

@JohnGiorgi
Copy link
Owner

Wow that sounds great! Yeah keep me updated and let me know if you have any questions / there's anything I can help with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants