[Draft][PyTorch] Add context parallel support for packed dataset in THD format #9540

tomlifu · 2024-06-25T22:56:43Z

What does this PR do ?

This PR adds context parallel support for packed dataset in THD format in NeMo in response to this TE PR: NVIDIA/TransformerEngine#641. Currently, the TE PR requires each individual sequence length is divisible by (2*context_parallel_size).

Changes

Add support to split packed dataset across different CP ranks in a load balanced way
Add necessary paddings to dataset during packing stage to make sure the individual sequence length is a multiple of 2*cp_size

PR Type:

New Feature
Bugfix
Documentation

Add context parallel support for packed dataset

c938bdd

github-actions bot added the NLP label Jun 25, 2024

tomlifu changed the title ~~[PyTorch] Add context parallel support for packed dataset in THD format~~ [Draft][PyTorch] Add context parallel support for packed dataset in THD format Jun 26, 2024

xrennvidia self-requested a review June 29, 2024 02:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft][PyTorch] Add context parallel support for packed dataset in THD format #9540

[Draft][PyTorch] Add context parallel support for packed dataset in THD format #9540

tomlifu commented Jun 25, 2024

[Draft][PyTorch] Add context parallel support for packed dataset in THD format #9540

Are you sure you want to change the base?

[Draft][PyTorch] Add context parallel support for packed dataset in THD format #9540

Conversation

tomlifu commented Jun 25, 2024

What does this PR do ?

Changes