forked from NVIDIA/Megatron-LM
-
Notifications
You must be signed in to change notification settings - Fork 332
Pull requests: microsoft/Megatron-DeepSpeed
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix --use-cpu-initialization error when expert is not tensor-parallel
#413
opened Jul 3, 2024 by
taozhiwei
Loading…
add kill switch support to gracefully exit training
#412
opened Jul 3, 2024 by
polisettyvarma
Loading…
improve performance by keeping attention_mask on device and run ops further on device
#411
opened Jul 3, 2024 by
polisettyvarma
Loading…
Improve RoPE perf by using cached sin/cos tensors
#410
opened Jul 2, 2024 by
polisettyvarma
Loading…
use split/squeeze instead of slice for performance
#409
opened Jul 2, 2024 by
polisettyvarma
Loading…
fixing the bug of flash_attn import and the wrong gather index when using flash_attn_cuda in sequence parallel
#406
opened Jun 27, 2024 by
YJHMITWEB
Loading…
Fix ConstantGradScaler and loss-scale argument not match
#376
opened Apr 12, 2024 by
BeingGod
Loading…
Simplify SP - Opportunity to improve SP scalability
#301
opened Nov 28, 2023 by
RezaYazdaniAminabadi
Loading…
Previous Next
ProTip!
no:milestone will show everything without a milestone.