Skip to content

Navigation Menu

Explore
By size
By industry
By use case
Resources
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

microsoft / Megatron-DeepSpeed Public

forked from NVIDIA/Megatron-LM

Notifications You must be signed in to change notification settings
Fork 332
Star 1.7k

Code
Issues 115
Pull requests 27
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: microsoft/Megatron-DeepSpeed

Labels 9 Milestones 0

Labels 9 Milestones 0

New pull request New

27 Open 218 Closed

27 Open 218 Closed

Author

Filter by author

Loading

Label

Filter by label

Loading

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Loading

Milestones

Filter by milestone

Loading

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Loading

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

support split qkv linear and sp overlap comm

#415 opened Jul 5, 2024 by inkcherry

Loading…

add PyTorch profiler support

#414 opened Jul 3, 2024 by polisettyvarma

Loading…

fix --use-cpu-initialization error when expert is not tensor-parallel

#413 opened Jul 3, 2024 by taozhiwei

Loading…

add kill switch support to gracefully exit training

#412 opened Jul 3, 2024 by polisettyvarma

Loading…

improve performance by keeping attention_mask on device and run ops further on device

#411 opened Jul 3, 2024 by polisettyvarma

Loading…

Improve RoPE perf by using cached sin/cos tensors

#410 opened Jul 2, 2024 by polisettyvarma

Loading…

use split/squeeze instead of slice for performance

#409 opened Jul 2, 2024 by polisettyvarma

Loading…

fixing the bug of flash_attn import and the wrong gather index when using flash_attn_cuda in sequence parallel

#406 opened Jun 27, 2024 by YJHMITWEB

Loading…

fix NAN loss of rope long context training

#399 opened Jun 5, 2024 by inkcherry

Loading…

1

update universal_checkpointing/README.md

#395 opened Jun 3, 2024 by inkcherry

Loading…

2

convert mds checkpoint to Hf Llama model

#394 opened May 31, 2024 by vksastry

Loading…

1

ds-sequence-parallel(ulysses) for rope.

#392 opened May 30, 2024 by inkcherry

Loading…

1

Update/add GPT/Llama universal checkpointing scripts

#391 opened May 22, 2024 by lekurile • Draft

add HFTokenizer option for preprocess_data

#388 opened May 17, 2024 by Jianhong-Zhang

Loading…

Add layer norm weight plus 1

#378 opened Apr 18, 2024 by Yejing-Lai

Loading…

4

Fix ConstantGradScaler and loss-scale argument not match

#376 opened Apr 12, 2024 by BeingGod

Loading…

1

Support Llama2Tokenizer

#375 opened Apr 11, 2024 by jinyouzhi

Loading…

fix TFLOPs calculation

#371 opened Mar 22, 2024 by polisettyvarma

Loading…

2

collect grad_norm for non pipeline path

#370 opened Mar 21, 2024 by inkcherry

Loading…

optimize the generation of attention mask

#331 opened Jan 13, 2024 by imh966

Loading…

Enable torch.compile

#322 opened Dec 28, 2023 by tohtana • Draft

Simplify SP - Opportunity to improve SP scalability

#301 opened Nov 28, 2023 by RezaYazdaniAminabadi

Loading…

support transfer llama hf weight to megatron weight

#246 opened Sep 12, 2023 by uygnef

Loading…

add vit training with TP/PP

#146 opened Jun 9, 2023 by etoilestar

Loading…

3

integrate ort

#79 opened Aug 19, 2022 by prathikr • Draft

1

Previous 1 2 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.