Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Parallel build with limited resource
#981 opened Jul 2, 2024 by phu0ngng Loading…
5 of 13 tasks
[pre-commit.ci] pre-commit suggestions
#979 opened Jul 2, 2024 by pre-commit-ci bot Loading…
[WIP] [PyTorch] Support dtype casting in fused adam
#977 opened Jul 1, 2024 by Wong4j Loading…
6 of 13 tasks
Add test for building without support for any DL frameworks testing Improvements to tests or testing infrastructure
#974 opened Jun 27, 2024 by timmoon10 Loading…
6 of 14 tasks
[PyTorch] Runtime lookup for CUDA Driver API calls in Userbuffers 1.8.0 bug Something isn't working
#970 opened Jun 26, 2024 by denera Loading…
8 of 13 tasks
[C/PyTorch] Add support for bottom-right-diagonal causal mask
#960 opened Jun 25, 2024 by cyanguwa Loading…
5 tasks
[Paddle] Add deterministic option in DotProductAttention
#956 opened Jun 23, 2024 by Wong4j Loading…
8 of 13 tasks
Lower memory usage during AttnFuncWithCP.forward
#951 opened Jun 21, 2024 by i4never Loading…
8 of 13 tasks
[TE/JAX] Prototype for New XLA Custom Calls with FFI enhancement New feature or request jax
#946 opened Jun 19, 2024 by phu0ngng Loading…
3 of 13 tasks
[PyTorch] Add option to pass kwargs to CUDA graph module enhancement New feature or request
#945 opened Jun 19, 2024 by timmoon10 Loading…
9 of 13 tasks
Expose rotary_base as an arg instead of hardcoding
#944 opened Jun 18, 2024 by sudhakarsingh27 Loading…
1 of 6 tasks
Update required CMake version to 3.25 build Build system
#943 opened Jun 18, 2024 by timmoon10 Draft
5 of 13 tasks
[MoE][Common/PyTorch] Add permutation enhancement New feature or request
#936 opened Jun 17, 2024 by StudyingShao Loading…
13 tasks
[Draft] Zero fwd and bwd results for THD+CP
#920 opened Jun 13, 2024 by xrennvidia Draft
11 tasks
Fp8 model init factory
#880 opened May 30, 2024 by sudhakarsingh27 Draft
Avoid framework specific import from top level enhancement New feature or request
#862 opened May 22, 2024 by ksivaman Draft
6 of 11 tasks
Generation tutorial for Gemma model
#829 opened May 1, 2024 by pggPL Loading…
8 of 11 tasks
[UB] Adding support for multinode nvlink
#815 opened Apr 26, 2024 by shamisp Loading…
Bug fix in DGRAD->RS overlap
#802 opened Apr 23, 2024 by vasunvidia Draft
[PyTorch] Fix minor bug in computing num_gqa_groups_per_partition bug Something isn't working
#777 opened Apr 13, 2024 by knowlsie Loading…
[C/PyTorch] Refactor and move userbuffers into TE/common
#760 opened Apr 8, 2024 by denera Loading…
12 of 13 tasks
[PyTorch] Prototype for operation-based API enhancement New feature or request
#707 opened Mar 9, 2024 by timmoon10 Loading…
2 of 6 tasks
ProTip! Add no:assignee to see everything that’s not assigned.