Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

[pyTorch] Fix wrong results for noncontiguous input
#1017 opened Jul 16, 2024 by ptrendx Loading…
8 of 13 tasks
[PyTorch] Custom kernel to compute reciprocal of a single float enhancement New feature or request
#1016 opened Jul 15, 2024 by timmoon10 Draft
5 of 13 tasks
Flash attention support softcap.
#1013 opened Jul 14, 2024 by Lzhang-hub Loading…
7 tasks
Fix context parallelism implementation with THD format
#1012 opened Jul 13, 2024 by xrennvidia Loading…
6 of 13 tasks
Initialize output tensors to 0 for THD (temporary)
#1009 opened Jul 12, 2024 by cyanguwa Loading…
8 of 13 tasks
DGRAD_RS UB overlap Bug fixes
#1004 opened Jul 10, 2024 by vasunvidia Loading…
13 tasks
[JAX] Sharding Utils
#1003 opened Jul 9, 2024 by mingxu1067 Draft
8 of 13 tasks
Optimize multi-tensor cast-transpose kernel enhancement New feature or request
#998 opened Jul 8, 2024 by timmoon10 Draft
6 of 13 tasks
Add efficient cross entropy by cuda kernel.
#995 opened Jul 8, 2024 by cb521 Loading…
13 tasks
[PyTorch] Fixing hang in initialize_ub() for multi-node runs after PR901 removal of MPI-dependence bug Something isn't working
#986 opened Jul 3, 2024 by denera Loading…
8 of 13 tasks
[pre-commit.ci] pre-commit suggestions wontfix This will not be worked on
#979 opened Jul 2, 2024 by pre-commit-ci bot Draft
[PyTorch] Support dtype casting in fused adam
#977 opened Jul 1, 2024 by Wong4j Loading…
8 of 13 tasks
[Paddle] Add deterministic option in DotProductAttention
#956 opened Jun 23, 2024 by Wong4j Loading…
8 of 13 tasks
Lower memory usage during AttnFuncWithCP.forward
#951 opened Jun 21, 2024 by i4never Loading…
8 of 13 tasks
[TE/JAX] Prototype for New XLA Custom Calls with FFI enhancement New feature or request jax
#946 opened Jun 19, 2024 by phu0ngng Loading…
3 of 13 tasks
[PyTorch] Add option to pass kwargs to CUDA graph module enhancement New feature or request
#945 opened Jun 19, 2024 by timmoon10 Loading…
9 of 13 tasks
Expose rotary_base as an arg instead of hardcoding
#944 opened Jun 18, 2024 by sudhakarsingh27 Loading…
1 of 6 tasks
[MoE][Common/PyTorch] Add permutation enhancement New feature or request
#936 opened Jun 17, 2024 by StudyingShao Loading…
13 tasks
Fp8 model init factory
#880 opened May 30, 2024 by sudhakarsingh27 Draft
Avoid framework specific import from top level enhancement New feature or request
#862 opened May 22, 2024 by ksivaman Draft
6 of 11 tasks
Generation tutorial for Gemma model
#829 opened May 1, 2024 by pggPL Loading…
8 of 11 tasks
[UB] Adding support for multinode nvlink
#815 opened Apr 26, 2024 by shamisp Loading…
Bug fix in DGRAD->RS overlap
#802 opened Apr 23, 2024 by vasunvidia Draft
ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.