NVIDIA / TransformerEngine Public

Notifications You must be signed in to change notification settings
Fork 352
Star 2.1k

Code
Issues 166
Pull requests 47
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: NVIDIA/TransformerEngine

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

166 Open 218 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

FP8 execution requires 2D input matrices with height divisible by 8 and width divisible by 16

#1422 opened Jan 25, 2025 by Liufeiran123

Deadline or schedule new update supporting blackwell and fp4?

#1421 opened Jan 24, 2025 by johnnynunez

Problem when install transformers_engine with nvcc11.8 and nvcc12.0

#1420 opened Jan 23, 2025 by chwenjun225

Questions about accuracy alignment between BF16 and FP8 question

Further information is requested

#1419 opened Jan 22, 2025 by zigzagcai

Performance Discrepancy in FP8 vs. BF16 Training with NanoGPT

#1416 opened Jan 21, 2025 by wzzll123

Plans for block-wise FP8 quantization during training?

#1411 opened Jan 15, 2025 by beccohov

Questions on DotProductAttention API Usage in Flash Attention thd Mode

#1409 opened Jan 14, 2025 by pipSu

Fused attention error while running Nvidia Cosmos

#1407 opened Jan 13, 2025 by deepbeepmeep

Import fails when working from a TE directory good first issue

Good for newcomers

#1400 opened Jan 10, 2025 by ksivaman

Installation stuck at 97%

#1399 opened Jan 10, 2025 by lorenzbaraldi

why close ag overlap when is_grad_enabled is False

#1398 opened Jan 10, 2025 by sallyjunjun

support new flash_attn_interface

#1392 opened Jan 7, 2025 by rgtjf

FP8 GEMM Kernels

#1391 opened Jan 6, 2025 by xiaoxiao26

How about the grouplinear?

#1386 opened Dec 26, 2024 by south-ocean

_NoopCatFunc in transformer layer bug

Something isn't working

#1384 opened Dec 22, 2024 by robot-transformer

thd qkv-format in transformer layer

#1383 opened Dec 22, 2024 by robot-transformer

AttributeError: module 'transformer_engine' has no attribute 'pytorch'

#1379 opened Dec 17, 2024 by carrot0117

ViT Support

#1377 opened Dec 16, 2024 by cnut1648

Should cublasLtHandle_t be Destroyed?

#1372 opened Dec 13, 2024 by shenzhenghai

How to use thd format qkv with cp + packed_seq_params

#1368 opened Dec 12, 2024 by Wraythh

support float8 in flash-attn v3

#1359 opened Dec 5, 2024 by Monekyzoon

Support more than 1 shape/attention_params for DotProductAttention decision cache

#1349 opened Nov 29, 2024 by parthmannan

How to setup TP Overlap configs

#1344 opened Nov 21, 2024 by TJ-Solergibert

the max error of moe_permute/unpermute.grad could reach 3.6e+00

#1336 opened Nov 15, 2024 by NiuMa-1234

[TP comm overlap unit test]CUDA Error: misaligned address error when testing with recent cublas (or pytorch container)

#1332 opened Nov 14, 2024 by erhoo82

Previous 1 2 3 4 5 6 7 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly