NVIDIA / Megatron-LM Public

Notifications You must be signed in to change notification settings
Fork 2.1k
Star 9.4k

Code
Issues 321
Pull requests 133
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Issues: NVIDIA/Megatron-LM

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

321 Open 283 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[BUG] Resource Leak When Profile Parameter is Enabled

#932 opened Jul 16, 2024 by G-keng

how to install llama package

#931 opened Jul 16, 2024 by binglinchengxiash

what's the biggest dataset you've tried?

#930 opened Jul 15, 2024 by exnx

How to train multiple binariey files at the same time or merge them?

#927 opened Jul 11, 2024 by Liangyz2019

VocabParallelEmbedding

#926 opened Jul 11, 2024 by panjianfei

[QUESTION] How Do NCCL_ALGO and Flash Attention Affect Deterministic Training in Megatron?

#925 opened Jul 11, 2024 by jinzhuer

[BUG] Getting distributed rank in save_checkpoint when torch.distributed is not initialized.

#920 opened Jul 10, 2024 by haolin-nju

[ENHANCEMENT] Enable non-gelu activations for BERT LM Head

#918 opened Jul 9, 2024 by skothenhill-nv

[BUG] Unnecessary initialization for router in megatron-core

#915 opened Jul 9, 2024 by haolin-nju

[QUESTION] Can fp8 and pipeline parallelism be used together?

#912 opened Jul 9, 2024 by exnx

[BUG] Missing init_process_group call when converting model to HF format.

#911 opened Jul 8, 2024 by benoriol

[QUESTION]

#910 opened Jul 8, 2024 by woson

function missing

#908 opened Jul 8, 2024 by ywb2018

[BUG] GPTDataset._build_document_sample_shuffle_indices does not build the indices on non-root nodes when not using NFS

#907 opened Jul 7, 2024 by dementrock

[BUG] wrong loss scaling when context parallel is on

#906 opened Jul 7, 2024 by zhaoyinglia

AURORA STK 3.6.9

#905 opened Jul 6, 2024 by felipeliliti

[BUG] context manager syntax bug in transformer_block.py

#903 opened Jul 6, 2024 by Yuxin-CV

AURORA STK 3.6.9 I A

#901 opened Jul 3, 2024 by felipeliliti

[QUESTION] When will model have _extra_state?

#900 opened Jul 3, 2024 by 1049451037

[QUESTION] Does Megatron-LM supports Flash Attention for BERT and T5 Pretraining?

#899 opened Jul 2, 2024 by Leo-T-Zang

[BUG] WHEN install with nemo

#898 opened Jul 2, 2024 by willy808

Batch_input and elapsed time per iteration slow down during model training

#897 opened Jun 29, 2024 by Yuhanleeee

[BUG]Question about helpers.cpp in version core_v0.7.0

#896 opened Jun 28, 2024 by longzhang418

[REGRESSION] MoEs are obtaining higher loss than they should during training

#894 opened Jun 27, 2024 by kiddyboots216

[QUESTION] Getting tools/preprocess_data.py to work is painful

#892 opened Jun 26, 2024 by sambar1729

Previous 1 2 3 4 5 … 12 13 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly