Add `attention_bias` argument in transformer block and transformer layer modules, addressing change in MCore #11289

yaoyu-33 · 2024-11-14T19:47:35Z

What does this PR do ?

https://gitlab-master.nvidia.com/ADLR/megatron-lm/-/merge_requests/2293
we need to fix all occurances in NeMo where TransformerBlock is using a non-mcore TransformerLayer. Otherwise the forward interface between Block and Layer will not match anymore and so NeMo will break

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: yaoyu-33 <[email protected]>

github-actions · 2024-11-15T19:07:30Z

[🤖]: Hi @yaoyu-33 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

Signed-off-by: yaoyu-33 <[email protected]>

github-actions · 2024-11-15T19:50:12Z

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.

Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.collections.nlp.models.language_modeling.megatron.bert.bert_model
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:82:0: C0301: Line too long (121/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:70:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:128:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:144:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:177:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:184:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:202:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:296:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:306:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:335:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:570:4: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.collections.nlp.models.language_modeling.megatron.falcon.falcon_decoder_layer
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:70:0: C0301: Line too long (149/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:102:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:18:4: W0611: Unused parallel_state imported from megatron.core (unused-import)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:19:4: W0611: Unused ShardedObject imported from megatron.core.dist_checkpointing.mapping (unused-import)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:19:4: W0611: Unused ShardedTensor imported from megatron.core.dist_checkpointing.mapping (unused-import)
************* Module nemo.collections.nlp.models.language_modeling.megatron.gpt_full_te_layer_autocast_spec
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:51:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:147:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:179:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:303:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:336:0: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.collections.nlp.modules.common.megatron.adapters.mcore_mixins
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:140:0: C0301: Line too long (120/119) (line-too-long)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:142:0: C0301: Line too long (147/119) (line-too-long)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:60:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:69:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:76:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:108:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:226:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:326:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:352:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:436:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:443:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:467:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:474:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:492:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:499:4: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 9.42/10

Thank you for improving NeMo's documentation!

github-actions · 2024-11-15T19:50:17Z

beep boop 🤖: 🚨 The following files must be fixed before merge!

Your code was analyzed with PyLint. The following annotations have been identified:


------------------------------------
Your code has been rated at 10.00/10

Thank you for improving NeMo's documentation!

github-actions · 2024-11-15T19:50:18Z

beep boop 🤖: 🚨 The following files must be fixed before merge!

Your code was analyzed with PyLint. The following annotations have been identified:


------------------------------------
Your code has been rated at 10.00/10

Thank you for improving NeMo's documentation!

github-actions · 2024-11-15T19:50:23Z

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.

Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.collections.nlp.models.language_modeling.megatron.bert.bert_model
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:82:0: C0301: Line too long (121/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:70:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:128:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:144:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:177:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:184:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:202:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:296:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:306:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:335:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:570:4: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.collections.nlp.models.language_modeling.megatron.falcon.falcon_decoder_layer
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:70:0: C0301: Line too long (149/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:102:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:18:4: W0611: Unused parallel_state imported from megatron.core (unused-import)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:19:4: W0611: Unused ShardedObject imported from megatron.core.dist_checkpointing.mapping (unused-import)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:19:4: W0611: Unused ShardedTensor imported from megatron.core.dist_checkpointing.mapping (unused-import)
************* Module nemo.collections.nlp.models.language_modeling.megatron.gpt_full_te_layer_autocast_spec
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:51:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:147:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:179:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:303:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:336:0: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.collections.nlp.modules.common.megatron.adapters.mcore_mixins
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:140:0: C0301: Line too long (120/119) (line-too-long)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:142:0: C0301: Line too long (147/119) (line-too-long)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:60:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:69:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:76:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:108:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:226:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:326:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:352:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:436:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:443:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:467:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:474:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:492:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:499:4: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 9.42/10

Thank you for improving NeMo's documentation!

cuichenx

Please also bump the mcore version in Dockerfile.ci once the mcore commit is merged

cuichenx · 2024-11-15T20:29:11Z

nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py

+            rotary_pos_cos=rotary_pos_cos,
+            rotary_pos_sin=rotary_pos_sin,
+            inference_params=inference_params,
+            packed_seq_params=packed_seq_params,


does attention bias need to be passed to super.forward()?

mcore version hasn't bumped in ci, cannot add now

github-actions · 2024-11-18T20:26:55Z

[🤖]: Hi @yaoyu-33 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

Signed-off-by: Shriya Palsamudram <[email protected]> Fix FaultTolerencePlugin Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Add StragglerDetection callback to all NeMo2.0 recipes Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Add missing and remove unsued imports Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Add ft launcher test Signed-off-by: Shriya Palsamudram <[email protected]> fix typo Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> fix more typos Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> add ft launcher using nemo-run for llama3 test Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> fix serialization errors Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> create seperate ft test Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> change github actions test Signed-off-by: Shriya Palsamudram <[email protected]> draft crash simulation Signed-off-by: Shriya Balaji Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Simulate a crash using step, disable checkpointing Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Add a straggler detection test as well Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Revert enabling straggler_detection by default in all recipes Signed-off-by: Shriya Palsamudram <[email protected]> Remove unused imports Signed-off-by: Shriya Palsamudram <[email protected]> Remove extra check in ConfigValidationPlugin Signed-off-by: Shriya Palsamudram <[email protected]> Address pylinter issues Signed-off-by: Shriya Palsamudram <[email protected]> Improve straggler detection testing and add doc string Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> fix paths Signed-off-by: Shriya Palsamudram <[email protected]> Add assert for crash Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Append run logs to a file after a crash Signed-off-by: Shriya Palsamudram <[email protected]> Set FAULT_TOL_FINISHED_FLAG_FILE and FAULT_TOL_CFG_PATH Signed-off-by: Shriya Palsamudram <[email protected]> Add openai-gelu in gated activation (#11293) Fixes per comments (#11280) * Fixes per comments Signed-off-by: Gomathy Venkata Krishnan <[email protected]> * Update README Signed-off-by: Gomathy Venkata Krishnan <[email protected]> --------- Signed-off-by: Gomathy Venkata Krishnan <[email protected]> Add T5TTS (#11193) * added training and inference recipes for T5-TTS. * fix some attention errors * add copyright headers. * added TODO and detail error log info. * fixed missing a corner case. * added classes to __all__ * fixed to return either self-attention scores or cross-attention scores in ParallelTransformerLayer_ class. Signed-off-by: XuesongYang <[email protected]> --------- Signed-off-by: Jason <[email protected]> Signed-off-by: blisc <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: XuesongYang <[email protected]> Co-authored-by: blisc <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: XuesongYang <[email protected]> ci: Exclude CPU machines from scan (#11300) Signed-off-by: Oliver Koenig <[email protected]> Revert "fix(export): GPT models w/ bias=False convert properly (#11255)" (#11301) This reverts commit 2d4f4953881b9e2d118d3ffeba7e64625d827d11. remove redundant docs (#11302) Create phi3mini.py (#11281) * Create phi3mini.py Signed-off-by: mayani-nv <[email protected]> Apply isort and black reformatting Signed-off-by: mayani-nv <[email protected]> Update __init__.py Signed-off-by: mayani-nv <[email protected]> Update __init__.py Signed-off-by: mayani-nv <[email protected]> Apply isort and black reformatting Signed-off-by: mayani-nv <[email protected]> * Create phi3_mini_4k_instruct.py for adding to recipe Signed-off-by: mayani-nv <[email protected]> Apply isort and black reformatting Signed-off-by: mayani-nv <[email protected]> Update phi3_mini_4k_instruct.py and removed Performant recipe Signed-off-by: mayani-nv <[email protected]> Update phi3_mini_4k_instruct.py and removing performant condition Signed-off-by: mayani-nv <[email protected]> Update phi3_mini_4k_instruct.py with docstring changes Signed-off-by: mayani-nv <[email protected]> * Update __init__.py Signed-off-by: mayani-nv <[email protected]> * fixing pylint warnings * Apply isort and black reformatting Signed-off-by: mayani-nv <[email protected]> * correcting typos and adding working recipe files --------- Signed-off-by: mayani-nv <[email protected]> Signed-off-by: mayani-nv <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: mayani-nv <[email protected]> Integrate lm-eval-harness for evaluations in NeMo (#10621) * Add evaluate method and other minor fixes Signed-off-by: Abhishree <[email protected]> * Add inference params to evaluate method Signed-off-by: Abhishree <[email protected]> * Add wait_for_rest_service fn to evaluate method Signed-off-by: Abhishree <[email protected]> * Apply isort and black reformatting Signed-off-by: athitten <[email protected]> * Add logprobs to be returned by Pytriton for trtllm models Signed-off-by: Abhishree <[email protected]> * Increase max_retries in wait_for_rest_service method Signed-off-by: Abhishree <[email protected]> * Apply isort and black reformatting Signed-off-by: athitten <[email protected]> * Add unset slurm vars and use env vars for Triton args Signed-off-by: Abhishree <[email protected]> * Add logic to get logProbs from logits Signed-off-by: Abhishree <[email protected]> * Refactor, clean and organize the code 1) Refactors the code and creates an evaluation folder where all util methods live 2) Add doctsrings, comments 3) Expose gather_context_logits, gather_generation_logits in trtllm and add output_generation_logits flag to return generation logits and remove output_logporbs as its not getting used anymore Signed-off-by: Abhishree <[email protected]> * Add copyright and initialize special_tokens_kwargs in eval_utils.py Signed-off-by: Abhishree <[email protected]> * Add the following chanes 1) Move get_trtllm_deployable and unset_environment_variables to deploy base.py 2) Rename eval_utils.py to base.py 3) REstore scripts/export/convert_nemo2_for_export.py Signed-off-by: Abhishree <[email protected]> * Fix a minor typo Signed-off-by: Abhishree <[email protected]> * Revert output_log_probs and all_probs arg in tensorrt_llm_run.py Signed-off-by: Abhishree <[email protected]> * Fix docstrings formatting Signed-off-by: Abhishree <[email protected]> * Pylint and other minor fixes Signed-off-by: Abhishree <[email protected]> * Fix pylint and typos Signed-off-by: Abhishree <[email protected]> * Apply isort and black reformatting Signed-off-by: athitten <[email protected]> * Avoid multiple calls for tokenizer_type Co-authored-by: Ananth Subramaniam <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> * Replace print statements with logging statements Signed-off-by: Abhishree <[email protected]> * Apply isort and black reformatting Signed-off-by: athitten <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: athitten <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: athitten <[email protected]> Co-authored-by: Ananth Subramaniam <[email protected]> ci: Fix release workflow (#11286) * ci: Fix release workflow Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * Update .github/workflows/release.yml Signed-off-by: oliver könig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: oliver könig <[email protected]> Update import 'pytorch_lightning' -> 'lightning.pytorch' (#11252) * update import in collections/llm Signed-off-by: Maanu Grover <[email protected]> * update import in lightning Signed-off-by: Maanu Grover <[email protected]> * update fabric import in lightning Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in collections/asr Signed-off-by: Maanu Grover <[email protected]> * update import in collections/tts Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update requirements Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> * update import in tests Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in collections/common Signed-off-by: Maanu Grover <[email protected]> * update import in core Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in utils Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in collections/nlp Signed-off-by: Maanu Grover <[email protected]> * update fabric import in collections/nlp Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update fabric import in utils Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in nlp examples Signed-off-by: Maanu Grover <[email protected]> * update import in asr examples Signed-off-by: Maanu Grover <[email protected]> * update import in llm examples Signed-off-by: Maanu Grover <[email protected]> * update import in tts examples Signed-off-by: Maanu Grover <[email protected]> * update fabric import in nlp examples Signed-off-by: Maanu Grover <[email protected]> * update import in deploy Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in slu examples Signed-off-by: Maanu Grover <[email protected]> * update import in speaker_tasks examples Signed-off-by: Maanu Grover <[email protected]> * update import in collections/audio Signed-off-by: Maanu Grover <[email protected]> * update import in audio examples Signed-off-by: Maanu Grover <[email protected]> * update import in collections/llm Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in collections/vlm Signed-off-by: Maanu Grover <[email protected]> * update import in collections/diffusion Signed-off-by: Maanu Grover <[email protected]> * update import in collections/vision Signed-off-by: Maanu Grover <[email protected]> * update import in collections/multimodal Signed-off-by: Maanu Grover <[email protected]> * update import in multimodal examples Signed-off-by: Maanu Grover <[email protected]> * update import in vision examples Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in scripts Signed-off-by: Maanu Grover <[email protected]> * Update baseline Signed-off-by: maanug-nv <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * revert bad change Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: maanug-nv <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: maanug-nv <[email protected]> Co-authored-by: artbataev <[email protected]> fix perf plugin CUDA_DEVICE_MAX_CONNECTIONS setting (#11299) * fix Signed-off-by: Jimmy Zhang <[email protected]> * Docstrings Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> --------- Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> PTQ via NeMo-Run CLI (#10984) * PTQ support in nemo CLI Signed-off-by: Jan Lasek <[email protected]> * Naming engine vs checkpoint Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> PTQ memory optimization (#11257) * Initial commit Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * Add sample generate Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * Nemotron quantization, reduce diff Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * Reduce diff Signed-off-by: Piotr Kaminski <[email protected]> * code review suggestions Signed-off-by: Piotr Kaminski <[email protected]> * Bug fixes Signed-off-by: Piotr Kaminski <[email protected]> * remove not needed import Signed-off-by: Piotr Kaminski <[email protected]> * fix model type and allow ddp/optim setup Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> --------- Signed-off-by: Piotr Kaminski <[email protected]> Signed-off-by: Laplasjan107 <[email protected]> Signed-off-by: Piotr Kamiński <[email protected]> Co-authored-by: Piotr Kaminski <[email protected]> Co-authored-by: Laplasjan107 <[email protected]> Co-authored-by: Jan Lasek <[email protected]> update README.md (#11223) Signed-off-by: yaoyu-33 <[email protected]> Add `attention_bias` argument in transformer block and transformer layer modules, addressing change in MCore (#11289) * fix api Signed-off-by: yaoyu-33 <[email protected]> * fix ci Signed-off-by: yaoyu-33 <[email protected]> * add docstring Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix docstring2 Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix line too long Signed-off-by: yaoyu-33 <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Remove pytorch-lightning (#11306) * update import in docs Signed-off-by: Maanu Grover <[email protected]> * update import in tutorials Signed-off-by: Maanu Grover <[email protected]> * remove pl requirement Signed-off-by: Maanu Grover <[email protected]> * missed import updates Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Adding multimodal examples (#11279) * Adding multimodal examples * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> --------- Signed-off-by: shanmugamr1992 <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: shanmugamr1992 <[email protected]> Update T5 attention-mask shapes to be compatible with all attention-backend in new TE versions (#11059) * initial commits * updating cicd test * commit for FlashFused T5 from Mcore * testing CICD * update code for data/mock, update mcore commit for dockerfile * fix error * fix error * fix error in nemo/collections/llm/inference/base.py * update t5/data/mock.py * fix cicd erorr * remove unused libs * address Yu Yao's comments * Apply isort and black reformatting Signed-off-by: huvunvidia <[email protected]> --------- Signed-off-by: huvunvidia <[email protected]> Co-authored-by: Huy Vu2 <[email protected]> Co-authored-by: huvunvidia <[email protected]> Add HF untrusted code toggle (#11313) * add trust_remote_code toggle Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> P2p chunk size setting in nemo 2.0 (#11312) * NCCL P2P communication chunk size Signed-off-by: Sangkug Lym <[email protected]> * NCCL P2P communication chunk size Signed-off-by: Sangkug Lym <[email protected]> --------- Signed-off-by: Sangkug Lym <[email protected]> Nemo2 batcheval (#11158) * initial draft for eval api Signed-off-by: HuiyingLi <[email protected]> * add dp to generate Signed-off-by: HuiyingLi <[email protected]> * Apply isort and black reformatting Signed-off-by: HuiyingLi <[email protected]> * add top_k=1 to defaul inf param to get deterministic output Signed-off-by: HuiyingLi <[email protected]> * change name Signed-off-by: HuiyingLi <[email protected]> * add eval ds and write to file to llm.generate Signed-off-by: HuiyingLi <[email protected]> * support standalone input jsonl Signed-off-by: HuiyingLi <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: HuiyingLi <[email protected]> DoRA (#11104) * initial commit for DoRA Signed-off-by: Chen Cui <[email protected]> * clean up code Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * clean up Signed-off-by: Chen Cui <[email protected]> * fix TP Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add dropout correction term Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add copyright and doc strings Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * docstrings Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * docstrings Signed-off-by: Chen Cui <[email protected]> * add ci test Signed-off-by: Chen Cui <[email protected]> * add ci test Signed-off-by: Chen Cui <[email protected]> * typo Signed-off-by: Chen Cui <[email protected]> * remove unused code Signed-off-by: Chen Cui <[email protected]> * remove commented out code Signed-off-by: Chen Cui <[email protected]> * fix Signed-off-by: Chen Cui <[email protected]> * bug Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: cuichenx <[email protected]> Profiling - support Chakra & Kineto trace dumping (#11115) * Support chakra trace dumping by cfg Signed-off-by: Lily Wang <[email protected]> remove the manual recording of process::init Signed-off-by: Lily Wang <[email protected]> 1. Remove unnecessary kineto config 2. Fix typo Signed-off-by: Lily Wang <[email protected]> Change warning to exception when nsys is enabled with chakra profiling Signed-off-by: Lily Wang <[email protected]> * Apply isort and black reformatting Signed-off-by: pablo-garay <[email protected]> * fix bug in identifying profiling start step Signed-off-by: Lily Wang <[email protected]> * Update baseline Signed-off-by: lilyw97 <[email protected]> * [1]remove unused import [2]switch to use isinstance instead of type() [3]move torch.profiling to function Signed-off-by: Lily Wang <[email protected]> * Apply isort and black reformatting Signed-off-by: lilyw97 <[email protected]> --------- Signed-off-by: Lily Wang <[email protected]> Signed-off-by: pablo-garay <[email protected]> Signed-off-by: lilyw97 <[email protected]> Signed-off-by: Maanu Grover <[email protected]> Co-authored-by: Lily Wang <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: pablo-garay <[email protected]> Co-authored-by: lilyw97 <[email protected]> Co-authored-by: Maanu Grover <[email protected]> NeMo 2.0 SFT PEFT notebooks (#10874) * nemo2-sft notebook initial draft Signed-off-by: HuiyingLi <[email protected]> * remove mixtral info Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * add import_ckpt script and minor changes Signed-off-by: HuiyingLi <[email protected]> * Random read for tarr files in lhotse dataloaders (#10536) * Random read for tarr files in lhotse dataloaders Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Solve failled tests Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Adding a testcase Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Some changs in tests Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * removing import Signed-off-by: Nune <[email protected]> --------- Signed-off-by: Nune <[email protected]> Signed-off-by: nune-tadevosyan <[email protected]> Co-authored-by: nune-tadevosyan <[email protected]> * training code for hybrid-autoregressive inference model (#10841) * training code for hybrid-autoregressive inference model Signed-off-by: Hainan Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: hainan-xv <[email protected]> --------- Signed-off-by: Hainan Xu <[email protected]> Signed-off-by: hainan-xv <[email protected]> Co-authored-by: Hainan Xu <[email protected]> Co-authored-by: hainan-xv <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 772faca ! (#10871) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> * Use trainer.local_rank/global_rank (#10860) * fix global_rank calculation Signed-off-by: Alexandros Koumparoulis <[email protected]> * use trainer's global/local rank Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove stacking operation from batched functions (#10524) * remove stacking operations Signed-off-by: lilithgrigoryan <[email protected]> * fixes im base class Signed-off-by: lilithgrigoryan <[email protected]> * clean up Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * remove potentially uninitialized local variable Signed-off-by: lilithgrigoryan <[email protected]> * restore batch_intilize states funcname Signed-off-by: lilithgrigoryan <[email protected]> * fix typo Signed-off-by: lilithgrigoryan <[email protected]> * fix potentially uninitialized local variable Signed-off-by: lilithgrigoryan <[email protected]> * fix potentially uninitialized local variable in stateless transduser Signed-off-by: lilithgrigoryan <[email protected]> * fix test Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * fix docstring, rm comment Signed-off-by: lilithgrigoryan <[email protected]> * fix dosctrings Signed-off-by: lilithgrigoryan <[email protected]> --------- Signed-off-by: lilithgrigoryan <[email protected]> Signed-off-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> * [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471) * Add llm.generate Signed-off-by: Hemil Desai <[email protected]> * Remove comment Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix launching with python Signed-off-by: Hemil Desai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add assert cp Signed-off-by: Hemil Desai <[email protected]> * Add example script Signed-off-by: Hemil Desai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * Adding support for LightningDataModule inside Fabric-API (#10879) * Make FabricMegatronMixedPrecision match MegatronMixedPrecision Signed-off-by: Marc Romeijn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> * Supporting DataModule in fabric-API Signed-off-by: Marc Romeijn <[email protected]> * Adding support for LightningDataModule inside Fabric-API Signed-off-by: Marc Romeijn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> * Remove import in mock.py Signed-off-by: Marc Romeijn <[email protected]> --------- Signed-off-by: Marc Romeijn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * initial draft Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Save yaml config for model in nemo.lightning.io (#10765) * Save yaml config for model in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix bug Signed-off-by: Hemil Desai <[email protected]> * Fix bug Signed-off-by: Hemil Desai <[email protected]> * fix bug Signed-off-by: Hemil Desai <[email protected]> * Add explicit yaml comparison Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * relax test Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * Move collectiob.nlp imports inline for t5 (#10877) * Move collectiob.nlp imports inline for t5 Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> --------- Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * add world_size/pp_size runtime check (#10842) * add world_size/pp_size runtime check Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix msg precision Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix test_init_parallel_ranks ws=3 pp=3 Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix peft resume (#10887) Signed-off-by: Chen Cui <[email protected]> * Update engine build step for TRT-LLM 0.13.0 (#10880) * Setting use_fused_mlp for TRT-LLM >= 0.13.0 Signed-off-by: Jan Lasek <[email protected]> * Unused import removal Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * Akoumparouli/nemo ux moe loss logging (#10128) * Move across pipeline loss reduction to a separate function Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add support for MoE loss logging Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused function Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * enable vboost and set LM SM margin (#10853) * enable vboost Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * env vars Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * add perf plugin Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * revert default executor Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * fix typo Signed-off-by: Jimmy Zhang <[email protected]> * fix more typo Signed-off-by: Jimmy Zhang <[email protected]> * ln margin knob Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * specify lm margin Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> --------- Signed-off-by: Malay Nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: JimmyZhang12 <[email protected]> Co-authored-by: malay-nagda <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> * use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_k… (#10608) * use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_kwargs & overwrite device) Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Use torch sdpa implementation in ASR mha (#9590) * use pytorch sdpa Signed-off-by: WoodieDudy <[email protected]> * sdpa work Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: titu1994 <[email protected]> * sdpa flag to false & sdpa_backend arg Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * change arg name Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * fix config args Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * add condition on version Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * update condition on version Signed-off-by: WoodieDudy <[email protected]> * remove condition on torch version Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * move code to init Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * refactor Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * refactor Signed-off-by: WoodieDudy <[email protected]> --------- Signed-off-by: WoodieDudy <[email protected]> Signed-off-by: titu1994 <[email protected]> Signed-off-by: WoodieDudy <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: titu1994 <[email protected]> Co-authored-by: WoodieDudy <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * Add registry to register all needed classes with artifacts in nemo.lightning.io (#10861) * Add registry to register all needed classes with artifacts in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fixes Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> * comments Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Remove cyclic import Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: artbataev <[email protected]> * call __post_init__ after altering config values (#10885) * call __post_init__ after altering config values Signed-off-by: Alexandros Koumparoulis <[email protected]> * test fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * turn off SP Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * Nemo 2.0 ckpt support in TRT-LLM export (#10891) * fix minor import bug Signed-off-by: Onur Yilmaz <[email protected]> * Add registry to register all needed classes with artifacts in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fixes Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> * nemo 2.0 support in export to trt-llm Signed-off-by: Onur Yilmaz <[email protected]> * get mixing from main Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * fix style Signed-off-by: Onur Yilmaz <[email protected]> --------- Signed-off-by: Onur Yilmaz <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: oyilmaz-nvidia <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: oyilmaz-nvidia <[email protected]> * [Docs] Fix doc warnings, focus on feature and multimodal sections (#10171) * various simple docs source fixes Signed-off-by: Elena Rastorgueva <[email protected]> * fix docstrings and typing with forward reference Signed-off-by: Elena Rastorgueva <[email protected]> * Apply isort and black reformatting Signed-off-by: erastorgueva-nv <[email protected]> * fix typing forward reference for PromptedAudioToTextLhotseDataset Signed-off-by: Elena Rastorgueva <[email protected]> * fix feature warnings Signed-off-by: yaoyu-33 <[email protected]> * Try fix some model part errors Signed-off-by: yaoyu-33 <[email protected]> * try add requirements Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try add requirements Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix indent in docstring Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * update Signed-off-by: yaoyu-33 <[email protected]> * handle duplicate issue Signed-off-by: yaoyu-33 <[email protected]> * handle duplicate issue Signed-off-by: yaoyu-33 <[email protected]> * fix imagen cite * fix ratio issues Signed-off-by: yaoyu-33 <[email protected]> * fix Dreambooth Signed-off-by: yaoyu-33 <[email protected]> * Fix activation recomputation Signed-off-by: yaoyu-33 <[email protected]> * fix sequence packing Signed-off-by: yaoyu-33 <[email protected]> * fix asr_language_modeling_and_customization Signed-off-by: yaoyu-33 <[email protected]> * fixes wip Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: erastorgueva-nv <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Yu Yao <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: erastorgueva-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Ao Tang <[email protected]> Co-authored-by: Huiying Li <[email protected]> * calculate step time batch end-batch end (#10202) * log step time at end Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * use nemo logging Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * cleanup Signed-off-by: Malay Nagda <[email protected]> * check remove Signed-off-by: Malay Nagda <[email protected]> * delta timing callback Signed-off-by: Malay Nagda <[email protected]> * comment and name change Signed-off-by: Malay Nagda <[email protected]> --------- Signed-off-by: Malay Nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Co-authored-by: malay-nagda <[email protected]> * late import prettytable (#10912) Signed-off-by: Maanu Grover <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 0d89fc4 ! (#10919) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Warning for missing FP8 checkpoint support for vLLM deployment (#10906) Signed-off-by: Jan Lasek <[email protected]> * Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10821) * Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10787) * Add lhotse fixes for rnnt model training and WER hanging issue with fuse batching Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: nithinraok <[email protected]> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: nithinraok <[email protected]> Co-authored-by: artbataev <[email protected]> * Fix ASR tests (#10794) * Make tests required Signed-off-by: Vladimir Bataev <[email protected]> * Debug torch.load issue Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Run only necessary tests Signed-off-by: Vladimir Bataev <[email protected]> * Try fix loading Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Avoid caching fixture Signed-off-by: Vladimir Bataev <[email protected]> * Try restore model several times Signed-off-by: Vladimir Bataev <[email protected]> * Try customize temporary directory Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Reorder tests Signed-off-by: Vladimir Bataev <[email protected]> * Disable one test Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Avoid xxlarge model Signed-off-by: Vladimir Bataev <[email protected]> * Disable test Signed-off-by: Vladimir Bataev <[email protected]> * Revert changes Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Magic fix Signed-off-by: Vladimir Bataev <[email protected]> * Revert unnecessary changes Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Disable all jobs except L0 Signed-off-by: Vladimir Bataev <[email protected]> * RNNT alignments - merge with unit tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix CUDA graph frame-looping decoder to handle non-CUDA inputs Signed-off-by: Vladimir Bataev <[email protected]> * Fix config Signed-off-by: Vladimir Bataev <[email protected]> * Log test results Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Use less audio files for tests Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: artbataev <[email protected]> * Integrating mcore export (#10238) * Integrating mcore export * Integrating mcore export * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Move trt imports in nemo.collections.llm inside respective functions (#10234) Signed-off-by: Hemil Desai <[email protected]> * Add tests for LazyNeMoIterator and fix case with metadata_only=True and offsets in manifest (#10198) * Add tests for LazyNeMoIterator and fix case with manifest_only=True and offsets in manifest Signed-off-by: Piotr Żelasko <[email protected]> * Address code review Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> * [NeMo-UX] Fix a serialization bug that prevents users from moving checkpoints (#9939) * perfor serialization using relative paths to allow users to move checkpoints after they're saved Signed-off-by: ashors1 <[email protected]> * Apply isort and black reformatting Signed-off-by: ashors1 <[email protected]> * remove unused import Signed-off-by: ashors1 <[email protected]> * fix artifact load Signed-off-by: ashors1 <[email protected]> * fix path artifact Signed-off-by: ashors1 <[email protected]> * remove unused import Signed-off-by: ashors1 <[email protected]> --------- Signed-off-by: ashors1 <[email protected]> Signed-off-by: ashors1 <[email protected]> Co-authored-by: ashors1 <[email protected]> * Add MemoryProfileCallback (#10166) * Add MemoryProfileCallback Signed-off-by: Shriya Palsamudram <[email protected]> * Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> * Remove reference cycles, save snapshot on specific ranks Signed-off-by: Shriya Palsamudram <[email protected]> * Remove unnecessary imports Signed-off-by: Shriya Palsamudram <[email protected]> * Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> * Update docstring Signed-off-by: Shriya Palsamudram <[email protected]> --------- Signed-off-by: Shriya Palsamudram <[email protected]> Signed-off-by: ShriyaPalsamudram <[email protected]> Signed-off-by: Shriya Rishab <[email protected]> Co-authored-by: ShriyaPalsamudram <[email protected]> * Lower bound transformers to support nemotron (#10240) Signed-off-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> * [Audio] SSL Pretraining framework for flow-matching model for audio processing (#10052) Flow matching generative model with SSL pretraining framework Signed-off-by: Pin-Jui Ku <[email protected]> Co-authored-by: Kuray107 <[email protected]> * Revert torchrun fix for model import (#10251) Signed-off-by: Alexandros Koumparoulis <[email protected]> * [NeMo-UX[ Move nemotron imports inline (#10255) * Move nemotron transformers + tokenizer imports inline to reduce number of required deps Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> --------- Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * Wrap CPU model init with megatron_lazy_init_context (#10219) * Wrap CPU model init with megatron_lazy_init_context Signed-off-by: Alexandros Koumparoulis <[email protected]> * Cleanup checkpoint-dir if saving fails Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Bump `Dockerfile.ci` (2024-08-22) (#10227) * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 124bcff ! Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix bert flags Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Oliver Koenig <[email protected]> Co-authored-by: pablo-garay <[email protected]> * salm export trtllm (#10245) Signed-off-by: slyne deng <[email protected]> Co-authored-by: slyne deng <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to ef85bc9 ! (#10250) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 01ca03f ! (#10266) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: oliver könig <[email protected]> Co-authored-by: pablo-garay <[email protected]> * Load model in the target export precision by default in PTQ (#10267) * Load model in the target export precision by default Signed-off-by: Jan Lasek <[email protected]> * Enable megatron_amp_O2=true to actually use half-precision Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Jan Lasek <[email protected]> * Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins (#10223) * Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Remove duplicate Signed-off-by: Hemil Desai <[email protected]> * Add entity to wandb logger Signed-off-by: Hemil Desai <[email protected]> * Add documentation Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add warning Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add comments Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * [NeMo-UX] Handle absolute logger directories in nemo_logger (#10259) * handle absolute and relative logger directories Signed-off-by: Anna Shors <[email protected]> * merge lines Signed-off-by: ashors1 <[email protected]> --------- Signed-off-by: Anna Shors <[email protected]> Signed-off-by: ashors1 <[email protected]> * Add sdxl notebook (#10139) * Add sdxl notebook Signed-off-by: mingyuanm <[email protected]> * Rename Signed-off-by: mingyuanm <[email protected]> * final Update SDXL notebook Signed-off-by: mingyuanm <[email protected]> --------- Signed-off-by: mingyuanm <[email protected]> * Updating some coments * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Updating some coments * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Updating some coments * Small change * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * ADD support for layernorm1p * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> --------- Signed-off-by: shanmugamr1992 <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ashors1 <[email protected]> Signed-off-by: ashors1 <[email protected]> Signed-off-by: Shriya Palsamudram <[email protected]> Signed-off-by: ShriyaPalsamudram <[email protected]> Signed-off-by: Shriya Rishab <[email protected]> Signed-off-by: Dong Hyuk Chang <[email protected]> Signed-off-by: Pin-Jui Ku <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: slyne deng <[email protected]> Signed-off-by: oliver könig <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: Anna Shors <[email protected]> Signed-off-by: mingyuanm <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: shanmugamr1992 <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Anna Shors <[email protected]> Co-authored-by: ashors1 <[email protected]> Co-authored-by: Shriya Rishab <[email protected]> Co-authored-by: ShriyaPalsamudram <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Kuray107 <[email protected]> Co-authored-by: Kuray107 <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: pablo-garay <[email protected]> Co-authored-by: Slyne Deng <[email protected]> Co-authored-by: slyne deng <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: Ming <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> * Fix artifact saving (#10914) Signed-off-by: Hemil Desai <[email protected]> * Lora improvement (#10918) * pull out freeze model Signed-off-by: Chen Cui <[email protected]> * add wildcard match to lora target modules Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> * Huvu/t5 nemo2.0 peft (#10916) * adding peft test and cicd * add setting mcore model to train in peft.py * adding test for T5 lora * fix follow Chen's fix * restore cicd-main.yml --------- Co-authored-by: Huy Vu2 <[email protected]> * Add tie_word_embeddings=True (#10710) Signed-off-by: Yoshi Suhara <[email protected]> * Use a context-manager when opening files (#10895) * Use a context-manager when opening files Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: artbataev <[email protected]> * long context performance numbers in doc (#10784) * long context perf Signed-off-by: Youngeun Kwon <[email protected]> * update the long context perf Signed-off-by: Youngeun Kwon <[email protected]> * Akoumparouli/mcore microbatch calculator fix (#10780) * move tests/lightning/{,_}io Signed-off-by: Alexandros Koumparoulis <[email protected]> * add microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * use microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused var Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * remove 8x3b recipes (#10764) * remove 8x3b recipes Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove 8x3b from test_nemo_run Signed-off-by: Alexandros Koumparoulis <[email protected]> * rm from __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * change the figure file name Signed-off-by: Youngeun Kwon <[email protected]> * Accommodating the reviewer's comment Signed-off-by: Youngeun Kwon <[email protected]> * update the y-axis title Signed-off-by: Youngeun Kwon <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3f90b98 ! (#10789) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Add ModelOpt transformer model pruning example for Llama models, default to llama3.1-8b-base (#10294) * Add ModelOpt transformer model pruning example for Llama3 model Signed-off-by: Shengliang Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Shengliang Xu <[email protected]> * examples code is at wrong dir, move them Signed-off-by: Shengliang Xu <[email protected]> * changes as suggested in comment remove some logging and unused config code, update example model to llama3.1 Signed-off-by: Shengliang Xu <[email protected]> * Add pruning of hidden_size into example Signed-off-by: Shengliang Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Shengliang Xu <[email protected]> * Update examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml Signed-off-by: Keval Morabia <[email protected]> * Add pruning test to cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <2891698…

…yer modules, addressing change in MCore (NVIDIA#11289) * fix api Signed-off-by: yaoyu-33 <[email protected]> * fix ci Signed-off-by: yaoyu-33 <[email protected]> * add docstring Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix docstring2 Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix line too long Signed-off-by: yaoyu-33 <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]>

fix api

e85a454

Signed-off-by: yaoyu-33 <[email protected]>

github-actions bot added the NLP label Nov 14, 2024

yaoyu-33 added Run CICD skip-docs labels Nov 14, 2024

ko3n1g added Run CICD and removed Run CICD labels Nov 14, 2024

fix ci

0de7489

Signed-off-by: yaoyu-33 <[email protected]>

yaoyu-33 added Run CICD and removed Run CICD labels Nov 15, 2024

yaoyu-33 and others added 3 commits November 15, 2024 11:12

add docstring

b14c032

Signed-off-by: yaoyu-33 <[email protected]>

Apply isort and black reformatting

d33c516

Signed-off-by: yaoyu-33 <[email protected]>

fix docstring2

be0a75f

Signed-off-by: yaoyu-33 <[email protected]>

github-actions bot added the Multi Modal label Nov 15, 2024

Apply isort and black reformatting

9891dcd

Signed-off-by: yaoyu-33 <[email protected]>

github-actions bot removed the Multi Modal label Nov 15, 2024

yaoyu-33 added Run CICD skip-docs and removed Run CICD skip-docs labels Nov 15, 2024

fix line too long

1eadf6e

Signed-off-by: yaoyu-33 <[email protected]>

yaoyu-33 removed Run CICD skip-docs labels Nov 15, 2024

yaoyu-33 added Run CICD skip-docs labels Nov 15, 2024

cuichenx reviewed Nov 15, 2024

View reviewed changes

yaoyu-33 added Run CICD and removed Run CICD labels Nov 18, 2024

cuichenx approved these changes Nov 18, 2024

View reviewed changes

yaoyu-33 merged commit 168c3e5 into main Nov 18, 2024
319 of 321 checks passed

yaoyu-33 deleted the yuya/update_attention_bias_api branch November 18, 2024 20:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `attention_bias` argument in transformer block and transformer layer modules, addressing change in MCore #11289

Add `attention_bias` argument in transformer block and transformer layer modules, addressing change in MCore #11289

yaoyu-33 commented Nov 14, 2024

github-actions bot commented Nov 15, 2024

github-actions bot commented Nov 15, 2024

github-actions bot commented Nov 15, 2024

github-actions bot commented Nov 15, 2024

github-actions bot commented Nov 15, 2024

cuichenx left a comment

cuichenx Nov 15, 2024

yaoyu-33 Nov 15, 2024

github-actions bot commented Nov 18, 2024

Add attention_bias argument in transformer block and transformer layer modules, addressing change in MCore #11289

Add attention_bias argument in transformer block and transformer layer modules, addressing change in MCore #11289

Conversation

yaoyu-33 commented Nov 14, 2024

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

github-actions bot commented Nov 15, 2024

github-actions bot commented Nov 15, 2024

github-actions bot commented Nov 15, 2024

github-actions bot commented Nov 15, 2024

github-actions bot commented Nov 15, 2024

cuichenx left a comment

Choose a reason for hiding this comment

cuichenx Nov 15, 2024

Choose a reason for hiding this comment

yaoyu-33 Nov 15, 2024

Choose a reason for hiding this comment

github-actions bot commented Nov 18, 2024

Add `attention_bias` argument in transformer block and transformer layer modules, addressing change in MCore #11289

Add `attention_bias` argument in transformer block and transformer layer modules, addressing change in MCore #11289