Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add attention_bias argument in transformer block and transformer layer modules, addressing change in MCore #11289

Merged
merged 7 commits into from
Nov 18, 2024

Conversation

yaoyu-33
Copy link
Collaborator

What does this PR do ?

https://gitlab-master.nvidia.com/ADLR/megatron-lm/-/merge_requests/2293
we need to fix all occurances in NeMo where TransformerBlock is using a non-mcore TransformerLayer. Otherwise the forward interface between Block and Layer will not match anymore and so NeMo will break

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Copy link
Contributor

[🤖]: Hi @yaoyu-33 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

yaoyu-33 and others added 3 commits November 15, 2024 11:12
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Copy link
Contributor

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.


Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.collections.nlp.models.language_modeling.megatron.bert.bert_model
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:82:0: C0301: Line too long (121/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:70:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:128:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:144:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:177:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:184:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:202:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:296:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:306:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:335:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:570:4: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.collections.nlp.models.language_modeling.megatron.falcon.falcon_decoder_layer
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:70:0: C0301: Line too long (149/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:102:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:18:4: W0611: Unused parallel_state imported from megatron.core (unused-import)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:19:4: W0611: Unused ShardedObject imported from megatron.core.dist_checkpointing.mapping (unused-import)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:19:4: W0611: Unused ShardedTensor imported from megatron.core.dist_checkpointing.mapping (unused-import)
************* Module nemo.collections.nlp.models.language_modeling.megatron.gpt_full_te_layer_autocast_spec
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:51:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:147:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:179:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:303:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:336:0: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.collections.nlp.modules.common.megatron.adapters.mcore_mixins
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:140:0: C0301: Line too long (120/119) (line-too-long)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:142:0: C0301: Line too long (147/119) (line-too-long)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:60:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:69:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:76:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:108:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:226:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:326:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:352:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:436:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:443:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:467:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:474:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:492:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:499:4: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 9.42/10

Thank you for improving NeMo's documentation!

Copy link
Contributor

beep boop 🤖: 🚨 The following files must be fixed before merge!


Your code was analyzed with PyLint. The following annotations have been identified:


------------------------------------
Your code has been rated at 10.00/10

Thank you for improving NeMo's documentation!

1 similar comment
Copy link
Contributor

beep boop 🤖: 🚨 The following files must be fixed before merge!


Your code was analyzed with PyLint. The following annotations have been identified:


------------------------------------
Your code has been rated at 10.00/10

Thank you for improving NeMo's documentation!

Copy link
Contributor

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.


Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.collections.nlp.models.language_modeling.megatron.bert.bert_model
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:82:0: C0301: Line too long (121/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:70:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:128:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:144:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:177:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:184:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:202:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:296:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:306:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:335:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/bert/bert_model.py:570:4: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.collections.nlp.models.language_modeling.megatron.falcon.falcon_decoder_layer
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:70:0: C0301: Line too long (149/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:102:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:18:4: W0611: Unused parallel_state imported from megatron.core (unused-import)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:19:4: W0611: Unused ShardedObject imported from megatron.core.dist_checkpointing.mapping (unused-import)
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py:19:4: W0611: Unused ShardedTensor imported from megatron.core.dist_checkpointing.mapping (unused-import)
************* Module nemo.collections.nlp.models.language_modeling.megatron.gpt_full_te_layer_autocast_spec
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:51:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:147:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:179:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:303:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py:336:0: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.collections.nlp.modules.common.megatron.adapters.mcore_mixins
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:140:0: C0301: Line too long (120/119) (line-too-long)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:142:0: C0301: Line too long (147/119) (line-too-long)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:60:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:69:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:76:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:108:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:226:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:326:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:352:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:436:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:443:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:467:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:474:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:492:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:499:4: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 9.42/10

Thank you for improving NeMo's documentation!

Copy link
Collaborator

@cuichenx cuichenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also bump the mcore version in Dockerfile.ci once the mcore commit is merged

rotary_pos_cos=rotary_pos_cos,
rotary_pos_sin=rotary_pos_sin,
inference_params=inference_params,
packed_seq_params=packed_seq_params,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does attention bias need to be passed to super.forward()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mcore version hasn't bumped in ci, cannot add now

Copy link
Contributor

[🤖]: Hi @yaoyu-33 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

@yaoyu-33 yaoyu-33 merged commit 168c3e5 into main Nov 18, 2024
319 of 321 checks passed
@yaoyu-33 yaoyu-33 deleted the yuya/update_attention_bias_api branch November 18, 2024 20:45
ShriyaPalsamudram added a commit that referenced this pull request Dec 2, 2024
Signed-off-by: Shriya Palsamudram <[email protected]>

Fix FaultTolerencePlugin

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Add StragglerDetection callback to all NeMo2.0 recipes

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Add missing and remove unsued imports

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Add ft launcher test

Signed-off-by: Shriya Palsamudram <[email protected]>

fix typo

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

fix more typos

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

add ft launcher using nemo-run for llama3 test

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

fix serialization errors

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

create seperate ft test

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

change github actions test

Signed-off-by: Shriya Palsamudram <[email protected]>

draft crash simulation

Signed-off-by: Shriya Balaji Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Simulate a crash using step, disable checkpointing

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Add a straggler detection test as well

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Revert enabling straggler_detection by default in all recipes

Signed-off-by: Shriya Palsamudram <[email protected]>

Remove unused imports

Signed-off-by: Shriya Palsamudram <[email protected]>

Remove extra check in ConfigValidationPlugin

Signed-off-by: Shriya Palsamudram <[email protected]>

Address pylinter issues

Signed-off-by: Shriya Palsamudram <[email protected]>

Improve straggler detection testing and add doc string

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

fix paths

Signed-off-by: Shriya Palsamudram <[email protected]>

Add assert for crash

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Append run logs to a file after a crash

Signed-off-by: Shriya Palsamudram <[email protected]>

Set FAULT_TOL_FINISHED_FLAG_FILE and FAULT_TOL_CFG_PATH

Signed-off-by: Shriya Palsamudram <[email protected]>

Add openai-gelu in gated activation (#11293)

Fixes per comments (#11280)

* Fixes per comments

Signed-off-by: Gomathy Venkata Krishnan <[email protected]>

* Update README

Signed-off-by: Gomathy Venkata Krishnan <[email protected]>

---------

Signed-off-by: Gomathy Venkata Krishnan <[email protected]>

Add T5TTS (#11193)

* added training and inference recipes for T5-TTS.
* fix some attention errors
* add copyright headers.
* added TODO and detail error log info.
* fixed missing a corner case.
* added classes to __all__
* fixed to return either self-attention scores or cross-attention scores in ParallelTransformerLayer_ class.

Signed-off-by: XuesongYang <[email protected]>

---------

Signed-off-by: Jason <[email protected]>
Signed-off-by: blisc <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: XuesongYang <[email protected]>
Co-authored-by: blisc <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: XuesongYang <[email protected]>

ci: Exclude CPU machines from scan (#11300)

Signed-off-by: Oliver Koenig <[email protected]>

Revert "fix(export): GPT models w/ bias=False convert properly (#11255)" (#11301)

This reverts commit 2d4f4953881b9e2d118d3ffeba7e64625d827d11.

remove redundant docs (#11302)

Create phi3mini.py (#11281)

* Create phi3mini.py

Signed-off-by: mayani-nv <[email protected]>

Apply isort and black reformatting

Signed-off-by: mayani-nv <[email protected]>

Update __init__.py

Signed-off-by: mayani-nv <[email protected]>

Update __init__.py

Signed-off-by: mayani-nv <[email protected]>

Apply isort and black reformatting

Signed-off-by: mayani-nv <[email protected]>

* Create phi3_mini_4k_instruct.py for adding to recipe

Signed-off-by: mayani-nv <[email protected]>

Apply isort and black reformatting

Signed-off-by: mayani-nv <[email protected]>

Update phi3_mini_4k_instruct.py and removed Performant recipe

Signed-off-by: mayani-nv <[email protected]>

Update phi3_mini_4k_instruct.py and removing performant condition

Signed-off-by: mayani-nv <[email protected]>

Update phi3_mini_4k_instruct.py with docstring changes

Signed-off-by: mayani-nv <[email protected]>

* Update __init__.py

Signed-off-by: mayani-nv <[email protected]>

* fixing pylint warnings

* Apply isort and black reformatting

Signed-off-by: mayani-nv <[email protected]>

* correcting typos and adding working recipe files

---------

Signed-off-by: mayani-nv <[email protected]>
Signed-off-by: mayani-nv <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: mayani-nv <[email protected]>

Integrate lm-eval-harness for evaluations in NeMo (#10621)

* Add evaluate method and other minor fixes

Signed-off-by: Abhishree <[email protected]>

* Add inference params to evaluate method

Signed-off-by: Abhishree <[email protected]>

* Add wait_for_rest_service fn to evaluate method

Signed-off-by: Abhishree <[email protected]>

* Apply isort and black reformatting

Signed-off-by: athitten <[email protected]>

* Add logprobs to be returned by Pytriton for trtllm models

Signed-off-by: Abhishree <[email protected]>

* Increase max_retries in wait_for_rest_service method

Signed-off-by: Abhishree <[email protected]>

* Apply isort and black reformatting

Signed-off-by: athitten <[email protected]>

* Add unset slurm vars and use env vars for Triton args

Signed-off-by: Abhishree <[email protected]>

* Add logic to get logProbs from logits

Signed-off-by: Abhishree <[email protected]>

* Refactor, clean and organize the code

1) Refactors the code and creates an evaluation folder where all util methods live
2) Add doctsrings, comments
3) Expose gather_context_logits, gather_generation_logits in trtllm and add output_generation_logits flag to return generation logits and remove output_logporbs as its not getting used anymore

Signed-off-by: Abhishree <[email protected]>

* Add copyright and initialize special_tokens_kwargs in eval_utils.py

Signed-off-by: Abhishree <[email protected]>

* Add the following chanes

1) Move get_trtllm_deployable and unset_environment_variables to deploy base.py
2) Rename eval_utils.py to base.py
3) REstore scripts/export/convert_nemo2_for_export.py

Signed-off-by: Abhishree <[email protected]>

* Fix a minor typo

Signed-off-by: Abhishree <[email protected]>

* Revert output_log_probs and all_probs arg in tensorrt_llm_run.py

Signed-off-by: Abhishree <[email protected]>

* Fix docstrings formatting

Signed-off-by: Abhishree <[email protected]>

* Pylint and other minor fixes

Signed-off-by: Abhishree <[email protected]>

* Fix pylint and typos

Signed-off-by: Abhishree <[email protected]>

* Apply isort and black reformatting

Signed-off-by: athitten <[email protected]>

* Avoid multiple calls for tokenizer_type

Co-authored-by: Ananth Subramaniam <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>

* Replace print statements with logging statements

Signed-off-by: Abhishree <[email protected]>

* Apply isort and black reformatting

Signed-off-by: athitten <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: athitten <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: athitten <[email protected]>
Co-authored-by: Ananth Subramaniam <[email protected]>

ci: Fix release workflow (#11286)

* ci: Fix release workflow

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* Update .github/workflows/release.yml

Signed-off-by: oliver könig <[email protected]>

---------

Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: oliver könig <[email protected]>

Update import 'pytorch_lightning' -> 'lightning.pytorch' (#11252)

* update import in collections/llm

Signed-off-by: Maanu Grover <[email protected]>

* update import in lightning

Signed-off-by: Maanu Grover <[email protected]>

* update fabric import in lightning

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in collections/asr

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/tts

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update requirements

Signed-off-by: Maanu Grover <[email protected]>

* unused imports

Signed-off-by: Maanu Grover <[email protected]>

* update import in tests

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in collections/common

Signed-off-by: Maanu Grover <[email protected]>

* update import in core

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in utils

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in collections/nlp

Signed-off-by: Maanu Grover <[email protected]>

* update fabric import in collections/nlp

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update fabric import in utils

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in nlp examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in asr examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in llm examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in tts examples

Signed-off-by: Maanu Grover <[email protected]>

* update fabric import in nlp examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in deploy

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in slu examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in speaker_tasks examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/audio

Signed-off-by: Maanu Grover <[email protected]>

* update import in audio examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/llm

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in collections/vlm

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/diffusion

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/vision

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/multimodal

Signed-off-by: Maanu Grover <[email protected]>

* update import in multimodal examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in vision examples

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in scripts

Signed-off-by: Maanu Grover <[email protected]>

* Update baseline

Signed-off-by: maanug-nv <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* revert bad change

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>
Signed-off-by: maanug-nv <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: maanug-nv <[email protected]>
Co-authored-by: artbataev <[email protected]>

fix perf plugin CUDA_DEVICE_MAX_CONNECTIONS setting (#11299)

* fix

Signed-off-by: Jimmy Zhang <[email protected]>

* Docstrings

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>

PTQ via NeMo-Run CLI (#10984)

* PTQ support in nemo CLI

Signed-off-by: Jan Lasek <[email protected]>

* Naming engine vs checkpoint

Signed-off-by: Jan Lasek <[email protected]>

---------

Signed-off-by: Jan Lasek <[email protected]>

PTQ memory optimization (#11257)

* Initial commit

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* Add sample generate

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* Nemotron quantization, reduce diff

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* Reduce diff

Signed-off-by: Piotr Kaminski <[email protected]>

* code review suggestions

Signed-off-by: Piotr Kaminski <[email protected]>

* Bug fixes

Signed-off-by: Piotr Kaminski <[email protected]>

* remove not needed import

Signed-off-by: Piotr Kaminski <[email protected]>

* fix model type and allow ddp/optim setup

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

---------

Signed-off-by: Piotr Kaminski <[email protected]>
Signed-off-by: Laplasjan107 <[email protected]>
Signed-off-by: Piotr Kamiński <[email protected]>
Co-authored-by: Piotr Kaminski <[email protected]>
Co-authored-by: Laplasjan107 <[email protected]>
Co-authored-by: Jan Lasek <[email protected]>

update README.md (#11223)

Signed-off-by: yaoyu-33 <[email protected]>

Add `attention_bias` argument in transformer block and transformer layer modules, addressing change in MCore (#11289)

* fix api

Signed-off-by: yaoyu-33 <[email protected]>

* fix ci

Signed-off-by: yaoyu-33 <[email protected]>

* add docstring

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix docstring2

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix line too long

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>

Remove pytorch-lightning (#11306)

* update import in docs

Signed-off-by: Maanu Grover <[email protected]>

* update import in tutorials

Signed-off-by: Maanu Grover <[email protected]>

* remove pl requirement

Signed-off-by: Maanu Grover <[email protected]>

* missed import updates

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>

Adding multimodal examples (#11279)

* Adding multimodal examples

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

---------

Signed-off-by: shanmugamr1992 <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: shanmugamr1992 <[email protected]>

Update T5 attention-mask shapes to be compatible with all attention-backend in new TE versions (#11059)

* initial commits

* updating cicd test

* commit for FlashFused T5 from Mcore

* testing CICD

* update code for data/mock, update mcore commit for dockerfile

* fix error

* fix error

* fix error in nemo/collections/llm/inference/base.py

* update t5/data/mock.py

* fix cicd erorr

* remove unused libs

* address Yu Yao's comments

* Apply isort and black reformatting

Signed-off-by: huvunvidia <[email protected]>

---------

Signed-off-by: huvunvidia <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: huvunvidia <[email protected]>

Add HF untrusted code toggle (#11313)

* add trust_remote_code toggle

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

P2p chunk size setting in nemo 2.0 (#11312)

* NCCL P2P communication chunk size

Signed-off-by: Sangkug Lym <[email protected]>

* NCCL P2P communication chunk size

Signed-off-by: Sangkug Lym <[email protected]>

---------

Signed-off-by: Sangkug Lym <[email protected]>

Nemo2 batcheval (#11158)

* initial draft for eval api

Signed-off-by: HuiyingLi <[email protected]>

* add dp to generate

Signed-off-by: HuiyingLi <[email protected]>

* Apply isort and black reformatting

Signed-off-by: HuiyingLi <[email protected]>

* add top_k=1 to defaul inf param to get deterministic output

Signed-off-by: HuiyingLi <[email protected]>

* change name

Signed-off-by: HuiyingLi <[email protected]>

* add eval ds and write to file to llm.generate

Signed-off-by: HuiyingLi <[email protected]>

* support standalone input jsonl

Signed-off-by: HuiyingLi <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: HuiyingLi <[email protected]>

DoRA (#11104)

* initial commit for DoRA

Signed-off-by: Chen Cui <[email protected]>

* clean up code

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* clean up

Signed-off-by: Chen Cui <[email protected]>

* fix TP

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* add dropout correction term

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* add copyright and doc strings

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* fix

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* docstrings

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* docstrings

Signed-off-by: Chen Cui <[email protected]>

* add ci test

Signed-off-by: Chen Cui <[email protected]>

* add ci test

Signed-off-by: Chen Cui <[email protected]>

* typo

Signed-off-by: Chen Cui <[email protected]>

* remove unused code

Signed-off-by: Chen Cui <[email protected]>

* remove commented out code

Signed-off-by: Chen Cui <[email protected]>

* fix

Signed-off-by: Chen Cui <[email protected]>

* bug

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Co-authored-by: cuichenx <[email protected]>

Profiling - support Chakra & Kineto trace dumping (#11115)

* Support chakra trace dumping by cfg

Signed-off-by: Lily Wang <[email protected]>

remove the manual recording of process::init

Signed-off-by: Lily Wang <[email protected]>

1. Remove unnecessary kineto config  2. Fix typo

Signed-off-by: Lily Wang <[email protected]>

Change warning to exception when nsys is enabled with chakra profiling

Signed-off-by: Lily Wang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: pablo-garay <[email protected]>

* fix bug in identifying profiling start step

Signed-off-by: Lily Wang <[email protected]>

* Update baseline

Signed-off-by: lilyw97 <[email protected]>

* [1]remove unused import [2]switch to use isinstance instead of type() [3]move torch.profiling to function

Signed-off-by: Lily Wang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: lilyw97 <[email protected]>

---------

Signed-off-by: Lily Wang <[email protected]>
Signed-off-by: pablo-garay <[email protected]>
Signed-off-by: lilyw97 <[email protected]>
Signed-off-by: Maanu Grover <[email protected]>
Co-authored-by: Lily Wang <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: pablo-garay <[email protected]>
Co-authored-by: lilyw97 <[email protected]>
Co-authored-by: Maanu Grover <[email protected]>

NeMo 2.0 SFT PEFT notebooks (#10874)

* nemo2-sft notebook initial draft

Signed-off-by: HuiyingLi <[email protected]>

* remove mixtral info

Signed-off-by: HuiyingLi <[email protected]>

* minor fixes

Signed-off-by: HuiyingLi <[email protected]>

* minor fixes

Signed-off-by: HuiyingLi <[email protected]>

* minor fixes

Signed-off-by: HuiyingLi <[email protected]>

* add import_ckpt script and minor changes

Signed-off-by: HuiyingLi <[email protected]>

* Random read for tarr files in lhotse dataloaders (#10536)

* Random read for tarr files in lhotse dataloaders

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* Solve failled tests

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* Adding a testcase

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* Some changs in tests

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* removing import

Signed-off-by: Nune <[email protected]>

---------

Signed-off-by: Nune <[email protected]>
Signed-off-by: nune-tadevosyan <[email protected]>
Co-authored-by: nune-tadevosyan <[email protected]>

* training code for hybrid-autoregressive inference model (#10841)

* training code for hybrid-autoregressive inference model

Signed-off-by: Hainan Xu <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hainan-xv <[email protected]>

---------

Signed-off-by: Hainan Xu <[email protected]>
Signed-off-by: hainan-xv <[email protected]>
Co-authored-by: Hainan Xu <[email protected]>
Co-authored-by: hainan-xv <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 772faca ! (#10871)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <[email protected]>

* Use trainer.local_rank/global_rank (#10860)

* fix global_rank calculation

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* use trainer's global/local rank

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove stacking operation from batched functions (#10524)

* remove stacking operations

Signed-off-by: lilithgrigoryan <[email protected]>

* fixes im base class

Signed-off-by: lilithgrigoryan <[email protected]>

* clean up

Signed-off-by: lilithgrigoryan <[email protected]>

* Apply isort and black reformatting

Signed-off-by: lilithgrigoryan <[email protected]>

* remove potentially uninitialized local variable

Signed-off-by: lilithgrigoryan <[email protected]>

* restore batch_intilize states funcname

Signed-off-by: lilithgrigoryan <[email protected]>

* fix typo

Signed-off-by: lilithgrigoryan <[email protected]>

* fix potentially uninitialized local variable

Signed-off-by: lilithgrigoryan <[email protected]>

* fix potentially uninitialized local variable
in stateless transduser

Signed-off-by: lilithgrigoryan <[email protected]>

* fix test

Signed-off-by: lilithgrigoryan <[email protected]>

* Apply isort and black reformatting

Signed-off-by: lilithgrigoryan <[email protected]>

* fix docstring, rm comment

Signed-off-by: lilithgrigoryan <[email protected]>

* fix dosctrings

Signed-off-by: lilithgrigoryan <[email protected]>

---------

Signed-off-by: lilithgrigoryan <[email protected]>
Signed-off-by: lilithgrigoryan <[email protected]>
Co-authored-by: lilithgrigoryan <[email protected]>
Co-authored-by: lilithgrigoryan <[email protected]>

* [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471)

* Add llm.generate

Signed-off-by: Hemil Desai <[email protected]>

* Remove comment

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix launching with python

Signed-off-by: Hemil Desai <[email protected]>

* PR feedback

Signed-off-by: Hemil Desai <[email protected]>

* PR feedback

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Add assert cp

Signed-off-by: Hemil Desai <[email protected]>

* Add example script

Signed-off-by: Hemil Desai <[email protected]>

* Fix

Signed-off-by: Hemil Desai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Co-authored-by: hemildesai <[email protected]>

* Adding support for LightningDataModule inside Fabric-API (#10879)

* Make FabricMegatronMixedPrecision match MegatronMixedPrecision

Signed-off-by: Marc Romeijn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

* Supporting DataModule in fabric-API

Signed-off-by: Marc Romeijn <[email protected]>

* Adding support for LightningDataModule inside Fabric-API

Signed-off-by: Marc Romeijn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

* Remove import in mock.py

Signed-off-by: Marc Romeijn <[email protected]>

---------

Signed-off-by: Marc Romeijn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>

* initial draft

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Save yaml config for model in nemo.lightning.io (#10765)

* Save yaml config for model in nemo.lightning.io

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix bug

Signed-off-by: Hemil Desai <[email protected]>

* Fix bug

Signed-off-by: Hemil Desai <[email protected]>

* fix bug

Signed-off-by: Hemil Desai <[email protected]>

* Add explicit yaml comparison

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* relax test

Signed-off-by: Hemil Desai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Co-authored-by: hemildesai <[email protected]>

* Move collectiob.nlp imports inline for t5 (#10877)

* Move collectiob.nlp imports inline for t5

Signed-off-by: Marc Romeyn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

---------

Signed-off-by: Marc Romeyn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>

* add world_size/pp_size runtime check (#10842)

* add world_size/pp_size runtime check

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix msg precision

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix test_init_parallel_ranks ws=3 pp=3

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix peft resume (#10887)

Signed-off-by: Chen Cui <[email protected]>

* Update engine build step for TRT-LLM 0.13.0 (#10880)

* Setting use_fused_mlp for TRT-LLM >= 0.13.0

Signed-off-by: Jan Lasek <[email protected]>

* Unused import removal

Signed-off-by: Jan Lasek <[email protected]>

---------

Signed-off-by: Jan Lasek <[email protected]>

* Akoumparouli/nemo ux moe loss logging (#10128)

* Move across pipeline loss reduction to a separate function

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add support for MoE loss logging

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove unused function

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* enable vboost and set LM SM margin (#10853)

* enable vboost

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* env vars

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* add perf plugin

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

* revert default executor

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

* fix typo

Signed-off-by: Jimmy Zhang <[email protected]>

* fix more typo

Signed-off-by: Jimmy Zhang <[email protected]>

* ln margin knob

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

* specify lm margin

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

---------

Signed-off-by: Malay Nagda <[email protected]>
Signed-off-by: malay-nagda <[email protected]>
Signed-off-by: malay-nagda <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: JimmyZhang12 <[email protected]>
Co-authored-by: malay-nagda <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>

* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_k… (#10608)

* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_kwargs & overwrite device)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* Use torch sdpa implementation in ASR mha (#9590)

* use pytorch sdpa

Signed-off-by: WoodieDudy <[email protected]>

* sdpa work

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: titu1994 <[email protected]>

* sdpa flag to false & sdpa_backend arg

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* change arg name

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* fix config args

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* add condition on version

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* update condition on version

Signed-off-by: WoodieDudy <[email protected]>

* remove condition on torch version

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* move code to init

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* refactor

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* refactor

Signed-off-by: WoodieDudy <[email protected]>

---------

Signed-off-by: WoodieDudy <[email protected]>
Signed-off-by: titu1994 <[email protected]>
Signed-off-by: WoodieDudy <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: titu1994 <[email protected]>
Co-authored-by: WoodieDudy <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>

* Add registry to register all needed classes with artifacts in nemo.lightning.io (#10861)

* Add registry to register all needed classes with artifacts in nemo.lightning.io

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fixes

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix

Signed-off-by: Hemil Desai <[email protected]>

* comments

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Remove cyclic import

Signed-off-by: Hemil Desai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: hemildesai <[email protected]>
Co-authored-by: artbataev <[email protected]>

* call __post_init__ after altering config values (#10885)

* call __post_init__ after altering config values

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* test fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* turn off SP

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Nemo 2.0 ckpt support in TRT-LLM export (#10891)

* fix minor import bug

Signed-off-by: Onur Yilmaz <[email protected]>

* Add registry to register all needed classes with artifacts in nemo.lightning.io

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fixes

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix

Signed-off-by: Hemil Desai <[email protected]>

* nemo 2.0 support in export to trt-llm

Signed-off-by: Onur Yilmaz <[email protected]>

* get mixing from main

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* fix style

Signed-off-by: Onur Yilmaz <[email protected]>

---------

Signed-off-by: Onur Yilmaz <[email protected]>
Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Signed-off-by: oyilmaz-nvidia <[email protected]>
Co-authored-by: Hemil Desai <[email protected]>
Co-authored-by: hemildesai <[email protected]>
Co-authored-by: oyilmaz-nvidia <[email protected]>

* [Docs] Fix doc warnings, focus on feature and multimodal sections (#10171)

* various simple docs source fixes

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix docstrings and typing with forward reference

Signed-off-by: Elena Rastorgueva <[email protected]>

* Apply isort and black reformatting

Signed-off-by: erastorgueva-nv <[email protected]>

* fix typing forward reference for PromptedAudioToTextLhotseDataset

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix feature warnings

Signed-off-by: yaoyu-33 <[email protected]>

* Try fix some model part errors

Signed-off-by: yaoyu-33 <[email protected]>

* try add requirements

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try add requirements

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix indent in docstring

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* update

Signed-off-by: yaoyu-33 <[email protected]>

* handle duplicate issue

Signed-off-by: yaoyu-33 <[email protected]>

* handle duplicate issue

Signed-off-by: yaoyu-33 <[email protected]>

* fix imagen cite

* fix ratio issues

Signed-off-by: yaoyu-33 <[email protected]>

* fix Dreambooth

Signed-off-by: yaoyu-33 <[email protected]>

* Fix activation recomputation

Signed-off-by: yaoyu-33 <[email protected]>

* fix sequence packing

Signed-off-by: yaoyu-33 <[email protected]>

* fix asr_language_modeling_and_customization

Signed-off-by: yaoyu-33 <[email protected]>

* fixes wip

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: erastorgueva-nv <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Yu Yao <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: erastorgueva-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Ao Tang <[email protected]>
Co-authored-by: Huiying Li <[email protected]>

* calculate step time batch end-batch end (#10202)

* log step time at end

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* use nemo logging

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* cleanup

Signed-off-by: Malay Nagda <[email protected]>

* check remove

Signed-off-by: Malay Nagda <[email protected]>

* delta timing callback

Signed-off-by: Malay Nagda <[email protected]>

* comment and name change

Signed-off-by: Malay Nagda <[email protected]>

---------

Signed-off-by: Malay Nagda <[email protected]>
Signed-off-by: malay-nagda <[email protected]>
Co-authored-by: malay-nagda <[email protected]>

* late import prettytable (#10912)

Signed-off-by: Maanu Grover <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 0d89fc4 ! (#10919)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Warning for missing FP8 checkpoint support for vLLM deployment (#10906)

Signed-off-by: Jan Lasek <[email protected]>

* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10821)

* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10787)

* Add lhotse fixes for rnnt model training and WER hanging issue with fuse batching

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* Apply isort and black reformatting

Signed-off-by: nithinraok <[email protected]>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: nithinraok <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nithinraok <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: nithinraok <[email protected]>
Co-authored-by: artbataev <[email protected]>

* Fix ASR tests (#10794)

* Make tests required

Signed-off-by: Vladimir Bataev <[email protected]>

* Debug torch.load issue

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Run only necessary tests

Signed-off-by: Vladimir Bataev <[email protected]>

* Try fix loading

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Avoid caching fixture

Signed-off-by: Vladimir Bataev <[email protected]>

* Try restore model several times

Signed-off-by: Vladimir Bataev <[email protected]>

* Try customize temporary directory

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Reorder tests

Signed-off-by: Vladimir Bataev <[email protected]>

* Disable one test

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Avoid xxlarge model

Signed-off-by: Vladimir Bataev <[email protected]>

* Disable test

Signed-off-by: Vladimir Bataev <[email protected]>

* Revert changes

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Magic fix

Signed-off-by: Vladimir Bataev <[email protected]>

* Revert unnecessary changes

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Disable all jobs except L0

Signed-off-by: Vladimir Bataev <[email protected]>

* RNNT alignments - merge with unit tests

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix CUDA graph frame-looping decoder to handle non-CUDA inputs

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix config

Signed-off-by: Vladimir Bataev <[email protected]>

* Log test results

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Use less audio files for tests

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: artbataev <[email protected]>

* Integrating mcore export (#10238)

* Integrating mcore export

* Integrating mcore export

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Move trt imports in nemo.collections.llm inside respective functions (#10234)

Signed-off-by: Hemil Desai <[email protected]>

* Add tests for LazyNeMoIterator and fix case with metadata_only=True and offsets in manifest (#10198)

* Add tests for LazyNeMoIterator and fix case with manifest_only=True and offsets in manifest

Signed-off-by: Piotr Żelasko <[email protected]>

* Address code review

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* [NeMo-UX] Fix a serialization bug that prevents users from moving checkpoints (#9939)

* perfor serialization using relative paths to allow users to move checkpoints after they're saved

Signed-off-by: ashors1 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ashors1 <[email protected]>

* remove unused import

Signed-off-by: ashors1 <[email protected]>

* fix artifact load

Signed-off-by: ashors1 <[email protected]>

* fix path artifact

Signed-off-by: ashors1 <[email protected]>

* remove unused import

Signed-off-by: ashors1 <[email protected]>

---------

Signed-off-by: ashors1 <[email protected]>
Signed-off-by: ashors1 <[email protected]>
Co-authored-by: ashors1 <[email protected]>

* Add MemoryProfileCallback (#10166)

* Add MemoryProfileCallback

Signed-off-by: Shriya Palsamudram <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

* Remove reference cycles, save snapshot on specific ranks

Signed-off-by: Shriya Palsamudram <[email protected]>

* Remove unnecessary imports

Signed-off-by: Shriya Palsamudram <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

* Update docstring

Signed-off-by: Shriya Palsamudram <[email protected]>

---------

Signed-off-by: Shriya Palsamudram <[email protected]>
Signed-off-by: ShriyaPalsamudram <[email protected]>
Signed-off-by: Shriya Rishab <[email protected]>
Co-authored-by: ShriyaPalsamudram <[email protected]>

* Lower bound transformers to support nemotron (#10240)

Signed-off-by: Dong Hyuk Chang <[email protected]>
Co-authored-by: Dong Hyuk Chang <[email protected]>

* [Audio] SSL Pretraining framework for flow-matching model for audio processing (#10052)

Flow matching generative model with SSL pretraining framework

Signed-off-by: Pin-Jui Ku <[email protected]>
Co-authored-by: Kuray107 <[email protected]>

* Revert torchrun fix for model import (#10251)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [NeMo-UX[ Move nemotron imports inline (#10255)

* Move nemotron transformers + tokenizer imports inline to reduce number of required deps

Signed-off-by: Marc Romeyn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

---------

Signed-off-by: Marc Romeyn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>

* Wrap CPU model init with megatron_lazy_init_context (#10219)

* Wrap CPU model init with megatron_lazy_init_context

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Cleanup checkpoint-dir if saving fails

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* Bump `Dockerfile.ci` (2024-08-22) (#10227)

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 124bcff !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix bert flags

Signed-off-by: Oliver Koenig <[email protected]>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>

* salm export trtllm (#10245)

Signed-off-by: slyne deng <[email protected]>
Co-authored-by: slyne deng <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to ef85bc9 ! (#10250)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 01ca03f ! (#10266)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: oliver könig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>

* Load model in the target export precision by default in PTQ (#10267)

* Load model in the target export precision by default

Signed-off-by: Jan Lasek <[email protected]>

* Enable megatron_amp_O2=true to actually use half-precision

Signed-off-by: Jan Lasek <[email protected]>

---------

Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: Jan Lasek <[email protected]>

* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins (#10223)

* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Remove duplicate

Signed-off-by: Hemil Desai <[email protected]>

* Add entity to wandb logger

Signed-off-by: Hemil Desai <[email protected]>

* Add documentation

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Add warning

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* PR feedback

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Add comments

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Co-authored-by: hemildesai <[email protected]>

* [NeMo-UX] Handle absolute logger directories in nemo_logger (#10259)

* handle absolute and relative logger directories

Signed-off-by: Anna Shors <[email protected]>

* merge lines

Signed-off-by: ashors1 <[email protected]>

---------

Signed-off-by: Anna Shors <[email protected]>
Signed-off-by: ashors1 <[email protected]>

* Add sdxl notebook (#10139)

* Add sdxl notebook

Signed-off-by: mingyuanm <[email protected]>

* Rename

Signed-off-by: mingyuanm <[email protected]>

* final Update SDXL notebook

Signed-off-by: mingyuanm <[email protected]>

---------

Signed-off-by: mingyuanm <[email protected]>

* Updating some coments

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Updating some coments

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Updating some coments

* Small change

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* ADD support for layernorm1p

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Update Dockerfile.ci

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Dockerfile.ci

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Dockerfile.ci

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: shanmugamr1992 <[email protected]>
Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ashors1 <[email protected]>
Signed-off-by: ashors1 <[email protected]>
Signed-off-by: Shriya Palsamudram <[email protected]>
Signed-off-by: ShriyaPalsamudram <[email protected]>
Signed-off-by: Shriya Rishab <[email protected]>
Signed-off-by: Dong Hyuk Chang <[email protected]>
Signed-off-by: Pin-Jui Ku <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: slyne deng <[email protected]>
Signed-off-by: oliver könig <[email protected]>
Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Signed-off-by: Anna Shors <[email protected]>
Signed-off-by: mingyuanm <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: shanmugamr1992 <[email protected]>
Co-authored-by: Hemil Desai <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Anna Shors <[email protected]>
Co-authored-by: ashors1 <[email protected]>
Co-authored-by: Shriya Rishab <[email protected]>
Co-authored-by: ShriyaPalsamudram <[email protected]>
Co-authored-by: Dong Hyuk Chang <[email protected]>
Co-authored-by: Dong Hyuk Chang <[email protected]>
Co-authored-by: Kuray107 <[email protected]>
Co-authored-by: Kuray107 <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Marc Romeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: oliver könig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>
Co-authored-by: Slyne Deng <[email protected]>
Co-authored-by: slyne deng <[email protected]>
Co-authored-by: Jan Lasek <[email protected]>
Co-authored-by: hemildesai <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>

* Fix artifact saving (#10914)

Signed-off-by: Hemil Desai <[email protected]>

* Lora improvement (#10918)

* pull out freeze model

Signed-off-by: Chen Cui <[email protected]>

* add wildcard match to lora target modules

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Huvu/t5 nemo2.0 peft (#10916)

* adding peft test and cicd

* add setting mcore model to train in peft.py

* adding test for T5 lora

* fix follow Chen's fix

* restore cicd-main.yml

---------

Co-authored-by: Huy Vu2 <[email protected]>

* Add tie_word_embeddings=True (#10710)

Signed-off-by: Yoshi Suhara <[email protected]>

* Use a context-manager when opening files (#10895)

* Use a context-manager when opening files

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: artbataev <[email protected]>

* long context performance numbers in doc (#10784)

* long context perf

Signed-off-by: Youngeun Kwon <[email protected]>

* update the long context perf

Signed-off-by: Youngeun Kwon <[email protected]>

* Akoumparouli/mcore microbatch calculator fix (#10780)

* move tests/lightning/{,_}io

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add microbatch calculator context manager

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* use microbatch calculator context manager

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove unused var

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* remove 8x3b recipes (#10764)

* remove 8x3b recipes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove 8x3b from test_nemo_run

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* rm from __init__

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* change the figure file name

Signed-off-by: Youngeun Kwon <[email protected]>

* Accommodating the reviewer's comment

Signed-off-by: Youngeun Kwon <[email protected]>

* update the y-axis title

Signed-off-by: Youngeun Kwon <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3f90b98 ! (#10789)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* Add ModelOpt transformer model pruning example for Llama models, default to llama3.1-8b-base (#10294)

* Add ModelOpt transformer model pruning example for Llama3 model

Signed-off-by: Shengliang Xu <[email protected]>

* Apply isort and black reformatting

Signed-off-by: shengliangxu <[email protected]>
Signed-off-by: Shengliang Xu <[email protected]>

* examples code is at wrong dir, move them

Signed-off-by: Shengliang Xu <[email protected]>

* changes as suggested in comment

remove some logging and unused config code, update example model to
llama3.1

Signed-off-by: Shengliang Xu <[email protected]>

* Add pruning of hidden_size into example

Signed-off-by: Shengliang Xu <[email protected]>

* Apply isort and black reformatting

Signed-off-by: shengliangxu <[email protected]>
Signed-off-by: Shengliang Xu <[email protected]>

* Update examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml

Signed-off-by: Keval Morabia <[email protected]>

* Add pruning test to cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <2891698…
XuesongYang pushed a commit to paarthneekhara/NeMo that referenced this pull request Jan 18, 2025
…yer modules, addressing change in MCore (NVIDIA#11289)

* fix api

Signed-off-by: yaoyu-33 <[email protected]>

* fix ci

Signed-off-by: yaoyu-33 <[email protected]>

* add docstring

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix docstring2

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix line too long

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants