Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump Dockerfile.ci (2024-10-17) #10919

Merged
merged 1 commit into from
Oct 17, 2024
Merged

Bump Dockerfile.ci (2024-10-17) #10919

merged 1 commit into from
Oct 17, 2024

Conversation

ko3n1g
Copy link
Collaborator

@ko3n1g ko3n1g commented Oct 17, 2024

🚀 PR to Bump Dockerfile.ci.

📝 Please remember the following to-do's before merge:

  • Verify the presubmit CI

🙏 Please merge this PR only if the CI workflow completed successfully.

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Contributor

[🤖]: Hi @ko3n1g 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

@ko3n1g ko3n1g merged commit 6b0e04d into main Oct 17, 2024
167 of 171 checks passed
@ko3n1g ko3n1g deleted the bump-ci-container-2024-10-17 branch October 17, 2024 14:31
yashaswikarnati pushed a commit that referenced this pull request Oct 20, 2024
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
artbataev pushed a commit to artbataev/NeMo that referenced this pull request Oct 22, 2024
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
akoumpa pushed a commit that referenced this pull request Oct 24, 2024
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 5, 2024
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Hainan Xu <[email protected]>
HuiyingLi pushed a commit to HuiyingLi/NeMo that referenced this pull request Nov 15, 2024
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
ericharper added a commit that referenced this pull request Nov 19, 2024
* nemo2-sft notebook initial draft

Signed-off-by: HuiyingLi <[email protected]>

* remove mixtral info

Signed-off-by: HuiyingLi <[email protected]>

* minor fixes

Signed-off-by: HuiyingLi <[email protected]>

* minor fixes

Signed-off-by: HuiyingLi <[email protected]>

* minor fixes

Signed-off-by: HuiyingLi <[email protected]>

* add import_ckpt script and minor changes

Signed-off-by: HuiyingLi <[email protected]>

* Random read for tarr files in lhotse dataloaders (#10536)

* Random read for tarr files in lhotse dataloaders

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* Solve failled tests

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* Adding a testcase

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* Some changs in tests

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* removing import

Signed-off-by: Nune <[email protected]>

---------

Signed-off-by: Nune <[email protected]>
Signed-off-by: nune-tadevosyan <[email protected]>
Co-authored-by: nune-tadevosyan <[email protected]>

* training code for hybrid-autoregressive inference model (#10841)

* training code for hybrid-autoregressive inference model

Signed-off-by: Hainan Xu <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hainan-xv <[email protected]>

---------

Signed-off-by: Hainan Xu <[email protected]>
Signed-off-by: hainan-xv <[email protected]>
Co-authored-by: Hainan Xu <[email protected]>
Co-authored-by: hainan-xv <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 772faca ! (#10871)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <[email protected]>

* Use trainer.local_rank/global_rank (#10860)

* fix global_rank calculation

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* use trainer's global/local rank

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove stacking operation from batched functions (#10524)

* remove stacking operations

Signed-off-by: lilithgrigoryan <[email protected]>

* fixes im base class

Signed-off-by: lilithgrigoryan <[email protected]>

* clean up

Signed-off-by: lilithgrigoryan <[email protected]>

* Apply isort and black reformatting

Signed-off-by: lilithgrigoryan <[email protected]>

* remove potentially uninitialized local variable

Signed-off-by: lilithgrigoryan <[email protected]>

* restore batch_intilize states funcname

Signed-off-by: lilithgrigoryan <[email protected]>

* fix typo

Signed-off-by: lilithgrigoryan <[email protected]>

* fix potentially uninitialized local variable

Signed-off-by: lilithgrigoryan <[email protected]>

* fix potentially uninitialized local variable
in stateless transduser

Signed-off-by: lilithgrigoryan <[email protected]>

* fix test

Signed-off-by: lilithgrigoryan <[email protected]>

* Apply isort and black reformatting

Signed-off-by: lilithgrigoryan <[email protected]>

* fix docstring, rm comment

Signed-off-by: lilithgrigoryan <[email protected]>

* fix dosctrings

Signed-off-by: lilithgrigoryan <[email protected]>

---------

Signed-off-by: lilithgrigoryan <[email protected]>
Signed-off-by: lilithgrigoryan <[email protected]>
Co-authored-by: lilithgrigoryan <[email protected]>
Co-authored-by: lilithgrigoryan <[email protected]>

* [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471)

* Add llm.generate

Signed-off-by: Hemil Desai <[email protected]>

* Remove comment

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix launching with python

Signed-off-by: Hemil Desai <[email protected]>

* PR feedback

Signed-off-by: Hemil Desai <[email protected]>

* PR feedback

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Add assert cp

Signed-off-by: Hemil Desai <[email protected]>

* Add example script

Signed-off-by: Hemil Desai <[email protected]>

* Fix

Signed-off-by: Hemil Desai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Co-authored-by: hemildesai <[email protected]>

* Adding support for LightningDataModule inside Fabric-API (#10879)

* Make FabricMegatronMixedPrecision match MegatronMixedPrecision

Signed-off-by: Marc Romeijn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

* Supporting DataModule in fabric-API

Signed-off-by: Marc Romeijn <[email protected]>

* Adding support for LightningDataModule inside Fabric-API

Signed-off-by: Marc Romeijn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

* Remove import in mock.py

Signed-off-by: Marc Romeijn <[email protected]>

---------

Signed-off-by: Marc Romeijn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>

* initial draft

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Save yaml config for model in nemo.lightning.io (#10765)

* Save yaml config for model in nemo.lightning.io

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix bug

Signed-off-by: Hemil Desai <[email protected]>

* Fix bug

Signed-off-by: Hemil Desai <[email protected]>

* fix bug

Signed-off-by: Hemil Desai <[email protected]>

* Add explicit yaml comparison

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* relax test

Signed-off-by: Hemil Desai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Co-authored-by: hemildesai <[email protected]>

* Move collectiob.nlp imports inline for t5 (#10877)

* Move collectiob.nlp imports inline for t5

Signed-off-by: Marc Romeyn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

---------

Signed-off-by: Marc Romeyn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>

* add world_size/pp_size runtime check (#10842)

* add world_size/pp_size runtime check

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix msg precision

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix test_init_parallel_ranks ws=3 pp=3

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix peft resume (#10887)

Signed-off-by: Chen Cui <[email protected]>

* Update engine build step for TRT-LLM 0.13.0 (#10880)

* Setting use_fused_mlp for TRT-LLM >= 0.13.0

Signed-off-by: Jan Lasek <[email protected]>

* Unused import removal

Signed-off-by: Jan Lasek <[email protected]>

---------

Signed-off-by: Jan Lasek <[email protected]>

* Akoumparouli/nemo ux moe loss logging (#10128)

* Move across pipeline loss reduction to a separate function

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add support for MoE loss logging

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove unused function

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* enable vboost and set LM SM margin (#10853)

* enable vboost

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* env vars

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* add perf plugin

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

* revert default executor

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

* fix typo

Signed-off-by: Jimmy Zhang <[email protected]>

* fix more typo

Signed-off-by: Jimmy Zhang <[email protected]>

* ln margin knob

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

* specify lm margin

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

---------

Signed-off-by: Malay Nagda <[email protected]>
Signed-off-by: malay-nagda <[email protected]>
Signed-off-by: malay-nagda <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: JimmyZhang12 <[email protected]>
Co-authored-by: malay-nagda <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>

* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_k… (#10608)

* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_kwargs & overwrite device)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* Use torch sdpa implementation in ASR mha (#9590)

* use pytorch sdpa

Signed-off-by: WoodieDudy <[email protected]>

* sdpa work

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: titu1994 <[email protected]>

* sdpa flag to false & sdpa_backend arg

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* change arg name

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* fix config args

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* add condition on version

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* update condition on version

Signed-off-by: WoodieDudy <[email protected]>

* remove condition on torch version

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* move code to init

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* refactor

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* refactor

Signed-off-by: WoodieDudy <[email protected]>

---------

Signed-off-by: WoodieDudy <[email protected]>
Signed-off-by: titu1994 <[email protected]>
Signed-off-by: WoodieDudy <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: titu1994 <[email protected]>
Co-authored-by: WoodieDudy <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>

* Add registry to register all needed classes with artifacts in nemo.lightning.io (#10861)

* Add registry to register all needed classes with artifacts in nemo.lightning.io

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fixes

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix

Signed-off-by: Hemil Desai <[email protected]>

* comments

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Remove cyclic import

Signed-off-by: Hemil Desai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: hemildesai <[email protected]>
Co-authored-by: artbataev <[email protected]>

* call __post_init__ after altering config values (#10885)

* call __post_init__ after altering config values

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* test fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* turn off SP

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Nemo 2.0 ckpt support in TRT-LLM export (#10891)

* fix minor import bug

Signed-off-by: Onur Yilmaz <[email protected]>

* Add registry to register all needed classes with artifacts in nemo.lightning.io

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fixes

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix

Signed-off-by: Hemil Desai <[email protected]>

* nemo 2.0 support in export to trt-llm

Signed-off-by: Onur Yilmaz <[email protected]>

* get mixing from main

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* fix style

Signed-off-by: Onur Yilmaz <[email protected]>

---------

Signed-off-by: Onur Yilmaz <[email protected]>
Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Signed-off-by: oyilmaz-nvidia <[email protected]>
Co-authored-by: Hemil Desai <[email protected]>
Co-authored-by: hemildesai <[email protected]>
Co-authored-by: oyilmaz-nvidia <[email protected]>

* [Docs] Fix doc warnings, focus on feature and multimodal sections (#10171)

* various simple docs source fixes

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix docstrings and typing with forward reference

Signed-off-by: Elena Rastorgueva <[email protected]>

* Apply isort and black reformatting

Signed-off-by: erastorgueva-nv <[email protected]>

* fix typing forward reference for PromptedAudioToTextLhotseDataset

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix feature warnings

Signed-off-by: yaoyu-33 <[email protected]>

* Try fix some model part errors

Signed-off-by: yaoyu-33 <[email protected]>

* try add requirements

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try add requirements

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix indent in docstring

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* update

Signed-off-by: yaoyu-33 <[email protected]>

* handle duplicate issue

Signed-off-by: yaoyu-33 <[email protected]>

* handle duplicate issue

Signed-off-by: yaoyu-33 <[email protected]>

* fix imagen cite

* fix ratio issues

Signed-off-by: yaoyu-33 <[email protected]>

* fix Dreambooth

Signed-off-by: yaoyu-33 <[email protected]>

* Fix activation recomputation

Signed-off-by: yaoyu-33 <[email protected]>

* fix sequence packing

Signed-off-by: yaoyu-33 <[email protected]>

* fix asr_language_modeling_and_customization

Signed-off-by: yaoyu-33 <[email protected]>

* fixes wip

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: erastorgueva-nv <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Yu Yao <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: erastorgueva-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Ao Tang <[email protected]>
Co-authored-by: Huiying Li <[email protected]>

* calculate step time batch end-batch end (#10202)

* log step time at end

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* use nemo logging

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* cleanup

Signed-off-by: Malay Nagda <[email protected]>

* check remove

Signed-off-by: Malay Nagda <[email protected]>

* delta timing callback

Signed-off-by: Malay Nagda <[email protected]>

* comment and name change

Signed-off-by: Malay Nagda <[email protected]>

---------

Signed-off-by: Malay Nagda <[email protected]>
Signed-off-by: malay-nagda <[email protected]>
Co-authored-by: malay-nagda <[email protected]>

* late import prettytable (#10912)

Signed-off-by: Maanu Grover <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 0d89fc4 ! (#10919)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Warning for missing FP8 checkpoint support for vLLM deployment (#10906)

Signed-off-by: Jan Lasek <[email protected]>

* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10821)

* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10787)

* Add lhotse fixes for rnnt model training and WER hanging issue with fuse batching

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* Apply isort and black reformatting

Signed-off-by: nithinraok <[email protected]>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: nithinraok <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nithinraok <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: nithinraok <[email protected]>
Co-authored-by: artbataev <[email protected]>

* Fix ASR tests (#10794)

* Make tests required

Signed-off-by: Vladimir Bataev <[email protected]>

* Debug torch.load issue

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Run only necessary tests

Signed-off-by: Vladimir Bataev <[email protected]>

* Try fix loading

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Avoid caching fixture

Signed-off-by: Vladimir Bataev <[email protected]>

* Try restore model several times

Signed-off-by: Vladimir Bataev <[email protected]>

* Try customize temporary directory

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Reorder tests

Signed-off-by: Vladimir Bataev <[email protected]>

* Disable one test

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Avoid xxlarge model

Signed-off-by: Vladimir Bataev <[email protected]>

* Disable test

Signed-off-by: Vladimir Bataev <[email protected]>

* Revert changes

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Magic fix

Signed-off-by: Vladimir Bataev <[email protected]>

* Revert unnecessary changes

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Disable all jobs except L0

Signed-off-by: Vladimir Bataev <[email protected]>

* RNNT alignments - merge with unit tests

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix CUDA graph frame-looping decoder to handle non-CUDA inputs

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix config

Signed-off-by: Vladimir Bataev <[email protected]>

* Log test results

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Use less audio files for tests

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: artbataev <[email protected]>

* Integrating mcore export (#10238)

* Integrating mcore export

* Integrating mcore export

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Move trt imports in nemo.collections.llm inside respective functions (#10234)

Signed-off-by: Hemil Desai <[email protected]>

* Add tests for LazyNeMoIterator and fix case with metadata_only=True and offsets in manifest (#10198)

* Add tests for LazyNeMoIterator and fix case with manifest_only=True and offsets in manifest

Signed-off-by: Piotr Żelasko <[email protected]>

* Address code review

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* [NeMo-UX] Fix a serialization bug that prevents users from moving checkpoints (#9939)

* perfor serialization using relative paths to allow users to move checkpoints after they're saved

Signed-off-by: ashors1 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ashors1 <[email protected]>

* remove unused import

Signed-off-by: ashors1 <[email protected]>

* fix artifact load

Signed-off-by: ashors1 <[email protected]>

* fix path artifact

Signed-off-by: ashors1 <[email protected]>

* remove unused import

Signed-off-by: ashors1 <[email protected]>

---------

Signed-off-by: ashors1 <[email protected]>
Signed-off-by: ashors1 <[email protected]>
Co-authored-by: ashors1 <[email protected]>

* Add MemoryProfileCallback (#10166)

* Add MemoryProfileCallback

Signed-off-by: Shriya Palsamudram <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

* Remove reference cycles, save snapshot on specific ranks

Signed-off-by: Shriya Palsamudram <[email protected]>

* Remove unnecessary imports

Signed-off-by: Shriya Palsamudram <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

* Update docstring

Signed-off-by: Shriya Palsamudram <[email protected]>

---------

Signed-off-by: Shriya Palsamudram <[email protected]>
Signed-off-by: ShriyaPalsamudram <[email protected]>
Signed-off-by: Shriya Rishab <[email protected]>
Co-authored-by: ShriyaPalsamudram <[email protected]>

* Lower bound transformers to support nemotron (#10240)

Signed-off-by: Dong Hyuk Chang <[email protected]>
Co-authored-by: Dong Hyuk Chang <[email protected]>

* [Audio] SSL Pretraining framework for flow-matching model for audio processing (#10052)

Flow matching generative model with SSL pretraining framework

Signed-off-by: Pin-Jui Ku <[email protected]>
Co-authored-by: Kuray107 <[email protected]>

* Revert torchrun fix for model import (#10251)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [NeMo-UX[ Move nemotron imports inline (#10255)

* Move nemotron transformers + tokenizer imports inline to reduce number of required deps

Signed-off-by: Marc Romeyn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

---------

Signed-off-by: Marc Romeyn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>

* Wrap CPU model init with megatron_lazy_init_context (#10219)

* Wrap CPU model init with megatron_lazy_init_context

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Cleanup checkpoint-dir if saving fails

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* Bump `Dockerfile.ci` (2024-08-22) (#10227)

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 124bcff !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix bert flags

Signed-off-by: Oliver Koenig <[email protected]>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>

* salm export trtllm (#10245)

Signed-off-by: slyne deng <[email protected]>
Co-authored-by: slyne deng <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to ef85bc9 ! (#10250)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 01ca03f ! (#10266)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: oliver könig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>

* Load model in the target export precision by default in PTQ (#10267)

* Load model in the target export precision by default

Signed-off-by: Jan Lasek <[email protected]>

* Enable megatron_amp_O2=true to actually use half-precision

Signed-off-by: Jan Lasek <[email protected]>

---------

Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: Jan Lasek <[email protected]>

* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins (#10223)

* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Remove duplicate

Signed-off-by: Hemil Desai <[email protected]>

* Add entity to wandb logger

Signed-off-by: Hemil Desai <[email protected]>

* Add documentation

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Add warning

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* PR feedback

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Add comments

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Co-authored-by: hemildesai <[email protected]>

* [NeMo-UX] Handle absolute logger directories in nemo_logger (#10259)

* handle absolute and relative logger directories

Signed-off-by: Anna Shors <[email protected]>

* merge lines

Signed-off-by: ashors1 <[email protected]>

---------

Signed-off-by: Anna Shors <[email protected]>
Signed-off-by: ashors1 <[email protected]>

* Add sdxl notebook (#10139)

* Add sdxl notebook

Signed-off-by: mingyuanm <[email protected]>

* Rename

Signed-off-by: mingyuanm <[email protected]>

* final Update SDXL notebook

Signed-off-by: mingyuanm <[email protected]>

---------

Signed-off-by: mingyuanm <[email protected]>

* Updating some coments

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Updating some coments

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Updating some coments

* Small change

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* ADD support for layernorm1p

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Update Dockerfile.ci

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Dockerfile.ci

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Dockerfile.ci

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: shanmugamr1992 <[email protected]>
Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ashors1 <[email protected]>
Signed-off-by: ashors1 <[email protected]>
Signed-off-by: Shriya Palsamudram <[email protected]>
Signed-off-by: ShriyaPalsamudram <[email protected]>
Signed-off-by: Shriya Rishab <[email protected]>
Signed-off-by: Dong Hyuk Chang <[email protected]>
Signed-off-by: Pin-Jui Ku <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: slyne deng <[email protected]>
Signed-off-by: oliver könig <[email protected]>
Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Signed-off-by: Anna Shors <[email protected]>
Signed-off-by: mingyuanm <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: shanmugamr1992 <[email protected]>
Co-authored-by: Hemil Desai <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Anna Shors <[email protected]>
Co-authored-by: ashors1 <[email protected]>
Co-authored-by: Shriya Rishab <[email protected]>
Co-authored-by: ShriyaPalsamudram <[email protected]>
Co-authored-by: Dong Hyuk Chang <[email protected]>
Co-authored-by: Dong Hyuk Chang <[email protected]>
Co-authored-by: Kuray107 <[email protected]>
Co-authored-by: Kuray107 <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Marc Romeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: oliver könig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>
Co-authored-by: Slyne Deng <[email protected]>
Co-authored-by: slyne deng <[email protected]>
Co-authored-by: Jan Lasek <[email protected]>
Co-authored-by: hemildesai <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>

* Fix artifact saving (#10914)

Signed-off-by: Hemil Desai <[email protected]>

* Lora improvement (#10918)

* pull out freeze model

Signed-off-by: Chen Cui <[email protected]>

* add wildcard match to lora target modules

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Huvu/t5 nemo2.0 peft (#10916)

* adding peft test and cicd

* add setting mcore model to train in peft.py

* adding test for T5 lora

* fix follow Chen's fix

* restore cicd-main.yml

---------

Co-authored-by: Huy Vu2 <[email protected]>

* Add tie_word_embeddings=True (#10710)

Signed-off-by: Yoshi Suhara <[email protected]>

* Use a context-manager when opening files (#10895)

* Use a context-manager when opening files

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: artbataev <[email protected]>

* long context performance numbers in doc (#10784)

* long context perf

Signed-off-by: Youngeun Kwon <[email protected]>

* update the long context perf

Signed-off-by: Youngeun Kwon <[email protected]>

* Akoumparouli/mcore microbatch calculator fix (#10780)

* move tests/lightning/{,_}io

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add microbatch calculator context manager

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* use microbatch calculator context manager

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove unused var

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* remove 8x3b recipes (#10764)

* remove 8x3b recipes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove 8x3b from test_nemo_run

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* rm from __init__

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* change the figure file name

Signed-off-by: Youngeun Kwon <[email protected]>

* Accommodating the reviewer's comment

Signed-off-by: Youngeun Kwon <[email protected]>

* update the y-axis title

Signed-off-by: Youngeun Kwon <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3f90b98 ! (#10789)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* Add ModelOpt transformer model pruning example for Llama models, default to llama3.1-8b-base (#10294)

* Add ModelOpt transformer model pruning example for Llama3 model

Signed-off-by: Shengliang Xu <[email protected]>

* Apply isort and black reformatting

Signed-off-by: shengliangxu <[email protected]>
Signed-off-by: Shengliang Xu <[email protected]>

* examples code is at wrong dir, move them

Signed-off-by: Shengliang Xu <[email protected]>

* changes as suggested in comment

remove some logging and unused config code, update example model to
llama3.1

Signed-off-by: Shengliang Xu <[email protected]>

* Add pruning of hidden_size into example

Signed-off-by: Shengliang Xu <[email protected]>

* Apply isort and black reformatting

Signed-off-by: shengliangxu <[email protected]>
Signed-off-by: Shengliang Xu <[email protected]>

* Update examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml

Signed-off-by: Keval Morabia <[email protected]>

* Add pruning test to cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

---------

Signed-off-by: Shengliang Xu <[email protected]>
Signed-off-by: shengliangxu <[email protected]>
Signed-off-by: Keval Morabia <[email protected]>
Co-authored-by: shengliangxu <[email protected]>
Co-authored-by: Keval Morabia <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* Update mamba.rst after dist ckpt addition (#10800)

Signed-off-by: Ali Taghibakhshi <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* fix chunked infer (#10581)

Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* fix state transform (#10728)

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* use ckpt_to_weights_subdir in restore (#10786)

* use ckpt_to_weights_subdir in restore

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* make ckpt_to_{weight,context}_subdir idempotent

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* Mixtral set seq_length=4k (#10704)

* enable SP & set seq_lenght=4k

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* update test expected values

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* 8x22b 4k

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* Fix for crashes with tensorboard_logger=false and VP + LoRA (#10792)

* Fix for crashes with tensorboard_logger=false and virtual pipeline parallel + LoRA

Signed-off-by: Valerie Sarge <[email protected]>

* Apply isort and black reformatting

Signed-off-by: vysarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: vysarge <[email protected]>
Co-authored-by: vysarge <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* Disable checkpoint conversion inside AutoResume (#10645)

* Disable checkpoint conversion inside AutoResume

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Update resume docstrings

Signed-off-by: Hemil Desai <[email protected]>

* fix

Signed-off-by: Hemil Desai <[email protected]>

* add default finetuning recipe and refactor llama3 8b recipe

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* address comment

Signed-off-by: Chen Cui <[email protected]>

* refactor other recipes

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* remove 8x3b finetuning recipe for now because HF version not available

Signed-off-by: Chen Cui <[email protected]>

* add copyright header

Signed-off-by: Chen Cui <[email protected]>

* adjust unit tests based on recipe fixes

Signed-off-by: Chen Cui <[email protected]>

* fix failed unit test

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Co-authored-by: hemildesai <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: cuichenx <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* replace png file to github assets

Signed-off-by: Youngeun Kwon <[email protected]>

* change image url to github release

Signed-off-by: Youngeun Kwon <[email protected]>

---------

Signed-off-by: Youngeun Kwon <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Shengliang Xu <[email protected]>
Signed-off-by: shengliangxu <[email protected]>
Signed-off-by: Keval Morabia <[email protected]>
Signed-off-by: Ali Taghibakhshi <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: vysarge <[email protected]>
Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: oliver könig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>
Co-authored-by: Shengliang Xu <[email protected]>
Co-authored-by: shengliangxu <[email protected]>
Co-authored-by: Keval Morabia <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: vysarge <[email protected]>
Co-authored-by: Hemil Desai <[email protected]>
Co-authored-by: hemildesai <[email protected]>
Co-authored-by: cuichenx <[email protected]>

* perf recipes and Mcore DistOpt params (#10883)

* 175b gpt3 recipe

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* dist opt params

Signed-off-by: Malay Nagda <[email protected]>

* 405b dist opt params

Signed-off-by: Malay Nagda <[email protected]>

* perf recipes and dist opt params

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* MoE dist opt params

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* gpt bias fusion params

Signed-off-by: Malay Nagda <[email protected]>

* 175b recipe

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* perf params comments

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* MoE perf params comments

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* perf recipes suffix

Signed-off-by: Malay Nagda <[email protected]>

* specific models fusion params

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

---------

Signed-off-by: Malay Nagda <[email protected]>
Signed-off-by: malay-nagda <[email protected]>
Co-authored-by: malay-nagda <[email protected]>

* ci: Fix cherry pick team (#10945)

Signed-off-by: Oliver Koenig <[email protected]>

* Packed sequence bug fixes (#10898)

* save prepared dataset to different folders according to tokenizer name

Signed-off-by: Chen Cui <[email protected]>

* fix hang

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* fix hang

Signed-off-by: Chen Cui <[email protected]>

* raise mbs>1 error and provide suggestion to user instead of automatically changing config

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* add ci for packed seq

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* fix bug

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: cuichenx <[email protected]>
Co-authored-by: artbataev <[email protected]>

* Fix requirements for MacOS (#10930)

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix nemo 2.0 recipes  (#10915)

* Fix recipe num_nodes and long context docstring

* Fix typo

* Fix PP issue

* Fix unit test

* Change recipes

* fix test

* Fix unit tests

* Fix recipes

* Add general legal test on parallelization settings

* Rename test

* Apply isort and black reformatting

Signed-off-by: BoxiangW <[email protected]>

---------

Signed-off-by: BoxiangW <[email protected]>
Co-authored-by: BoxiangW <[email protected]>

* Akoumparouli/nemo ux fix dir or string artifact (#10936)

* Add __repr__ to Artifact

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* nemo.lightning.io.artifact: represent strings as fdl.Config to avoid path adjustment during restoration

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* t5 test minification

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* ckpt convert bug fixes (#10878)

* Mistral-NeMo-12B recipe

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* rename mistral to mistral_7b

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* include mistral_nemo_12b in __init__

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

* add to __init__

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

* Remove stale imports

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* TP=2

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove finetune_reci[e

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Rename MistralNeMo2407Config12B to MistralNeMoConfig12B per review's suggestion

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* update config names in tests

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* mistral-nemo-12b from llama_8b

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* TP=2; SP=True

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix overlap value

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

* update mistral-nemo-base-12b finetune recipe

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

* bug fix

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* remove extra file

Signed-off-by: dimapihtar <[email protected]>

* remove extra changes

Signed-off-by: dimapihtar <[email protected]>

* revert changes

Signed-off-by: dimapihtar <[email protected]>

* add ckpt_format configurable

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* revert changes

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: artbataev <[email protected]>

* fix typo in docstring (#10955)

Signed-off-by: ashors1 <[email protected]>

* remove deprecated ci tests (#10922)

* remove deprecated tutorial

Signed-off-by: dimapihtar <[email protected]>

* remove deprecated ci tests

Signed-off-by: dimapihtar <[email protected]>

* add deprecation note

Signed-off-by: dimapihtar <[email protected]>

* add deprecation note

Signed-off-by: dimapihtar <[email protected]>

* remove bart tests

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: dimapihtar <[email protected]>

* [Nemo CICD] Remove deprecated tests (#10960)

* remove deprecated tutorial

Signed-off-by: dimapihtar <[email protected]>

* remove deprecated ci tests

Signed-off-by: dimapihtar <[email protected]>

* add deprecation note

Signed-off-by: dimapihtar <[email protected]>

* add deprecation note

Signed-off-by: dimapihtar <[email protected]>

* remove bart tests

Signed-off-by: dimapihtar <[email protected]>

* Remove deleted CI tests

---------

Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Pablo Garay <[email protected]>
Co-authored-by: dimapihtar <[email protected]>

* Adithyare/oai chat completion (#10785)

* updates

Signed-off-by: adithyare <[email protected]>

* open ai chat completion wip

Signed-off-by: adithyare <[email protected]>

* responding with model responses

Signed-off-by: adithyare <[email protected]>

* Apply isort and black reformatting

Signed-off-by: arendu <[email protected]>

* also support general completion

Signed-off-by: adithyare <[email protected]>

* Apply isort and black reformatting

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: adithyare <[email protected]>
Signed-off-by: arendu <[email protected]>
Co-authored-by: arendu <[email protected]>

* Update megatron_t5_pretraining.py (#10952)

Signed-off-by: Huy Vu <[email protected]>

* Convert perf plugin env vars to strings (#10947)

Signed-off-by: Hemil Desai <[email protected]>

* disable dynamo for ddp checker (#10961)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to db7d37b ! (#10965)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <[email protected]>

* Mistral-NeMo-12B recipe (#10607)

* Mistral-NeMo-12B recipe

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* rename mistral to mistral_7b

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* include mistral_nemo_12b in __init__

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

* add to __init__

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

* Remove stale imports

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* TP=2

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove finetune_reci[e

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Rename MistralNeMo2407Config12B to MistralNeMoConfig12B per review's suggestion

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* update config names in tests

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* mistral-nemo-12b from llama_8b

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* TP=2; SP=True

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix overlap value

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

* update mistral-nemo-base-12b finetune recipe

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* Make nemo text processing optional in TTS (#10584)

* move TN guard to better location; make guard print error message rather than throwing error

Signed-off-by: Jason <[email protected]>

* Apply isort and black reformatting

Signed-off-by: blisc <[email protected]>

* Forgot to add the actual normalizer

Signed-off-by: Jason <[email protected]>

* Apply isort and black reformatting

Signed-off-by: blisc <[email protected]>

---------

Signed-off-by: Jason <[email protected]>
Signed-off-by: blisc <[email protected]>
Co-authored-by: blisc <[email protected]>

* respect warnings' filters (#10953)

* respect warnings' filters

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* Update T5 tokenizer (adding additional tokens to tokenizer config) (#10972)

* initial commit

* restore t5_pretraining

* Apply isort and black reformatting

Signed-off-by: huvunvidia <[email protected]>

---------

Signed-off-by: huvunvidia <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: huvunvidia <[email protected]>

* Alit/mamba recipe (#10935)

* add some mamba recipe

* add 130m

* add the rest of the recipes

* add tokenizer

* add tokenizer

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* add fixes to ssm for nemorun recipes

* add hybrid tokenizer

* updating some recipes

* Apply isort and black reformatting

Signed-off-by: JRD971000 <[email protected]>

* remove comments

* update gbs

* fix ckpt resume

* fix ckpt resume

* fix ckpt resume

* update recipes final

* Apply isort and black reformatting

Signed-off-by: JRD971000 <[email protected]>

* remove redundant imports

* ckpt convertor dtype fix

* Apply isort and black reformatting

Signed-off-by: JRD971000 <[email protected]>

---------

Signed-off-by: JRD971000 <[email protected]>
Signed-off-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: JRD971000 <[email protected]>

* Long context performance doc hot fix (#10946)

* long context perf

Signed-off-by: Youngeun Kwon <[email protected]>

* update the long context perf

Signed-off-by: Youngeun Kwon <[email protected]>

* Akoumparouli/mcore microbatch calculator fix (#10780)

* move tests/lightning/{,_}io

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add microbatch calculator context manager

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* use microbatch calculator context manager

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove unused var

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* remove 8x3b recipes (#10764)

* remove 8x3b recipes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove 8x3b from test_nemo_run

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* rm fr…
ShriyaPalsamudram added a commit that referenced this pull request Dec 2, 2024
Signed-off-by: Shriya Palsamudram <[email protected]>

Fix FaultTolerencePlugin

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Add StragglerDetection callback to all NeMo2.0 recipes

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Add missing and remove unsued imports

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Add ft launcher test

Signed-off-by: Shriya Palsamudram <[email protected]>

fix typo

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

fix more typos

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

add ft launcher using nemo-run for llama3 test

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

fix serialization errors

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

create seperate ft test

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

change github actions test

Signed-off-by: Shriya Palsamudram <[email protected]>

draft crash simulation

Signed-off-by: Shriya Balaji Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Simulate a crash using step, disable checkpointing

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Add a straggler detection test as well

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Revert enabling straggler_detection by default in all recipes

Signed-off-by: Shriya Palsamudram <[email protected]>

Remove unused imports

Signed-off-by: Shriya Palsamudram <[email protected]>

Remove extra check in ConfigValidationPlugin

Signed-off-by: Shriya Palsamudram <[email protected]>

Address pylinter issues

Signed-off-by: Shriya Palsamudram <[email protected]>

Improve straggler detection testing and add doc string

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

fix paths

Signed-off-by: Shriya Palsamudram <[email protected]>

Add assert for crash

Signed-off-by: Shriya Palsamudram <[email protected]>

Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

Append run logs to a file after a crash

Signed-off-by: Shriya Palsamudram <[email protected]>

Set FAULT_TOL_FINISHED_FLAG_FILE and FAULT_TOL_CFG_PATH

Signed-off-by: Shriya Palsamudram <[email protected]>

Add openai-gelu in gated activation (#11293)

Fixes per comments (#11280)

* Fixes per comments

Signed-off-by: Gomathy Venkata Krishnan <[email protected]>

* Update README

Signed-off-by: Gomathy Venkata Krishnan <[email protected]>

---------

Signed-off-by: Gomathy Venkata Krishnan <[email protected]>

Add T5TTS (#11193)

* added training and inference recipes for T5-TTS.
* fix some attention errors
* add copyright headers.
* added TODO and detail error log info.
* fixed missing a corner case.
* added classes to __all__
* fixed to return either self-attention scores or cross-attention scores in ParallelTransformerLayer_ class.

Signed-off-by: XuesongYang <[email protected]>

---------

Signed-off-by: Jason <[email protected]>
Signed-off-by: blisc <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: XuesongYang <[email protected]>
Co-authored-by: blisc <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: XuesongYang <[email protected]>

ci: Exclude CPU machines from scan (#11300)

Signed-off-by: Oliver Koenig <[email protected]>

Revert "fix(export): GPT models w/ bias=False convert properly (#11255)" (#11301)

This reverts commit 2d4f4953881b9e2d118d3ffeba7e64625d827d11.

remove redundant docs (#11302)

Create phi3mini.py (#11281)

* Create phi3mini.py

Signed-off-by: mayani-nv <[email protected]>

Apply isort and black reformatting

Signed-off-by: mayani-nv <[email protected]>

Update __init__.py

Signed-off-by: mayani-nv <[email protected]>

Update __init__.py

Signed-off-by: mayani-nv <[email protected]>

Apply isort and black reformatting

Signed-off-by: mayani-nv <[email protected]>

* Create phi3_mini_4k_instruct.py for adding to recipe

Signed-off-by: mayani-nv <[email protected]>

Apply isort and black reformatting

Signed-off-by: mayani-nv <[email protected]>

Update phi3_mini_4k_instruct.py and removed Performant recipe

Signed-off-by: mayani-nv <[email protected]>

Update phi3_mini_4k_instruct.py and removing performant condition

Signed-off-by: mayani-nv <[email protected]>

Update phi3_mini_4k_instruct.py with docstring changes

Signed-off-by: mayani-nv <[email protected]>

* Update __init__.py

Signed-off-by: mayani-nv <[email protected]>

* fixing pylint warnings

* Apply isort and black reformatting

Signed-off-by: mayani-nv <[email protected]>

* correcting typos and adding working recipe files

---------

Signed-off-by: mayani-nv <[email protected]>
Signed-off-by: mayani-nv <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: mayani-nv <[email protected]>

Integrate lm-eval-harness for evaluations in NeMo (#10621)

* Add evaluate method and other minor fixes

Signed-off-by: Abhishree <[email protected]>

* Add inference params to evaluate method

Signed-off-by: Abhishree <[email protected]>

* Add wait_for_rest_service fn to evaluate method

Signed-off-by: Abhishree <[email protected]>

* Apply isort and black reformatting

Signed-off-by: athitten <[email protected]>

* Add logprobs to be returned by Pytriton for trtllm models

Signed-off-by: Abhishree <[email protected]>

* Increase max_retries in wait_for_rest_service method

Signed-off-by: Abhishree <[email protected]>

* Apply isort and black reformatting

Signed-off-by: athitten <[email protected]>

* Add unset slurm vars and use env vars for Triton args

Signed-off-by: Abhishree <[email protected]>

* Add logic to get logProbs from logits

Signed-off-by: Abhishree <[email protected]>

* Refactor, clean and organize the code

1) Refactors the code and creates an evaluation folder where all util methods live
2) Add doctsrings, comments
3) Expose gather_context_logits, gather_generation_logits in trtllm and add output_generation_logits flag to return generation logits and remove output_logporbs as its not getting used anymore

Signed-off-by: Abhishree <[email protected]>

* Add copyright and initialize special_tokens_kwargs in eval_utils.py

Signed-off-by: Abhishree <[email protected]>

* Add the following chanes

1) Move get_trtllm_deployable and unset_environment_variables to deploy base.py
2) Rename eval_utils.py to base.py
3) REstore scripts/export/convert_nemo2_for_export.py

Signed-off-by: Abhishree <[email protected]>

* Fix a minor typo

Signed-off-by: Abhishree <[email protected]>

* Revert output_log_probs and all_probs arg in tensorrt_llm_run.py

Signed-off-by: Abhishree <[email protected]>

* Fix docstrings formatting

Signed-off-by: Abhishree <[email protected]>

* Pylint and other minor fixes

Signed-off-by: Abhishree <[email protected]>

* Fix pylint and typos

Signed-off-by: Abhishree <[email protected]>

* Apply isort and black reformatting

Signed-off-by: athitten <[email protected]>

* Avoid multiple calls for tokenizer_type

Co-authored-by: Ananth Subramaniam <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>

* Replace print statements with logging statements

Signed-off-by: Abhishree <[email protected]>

* Apply isort and black reformatting

Signed-off-by: athitten <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: athitten <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: athitten <[email protected]>
Co-authored-by: Ananth Subramaniam <[email protected]>

ci: Fix release workflow (#11286)

* ci: Fix release workflow

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* Update .github/workflows/release.yml

Signed-off-by: oliver könig <[email protected]>

---------

Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: oliver könig <[email protected]>

Update import 'pytorch_lightning' -> 'lightning.pytorch' (#11252)

* update import in collections/llm

Signed-off-by: Maanu Grover <[email protected]>

* update import in lightning

Signed-off-by: Maanu Grover <[email protected]>

* update fabric import in lightning

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in collections/asr

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/tts

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update requirements

Signed-off-by: Maanu Grover <[email protected]>

* unused imports

Signed-off-by: Maanu Grover <[email protected]>

* update import in tests

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in collections/common

Signed-off-by: Maanu Grover <[email protected]>

* update import in core

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in utils

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in collections/nlp

Signed-off-by: Maanu Grover <[email protected]>

* update fabric import in collections/nlp

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update fabric import in utils

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in nlp examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in asr examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in llm examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in tts examples

Signed-off-by: Maanu Grover <[email protected]>

* update fabric import in nlp examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in deploy

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in slu examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in speaker_tasks examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/audio

Signed-off-by: Maanu Grover <[email protected]>

* update import in audio examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/llm

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in collections/vlm

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/diffusion

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/vision

Signed-off-by: Maanu Grover <[email protected]>

* update import in collections/multimodal

Signed-off-by: Maanu Grover <[email protected]>

* update import in multimodal examples

Signed-off-by: Maanu Grover <[email protected]>

* update import in vision examples

Signed-off-by: Maanu Grover <[email protected]>

* Apply isort and black reformatting

Signed-off-by: maanug-nv <[email protected]>

* update import in scripts

Signed-off-by: Maanu Grover <[email protected]>

* Update baseline

Signed-off-by: maanug-nv <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* revert bad change

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>
Signed-off-by: maanug-nv <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: maanug-nv <[email protected]>
Co-authored-by: artbataev <[email protected]>

fix perf plugin CUDA_DEVICE_MAX_CONNECTIONS setting (#11299)

* fix

Signed-off-by: Jimmy Zhang <[email protected]>

* Docstrings

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>

PTQ via NeMo-Run CLI (#10984)

* PTQ support in nemo CLI

Signed-off-by: Jan Lasek <[email protected]>

* Naming engine vs checkpoint

Signed-off-by: Jan Lasek <[email protected]>

---------

Signed-off-by: Jan Lasek <[email protected]>

PTQ memory optimization (#11257)

* Initial commit

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* Add sample generate

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* Nemotron quantization, reduce diff

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* Reduce diff

Signed-off-by: Piotr Kaminski <[email protected]>

* code review suggestions

Signed-off-by: Piotr Kaminski <[email protected]>

* Bug fixes

Signed-off-by: Piotr Kaminski <[email protected]>

* remove not needed import

Signed-off-by: Piotr Kaminski <[email protected]>

* fix model type and allow ddp/optim setup

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

---------

Signed-off-by: Piotr Kaminski <[email protected]>
Signed-off-by: Laplasjan107 <[email protected]>
Signed-off-by: Piotr Kamiński <[email protected]>
Co-authored-by: Piotr Kaminski <[email protected]>
Co-authored-by: Laplasjan107 <[email protected]>
Co-authored-by: Jan Lasek <[email protected]>

update README.md (#11223)

Signed-off-by: yaoyu-33 <[email protected]>

Add `attention_bias` argument in transformer block and transformer layer modules, addressing change in MCore (#11289)

* fix api

Signed-off-by: yaoyu-33 <[email protected]>

* fix ci

Signed-off-by: yaoyu-33 <[email protected]>

* add docstring

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix docstring2

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix line too long

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>

Remove pytorch-lightning (#11306)

* update import in docs

Signed-off-by: Maanu Grover <[email protected]>

* update import in tutorials

Signed-off-by: Maanu Grover <[email protected]>

* remove pl requirement

Signed-off-by: Maanu Grover <[email protected]>

* missed import updates

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>

Adding multimodal examples (#11279)

* Adding multimodal examples

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

---------

Signed-off-by: shanmugamr1992 <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: shanmugamr1992 <[email protected]>

Update T5 attention-mask shapes to be compatible with all attention-backend in new TE versions (#11059)

* initial commits

* updating cicd test

* commit for FlashFused T5 from Mcore

* testing CICD

* update code for data/mock, update mcore commit for dockerfile

* fix error

* fix error

* fix error in nemo/collections/llm/inference/base.py

* update t5/data/mock.py

* fix cicd erorr

* remove unused libs

* address Yu Yao's comments

* Apply isort and black reformatting

Signed-off-by: huvunvidia <[email protected]>

---------

Signed-off-by: huvunvidia <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: huvunvidia <[email protected]>

Add HF untrusted code toggle (#11313)

* add trust_remote_code toggle

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

P2p chunk size setting in nemo 2.0 (#11312)

* NCCL P2P communication chunk size

Signed-off-by: Sangkug Lym <[email protected]>

* NCCL P2P communication chunk size

Signed-off-by: Sangkug Lym <[email protected]>

---------

Signed-off-by: Sangkug Lym <[email protected]>

Nemo2 batcheval (#11158)

* initial draft for eval api

Signed-off-by: HuiyingLi <[email protected]>

* add dp to generate

Signed-off-by: HuiyingLi <[email protected]>

* Apply isort and black reformatting

Signed-off-by: HuiyingLi <[email protected]>

* add top_k=1 to defaul inf param to get deterministic output

Signed-off-by: HuiyingLi <[email protected]>

* change name

Signed-off-by: HuiyingLi <[email protected]>

* add eval ds and write to file to llm.generate

Signed-off-by: HuiyingLi <[email protected]>

* support standalone input jsonl

Signed-off-by: HuiyingLi <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: HuiyingLi <[email protected]>

DoRA (#11104)

* initial commit for DoRA

Signed-off-by: Chen Cui <[email protected]>

* clean up code

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* clean up

Signed-off-by: Chen Cui <[email protected]>

* fix TP

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* add dropout correction term

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* add copyright and doc strings

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* fix

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* docstrings

Signed-off-by: Chen Cui <[email protected]>

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* docstrings

Signed-off-by: Chen Cui <[email protected]>

* add ci test

Signed-off-by: Chen Cui <[email protected]>

* add ci test

Signed-off-by: Chen Cui <[email protected]>

* typo

Signed-off-by: Chen Cui <[email protected]>

* remove unused code

Signed-off-by: Chen Cui <[email protected]>

* remove commented out code

Signed-off-by: Chen Cui <[email protected]>

* fix

Signed-off-by: Chen Cui <[email protected]>

* bug

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Co-authored-by: cuichenx <[email protected]>

Profiling - support Chakra & Kineto trace dumping (#11115)

* Support chakra trace dumping by cfg

Signed-off-by: Lily Wang <[email protected]>

remove the manual recording of process::init

Signed-off-by: Lily Wang <[email protected]>

1. Remove unnecessary kineto config  2. Fix typo

Signed-off-by: Lily Wang <[email protected]>

Change warning to exception when nsys is enabled with chakra profiling

Signed-off-by: Lily Wang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: pablo-garay <[email protected]>

* fix bug in identifying profiling start step

Signed-off-by: Lily Wang <[email protected]>

* Update baseline

Signed-off-by: lilyw97 <[email protected]>

* [1]remove unused import [2]switch to use isinstance instead of type() [3]move torch.profiling to function

Signed-off-by: Lily Wang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: lilyw97 <[email protected]>

---------

Signed-off-by: Lily Wang <[email protected]>
Signed-off-by: pablo-garay <[email protected]>
Signed-off-by: lilyw97 <[email protected]>
Signed-off-by: Maanu Grover <[email protected]>
Co-authored-by: Lily Wang <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: pablo-garay <[email protected]>
Co-authored-by: lilyw97 <[email protected]>
Co-authored-by: Maanu Grover <[email protected]>

NeMo 2.0 SFT PEFT notebooks (#10874)

* nemo2-sft notebook initial draft

Signed-off-by: HuiyingLi <[email protected]>

* remove mixtral info

Signed-off-by: HuiyingLi <[email protected]>

* minor fixes

Signed-off-by: HuiyingLi <[email protected]>

* minor fixes

Signed-off-by: HuiyingLi <[email protected]>

* minor fixes

Signed-off-by: HuiyingLi <[email protected]>

* add import_ckpt script and minor changes

Signed-off-by: HuiyingLi <[email protected]>

* Random read for tarr files in lhotse dataloaders (#10536)

* Random read for tarr files in lhotse dataloaders

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* Solve failled tests

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* Adding a testcase

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* Some changs in tests

Signed-off-by: Nune <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <[email protected]>

* removing import

Signed-off-by: Nune <[email protected]>

---------

Signed-off-by: Nune <[email protected]>
Signed-off-by: nune-tadevosyan <[email protected]>
Co-authored-by: nune-tadevosyan <[email protected]>

* training code for hybrid-autoregressive inference model (#10841)

* training code for hybrid-autoregressive inference model

Signed-off-by: Hainan Xu <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hainan-xv <[email protected]>

---------

Signed-off-by: Hainan Xu <[email protected]>
Signed-off-by: hainan-xv <[email protected]>
Co-authored-by: Hainan Xu <[email protected]>
Co-authored-by: hainan-xv <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 772faca ! (#10871)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <[email protected]>

* Use trainer.local_rank/global_rank (#10860)

* fix global_rank calculation

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* use trainer's global/local rank

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove stacking operation from batched functions (#10524)

* remove stacking operations

Signed-off-by: lilithgrigoryan <[email protected]>

* fixes im base class

Signed-off-by: lilithgrigoryan <[email protected]>

* clean up

Signed-off-by: lilithgrigoryan <[email protected]>

* Apply isort and black reformatting

Signed-off-by: lilithgrigoryan <[email protected]>

* remove potentially uninitialized local variable

Signed-off-by: lilithgrigoryan <[email protected]>

* restore batch_intilize states funcname

Signed-off-by: lilithgrigoryan <[email protected]>

* fix typo

Signed-off-by: lilithgrigoryan <[email protected]>

* fix potentially uninitialized local variable

Signed-off-by: lilithgrigoryan <[email protected]>

* fix potentially uninitialized local variable
in stateless transduser

Signed-off-by: lilithgrigoryan <[email protected]>

* fix test

Signed-off-by: lilithgrigoryan <[email protected]>

* Apply isort and black reformatting

Signed-off-by: lilithgrigoryan <[email protected]>

* fix docstring, rm comment

Signed-off-by: lilithgrigoryan <[email protected]>

* fix dosctrings

Signed-off-by: lilithgrigoryan <[email protected]>

---------

Signed-off-by: lilithgrigoryan <[email protected]>
Signed-off-by: lilithgrigoryan <[email protected]>
Co-authored-by: lilithgrigoryan <[email protected]>
Co-authored-by: lilithgrigoryan <[email protected]>

* [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471)

* Add llm.generate

Signed-off-by: Hemil Desai <[email protected]>

* Remove comment

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix launching with python

Signed-off-by: Hemil Desai <[email protected]>

* PR feedback

Signed-off-by: Hemil Desai <[email protected]>

* PR feedback

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Add assert cp

Signed-off-by: Hemil Desai <[email protected]>

* Add example script

Signed-off-by: Hemil Desai <[email protected]>

* Fix

Signed-off-by: Hemil Desai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Co-authored-by: hemildesai <[email protected]>

* Adding support for LightningDataModule inside Fabric-API (#10879)

* Make FabricMegatronMixedPrecision match MegatronMixedPrecision

Signed-off-by: Marc Romeijn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

* Supporting DataModule in fabric-API

Signed-off-by: Marc Romeijn <[email protected]>

* Adding support for LightningDataModule inside Fabric-API

Signed-off-by: Marc Romeijn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

* Remove import in mock.py

Signed-off-by: Marc Romeijn <[email protected]>

---------

Signed-off-by: Marc Romeijn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>

* initial draft

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Initial local run

Signed-off-by: smajumdar <[email protected]>

* Save yaml config for model in nemo.lightning.io (#10765)

* Save yaml config for model in nemo.lightning.io

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix bug

Signed-off-by: Hemil Desai <[email protected]>

* Fix bug

Signed-off-by: Hemil Desai <[email protected]>

* fix bug

Signed-off-by: Hemil Desai <[email protected]>

* Add explicit yaml comparison

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* relax test

Signed-off-by: Hemil Desai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Co-authored-by: hemildesai <[email protected]>

* Move collectiob.nlp imports inline for t5 (#10877)

* Move collectiob.nlp imports inline for t5

Signed-off-by: Marc Romeyn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

---------

Signed-off-by: Marc Romeyn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>

* add world_size/pp_size runtime check (#10842)

* add world_size/pp_size runtime check

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix msg precision

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix test_init_parallel_ranks ws=3 pp=3

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix peft resume (#10887)

Signed-off-by: Chen Cui <[email protected]>

* Update engine build step for TRT-LLM 0.13.0 (#10880)

* Setting use_fused_mlp for TRT-LLM >= 0.13.0

Signed-off-by: Jan Lasek <[email protected]>

* Unused import removal

Signed-off-by: Jan Lasek <[email protected]>

---------

Signed-off-by: Jan Lasek <[email protected]>

* Akoumparouli/nemo ux moe loss logging (#10128)

* Move across pipeline loss reduction to a separate function

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add support for MoE loss logging

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove unused function

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* enable vboost and set LM SM margin (#10853)

* enable vboost

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* env vars

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* add perf plugin

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

* revert default executor

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

* fix typo

Signed-off-by: Jimmy Zhang <[email protected]>

* fix more typo

Signed-off-by: Jimmy Zhang <[email protected]>

* ln margin knob

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

* specify lm margin

Signed-off-by: Jimmy Zhang <[email protected]>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <[email protected]>

---------

Signed-off-by: Malay Nagda <[email protected]>
Signed-off-by: malay-nagda <[email protected]>
Signed-off-by: malay-nagda <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: JimmyZhang12 <[email protected]>
Co-authored-by: malay-nagda <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>

* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_k… (#10608)

* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_kwargs & overwrite device)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* Use torch sdpa implementation in ASR mha (#9590)

* use pytorch sdpa

Signed-off-by: WoodieDudy <[email protected]>

* sdpa work

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: titu1994 <[email protected]>

* sdpa flag to false & sdpa_backend arg

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* change arg name

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* fix config args

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* add condition on version

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* update condition on version

Signed-off-by: WoodieDudy <[email protected]>

* remove condition on torch version

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* move code to init

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* refactor

Signed-off-by: WoodieDudy <[email protected]>

* Apply isort and black reformatting

Signed-off-by: WoodieDudy <[email protected]>

* refactor

Signed-off-by: WoodieDudy <[email protected]>

---------

Signed-off-by: WoodieDudy <[email protected]>
Signed-off-by: titu1994 <[email protected]>
Signed-off-by: WoodieDudy <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: titu1994 <[email protected]>
Co-authored-by: WoodieDudy <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>

* Add registry to register all needed classes with artifacts in nemo.lightning.io (#10861)

* Add registry to register all needed classes with artifacts in nemo.lightning.io

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fixes

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix

Signed-off-by: Hemil Desai <[email protected]>

* comments

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Remove cyclic import

Signed-off-by: Hemil Desai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: hemildesai <[email protected]>
Co-authored-by: artbataev <[email protected]>

* call __post_init__ after altering config values (#10885)

* call __post_init__ after altering config values

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* test fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* turn off SP

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Nemo 2.0 ckpt support in TRT-LLM export (#10891)

* fix minor import bug

Signed-off-by: Onur Yilmaz <[email protected]>

* Add registry to register all needed classes with artifacts in nemo.lightning.io

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fixes

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Fix

Signed-off-by: Hemil Desai <[email protected]>

* nemo 2.0 support in export to trt-llm

Signed-off-by: Onur Yilmaz <[email protected]>

* get mixing from main

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* fix style

Signed-off-by: Onur Yilmaz <[email protected]>

---------

Signed-off-by: Onur Yilmaz <[email protected]>
Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Signed-off-by: oyilmaz-nvidia <[email protected]>
Co-authored-by: Hemil Desai <[email protected]>
Co-authored-by: hemildesai <[email protected]>
Co-authored-by: oyilmaz-nvidia <[email protected]>

* [Docs] Fix doc warnings, focus on feature and multimodal sections (#10171)

* various simple docs source fixes

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix docstrings and typing with forward reference

Signed-off-by: Elena Rastorgueva <[email protected]>

* Apply isort and black reformatting

Signed-off-by: erastorgueva-nv <[email protected]>

* fix typing forward reference for PromptedAudioToTextLhotseDataset

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix feature warnings

Signed-off-by: yaoyu-33 <[email protected]>

* Try fix some model part errors

Signed-off-by: yaoyu-33 <[email protected]>

* try add requirements

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try add requirements

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix indent in docstring

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* update

Signed-off-by: yaoyu-33 <[email protected]>

* handle duplicate issue

Signed-off-by: yaoyu-33 <[email protected]>

* handle duplicate issue

Signed-off-by: yaoyu-33 <[email protected]>

* fix imagen cite

* fix ratio issues

Signed-off-by: yaoyu-33 <[email protected]>

* fix Dreambooth

Signed-off-by: yaoyu-33 <[email protected]>

* Fix activation recomputation

Signed-off-by: yaoyu-33 <[email protected]>

* fix sequence packing

Signed-off-by: yaoyu-33 <[email protected]>

* fix asr_language_modeling_and_customization

Signed-off-by: yaoyu-33 <[email protected]>

* fixes wip

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: erastorgueva-nv <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Yu Yao <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: erastorgueva-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Ao Tang <[email protected]>
Co-authored-by: Huiying Li <[email protected]>

* calculate step time batch end-batch end (#10202)

* log step time at end

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* use nemo logging

Signed-off-by: Malay Nagda <[email protected]>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <[email protected]>

* cleanup

Signed-off-by: Malay Nagda <[email protected]>

* check remove

Signed-off-by: Malay Nagda <[email protected]>

* delta timing callback

Signed-off-by: Malay Nagda <[email protected]>

* comment and name change

Signed-off-by: Malay Nagda <[email protected]>

---------

Signed-off-by: Malay Nagda <[email protected]>
Signed-off-by: malay-nagda <[email protected]>
Co-authored-by: malay-nagda <[email protected]>

* late import prettytable (#10912)

Signed-off-by: Maanu Grover <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 0d89fc4 ! (#10919)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Warning for missing FP8 checkpoint support for vLLM deployment (#10906)

Signed-off-by: Jan Lasek <[email protected]>

* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10821)

* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10787)

* Add lhotse fixes for rnnt model training and WER hanging issue with fuse batching

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* Apply isort and black reformatting

Signed-off-by: nithinraok <[email protected]>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: nithinraok <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nithinraok <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: nithinraok <[email protected]>
Co-authored-by: artbataev <[email protected]>

* Fix ASR tests (#10794)

* Make tests required

Signed-off-by: Vladimir Bataev <[email protected]>

* Debug torch.load issue

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Run only necessary tests

Signed-off-by: Vladimir Bataev <[email protected]>

* Try fix loading

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Avoid caching fixture

Signed-off-by: Vladimir Bataev <[email protected]>

* Try restore model several times

Signed-off-by: Vladimir Bataev <[email protected]>

* Try customize temporary directory

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Reorder tests

Signed-off-by: Vladimir Bataev <[email protected]>

* Disable one test

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Avoid xxlarge model

Signed-off-by: Vladimir Bataev <[email protected]>

* Disable test

Signed-off-by: Vladimir Bataev <[email protected]>

* Revert changes

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Magic fix

Signed-off-by: Vladimir Bataev <[email protected]>

* Revert unnecessary changes

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Disable all jobs except L0

Signed-off-by: Vladimir Bataev <[email protected]>

* RNNT alignments - merge with unit tests

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix CUDA graph frame-looping decoder to handle non-CUDA inputs

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix config

Signed-off-by: Vladimir Bataev <[email protected]>

* Log test results

Signed-off-by: Vladimir Bataev <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Use less audio files for tests

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: artbataev <[email protected]>

* Integrating mcore export (#10238)

* Integrating mcore export

* Integrating mcore export

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Move trt imports in nemo.collections.llm inside respective functions (#10234)

Signed-off-by: Hemil Desai <[email protected]>

* Add tests for LazyNeMoIterator and fix case with metadata_only=True and offsets in manifest (#10198)

* Add tests for LazyNeMoIterator and fix case with manifest_only=True and offsets in manifest

Signed-off-by: Piotr Żelasko <[email protected]>

* Address code review

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* [NeMo-UX] Fix a serialization bug that prevents users from moving checkpoints (#9939)

* perfor serialization using relative paths to allow users to move checkpoints after they're saved

Signed-off-by: ashors1 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ashors1 <[email protected]>

* remove unused import

Signed-off-by: ashors1 <[email protected]>

* fix artifact load

Signed-off-by: ashors1 <[email protected]>

* fix path artifact

Signed-off-by: ashors1 <[email protected]>

* remove unused import

Signed-off-by: ashors1 <[email protected]>

---------

Signed-off-by: ashors1 <[email protected]>
Signed-off-by: ashors1 <[email protected]>
Co-authored-by: ashors1 <[email protected]>

* Add MemoryProfileCallback (#10166)

* Add MemoryProfileCallback

Signed-off-by: Shriya Palsamudram <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

* Remove reference cycles, save snapshot on specific ranks

Signed-off-by: Shriya Palsamudram <[email protected]>

* Remove unnecessary imports

Signed-off-by: Shriya Palsamudram <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ShriyaPalsamudram <[email protected]>

* Update docstring

Signed-off-by: Shriya Palsamudram <[email protected]>

---------

Signed-off-by: Shriya Palsamudram <[email protected]>
Signed-off-by: ShriyaPalsamudram <[email protected]>
Signed-off-by: Shriya Rishab <[email protected]>
Co-authored-by: ShriyaPalsamudram <[email protected]>

* Lower bound transformers to support nemotron (#10240)

Signed-off-by: Dong Hyuk Chang <[email protected]>
Co-authored-by: Dong Hyuk Chang <[email protected]>

* [Audio] SSL Pretraining framework for flow-matching model for audio processing (#10052)

Flow matching generative model with SSL pretraining framework

Signed-off-by: Pin-Jui Ku <[email protected]>
Co-authored-by: Kuray107 <[email protected]>

* Revert torchrun fix for model import (#10251)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [NeMo-UX[ Move nemotron imports inline (#10255)

* Move nemotron transformers + tokenizer imports inline to reduce number of required deps

Signed-off-by: Marc Romeyn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

---------

Signed-off-by: Marc Romeyn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>

* Wrap CPU model init with megatron_lazy_init_context (#10219)

* Wrap CPU model init with megatron_lazy_init_context

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Cleanup checkpoint-dir if saving fails

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* Bump `Dockerfile.ci` (2024-08-22) (#10227)

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 124bcff !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix bert flags

Signed-off-by: Oliver Koenig <[email protected]>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>

* salm export trtllm (#10245)

Signed-off-by: slyne deng <[email protected]>
Co-authored-by: slyne deng <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to ef85bc9 ! (#10250)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 01ca03f ! (#10266)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: oliver könig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>

* Load model in the target export precision by default in PTQ (#10267)

* Load model in the target export precision by default

Signed-off-by: Jan Lasek <[email protected]>

* Enable megatron_amp_O2=true to actually use half-precision

Signed-off-by: Jan Lasek <[email protected]>

---------

Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: Jan Lasek <[email protected]>

* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins (#10223)

* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Remove duplicate

Signed-off-by: Hemil Desai <[email protected]>

* Add entity to wandb logger

Signed-off-by: Hemil Desai <[email protected]>

* Add documentation

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Add warning

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* PR feedback

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

* Add comments

Signed-off-by: Hemil Desai <[email protected]>

* Apply isort and black reformatting

Signed-off-by: hemildesai <[email protected]>

---------

Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Co-authored-by: hemildesai <[email protected]>

* [NeMo-UX] Handle absolute logger directories in nemo_logger (#10259)

* handle absolute and relative logger directories

Signed-off-by: Anna Shors <[email protected]>

* merge lines

Signed-off-by: ashors1 <[email protected]>

---------

Signed-off-by: Anna Shors <[email protected]>
Signed-off-by: ashors1 <[email protected]>

* Add sdxl notebook (#10139)

* Add sdxl notebook

Signed-off-by: mingyuanm <[email protected]>

* Rename

Signed-off-by: mingyuanm <[email protected]>

* final Update SDXL notebook

Signed-off-by: mingyuanm <[email protected]>

---------

Signed-off-by: mingyuanm <[email protected]>

* Updating some coments

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Updating some coments

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Updating some coments

* Small change

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* ADD support for layernorm1p

* Apply isort and black reformatting

Signed-off-by: shanmugamr1992 <[email protected]>

* Update Dockerfile.ci

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Dockerfile.ci

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Dockerfile.ci

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: shanmugamr1992 <[email protected]>
Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ashors1 <[email protected]>
Signed-off-by: ashors1 <[email protected]>
Signed-off-by: Shriya Palsamudram <[email protected]>
Signed-off-by: ShriyaPalsamudram <[email protected]>
Signed-off-by: Shriya Rishab <[email protected]>
Signed-off-by: Dong Hyuk Chang <[email protected]>
Signed-off-by: Pin-Jui Ku <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: slyne deng <[email protected]>
Signed-off-by: oliver könig <[email protected]>
Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: hemildesai <[email protected]>
Signed-off-by: Anna Shors <[email protected]>
Signed-off-by: mingyuanm <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: shanmugamr1992 <[email protected]>
Co-authored-by: Hemil Desai <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Anna Shors <[email protected]>
Co-authored-by: ashors1 <[email protected]>
Co-authored-by: Shriya Rishab <[email protected]>
Co-authored-by: ShriyaPalsamudram <[email protected]>
Co-authored-by: Dong Hyuk Chang <[email protected]>
Co-authored-by: Dong Hyuk Chang <[email protected]>
Co-authored-by: Kuray107 <[email protected]>
Co-authored-by: Kuray107 <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Marc Romeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: oliver könig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>
Co-authored-by: Slyne Deng <[email protected]>
Co-authored-by: slyne deng <[email protected]>
Co-authored-by: Jan Lasek <[email protected]>
Co-authored-by: hemildesai <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>

* Fix artifact saving (#10914)

Signed-off-by: Hemil Desai <[email protected]>

* Lora improvement (#10918)

* pull out freeze model

Signed-off-by: Chen Cui <[email protected]>

* add wildcard match to lora target modules

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Huvu/t5 nemo2.0 peft (#10916)

* adding peft test and cicd

* add setting mcore model to train in peft.py

* adding test for T5 lora

* fix follow Chen's fix

* restore cicd-main.yml

---------

Co-authored-by: Huy Vu2 <[email protected]>

* Add tie_word_embeddings=True (#10710)

Signed-off-by: Yoshi Suhara <[email protected]>

* Use a context-manager when opening files (#10895)

* Use a context-manager when opening files

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: artbataev <[email protected]>

* long context performance numbers in doc (#10784)

* long context perf

Signed-off-by: Youngeun Kwon <[email protected]>

* update the long context perf

Signed-off-by: Youngeun Kwon <[email protected]>

* Akoumparouli/mcore microbatch calculator fix (#10780)

* move tests/lightning/{,_}io

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add microbatch calculator context manager

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* use microbatch calculator context manager

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove unused var

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* remove 8x3b recipes (#10764)

* remove 8x3b recipes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove 8x3b from test_nemo_run

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* rm from __init__

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* change the figure file name

Signed-off-by: Youngeun Kwon <[email protected]>

* Accommodating the reviewer's comment

Signed-off-by: Youngeun Kwon <[email protected]>

* update the y-axis title

Signed-off-by: Youngeun Kwon <[email protected]>

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3f90b98 ! (#10789)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>

* Add ModelOpt transformer model pruning example for Llama models, default to llama3.1-8b-base (#10294)

* Add ModelOpt transformer model pruning example for Llama3 model

Signed-off-by: Shengliang Xu <[email protected]>

* Apply isort and black reformatting

Signed-off-by: shengliangxu <[email protected]>
Signed-off-by: Shengliang Xu <[email protected]>

* examples code is at wrong dir, move them

Signed-off-by: Shengliang Xu <[email protected]>

* changes as suggested in comment

remove some logging and unused config code, update example model to
llama3.1

Signed-off-by: Shengliang Xu <[email protected]>

* Add pruning of hidden_size into example

Signed-off-by: Shengliang Xu <[email protected]>

* Apply isort and black reformatting

Signed-off-by: shengliangxu <[email protected]>
Signed-off-by: Shengliang Xu <[email protected]>

* Update examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml

Signed-off-by: Keval Morabia <[email protected]>

* Add pruning test to cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <[email protected]>

* Update cicd-main.yml

Signed-off-by: Keval Morabia <2891698…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants