-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump Dockerfile.ci
(2024-10-17)
#10919
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
[🤖]: Hi @ko3n1g 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully So it might be time to merge this PR or get some approvals I'm just a bot so I'll leave it you what to do next. //cc @pablo-garay @ko3n1g |
yashaswikarnati
pushed a commit
that referenced
this pull request
Oct 20, 2024
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
artbataev
pushed a commit
to artbataev/NeMo
that referenced
this pull request
Oct 22, 2024
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
akoumpa
pushed a commit
that referenced
this pull request
Oct 24, 2024
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]>
hainan-xv
pushed a commit
to hainan-xv/NeMo
that referenced
this pull request
Nov 5, 2024
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu <[email protected]>
HuiyingLi
pushed a commit
to HuiyingLi/NeMo
that referenced
this pull request
Nov 15, 2024
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
ericharper
added a commit
that referenced
this pull request
Nov 19, 2024
* nemo2-sft notebook initial draft Signed-off-by: HuiyingLi <[email protected]> * remove mixtral info Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * add import_ckpt script and minor changes Signed-off-by: HuiyingLi <[email protected]> * Random read for tarr files in lhotse dataloaders (#10536) * Random read for tarr files in lhotse dataloaders Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Solve failled tests Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Adding a testcase Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Some changs in tests Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * removing import Signed-off-by: Nune <[email protected]> --------- Signed-off-by: Nune <[email protected]> Signed-off-by: nune-tadevosyan <[email protected]> Co-authored-by: nune-tadevosyan <[email protected]> * training code for hybrid-autoregressive inference model (#10841) * training code for hybrid-autoregressive inference model Signed-off-by: Hainan Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: hainan-xv <[email protected]> --------- Signed-off-by: Hainan Xu <[email protected]> Signed-off-by: hainan-xv <[email protected]> Co-authored-by: Hainan Xu <[email protected]> Co-authored-by: hainan-xv <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 772faca ! (#10871) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> * Use trainer.local_rank/global_rank (#10860) * fix global_rank calculation Signed-off-by: Alexandros Koumparoulis <[email protected]> * use trainer's global/local rank Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove stacking operation from batched functions (#10524) * remove stacking operations Signed-off-by: lilithgrigoryan <[email protected]> * fixes im base class Signed-off-by: lilithgrigoryan <[email protected]> * clean up Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * remove potentially uninitialized local variable Signed-off-by: lilithgrigoryan <[email protected]> * restore batch_intilize states funcname Signed-off-by: lilithgrigoryan <[email protected]> * fix typo Signed-off-by: lilithgrigoryan <[email protected]> * fix potentially uninitialized local variable Signed-off-by: lilithgrigoryan <[email protected]> * fix potentially uninitialized local variable in stateless transduser Signed-off-by: lilithgrigoryan <[email protected]> * fix test Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * fix docstring, rm comment Signed-off-by: lilithgrigoryan <[email protected]> * fix dosctrings Signed-off-by: lilithgrigoryan <[email protected]> --------- Signed-off-by: lilithgrigoryan <[email protected]> Signed-off-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> * [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471) * Add llm.generate Signed-off-by: Hemil Desai <[email protected]> * Remove comment Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix launching with python Signed-off-by: Hemil Desai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add assert cp Signed-off-by: Hemil Desai <[email protected]> * Add example script Signed-off-by: Hemil Desai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * Adding support for LightningDataModule inside Fabric-API (#10879) * Make FabricMegatronMixedPrecision match MegatronMixedPrecision Signed-off-by: Marc Romeijn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> * Supporting DataModule in fabric-API Signed-off-by: Marc Romeijn <[email protected]> * Adding support for LightningDataModule inside Fabric-API Signed-off-by: Marc Romeijn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> * Remove import in mock.py Signed-off-by: Marc Romeijn <[email protected]> --------- Signed-off-by: Marc Romeijn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * initial draft Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Save yaml config for model in nemo.lightning.io (#10765) * Save yaml config for model in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix bug Signed-off-by: Hemil Desai <[email protected]> * Fix bug Signed-off-by: Hemil Desai <[email protected]> * fix bug Signed-off-by: Hemil Desai <[email protected]> * Add explicit yaml comparison Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * relax test Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * Move collectiob.nlp imports inline for t5 (#10877) * Move collectiob.nlp imports inline for t5 Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> --------- Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * add world_size/pp_size runtime check (#10842) * add world_size/pp_size runtime check Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix msg precision Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix test_init_parallel_ranks ws=3 pp=3 Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix peft resume (#10887) Signed-off-by: Chen Cui <[email protected]> * Update engine build step for TRT-LLM 0.13.0 (#10880) * Setting use_fused_mlp for TRT-LLM >= 0.13.0 Signed-off-by: Jan Lasek <[email protected]> * Unused import removal Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * Akoumparouli/nemo ux moe loss logging (#10128) * Move across pipeline loss reduction to a separate function Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add support for MoE loss logging Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused function Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * enable vboost and set LM SM margin (#10853) * enable vboost Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * env vars Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * add perf plugin Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * revert default executor Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * fix typo Signed-off-by: Jimmy Zhang <[email protected]> * fix more typo Signed-off-by: Jimmy Zhang <[email protected]> * ln margin knob Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * specify lm margin Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> --------- Signed-off-by: Malay Nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: JimmyZhang12 <[email protected]> Co-authored-by: malay-nagda <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> * use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_k… (#10608) * use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_kwargs & overwrite device) Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Use torch sdpa implementation in ASR mha (#9590) * use pytorch sdpa Signed-off-by: WoodieDudy <[email protected]> * sdpa work Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: titu1994 <[email protected]> * sdpa flag to false & sdpa_backend arg Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * change arg name Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * fix config args Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * add condition on version Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * update condition on version Signed-off-by: WoodieDudy <[email protected]> * remove condition on torch version Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * move code to init Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * refactor Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * refactor Signed-off-by: WoodieDudy <[email protected]> --------- Signed-off-by: WoodieDudy <[email protected]> Signed-off-by: titu1994 <[email protected]> Signed-off-by: WoodieDudy <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: titu1994 <[email protected]> Co-authored-by: WoodieDudy <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * Add registry to register all needed classes with artifacts in nemo.lightning.io (#10861) * Add registry to register all needed classes with artifacts in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fixes Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> * comments Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Remove cyclic import Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: artbataev <[email protected]> * call __post_init__ after altering config values (#10885) * call __post_init__ after altering config values Signed-off-by: Alexandros Koumparoulis <[email protected]> * test fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * turn off SP Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * Nemo 2.0 ckpt support in TRT-LLM export (#10891) * fix minor import bug Signed-off-by: Onur Yilmaz <[email protected]> * Add registry to register all needed classes with artifacts in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fixes Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> * nemo 2.0 support in export to trt-llm Signed-off-by: Onur Yilmaz <[email protected]> * get mixing from main Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * fix style Signed-off-by: Onur Yilmaz <[email protected]> --------- Signed-off-by: Onur Yilmaz <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: oyilmaz-nvidia <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: oyilmaz-nvidia <[email protected]> * [Docs] Fix doc warnings, focus on feature and multimodal sections (#10171) * various simple docs source fixes Signed-off-by: Elena Rastorgueva <[email protected]> * fix docstrings and typing with forward reference Signed-off-by: Elena Rastorgueva <[email protected]> * Apply isort and black reformatting Signed-off-by: erastorgueva-nv <[email protected]> * fix typing forward reference for PromptedAudioToTextLhotseDataset Signed-off-by: Elena Rastorgueva <[email protected]> * fix feature warnings Signed-off-by: yaoyu-33 <[email protected]> * Try fix some model part errors Signed-off-by: yaoyu-33 <[email protected]> * try add requirements Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try add requirements Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix indent in docstring Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * update Signed-off-by: yaoyu-33 <[email protected]> * handle duplicate issue Signed-off-by: yaoyu-33 <[email protected]> * handle duplicate issue Signed-off-by: yaoyu-33 <[email protected]> * fix imagen cite * fix ratio issues Signed-off-by: yaoyu-33 <[email protected]> * fix Dreambooth Signed-off-by: yaoyu-33 <[email protected]> * Fix activation recomputation Signed-off-by: yaoyu-33 <[email protected]> * fix sequence packing Signed-off-by: yaoyu-33 <[email protected]> * fix asr_language_modeling_and_customization Signed-off-by: yaoyu-33 <[email protected]> * fixes wip Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: erastorgueva-nv <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Yu Yao <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: erastorgueva-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Ao Tang <[email protected]> Co-authored-by: Huiying Li <[email protected]> * calculate step time batch end-batch end (#10202) * log step time at end Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * use nemo logging Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * cleanup Signed-off-by: Malay Nagda <[email protected]> * check remove Signed-off-by: Malay Nagda <[email protected]> * delta timing callback Signed-off-by: Malay Nagda <[email protected]> * comment and name change Signed-off-by: Malay Nagda <[email protected]> --------- Signed-off-by: Malay Nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Co-authored-by: malay-nagda <[email protected]> * late import prettytable (#10912) Signed-off-by: Maanu Grover <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 0d89fc4 ! (#10919) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Warning for missing FP8 checkpoint support for vLLM deployment (#10906) Signed-off-by: Jan Lasek <[email protected]> * Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10821) * Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10787) * Add lhotse fixes for rnnt model training and WER hanging issue with fuse batching Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: nithinraok <[email protected]> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: nithinraok <[email protected]> Co-authored-by: artbataev <[email protected]> * Fix ASR tests (#10794) * Make tests required Signed-off-by: Vladimir Bataev <[email protected]> * Debug torch.load issue Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Run only necessary tests Signed-off-by: Vladimir Bataev <[email protected]> * Try fix loading Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Avoid caching fixture Signed-off-by: Vladimir Bataev <[email protected]> * Try restore model several times Signed-off-by: Vladimir Bataev <[email protected]> * Try customize temporary directory Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Reorder tests Signed-off-by: Vladimir Bataev <[email protected]> * Disable one test Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Avoid xxlarge model Signed-off-by: Vladimir Bataev <[email protected]> * Disable test Signed-off-by: Vladimir Bataev <[email protected]> * Revert changes Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Magic fix Signed-off-by: Vladimir Bataev <[email protected]> * Revert unnecessary changes Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Disable all jobs except L0 Signed-off-by: Vladimir Bataev <[email protected]> * RNNT alignments - merge with unit tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix CUDA graph frame-looping decoder to handle non-CUDA inputs Signed-off-by: Vladimir Bataev <[email protected]> * Fix config Signed-off-by: Vladimir Bataev <[email protected]> * Log test results Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Use less audio files for tests Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: artbataev <[email protected]> * Integrating mcore export (#10238) * Integrating mcore export * Integrating mcore export * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Move trt imports in nemo.collections.llm inside respective functions (#10234) Signed-off-by: Hemil Desai <[email protected]> * Add tests for LazyNeMoIterator and fix case with metadata_only=True and offsets in manifest (#10198) * Add tests for LazyNeMoIterator and fix case with manifest_only=True and offsets in manifest Signed-off-by: Piotr Żelasko <[email protected]> * Address code review Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> * [NeMo-UX] Fix a serialization bug that prevents users from moving checkpoints (#9939) * perfor serialization using relative paths to allow users to move checkpoints after they're saved Signed-off-by: ashors1 <[email protected]> * Apply isort and black reformatting Signed-off-by: ashors1 <[email protected]> * remove unused import Signed-off-by: ashors1 <[email protected]> * fix artifact load Signed-off-by: ashors1 <[email protected]> * fix path artifact Signed-off-by: ashors1 <[email protected]> * remove unused import Signed-off-by: ashors1 <[email protected]> --------- Signed-off-by: ashors1 <[email protected]> Signed-off-by: ashors1 <[email protected]> Co-authored-by: ashors1 <[email protected]> * Add MemoryProfileCallback (#10166) * Add MemoryProfileCallback Signed-off-by: Shriya Palsamudram <[email protected]> * Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> * Remove reference cycles, save snapshot on specific ranks Signed-off-by: Shriya Palsamudram <[email protected]> * Remove unnecessary imports Signed-off-by: Shriya Palsamudram <[email protected]> * Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> * Update docstring Signed-off-by: Shriya Palsamudram <[email protected]> --------- Signed-off-by: Shriya Palsamudram <[email protected]> Signed-off-by: ShriyaPalsamudram <[email protected]> Signed-off-by: Shriya Rishab <[email protected]> Co-authored-by: ShriyaPalsamudram <[email protected]> * Lower bound transformers to support nemotron (#10240) Signed-off-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> * [Audio] SSL Pretraining framework for flow-matching model for audio processing (#10052) Flow matching generative model with SSL pretraining framework Signed-off-by: Pin-Jui Ku <[email protected]> Co-authored-by: Kuray107 <[email protected]> * Revert torchrun fix for model import (#10251) Signed-off-by: Alexandros Koumparoulis <[email protected]> * [NeMo-UX[ Move nemotron imports inline (#10255) * Move nemotron transformers + tokenizer imports inline to reduce number of required deps Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> --------- Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * Wrap CPU model init with megatron_lazy_init_context (#10219) * Wrap CPU model init with megatron_lazy_init_context Signed-off-by: Alexandros Koumparoulis <[email protected]> * Cleanup checkpoint-dir if saving fails Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Bump `Dockerfile.ci` (2024-08-22) (#10227) * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 124bcff ! Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix bert flags Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Oliver Koenig <[email protected]> Co-authored-by: pablo-garay <[email protected]> * salm export trtllm (#10245) Signed-off-by: slyne deng <[email protected]> Co-authored-by: slyne deng <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to ef85bc9 ! (#10250) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 01ca03f ! (#10266) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: oliver könig <[email protected]> Co-authored-by: pablo-garay <[email protected]> * Load model in the target export precision by default in PTQ (#10267) * Load model in the target export precision by default Signed-off-by: Jan Lasek <[email protected]> * Enable megatron_amp_O2=true to actually use half-precision Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Jan Lasek <[email protected]> * Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins (#10223) * Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Remove duplicate Signed-off-by: Hemil Desai <[email protected]> * Add entity to wandb logger Signed-off-by: Hemil Desai <[email protected]> * Add documentation Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add warning Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add comments Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * [NeMo-UX] Handle absolute logger directories in nemo_logger (#10259) * handle absolute and relative logger directories Signed-off-by: Anna Shors <[email protected]> * merge lines Signed-off-by: ashors1 <[email protected]> --------- Signed-off-by: Anna Shors <[email protected]> Signed-off-by: ashors1 <[email protected]> * Add sdxl notebook (#10139) * Add sdxl notebook Signed-off-by: mingyuanm <[email protected]> * Rename Signed-off-by: mingyuanm <[email protected]> * final Update SDXL notebook Signed-off-by: mingyuanm <[email protected]> --------- Signed-off-by: mingyuanm <[email protected]> * Updating some coments * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Updating some coments * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Updating some coments * Small change * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * ADD support for layernorm1p * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> --------- Signed-off-by: shanmugamr1992 <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ashors1 <[email protected]> Signed-off-by: ashors1 <[email protected]> Signed-off-by: Shriya Palsamudram <[email protected]> Signed-off-by: ShriyaPalsamudram <[email protected]> Signed-off-by: Shriya Rishab <[email protected]> Signed-off-by: Dong Hyuk Chang <[email protected]> Signed-off-by: Pin-Jui Ku <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: slyne deng <[email protected]> Signed-off-by: oliver könig <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: Anna Shors <[email protected]> Signed-off-by: mingyuanm <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: shanmugamr1992 <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Anna Shors <[email protected]> Co-authored-by: ashors1 <[email protected]> Co-authored-by: Shriya Rishab <[email protected]> Co-authored-by: ShriyaPalsamudram <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Kuray107 <[email protected]> Co-authored-by: Kuray107 <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: pablo-garay <[email protected]> Co-authored-by: Slyne Deng <[email protected]> Co-authored-by: slyne deng <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: Ming <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> * Fix artifact saving (#10914) Signed-off-by: Hemil Desai <[email protected]> * Lora improvement (#10918) * pull out freeze model Signed-off-by: Chen Cui <[email protected]> * add wildcard match to lora target modules Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> * Huvu/t5 nemo2.0 peft (#10916) * adding peft test and cicd * add setting mcore model to train in peft.py * adding test for T5 lora * fix follow Chen's fix * restore cicd-main.yml --------- Co-authored-by: Huy Vu2 <[email protected]> * Add tie_word_embeddings=True (#10710) Signed-off-by: Yoshi Suhara <[email protected]> * Use a context-manager when opening files (#10895) * Use a context-manager when opening files Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: artbataev <[email protected]> * long context performance numbers in doc (#10784) * long context perf Signed-off-by: Youngeun Kwon <[email protected]> * update the long context perf Signed-off-by: Youngeun Kwon <[email protected]> * Akoumparouli/mcore microbatch calculator fix (#10780) * move tests/lightning/{,_}io Signed-off-by: Alexandros Koumparoulis <[email protected]> * add microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * use microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused var Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * remove 8x3b recipes (#10764) * remove 8x3b recipes Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove 8x3b from test_nemo_run Signed-off-by: Alexandros Koumparoulis <[email protected]> * rm from __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * change the figure file name Signed-off-by: Youngeun Kwon <[email protected]> * Accommodating the reviewer's comment Signed-off-by: Youngeun Kwon <[email protected]> * update the y-axis title Signed-off-by: Youngeun Kwon <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3f90b98 ! (#10789) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Add ModelOpt transformer model pruning example for Llama models, default to llama3.1-8b-base (#10294) * Add ModelOpt transformer model pruning example for Llama3 model Signed-off-by: Shengliang Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Shengliang Xu <[email protected]> * examples code is at wrong dir, move them Signed-off-by: Shengliang Xu <[email protected]> * changes as suggested in comment remove some logging and unused config code, update example model to llama3.1 Signed-off-by: Shengliang Xu <[email protected]> * Add pruning of hidden_size into example Signed-off-by: Shengliang Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Shengliang Xu <[email protected]> * Update examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml Signed-off-by: Keval Morabia <[email protected]> * Add pruning test to cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> --------- Signed-off-by: Shengliang Xu <[email protected]> Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Keval Morabia <[email protected]> Co-authored-by: shengliangxu <[email protected]> Co-authored-by: Keval Morabia <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Update mamba.rst after dist ckpt addition (#10800) Signed-off-by: Ali Taghibakhshi <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * fix chunked infer (#10581) Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * fix state transform (#10728) Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * use ckpt_to_weights_subdir in restore (#10786) * use ckpt_to_weights_subdir in restore Signed-off-by: Alexandros Koumparoulis <[email protected]> * make ckpt_to_{weight,context}_subdir idempotent Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Mixtral set seq_length=4k (#10704) * enable SP & set seq_lenght=4k Signed-off-by: Alexandros Koumparoulis <[email protected]> * update test expected values Signed-off-by: Alexandros Koumparoulis <[email protected]> * 8x22b 4k Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Fix for crashes with tensorboard_logger=false and VP + LoRA (#10792) * Fix for crashes with tensorboard_logger=false and virtual pipeline parallel + LoRA Signed-off-by: Valerie Sarge <[email protected]> * Apply isort and black reformatting Signed-off-by: vysarge <[email protected]> --------- Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: vysarge <[email protected]> Co-authored-by: vysarge <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Disable checkpoint conversion inside AutoResume (#10645) * Disable checkpoint conversion inside AutoResume Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Update resume docstrings Signed-off-by: Hemil Desai <[email protected]> * fix Signed-off-by: Hemil Desai <[email protected]> * add default finetuning recipe and refactor llama3 8b recipe Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * address comment Signed-off-by: Chen Cui <[email protected]> * refactor other recipes Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * remove 8x3b finetuning recipe for now because HF version not available Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * adjust unit tests based on recipe fixes Signed-off-by: Chen Cui <[email protected]> * fix failed unit test Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: cuichenx <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * replace png file to github assets Signed-off-by: Youngeun Kwon <[email protected]> * change image url to github release Signed-off-by: Youngeun Kwon <[email protected]> --------- Signed-off-by: Youngeun Kwon <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Shengliang Xu <[email protected]> Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Keval Morabia <[email protected]> Signed-off-by: Ali Taghibakhshi <[email protected]> Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: vysarge <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: pablo-garay <[email protected]> Co-authored-by: Shengliang Xu <[email protected]> Co-authored-by: shengliangxu <[email protected]> Co-authored-by: Keval Morabia <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: He Huang (Steve) <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: vysarge <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: cuichenx <[email protected]> * perf recipes and Mcore DistOpt params (#10883) * 175b gpt3 recipe Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * dist opt params Signed-off-by: Malay Nagda <[email protected]> * 405b dist opt params Signed-off-by: Malay Nagda <[email protected]> * perf recipes and dist opt params Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * MoE dist opt params Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * gpt bias fusion params Signed-off-by: Malay Nagda <[email protected]> * 175b recipe Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * perf params comments Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * MoE perf params comments Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * perf recipes suffix Signed-off-by: Malay Nagda <[email protected]> * specific models fusion params Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> --------- Signed-off-by: Malay Nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Co-authored-by: malay-nagda <[email protected]> * ci: Fix cherry pick team (#10945) Signed-off-by: Oliver Koenig <[email protected]> * Packed sequence bug fixes (#10898) * save prepared dataset to different folders according to tokenizer name Signed-off-by: Chen Cui <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * raise mbs>1 error and provide suggestion to user instead of automatically changing config Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add ci for packed seq Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix bug Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: artbataev <[email protected]> * Fix requirements for MacOS (#10930) Signed-off-by: Vladimir Bataev <[email protected]> * Fix nemo 2.0 recipes (#10915) * Fix recipe num_nodes and long context docstring * Fix typo * Fix PP issue * Fix unit test * Change recipes * fix test * Fix unit tests * Fix recipes * Add general legal test on parallelization settings * Rename test * Apply isort and black reformatting Signed-off-by: BoxiangW <[email protected]> --------- Signed-off-by: BoxiangW <[email protected]> Co-authored-by: BoxiangW <[email protected]> * Akoumparouli/nemo ux fix dir or string artifact (#10936) * Add __repr__ to Artifact Signed-off-by: Alexandros Koumparoulis <[email protected]> * nemo.lightning.io.artifact: represent strings as fdl.Config to avoid path adjustment during restoration Signed-off-by: Alexandros Koumparoulis <[email protected]> * t5 test minification Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * ckpt convert bug fixes (#10878) * Mistral-NeMo-12B recipe Signed-off-by: Alexandros Koumparoulis <[email protected]> * rename mistral to mistral_7b Signed-off-by: Alexandros Koumparoulis <[email protected]> * include mistral_nemo_12b in __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * add to __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * Remove stale imports Signed-off-by: Alexandros Koumparoulis <[email protected]> * TP=2 Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove finetune_reci[e Signed-off-by: Alexandros Koumparoulis <[email protected]> * Rename MistralNeMo2407Config12B to MistralNeMoConfig12B per review's suggestion Signed-off-by: Alexandros Koumparoulis <[email protected]> * update config names in tests Signed-off-by: Alexandros Koumparoulis <[email protected]> * mistral-nemo-12b from llama_8b Signed-off-by: Alexandros Koumparoulis <[email protected]> * TP=2; SP=True Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix overlap value Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * update mistral-nemo-base-12b finetune recipe Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * bug fix Signed-off-by: dimapihtar <[email protected]> * Apply isort and black reformatting Signed-off-by: dimapihtar <[email protected]> * remove extra file Signed-off-by: dimapihtar <[email protected]> * remove extra changes Signed-off-by: dimapihtar <[email protected]> * revert changes Signed-off-by: dimapihtar <[email protected]> * add ckpt_format configurable Signed-off-by: dimapihtar <[email protected]> * Apply isort and black reformatting Signed-off-by: dimapihtar <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * revert changes Signed-off-by: dimapihtar <[email protected]> * Apply isort and black reformatting Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: artbataev <[email protected]> * fix typo in docstring (#10955) Signed-off-by: ashors1 <[email protected]> * remove deprecated ci tests (#10922) * remove deprecated tutorial Signed-off-by: dimapihtar <[email protected]> * remove deprecated ci tests Signed-off-by: dimapihtar <[email protected]> * add deprecation note Signed-off-by: dimapihtar <[email protected]> * add deprecation note Signed-off-by: dimapihtar <[email protected]> * remove bart tests Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: dimapihtar <[email protected]> * [Nemo CICD] Remove deprecated tests (#10960) * remove deprecated tutorial Signed-off-by: dimapihtar <[email protected]> * remove deprecated ci tests Signed-off-by: dimapihtar <[email protected]> * add deprecation note Signed-off-by: dimapihtar <[email protected]> * add deprecation note Signed-off-by: dimapihtar <[email protected]> * remove bart tests Signed-off-by: dimapihtar <[email protected]> * Remove deleted CI tests --------- Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Pablo Garay <[email protected]> Co-authored-by: dimapihtar <[email protected]> * Adithyare/oai chat completion (#10785) * updates Signed-off-by: adithyare <[email protected]> * open ai chat completion wip Signed-off-by: adithyare <[email protected]> * responding with model responses Signed-off-by: adithyare <[email protected]> * Apply isort and black reformatting Signed-off-by: arendu <[email protected]> * also support general completion Signed-off-by: adithyare <[email protected]> * Apply isort and black reformatting Signed-off-by: arendu <[email protected]> --------- Signed-off-by: adithyare <[email protected]> Signed-off-by: arendu <[email protected]> Co-authored-by: arendu <[email protected]> * Update megatron_t5_pretraining.py (#10952) Signed-off-by: Huy Vu <[email protected]> * Convert perf plugin env vars to strings (#10947) Signed-off-by: Hemil Desai <[email protected]> * disable dynamo for ddp checker (#10961) Signed-off-by: Alexandros Koumparoulis <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to db7d37b ! (#10965) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> * Mistral-NeMo-12B recipe (#10607) * Mistral-NeMo-12B recipe Signed-off-by: Alexandros Koumparoulis <[email protected]> * rename mistral to mistral_7b Signed-off-by: Alexandros Koumparoulis <[email protected]> * include mistral_nemo_12b in __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * add to __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * Remove stale imports Signed-off-by: Alexandros Koumparoulis <[email protected]> * TP=2 Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove finetune_reci[e Signed-off-by: Alexandros Koumparoulis <[email protected]> * Rename MistralNeMo2407Config12B to MistralNeMoConfig12B per review's suggestion Signed-off-by: Alexandros Koumparoulis <[email protected]> * update config names in tests Signed-off-by: Alexandros Koumparoulis <[email protected]> * mistral-nemo-12b from llama_8b Signed-off-by: Alexandros Koumparoulis <[email protected]> * TP=2; SP=True Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix overlap value Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * update mistral-nemo-base-12b finetune recipe Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Make nemo text processing optional in TTS (#10584) * move TN guard to better location; make guard print error message rather than throwing error Signed-off-by: Jason <[email protected]> * Apply isort and black reformatting Signed-off-by: blisc <[email protected]> * Forgot to add the actual normalizer Signed-off-by: Jason <[email protected]> * Apply isort and black reformatting Signed-off-by: blisc <[email protected]> --------- Signed-off-by: Jason <[email protected]> Signed-off-by: blisc <[email protected]> Co-authored-by: blisc <[email protected]> * respect warnings' filters (#10953) * respect warnings' filters Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Update T5 tokenizer (adding additional tokens to tokenizer config) (#10972) * initial commit * restore t5_pretraining * Apply isort and black reformatting Signed-off-by: huvunvidia <[email protected]> --------- Signed-off-by: huvunvidia <[email protected]> Co-authored-by: Huy Vu2 <[email protected]> Co-authored-by: huvunvidia <[email protected]> * Alit/mamba recipe (#10935) * add some mamba recipe * add 130m * add the rest of the recipes * add tokenizer * add tokenizer * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * add fixes to ssm for nemorun recipes * add hybrid tokenizer * updating some recipes * Apply isort and black reformatting Signed-off-by: JRD971000 <[email protected]> * remove comments * update gbs * fix ckpt resume * fix ckpt resume * fix ckpt resume * update recipes final * Apply isort and black reformatting Signed-off-by: JRD971000 <[email protected]> * remove redundant imports * ckpt convertor dtype fix * Apply isort and black reformatting Signed-off-by: JRD971000 <[email protected]> --------- Signed-off-by: JRD971000 <[email protected]> Signed-off-by: Ali Taghibakhshi <[email protected]> Co-authored-by: JRD971000 <[email protected]> * Long context performance doc hot fix (#10946) * long context perf Signed-off-by: Youngeun Kwon <[email protected]> * update the long context perf Signed-off-by: Youngeun Kwon <[email protected]> * Akoumparouli/mcore microbatch calculator fix (#10780) * move tests/lightning/{,_}io Signed-off-by: Alexandros Koumparoulis <[email protected]> * add microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * use microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused var Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * remove 8x3b recipes (#10764) * remove 8x3b recipes Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove 8x3b from test_nemo_run Signed-off-by: Alexandros Koumparoulis <[email protected]> * rm fr…
ShriyaPalsamudram
added a commit
that referenced
this pull request
Dec 2, 2024
Signed-off-by: Shriya Palsamudram <[email protected]> Fix FaultTolerencePlugin Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Add StragglerDetection callback to all NeMo2.0 recipes Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Add missing and remove unsued imports Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Add ft launcher test Signed-off-by: Shriya Palsamudram <[email protected]> fix typo Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> fix more typos Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> add ft launcher using nemo-run for llama3 test Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> fix serialization errors Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> create seperate ft test Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> change github actions test Signed-off-by: Shriya Palsamudram <[email protected]> draft crash simulation Signed-off-by: Shriya Balaji Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Simulate a crash using step, disable checkpointing Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Add a straggler detection test as well Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Revert enabling straggler_detection by default in all recipes Signed-off-by: Shriya Palsamudram <[email protected]> Remove unused imports Signed-off-by: Shriya Palsamudram <[email protected]> Remove extra check in ConfigValidationPlugin Signed-off-by: Shriya Palsamudram <[email protected]> Address pylinter issues Signed-off-by: Shriya Palsamudram <[email protected]> Improve straggler detection testing and add doc string Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> fix paths Signed-off-by: Shriya Palsamudram <[email protected]> Add assert for crash Signed-off-by: Shriya Palsamudram <[email protected]> Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> Append run logs to a file after a crash Signed-off-by: Shriya Palsamudram <[email protected]> Set FAULT_TOL_FINISHED_FLAG_FILE and FAULT_TOL_CFG_PATH Signed-off-by: Shriya Palsamudram <[email protected]> Add openai-gelu in gated activation (#11293) Fixes per comments (#11280) * Fixes per comments Signed-off-by: Gomathy Venkata Krishnan <[email protected]> * Update README Signed-off-by: Gomathy Venkata Krishnan <[email protected]> --------- Signed-off-by: Gomathy Venkata Krishnan <[email protected]> Add T5TTS (#11193) * added training and inference recipes for T5-TTS. * fix some attention errors * add copyright headers. * added TODO and detail error log info. * fixed missing a corner case. * added classes to __all__ * fixed to return either self-attention scores or cross-attention scores in ParallelTransformerLayer_ class. Signed-off-by: XuesongYang <[email protected]> --------- Signed-off-by: Jason <[email protected]> Signed-off-by: blisc <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: XuesongYang <[email protected]> Co-authored-by: blisc <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: XuesongYang <[email protected]> ci: Exclude CPU machines from scan (#11300) Signed-off-by: Oliver Koenig <[email protected]> Revert "fix(export): GPT models w/ bias=False convert properly (#11255)" (#11301) This reverts commit 2d4f4953881b9e2d118d3ffeba7e64625d827d11. remove redundant docs (#11302) Create phi3mini.py (#11281) * Create phi3mini.py Signed-off-by: mayani-nv <[email protected]> Apply isort and black reformatting Signed-off-by: mayani-nv <[email protected]> Update __init__.py Signed-off-by: mayani-nv <[email protected]> Update __init__.py Signed-off-by: mayani-nv <[email protected]> Apply isort and black reformatting Signed-off-by: mayani-nv <[email protected]> * Create phi3_mini_4k_instruct.py for adding to recipe Signed-off-by: mayani-nv <[email protected]> Apply isort and black reformatting Signed-off-by: mayani-nv <[email protected]> Update phi3_mini_4k_instruct.py and removed Performant recipe Signed-off-by: mayani-nv <[email protected]> Update phi3_mini_4k_instruct.py and removing performant condition Signed-off-by: mayani-nv <[email protected]> Update phi3_mini_4k_instruct.py with docstring changes Signed-off-by: mayani-nv <[email protected]> * Update __init__.py Signed-off-by: mayani-nv <[email protected]> * fixing pylint warnings * Apply isort and black reformatting Signed-off-by: mayani-nv <[email protected]> * correcting typos and adding working recipe files --------- Signed-off-by: mayani-nv <[email protected]> Signed-off-by: mayani-nv <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: mayani-nv <[email protected]> Integrate lm-eval-harness for evaluations in NeMo (#10621) * Add evaluate method and other minor fixes Signed-off-by: Abhishree <[email protected]> * Add inference params to evaluate method Signed-off-by: Abhishree <[email protected]> * Add wait_for_rest_service fn to evaluate method Signed-off-by: Abhishree <[email protected]> * Apply isort and black reformatting Signed-off-by: athitten <[email protected]> * Add logprobs to be returned by Pytriton for trtllm models Signed-off-by: Abhishree <[email protected]> * Increase max_retries in wait_for_rest_service method Signed-off-by: Abhishree <[email protected]> * Apply isort and black reformatting Signed-off-by: athitten <[email protected]> * Add unset slurm vars and use env vars for Triton args Signed-off-by: Abhishree <[email protected]> * Add logic to get logProbs from logits Signed-off-by: Abhishree <[email protected]> * Refactor, clean and organize the code 1) Refactors the code and creates an evaluation folder where all util methods live 2) Add doctsrings, comments 3) Expose gather_context_logits, gather_generation_logits in trtllm and add output_generation_logits flag to return generation logits and remove output_logporbs as its not getting used anymore Signed-off-by: Abhishree <[email protected]> * Add copyright and initialize special_tokens_kwargs in eval_utils.py Signed-off-by: Abhishree <[email protected]> * Add the following chanes 1) Move get_trtllm_deployable and unset_environment_variables to deploy base.py 2) Rename eval_utils.py to base.py 3) REstore scripts/export/convert_nemo2_for_export.py Signed-off-by: Abhishree <[email protected]> * Fix a minor typo Signed-off-by: Abhishree <[email protected]> * Revert output_log_probs and all_probs arg in tensorrt_llm_run.py Signed-off-by: Abhishree <[email protected]> * Fix docstrings formatting Signed-off-by: Abhishree <[email protected]> * Pylint and other minor fixes Signed-off-by: Abhishree <[email protected]> * Fix pylint and typos Signed-off-by: Abhishree <[email protected]> * Apply isort and black reformatting Signed-off-by: athitten <[email protected]> * Avoid multiple calls for tokenizer_type Co-authored-by: Ananth Subramaniam <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> * Replace print statements with logging statements Signed-off-by: Abhishree <[email protected]> * Apply isort and black reformatting Signed-off-by: athitten <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: athitten <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: athitten <[email protected]> Co-authored-by: Ananth Subramaniam <[email protected]> ci: Fix release workflow (#11286) * ci: Fix release workflow Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * Update .github/workflows/release.yml Signed-off-by: oliver könig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: oliver könig <[email protected]> Update import 'pytorch_lightning' -> 'lightning.pytorch' (#11252) * update import in collections/llm Signed-off-by: Maanu Grover <[email protected]> * update import in lightning Signed-off-by: Maanu Grover <[email protected]> * update fabric import in lightning Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in collections/asr Signed-off-by: Maanu Grover <[email protected]> * update import in collections/tts Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update requirements Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> * update import in tests Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in collections/common Signed-off-by: Maanu Grover <[email protected]> * update import in core Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in utils Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in collections/nlp Signed-off-by: Maanu Grover <[email protected]> * update fabric import in collections/nlp Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update fabric import in utils Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in nlp examples Signed-off-by: Maanu Grover <[email protected]> * update import in asr examples Signed-off-by: Maanu Grover <[email protected]> * update import in llm examples Signed-off-by: Maanu Grover <[email protected]> * update import in tts examples Signed-off-by: Maanu Grover <[email protected]> * update fabric import in nlp examples Signed-off-by: Maanu Grover <[email protected]> * update import in deploy Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in slu examples Signed-off-by: Maanu Grover <[email protected]> * update import in speaker_tasks examples Signed-off-by: Maanu Grover <[email protected]> * update import in collections/audio Signed-off-by: Maanu Grover <[email protected]> * update import in audio examples Signed-off-by: Maanu Grover <[email protected]> * update import in collections/llm Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in collections/vlm Signed-off-by: Maanu Grover <[email protected]> * update import in collections/diffusion Signed-off-by: Maanu Grover <[email protected]> * update import in collections/vision Signed-off-by: Maanu Grover <[email protected]> * update import in collections/multimodal Signed-off-by: Maanu Grover <[email protected]> * update import in multimodal examples Signed-off-by: Maanu Grover <[email protected]> * update import in vision examples Signed-off-by: Maanu Grover <[email protected]> * Apply isort and black reformatting Signed-off-by: maanug-nv <[email protected]> * update import in scripts Signed-off-by: Maanu Grover <[email protected]> * Update baseline Signed-off-by: maanug-nv <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * revert bad change Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: maanug-nv <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: maanug-nv <[email protected]> Co-authored-by: artbataev <[email protected]> fix perf plugin CUDA_DEVICE_MAX_CONNECTIONS setting (#11299) * fix Signed-off-by: Jimmy Zhang <[email protected]> * Docstrings Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> --------- Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> PTQ via NeMo-Run CLI (#10984) * PTQ support in nemo CLI Signed-off-by: Jan Lasek <[email protected]> * Naming engine vs checkpoint Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> PTQ memory optimization (#11257) * Initial commit Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * Add sample generate Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * Nemotron quantization, reduce diff Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * Reduce diff Signed-off-by: Piotr Kaminski <[email protected]> * code review suggestions Signed-off-by: Piotr Kaminski <[email protected]> * Bug fixes Signed-off-by: Piotr Kaminski <[email protected]> * remove not needed import Signed-off-by: Piotr Kaminski <[email protected]> * fix model type and allow ddp/optim setup Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> --------- Signed-off-by: Piotr Kaminski <[email protected]> Signed-off-by: Laplasjan107 <[email protected]> Signed-off-by: Piotr Kamiński <[email protected]> Co-authored-by: Piotr Kaminski <[email protected]> Co-authored-by: Laplasjan107 <[email protected]> Co-authored-by: Jan Lasek <[email protected]> update README.md (#11223) Signed-off-by: yaoyu-33 <[email protected]> Add `attention_bias` argument in transformer block and transformer layer modules, addressing change in MCore (#11289) * fix api Signed-off-by: yaoyu-33 <[email protected]> * fix ci Signed-off-by: yaoyu-33 <[email protected]> * add docstring Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix docstring2 Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix line too long Signed-off-by: yaoyu-33 <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Remove pytorch-lightning (#11306) * update import in docs Signed-off-by: Maanu Grover <[email protected]> * update import in tutorials Signed-off-by: Maanu Grover <[email protected]> * remove pl requirement Signed-off-by: Maanu Grover <[email protected]> * missed import updates Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Adding multimodal examples (#11279) * Adding multimodal examples * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> --------- Signed-off-by: shanmugamr1992 <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: shanmugamr1992 <[email protected]> Update T5 attention-mask shapes to be compatible with all attention-backend in new TE versions (#11059) * initial commits * updating cicd test * commit for FlashFused T5 from Mcore * testing CICD * update code for data/mock, update mcore commit for dockerfile * fix error * fix error * fix error in nemo/collections/llm/inference/base.py * update t5/data/mock.py * fix cicd erorr * remove unused libs * address Yu Yao's comments * Apply isort and black reformatting Signed-off-by: huvunvidia <[email protected]> --------- Signed-off-by: huvunvidia <[email protected]> Co-authored-by: Huy Vu2 <[email protected]> Co-authored-by: huvunvidia <[email protected]> Add HF untrusted code toggle (#11313) * add trust_remote_code toggle Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> P2p chunk size setting in nemo 2.0 (#11312) * NCCL P2P communication chunk size Signed-off-by: Sangkug Lym <[email protected]> * NCCL P2P communication chunk size Signed-off-by: Sangkug Lym <[email protected]> --------- Signed-off-by: Sangkug Lym <[email protected]> Nemo2 batcheval (#11158) * initial draft for eval api Signed-off-by: HuiyingLi <[email protected]> * add dp to generate Signed-off-by: HuiyingLi <[email protected]> * Apply isort and black reformatting Signed-off-by: HuiyingLi <[email protected]> * add top_k=1 to defaul inf param to get deterministic output Signed-off-by: HuiyingLi <[email protected]> * change name Signed-off-by: HuiyingLi <[email protected]> * add eval ds and write to file to llm.generate Signed-off-by: HuiyingLi <[email protected]> * support standalone input jsonl Signed-off-by: HuiyingLi <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: HuiyingLi <[email protected]> DoRA (#11104) * initial commit for DoRA Signed-off-by: Chen Cui <[email protected]> * clean up code Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * clean up Signed-off-by: Chen Cui <[email protected]> * fix TP Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add dropout correction term Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add copyright and doc strings Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * docstrings Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * docstrings Signed-off-by: Chen Cui <[email protected]> * add ci test Signed-off-by: Chen Cui <[email protected]> * add ci test Signed-off-by: Chen Cui <[email protected]> * typo Signed-off-by: Chen Cui <[email protected]> * remove unused code Signed-off-by: Chen Cui <[email protected]> * remove commented out code Signed-off-by: Chen Cui <[email protected]> * fix Signed-off-by: Chen Cui <[email protected]> * bug Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: cuichenx <[email protected]> Profiling - support Chakra & Kineto trace dumping (#11115) * Support chakra trace dumping by cfg Signed-off-by: Lily Wang <[email protected]> remove the manual recording of process::init Signed-off-by: Lily Wang <[email protected]> 1. Remove unnecessary kineto config 2. Fix typo Signed-off-by: Lily Wang <[email protected]> Change warning to exception when nsys is enabled with chakra profiling Signed-off-by: Lily Wang <[email protected]> * Apply isort and black reformatting Signed-off-by: pablo-garay <[email protected]> * fix bug in identifying profiling start step Signed-off-by: Lily Wang <[email protected]> * Update baseline Signed-off-by: lilyw97 <[email protected]> * [1]remove unused import [2]switch to use isinstance instead of type() [3]move torch.profiling to function Signed-off-by: Lily Wang <[email protected]> * Apply isort and black reformatting Signed-off-by: lilyw97 <[email protected]> --------- Signed-off-by: Lily Wang <[email protected]> Signed-off-by: pablo-garay <[email protected]> Signed-off-by: lilyw97 <[email protected]> Signed-off-by: Maanu Grover <[email protected]> Co-authored-by: Lily Wang <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: pablo-garay <[email protected]> Co-authored-by: lilyw97 <[email protected]> Co-authored-by: Maanu Grover <[email protected]> NeMo 2.0 SFT PEFT notebooks (#10874) * nemo2-sft notebook initial draft Signed-off-by: HuiyingLi <[email protected]> * remove mixtral info Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * add import_ckpt script and minor changes Signed-off-by: HuiyingLi <[email protected]> * Random read for tarr files in lhotse dataloaders (#10536) * Random read for tarr files in lhotse dataloaders Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Solve failled tests Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Adding a testcase Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Some changs in tests Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * removing import Signed-off-by: Nune <[email protected]> --------- Signed-off-by: Nune <[email protected]> Signed-off-by: nune-tadevosyan <[email protected]> Co-authored-by: nune-tadevosyan <[email protected]> * training code for hybrid-autoregressive inference model (#10841) * training code for hybrid-autoregressive inference model Signed-off-by: Hainan Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: hainan-xv <[email protected]> --------- Signed-off-by: Hainan Xu <[email protected]> Signed-off-by: hainan-xv <[email protected]> Co-authored-by: Hainan Xu <[email protected]> Co-authored-by: hainan-xv <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 772faca ! (#10871) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> * Use trainer.local_rank/global_rank (#10860) * fix global_rank calculation Signed-off-by: Alexandros Koumparoulis <[email protected]> * use trainer's global/local rank Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove stacking operation from batched functions (#10524) * remove stacking operations Signed-off-by: lilithgrigoryan <[email protected]> * fixes im base class Signed-off-by: lilithgrigoryan <[email protected]> * clean up Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * remove potentially uninitialized local variable Signed-off-by: lilithgrigoryan <[email protected]> * restore batch_intilize states funcname Signed-off-by: lilithgrigoryan <[email protected]> * fix typo Signed-off-by: lilithgrigoryan <[email protected]> * fix potentially uninitialized local variable Signed-off-by: lilithgrigoryan <[email protected]> * fix potentially uninitialized local variable in stateless transduser Signed-off-by: lilithgrigoryan <[email protected]> * fix test Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * fix docstring, rm comment Signed-off-by: lilithgrigoryan <[email protected]> * fix dosctrings Signed-off-by: lilithgrigoryan <[email protected]> --------- Signed-off-by: lilithgrigoryan <[email protected]> Signed-off-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> * [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471) * Add llm.generate Signed-off-by: Hemil Desai <[email protected]> * Remove comment Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix launching with python Signed-off-by: Hemil Desai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add assert cp Signed-off-by: Hemil Desai <[email protected]> * Add example script Signed-off-by: Hemil Desai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * Adding support for LightningDataModule inside Fabric-API (#10879) * Make FabricMegatronMixedPrecision match MegatronMixedPrecision Signed-off-by: Marc Romeijn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> * Supporting DataModule in fabric-API Signed-off-by: Marc Romeijn <[email protected]> * Adding support for LightningDataModule inside Fabric-API Signed-off-by: Marc Romeijn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> * Remove import in mock.py Signed-off-by: Marc Romeijn <[email protected]> --------- Signed-off-by: Marc Romeijn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * initial draft Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Save yaml config for model in nemo.lightning.io (#10765) * Save yaml config for model in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix bug Signed-off-by: Hemil Desai <[email protected]> * Fix bug Signed-off-by: Hemil Desai <[email protected]> * fix bug Signed-off-by: Hemil Desai <[email protected]> * Add explicit yaml comparison Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * relax test Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * Move collectiob.nlp imports inline for t5 (#10877) * Move collectiob.nlp imports inline for t5 Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> --------- Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * add world_size/pp_size runtime check (#10842) * add world_size/pp_size runtime check Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix msg precision Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix test_init_parallel_ranks ws=3 pp=3 Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix peft resume (#10887) Signed-off-by: Chen Cui <[email protected]> * Update engine build step for TRT-LLM 0.13.0 (#10880) * Setting use_fused_mlp for TRT-LLM >= 0.13.0 Signed-off-by: Jan Lasek <[email protected]> * Unused import removal Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * Akoumparouli/nemo ux moe loss logging (#10128) * Move across pipeline loss reduction to a separate function Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add support for MoE loss logging Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused function Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * enable vboost and set LM SM margin (#10853) * enable vboost Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * env vars Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * add perf plugin Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * revert default executor Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * fix typo Signed-off-by: Jimmy Zhang <[email protected]> * fix more typo Signed-off-by: Jimmy Zhang <[email protected]> * ln margin knob Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * specify lm margin Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> --------- Signed-off-by: Malay Nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: JimmyZhang12 <[email protected]> Co-authored-by: malay-nagda <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> * use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_k… (#10608) * use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_kwargs & overwrite device) Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Use torch sdpa implementation in ASR mha (#9590) * use pytorch sdpa Signed-off-by: WoodieDudy <[email protected]> * sdpa work Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: titu1994 <[email protected]> * sdpa flag to false & sdpa_backend arg Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * change arg name Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * fix config args Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * add condition on version Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * update condition on version Signed-off-by: WoodieDudy <[email protected]> * remove condition on torch version Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * move code to init Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * refactor Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * refactor Signed-off-by: WoodieDudy <[email protected]> --------- Signed-off-by: WoodieDudy <[email protected]> Signed-off-by: titu1994 <[email protected]> Signed-off-by: WoodieDudy <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: titu1994 <[email protected]> Co-authored-by: WoodieDudy <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * Add registry to register all needed classes with artifacts in nemo.lightning.io (#10861) * Add registry to register all needed classes with artifacts in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fixes Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> * comments Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Remove cyclic import Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: artbataev <[email protected]> * call __post_init__ after altering config values (#10885) * call __post_init__ after altering config values Signed-off-by: Alexandros Koumparoulis <[email protected]> * test fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * turn off SP Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * Nemo 2.0 ckpt support in TRT-LLM export (#10891) * fix minor import bug Signed-off-by: Onur Yilmaz <[email protected]> * Add registry to register all needed classes with artifacts in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fixes Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> * nemo 2.0 support in export to trt-llm Signed-off-by: Onur Yilmaz <[email protected]> * get mixing from main Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * fix style Signed-off-by: Onur Yilmaz <[email protected]> --------- Signed-off-by: Onur Yilmaz <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: oyilmaz-nvidia <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: oyilmaz-nvidia <[email protected]> * [Docs] Fix doc warnings, focus on feature and multimodal sections (#10171) * various simple docs source fixes Signed-off-by: Elena Rastorgueva <[email protected]> * fix docstrings and typing with forward reference Signed-off-by: Elena Rastorgueva <[email protected]> * Apply isort and black reformatting Signed-off-by: erastorgueva-nv <[email protected]> * fix typing forward reference for PromptedAudioToTextLhotseDataset Signed-off-by: Elena Rastorgueva <[email protected]> * fix feature warnings Signed-off-by: yaoyu-33 <[email protected]> * Try fix some model part errors Signed-off-by: yaoyu-33 <[email protected]> * try add requirements Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try add requirements Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix indent in docstring Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * update Signed-off-by: yaoyu-33 <[email protected]> * handle duplicate issue Signed-off-by: yaoyu-33 <[email protected]> * handle duplicate issue Signed-off-by: yaoyu-33 <[email protected]> * fix imagen cite * fix ratio issues Signed-off-by: yaoyu-33 <[email protected]> * fix Dreambooth Signed-off-by: yaoyu-33 <[email protected]> * Fix activation recomputation Signed-off-by: yaoyu-33 <[email protected]> * fix sequence packing Signed-off-by: yaoyu-33 <[email protected]> * fix asr_language_modeling_and_customization Signed-off-by: yaoyu-33 <[email protected]> * fixes wip Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: erastorgueva-nv <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Yu Yao <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: erastorgueva-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Ao Tang <[email protected]> Co-authored-by: Huiying Li <[email protected]> * calculate step time batch end-batch end (#10202) * log step time at end Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * use nemo logging Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * cleanup Signed-off-by: Malay Nagda <[email protected]> * check remove Signed-off-by: Malay Nagda <[email protected]> * delta timing callback Signed-off-by: Malay Nagda <[email protected]> * comment and name change Signed-off-by: Malay Nagda <[email protected]> --------- Signed-off-by: Malay Nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Co-authored-by: malay-nagda <[email protected]> * late import prettytable (#10912) Signed-off-by: Maanu Grover <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 0d89fc4 ! (#10919) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Warning for missing FP8 checkpoint support for vLLM deployment (#10906) Signed-off-by: Jan Lasek <[email protected]> * Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10821) * Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10787) * Add lhotse fixes for rnnt model training and WER hanging issue with fuse batching Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: nithinraok <[email protected]> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: nithinraok <[email protected]> Co-authored-by: artbataev <[email protected]> * Fix ASR tests (#10794) * Make tests required Signed-off-by: Vladimir Bataev <[email protected]> * Debug torch.load issue Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Run only necessary tests Signed-off-by: Vladimir Bataev <[email protected]> * Try fix loading Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Avoid caching fixture Signed-off-by: Vladimir Bataev <[email protected]> * Try restore model several times Signed-off-by: Vladimir Bataev <[email protected]> * Try customize temporary directory Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Reorder tests Signed-off-by: Vladimir Bataev <[email protected]> * Disable one test Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Avoid xxlarge model Signed-off-by: Vladimir Bataev <[email protected]> * Disable test Signed-off-by: Vladimir Bataev <[email protected]> * Revert changes Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Magic fix Signed-off-by: Vladimir Bataev <[email protected]> * Revert unnecessary changes Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Disable all jobs except L0 Signed-off-by: Vladimir Bataev <[email protected]> * RNNT alignments - merge with unit tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix CUDA graph frame-looping decoder to handle non-CUDA inputs Signed-off-by: Vladimir Bataev <[email protected]> * Fix config Signed-off-by: Vladimir Bataev <[email protected]> * Log test results Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Use less audio files for tests Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: artbataev <[email protected]> * Integrating mcore export (#10238) * Integrating mcore export * Integrating mcore export * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Move trt imports in nemo.collections.llm inside respective functions (#10234) Signed-off-by: Hemil Desai <[email protected]> * Add tests for LazyNeMoIterator and fix case with metadata_only=True and offsets in manifest (#10198) * Add tests for LazyNeMoIterator and fix case with manifest_only=True and offsets in manifest Signed-off-by: Piotr Żelasko <[email protected]> * Address code review Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> * [NeMo-UX] Fix a serialization bug that prevents users from moving checkpoints (#9939) * perfor serialization using relative paths to allow users to move checkpoints after they're saved Signed-off-by: ashors1 <[email protected]> * Apply isort and black reformatting Signed-off-by: ashors1 <[email protected]> * remove unused import Signed-off-by: ashors1 <[email protected]> * fix artifact load Signed-off-by: ashors1 <[email protected]> * fix path artifact Signed-off-by: ashors1 <[email protected]> * remove unused import Signed-off-by: ashors1 <[email protected]> --------- Signed-off-by: ashors1 <[email protected]> Signed-off-by: ashors1 <[email protected]> Co-authored-by: ashors1 <[email protected]> * Add MemoryProfileCallback (#10166) * Add MemoryProfileCallback Signed-off-by: Shriya Palsamudram <[email protected]> * Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> * Remove reference cycles, save snapshot on specific ranks Signed-off-by: Shriya Palsamudram <[email protected]> * Remove unnecessary imports Signed-off-by: Shriya Palsamudram <[email protected]> * Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> * Update docstring Signed-off-by: Shriya Palsamudram <[email protected]> --------- Signed-off-by: Shriya Palsamudram <[email protected]> Signed-off-by: ShriyaPalsamudram <[email protected]> Signed-off-by: Shriya Rishab <[email protected]> Co-authored-by: ShriyaPalsamudram <[email protected]> * Lower bound transformers to support nemotron (#10240) Signed-off-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> * [Audio] SSL Pretraining framework for flow-matching model for audio processing (#10052) Flow matching generative model with SSL pretraining framework Signed-off-by: Pin-Jui Ku <[email protected]> Co-authored-by: Kuray107 <[email protected]> * Revert torchrun fix for model import (#10251) Signed-off-by: Alexandros Koumparoulis <[email protected]> * [NeMo-UX[ Move nemotron imports inline (#10255) * Move nemotron transformers + tokenizer imports inline to reduce number of required deps Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> --------- Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * Wrap CPU model init with megatron_lazy_init_context (#10219) * Wrap CPU model init with megatron_lazy_init_context Signed-off-by: Alexandros Koumparoulis <[email protected]> * Cleanup checkpoint-dir if saving fails Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Bump `Dockerfile.ci` (2024-08-22) (#10227) * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 124bcff ! Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix bert flags Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Oliver Koenig <[email protected]> Co-authored-by: pablo-garay <[email protected]> * salm export trtllm (#10245) Signed-off-by: slyne deng <[email protected]> Co-authored-by: slyne deng <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to ef85bc9 ! (#10250) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 01ca03f ! (#10266) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: oliver könig <[email protected]> Co-authored-by: pablo-garay <[email protected]> * Load model in the target export precision by default in PTQ (#10267) * Load model in the target export precision by default Signed-off-by: Jan Lasek <[email protected]> * Enable megatron_amp_O2=true to actually use half-precision Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Jan Lasek <[email protected]> * Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins (#10223) * Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Remove duplicate Signed-off-by: Hemil Desai <[email protected]> * Add entity to wandb logger Signed-off-by: Hemil Desai <[email protected]> * Add documentation Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add warning Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add comments Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * [NeMo-UX] Handle absolute logger directories in nemo_logger (#10259) * handle absolute and relative logger directories Signed-off-by: Anna Shors <[email protected]> * merge lines Signed-off-by: ashors1 <[email protected]> --------- Signed-off-by: Anna Shors <[email protected]> Signed-off-by: ashors1 <[email protected]> * Add sdxl notebook (#10139) * Add sdxl notebook Signed-off-by: mingyuanm <[email protected]> * Rename Signed-off-by: mingyuanm <[email protected]> * final Update SDXL notebook Signed-off-by: mingyuanm <[email protected]> --------- Signed-off-by: mingyuanm <[email protected]> * Updating some coments * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Updating some coments * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Updating some coments * Small change * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * ADD support for layernorm1p * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> --------- Signed-off-by: shanmugamr1992 <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ashors1 <[email protected]> Signed-off-by: ashors1 <[email protected]> Signed-off-by: Shriya Palsamudram <[email protected]> Signed-off-by: ShriyaPalsamudram <[email protected]> Signed-off-by: Shriya Rishab <[email protected]> Signed-off-by: Dong Hyuk Chang <[email protected]> Signed-off-by: Pin-Jui Ku <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: slyne deng <[email protected]> Signed-off-by: oliver könig <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: Anna Shors <[email protected]> Signed-off-by: mingyuanm <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: shanmugamr1992 <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Anna Shors <[email protected]> Co-authored-by: ashors1 <[email protected]> Co-authored-by: Shriya Rishab <[email protected]> Co-authored-by: ShriyaPalsamudram <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Kuray107 <[email protected]> Co-authored-by: Kuray107 <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: pablo-garay <[email protected]> Co-authored-by: Slyne Deng <[email protected]> Co-authored-by: slyne deng <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: Ming <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> * Fix artifact saving (#10914) Signed-off-by: Hemil Desai <[email protected]> * Lora improvement (#10918) * pull out freeze model Signed-off-by: Chen Cui <[email protected]> * add wildcard match to lora target modules Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> * Huvu/t5 nemo2.0 peft (#10916) * adding peft test and cicd * add setting mcore model to train in peft.py * adding test for T5 lora * fix follow Chen's fix * restore cicd-main.yml --------- Co-authored-by: Huy Vu2 <[email protected]> * Add tie_word_embeddings=True (#10710) Signed-off-by: Yoshi Suhara <[email protected]> * Use a context-manager when opening files (#10895) * Use a context-manager when opening files Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: artbataev <[email protected]> * long context performance numbers in doc (#10784) * long context perf Signed-off-by: Youngeun Kwon <[email protected]> * update the long context perf Signed-off-by: Youngeun Kwon <[email protected]> * Akoumparouli/mcore microbatch calculator fix (#10780) * move tests/lightning/{,_}io Signed-off-by: Alexandros Koumparoulis <[email protected]> * add microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * use microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused var Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * remove 8x3b recipes (#10764) * remove 8x3b recipes Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove 8x3b from test_nemo_run Signed-off-by: Alexandros Koumparoulis <[email protected]> * rm from __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * change the figure file name Signed-off-by: Youngeun Kwon <[email protected]> * Accommodating the reviewer's comment Signed-off-by: Youngeun Kwon <[email protected]> * update the y-axis title Signed-off-by: Youngeun Kwon <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3f90b98 ! (#10789) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Add ModelOpt transformer model pruning example for Llama models, default to llama3.1-8b-base (#10294) * Add ModelOpt transformer model pruning example for Llama3 model Signed-off-by: Shengliang Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Shengliang Xu <[email protected]> * examples code is at wrong dir, move them Signed-off-by: Shengliang Xu <[email protected]> * changes as suggested in comment remove some logging and unused config code, update example model to llama3.1 Signed-off-by: Shengliang Xu <[email protected]> * Add pruning of hidden_size into example Signed-off-by: Shengliang Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Shengliang Xu <[email protected]> * Update examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml Signed-off-by: Keval Morabia <[email protected]> * Add pruning test to cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <2891698…
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🚀 PR to Bump
Dockerfile.ci
.📝 Please remember the following to-do's before merge:
🙏 Please merge this PR only if the CI workflow completed successfully.