Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: fix linking of reinforce from index and add it to support table #489

Merged
merged 3 commits into from
Jan 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/user-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

sft.rst
knowledge-distillation.rst
reinforce.rst
dpo.rst
rlhf.rst
steerlm.rst
Expand All @@ -25,6 +26,9 @@
:ref:`Supervised Fine-Tuning (SFT) with Knowledge Distillation <nemo-aligner-knowledge-distillation>`
In this section, we walk through a variation of SFT using Knowledge Distillation where we train a smaller "student" model using a larger "teacher" model.

:ref:`Model Alignment by REINFORCE <nemo-aligner-reinforce>`
In this tutorial, we will guide you through the process of aligning a NeMo Framework model using REINFORCE. This method can be applied to various models, including LLaMa2 and Mistral, with our scripts functioning consistently across different models.

:ref:`Model Alignment by DPO, RPO and IPO <nemo-aligner-dpo>`
DPO, RPO, and IPO are simpler alignment methods compared to RLHF. DPO introduces a novel parameterization of the reward model in RLHF, which allows us to extract the corresponding optimal policy. Similarly, RPO and IPO provide alternative parameterizations or optimization strategies, each contributing unique approaches to refining model alignment.

Expand Down Expand Up @@ -75,6 +79,14 @@
- Yes
- Yes
-
* - :ref:`REINFORCE <nemo-aligner-reinforce>`
- Yes
- Yes
- Yes
- Yes (✓)
- Yes
- Yes
-
* - :ref:`DPO <nemo-aligner-dpo>`
-
- Yes (✓)
Expand Down
2 changes: 1 addition & 1 deletion docs/user-guide/reinforce.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. include:: /content/nemo.rsts

.. _model-aligner-reinforce:
.. _nemo-aligner-reinforce:

Model Alignment by REINFORCE
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Expand Down
Loading