NVIDIA · ko3n1g · Jan 27, 2025 · Jan 23, 2025 · Jan 23, 2025 · Jan 27, 2025
diff --git a/docs/user-guide/index.rst b/docs/user-guide/index.rst
@@ -7,6 +7,7 @@
 
    sft.rst
    knowledge-distillation.rst
+   reinforce.rst
    dpo.rst
    rlhf.rst
    steerlm.rst
@@ -25,6 +26,9 @@
 :ref:`Supervised Fine-Tuning (SFT) with Knowledge Distillation <nemo-aligner-knowledge-distillation>`
    In this section, we walk through a variation of SFT using Knowledge Distillation where we train a smaller "student" model using a larger "teacher" model.
 
+:ref:`Model Alignment by REINFORCE <nemo-aligner-reinforce>`
+   In this tutorial, we will guide you through the process of aligning a NeMo Framework model using REINFORCE. This method can be applied to various models, including LLaMa2 and Mistral, with our scripts functioning consistently across different models.
+
 :ref:`Model Alignment by DPO, RPO and IPO <nemo-aligner-dpo>`
    DPO, RPO, and IPO are simpler alignment methods compared to RLHF. DPO introduces a novel parameterization of the reward model in RLHF, which allows us to extract the corresponding optimal policy. Similarly, RPO and IPO provide alternative parameterizations or optimization strategies, each contributing unique approaches to refining model alignment.
 
@@ -75,6 +79,14 @@
      - Yes
      - Yes
      - 
+   * - :ref:`REINFORCE <nemo-aligner-reinforce>`
+     - Yes
+     - Yes
+     - Yes
+     - Yes (✓)
+     - Yes
+     - Yes
+     - 
    * - :ref:`DPO <nemo-aligner-dpo>`
      - 
      - Yes (✓)

diff --git a/docs/user-guide/reinforce.rst b/docs/user-guide/reinforce.rst
@@ -1,6 +1,6 @@
 .. include:: /content/nemo.rsts
 
-.. _model-aligner-reinforce:
+.. _nemo-aligner-reinforce:
 
 Model Alignment by REINFORCE
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@