diff --git a/index.html b/index.html
index 2bcc9e8..bb98d18 100644
--- a/index.html
+++ b/index.html
@@ -225,25 +225,31 @@
+
AlpacaEval 2.0 evaluation of models in terms of both normal and length-controlled (LC) win rates in percentage (%). SPPO demonstrates steady performance gains across iterations and outperforms other baselines which show a tendency to produce longer responses. Additionally, re-ranking with the PairRM reward model (best-of-16) at test time consistently enhances the performance across all models。 SPPO achieves a high win rate without strong external supervision like GPT-4.
+
+
Open LLM Leaderboard Evaluation. SPPO fine-tuning improves the base model's performance on different tasks, reaching a state-of-the-art average score of 66.75 for Mistral-7B and 70.29 for Llama-3-8B. For Mistral-7B, subsequent iterations of DPO, IPO, and SPPO see a decline in performance. It is possible that aligning with human preferences (simulated by the PairRM preference model in our study) may not always enhance, and can even detract from overall performance.
+
+
A pairwise loss like DPO can only enlarge the relative probability gap between the winner and loser. Pal et al., 2024 observed that DPO only drives the loser's likelihood to be small, but the winner's likelihood barely changes.
We observe the same phenomenon and in contrast, SPPO can boost the probability density of the winner.
+