Add end_strings to SamplingParams (#6986)

* Add end_strings to SamplingParams Signed-off-by: Gerald Shen <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Gerald Shen <[email protected]> * Add end_strings to megatron_gpt_inference.yaml Signed-off-by: Gerald Shen <[email protected]> * Add end_strings to sampling params Signed-off-by: Gerald Shen <[email protected]> * Remove extra_id_1 from default end_strings Signed-off-by: Gerald Shen <[email protected]> * Fix require_grad typos (#6930) Signed-off-by: Sergii Dymchenko <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * fix syntax error Signed-off-by: Gerald Shen <[email protected]> * fix the mpt chatbot (#6957) (#6968) Signed-off-by: Yi Dong <[email protected]> Co-authored-by: Yi Dong <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * add support for max_total_length=4096 for 43b (#6763) * add support for max_total_length=4096 for 43b Signed-off-by: Zhilin Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Zhilin Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Gerald Shen <[email protected]> * rnnt_greedy_decoding.py: typos? auto-repressively -> auto-regressively (#6989) Signed-off-by: Vadim Kantorov <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * Cache handling without input tensors mutation (#6980) (#6996) * Cache handling without input tensors mutation * Cleanup * Cleanup#2 * Cleanup#3 --------- Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * Hybrid conformer export (#6983) (#6995) * Implemented generic kv-pair setting of export_config from args * Hybrid conformer export * Hybrid decoder export * Cleanup * Changed from **kwargs * Docstring * Docs added * Stringify args * Added docs for ASR export configs * lowercase ctc --------- Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * Fixing an issue with confidence ensembles (#6987) (#7004) * Bug fix for the confidence ensembles * Relax constraints for the test --------- Signed-off-by: Igor Gitman <[email protected]> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * [TTS] Add cosine distance option to TTS aligner (#6806) * [TTS] Add cosine distance option to TTS aligner Signed-off-by: Ryan <[email protected]> * [TTS] Update aligner comments Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * Minor MPT-7B fixes and creation script update (#6982) * Initial commit of minor MPT-7B fixes Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Daniel Egert <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Gerald Shen <[email protected]> * Change Jenkins timeout (#6997) * change timeout Signed-off-by: ericharper <[email protected]> * change to 8 hours Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: ericharper <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * remove hard coded input and output fields (#7008) * remove hard coded input and output fields Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Gerald Shen <[email protected]> * RoPE length extrapolation with interpolation (#7005) * Push changes Signed-off-by: MaximumEntropy <[email protected]> * Fixes Signed-off-by: MaximumEntropy <[email protected]> * add continue training script Signed-off-by: MaximumEntropy <[email protected]> * [WIP] nonlinear interp Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * override encoder_seq_len Signed-off-by: MaximumEntropy <[email protected]> * Remove nonlinear Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * sft with pi (#7006) * sft with pi Signed-off-by: Evelina <[email protected]> * update values only if not None" Signed-off-by: Evelina <[email protected]> --------- Signed-off-by: Evelina <[email protected]> * Address comments Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add info Signed-off-by: MaximumEntropy <[email protected]> * Empty Signed-off-by: MaximumEntropy <[email protected]> --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Evelina <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * use proper config Signed-off-by: Gerald Shen <[email protected]> * Add end_strings to SamplingParams Signed-off-by: Gerald Shen <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Gerald Shen <[email protected]> * Add end_strings to megatron_gpt_inference.yaml Signed-off-by: Gerald Shen <[email protected]> * Add end_strings to sampling params Signed-off-by: Gerald Shen <[email protected]> * Remove extra_id_1 from default end_strings Signed-off-by: Gerald Shen <[email protected]> * fix syntax error Signed-off-by: Gerald Shen <[email protected]> * use proper config Signed-off-by: Gerald Shen <[email protected]> --------- Signed-off-by: Gerald Shen <[email protected]> Signed-off-by: Sergii Dymchenko <[email protected]> Signed-off-by: Yi Dong <[email protected]> Signed-off-by: Zhilin Wang <[email protected]> Signed-off-by: Vadim Kantorov <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Igor Gitman <[email protected]> Signed-off-by: Ryan <[email protected]> Signed-off-by: Daniel Egert <[email protected]> Signed-off-by: ericharper <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sergii Dymchenko <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: Vadim Kantorov <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Igor Gitman <[email protected]> Co-authored-by: Ryan Langman <[email protected]> Co-authored-by: trias702 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Evelina <[email protected]>
NVIDIA · Jul 13, 2023 · f7e33fc · f7e33fc
1 parent d44127e
commit f7e33fc
Show file tree

Hide file tree

Showing 8 changed files with 23 additions and 13 deletions.
diff --git a/examples/nlp/language_modeling/conf/megatron_gpt_inference.yaml b/examples/nlp/language_modeling/conf/megatron_gpt_inference.yaml
@@ -9,7 +9,7 @@ inference:
   repetition_penalty: 1.2  # The parameter for repetition penalty. 1.0 means no penalty.
   min_tokens_to_generate: 0  # The minimum length of the sequence to be generated.
   compute_logprob: False  # a flag used to compute logprob of all the input text, a very special case of running inference, default False
-
+  end_strings: ["<|endoftext|>"]  # generation will stop when one of these tokens is generated
 
 trainer:
   devices: 1

diff --git a/examples/nlp/language_modeling/megatron_gpt_eval.py b/examples/nlp/language_modeling/megatron_gpt_eval.py
@@ -267,6 +267,7 @@ def main(cfg) -> None:
         "add_BOS": cfg.inference.add_BOS,
         "all_probs": cfg.inference.all_probs,
         "compute_logprob": cfg.inference.compute_logprob,
+        "end_strings": cfg.inference.end_strings,
     }
 
     fp8_enabled = hasattr(model.cfg, "fp8") and (model.cfg.fp8 == True)

diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py
@@ -217,6 +217,7 @@ def init_model(self, cfg: DictConfig, trainer: Trainer):
                 "add_BOS": True,
                 "all_probs": False,
                 "compute_logprob": False,
+                "end_strings": self.cfg.inference.get('end_strings', ["<|endoftext|>"]),
             }
         elif self.cfg.get("report_validation_metric", False) and not hasattr(self.cfg, 'inference'):
             raise ValueError("Must provide inference parameters for reporting validation metric!")
@@ -754,6 +755,7 @@ def predict_step(self, batch: Any, batch_idx: int, dataloader_idx: Optional[int]
                 "all_probs": inference_config["all_probs"],
                 "compute_logprob": inference_config["compute_logprob"],
                 "compute_attention_mask": inference_config.get("compute_attention_mask", True),
+                "end_strings": inference_config.get('end_strings', ["<|endoftext|>"]),
             }
 
             task_ids, processed_inputs = batch

diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py
@@ -390,6 +390,7 @@ def inference_step(self, dataloader_iter, batch_idx, mode, dataloader_idx=0):
             "add_BOS": False,
             "all_probs": False,
             "compute_logprob": False,
+            "end_strings": ["<|endoftext|>"],
         }
         result = megatron_gpt_generate(
             model=self,

diff --git a/nemo/collections/nlp/modules/common/text_generation_server.py b/nemo/collections/nlp/modules/common/text_generation_server.py
@@ -141,6 +141,14 @@ def put(self):
             if not (1.0 <= repetition_penalty):
                 return "repetition_penalty must be a positive number no less than 1.0"
 
+        end_strings = ['<|endoftext|>']
+        if 'end_strings' in request.get_json():
+            end_strings = request.get_json()['end_strings']
+            if not isinstance(end_strings, list):
+                return "expect end_strings to be a list of strings"
+            if not all([isinstance(s, str) for s in end_strings]):
+                return "expect end_strings to be a list of strings"
+
         min_tokens_to_generate = 0
         if "min_tokens_to_generate" in request.get_json():
             min_tokens_to_generate = request.get_json()["min_tokens_to_generate"]
@@ -157,14 +165,6 @@ def put(self):
             if neighbors < 0:
                 return "num of neighbors must be an integer no less than 0"
 
-        end_strings = ['<|endoftext|>']
-        if 'end_strings' in request.get_json():
-            end_strings = request.get_json()['end_strings']
-            if not isinstance(end_strings, list):
-                return "expect end_strings to be a list of strings"
-            if not all([isinstance(s, str) for s in end_strings]):
-                return "expect end_strings to be a list of strings"
-
         with lock:  # Need to get lock to keep multiple threads from hitting code
             MegatronGenerate.send_do_generate()  # Tell other ranks we're doing generate
             extra = {}
@@ -190,8 +190,8 @@ def put(self):
                 top_p,
                 greedy,
                 repetition_penalty,
-                min_tokens_to_generate,
                 end_strings=end_strings,
+                min_tokens_to_generate=min_tokens_to_generate,
                 **extra,
             )
             for k in output:

diff --git a/nemo/collections/nlp/modules/common/text_generation_utils.py b/nemo/collections/nlp/modules/common/text_generation_utils.py
@@ -69,6 +69,7 @@ def get_default_sampling_params():
         "add_BOS": True,
         "all_probs": False,
         "compute_logprob": False,
+        "end_strings": ["<|endoftext|>", "<extra_id_1>"],
     }
 
     return sampling_params
@@ -104,6 +105,7 @@ def megatron_gpt_generate(model, inputs, tokenizer, length_params, sampling_para
             top_p=sampling_params['top_p'],
             greedy=sampling_params['use_greedy'],
             repetition_penalty=sampling_params['repetition_penalty'],
+            end_strings=sampling_params['end_strings'],
             min_tokens_to_generate=length_params['min_length'],
             compute_attention_mask=sampling_params.get("compute_attention_mask", True),
             **strategy_args,
@@ -125,6 +127,7 @@ def megatron_gpt_generate(model, inputs, tokenizer, length_params, sampling_para
                 top_p=sampling_params['top_p'],
                 greedy=sampling_params['use_greedy'],
                 repetition_penalty=sampling_params['repetition_penalty'],
+                end_strings=sampling_params['end_strings'],
                 min_tokens_to_generate=length_params['min_length'],
                 **strategy_args,
             )
@@ -380,8 +383,8 @@ def synced_generate(
     compute_attention_mask=True,
     compute_logprob=False,
     repetition_penalty=1.2,
-    min_tokens_to_generate=0,
     end_strings=[],
+    min_tokens_to_generate=0,
 ):
     context_length = context_length_tensor.min().item()
     tokenizer = model.tokenizer
@@ -475,8 +478,8 @@ def generate(
     compute_attention_mask=True,
     compute_logprob=False,
     repetition_penalty=1.0,
-    min_tokens_to_generate=0,
     end_strings=['<|endoftext|>'],
+    min_tokens_to_generate=0,
     **strategy_args,
 ) -> OutputType:
     """
@@ -560,8 +563,8 @@ def generate(
         top_p=top_p,
         greedy=greedy,
         repetition_penalty=repetition_penalty,
-        min_tokens_to_generate=min_tokens_to_generate,
         end_strings=end_strings,
+        min_tokens_to_generate=min_tokens_to_generate,
     )
     special_tokens = set()
     if hasattr(tokenizer, 'pad_token') and tokenizer.pad_token is not None:

diff --git a/nemo/collections/nlp/modules/common/transformer/text_generation.py b/nemo/collections/nlp/modules/common/transformer/text_generation.py
@@ -37,6 +37,7 @@ class SamplingParam(TypedDict):
     add_BOS: bool  # add the bos token at the begining of the prompt
     all_probs: bool  # whether return the log prob for all the tokens in vocab
     compute_logprob: bool  # a flag used to compute logprob of all the input text, a very special case of running inference, default False
+    end_strings: List[str]  # generation will stop when one of these tokens is generated
 
 
 class OutputType(TypedDict):
@@ -88,6 +89,7 @@ def generate(
                     add_BOS: bool, Whether add the bos token at the begining of the prompt
                     all_probs: bool  # whether return the log prob for all the tokens in vocab
                     compute_logprob: bool  # a flag used to compute logprob of all the input text, a very special case of running inference, default False
+                    end_strings: List[str]  # generation will stop when one of these tokens is generated
                 Default None, If it is None, use_greedy will be "True".
         Returns:
             OutputType: It generates the output in a dictionary type. It has the following keys:

diff --git a/tests/collections/nlp/test_gpt_eval.py b/tests/collections/nlp/test_gpt_eval.py
@@ -78,6 +78,7 @@ def test_gpt_eval(self):
             "add_BOS": True,
             "all_probs": False,
             "compute_logprob": False,
+            "end_strings": ["<|endoftext|>"],
         }
 
         # test logprob