trtllm0.9 changes #149

jiemingz · 2024-04-10T16:43:48Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation? Make sure to also update the NeMo Framework User Guide which contains the tutorials

Checklist when contributing a new algorithm

Does the trainer resume and restore model state all states?
Does the trainer support all parallelism techniques(PP, TP, DP)?
Does the trainer support max_steps=-1 and validation?
Does the trainer only call APIs defined in alignable_interface.py?
Does the trainer have proper logging?

Additional Information

Related to # (issue)

Signed-off-by: jiemingz <=>

nemo_aligner/utils/trt_llm.py

gshennvm · 2024-04-10T20:22:20Z

nemo_aligner/utils/trt_llm.py

+        if parallel_state.get_pipeline_model_parallel_world_size() > 1 and not self.reshard_model:  
+            group = parallel_state.get_pipeline_model_parallel_group()
+            src = parallel_state.get_pipeline_model_parallel_first_rank()
+            output_ids = broadcast_2d_tensor(output_ids, src, group, dtype=output_ids.dtype)


https://github.com/NVIDIA/NeMo-Aligner/blob/geshen/main_trt/nemo_aligner/utils/distributed.py#L102

I think you can use this function instead. The groups are unchanged when we don't have resharding and when there's resharding this should be fairly trivial(though if you think we have perf hit lmk)

wouldnt this also broadcast to gpus in the same TP group? Which TRTLLM already has equal across TP

yeah do you think that's too much of a perf hit or doesn't matter?

I dont think it would affect perf but it does make the code less clear
This broadcast only happens without resharding, do we need to guard it?

okay let's guard on no reshard and then use that function then?

gshennvm · 2024-04-10T20:22:41Z

examples/nlp/gpt/conf/gpt_ppo_actor.yaml

@@ -38,6 +38,12 @@ trainer:
    trt_llm:
      enable: False
      reshard: False # if True then reshard the model into TP only for inf
+      max_context_len: ${int_div:${model.encoder_seq_length}, 2}
+      model_type: "LLaMAForCausalLM"


oh... forgot about this. we need to change this for mixtral etc?

Yeah with the TRTLLM unified builder we can only build models that TRTLLM supports, and have to specify that model.
NeMo export had a way to automatically detect the model type but it was prone to failure so I think its best if the user explicitly defines the TRTLLM model type

okay it'll be good to just list which ones people can specify

Signed-off-by: jiemingz <=>

gshennvm

LGTM once you resolve the 2 small comments

Signed-off-by: jiemingz <=>

trtllm0.9 changes

866c2a0

Signed-off-by: jiemingz <=>

github-actions bot added the Utils label Apr 10, 2024

gshennvm reviewed Apr 10, 2024

View reviewed changes

nemo_aligner/utils/trt_llm.py Outdated Show resolved Hide resolved

gshennvm reviewed Apr 10, 2024

View reviewed changes

jiemingz added 3 commits April 10, 2024 15:03

fix typos

03be6a3

Signed-off-by: jiemingz <=>

address comments

ffb8cd7

Signed-off-by: jiemingz <=>

fixes

1258263

Signed-off-by: jiemingz <=>

github-actions bot added the Algorithms label Apr 11, 2024

jiemingz added 4 commits April 11, 2024 12:43

fix

2e592ab

Signed-off-by: jiemingz <=>

fix nemo generations with PP

399afb6

Signed-off-by: jiemingz <=>

add engine_unload

c61d33e

Signed-off-by: jiemingz <=>

cleanup trtllm

1e0359d

Signed-off-by: jiemingz <=>

gshennvm approved these changes Apr 16, 2024

View reviewed changes

address comments

195f981

Signed-off-by: jiemingz <=>

gshennvm merged commit 0983164 into NVIDIA:geshen/main_trt Apr 17, 2024
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trtllm0.9 changes #149

trtllm0.9 changes #149

jiemingz commented Apr 10, 2024

gshennvm Apr 10, 2024

jiemingz Apr 10, 2024

gshennvm Apr 10, 2024

JimmyZhang12 Apr 10, 2024

gshennvm Apr 16, 2024

gshennvm Apr 10, 2024 •

edited

Loading

JimmyZhang12 Apr 10, 2024 •

edited

Loading

gshennvm Apr 16, 2024

gshennvm left a comment

trtllm0.9 changes #149

trtllm0.9 changes #149

Conversation

jiemingz commented Apr 10, 2024

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Checklist when contributing a new algorithm

Additional Information

gshennvm Apr 10, 2024

Choose a reason for hiding this comment

jiemingz Apr 10, 2024

Choose a reason for hiding this comment

gshennvm Apr 10, 2024

Choose a reason for hiding this comment

JimmyZhang12 Apr 10, 2024

Choose a reason for hiding this comment

gshennvm Apr 16, 2024

Choose a reason for hiding this comment

gshennvm Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

JimmyZhang12 Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

gshennvm Apr 16, 2024

Choose a reason for hiding this comment

gshennvm left a comment

Choose a reason for hiding this comment

gshennvm Apr 10, 2024 •

edited

Loading

JimmyZhang12 Apr 10, 2024 •

edited

Loading