-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trtllm0.9 changes #149
trtllm0.9 changes #149
Conversation
Signed-off-by: jiemingz <=>
nemo_aligner/utils/trt_llm.py
Outdated
if parallel_state.get_pipeline_model_parallel_world_size() > 1 and not self.reshard_model: | ||
group = parallel_state.get_pipeline_model_parallel_group() | ||
src = parallel_state.get_pipeline_model_parallel_first_rank() | ||
output_ids = broadcast_2d_tensor(output_ids, src, group, dtype=output_ids.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use this function instead. The groups are unchanged when we don't have resharding and when there's resharding this should be fairly trivial(though if you think we have perf hit lmk)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldnt this also broadcast to gpus in the same TP group? Which TRTLLM already has equal across TP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah do you think that's too much of a perf hit or doesn't matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think it would affect perf but it does make the code less clear
This broadcast only happens without resharding, do we need to guard it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay let's guard on no reshard and then use that function then?
@@ -38,6 +38,12 @@ trainer: | |||
trt_llm: | |||
enable: False | |||
reshard: False # if True then reshard the model into TP only for inf | |||
max_context_len: ${int_div:${model.encoder_seq_length}, 2} | |||
model_type: "LLaMAForCausalLM" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh... forgot about this. we need to change this for mixtral etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah with the TRTLLM unified builder we can only build models that TRTLLM supports, and have to specify that model.
NeMo export had a way to automatically detect the model type but it was prone to failure so I think its best if the user explicitly defines the TRTLLM model type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay it'll be good to just list which ones people can specify
Signed-off-by: jiemingz <=>
Signed-off-by: jiemingz <=>
Signed-off-by: jiemingz <=>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once you resolve the 2 small comments
Signed-off-by: jiemingz <=>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Changelog
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
Checklist when contributing a new algorithm
max_steps=-1
andvalidation
?Additional Information