Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][OpenMP] Support host_eval for target teams loop #228

Merged
merged 1 commit into from
Jan 13, 2025

Conversation

ergawy
Copy link

@ergawy ergawy commented Dec 11, 2024

Extends host_eval support for the currently supported form of the generic loop directive.

@ergawy ergawy requested review from skatrak and agozillon December 11, 2024 09:24
Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Kareem for working on this! I've got a couple of small comments, but I think it's almost there.

flang/lib/Lower/OpenMP/OpenMP.cpp Outdated Show resolved Hide resolved
mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp Outdated Show resolved Hide resolved
@ergawy ergawy force-pushed the loop_host_eval branch 2 times, most recently from 6e5cba3 to bd14f92 Compare December 16, 2024 08:43
@ergawy
Copy link
Author

ergawy commented Dec 16, 2024

Thanks for the review @skatrak. I handled your comments (hopefully) and added a small offloading test. But I wanted to ask: where in the spec do we extract the info that a certain config is generic, SPMD, or generic-SPMD?

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Kareem, and sorry for the delay getting back to this! LGTM, just minor nits.

@@ -1908,16 +1908,19 @@ llvm::omp::OMPTgtExecModeFlags TargetOp::getKernelExecFlags() {
long numWrappers = std::distance(innermostWrapper, wrappers.end());

// Detect Generic-SPMD: target-teams-distribute[-simd].
// Detect SPMD: target-teams-loop[-simd].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Detect SPMD: target-teams-loop[-simd].
// Detect SPMD: target-teams-loop.

if (numWrappers == 1) {
if (!isa<DistributeOp>(innermostWrapper))
if (!isa<DistributeOp>(innermostWrapper) && !isa<LoopOp>(innermostWrapper))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!isa<DistributeOp>(innermostWrapper) && !isa<LoopOp>(innermostWrapper))
if (!isa<DistributeOp, LoopOp>(innermostWrapper))

@skatrak
Copy link
Member

skatrak commented Jan 7, 2025

Thanks for the review @skatrak. I handled your comments (hopefully) and added a small offloading test. But I wanted to ask: where in the spec do we extract the info that a certain config is generic, SPMD, or generic-SPMD?

That differentiation is not really in the spec, but it's instead part of the device runtime. It's an implementation detail that allows us to provide more efficient specializations for specific kinds of kernels. SPMD, for example, is used to identify kernels for which the number of teams and threads and the trip count of the single loop being run across all threads and teams remains constant during execution and is known in advance.

A generic kernel might have multiple loops, none at all, or violate any of the conditions above (which are not mandated by the spec), and for that case the generated code implements a state machine that reduces performance on GPUs. Generic-SPMD seems to fall somewhere in the middle, and I don't know much about it apart from the fact that it seems to represent target teams distribute, so something inside of a loop that is distributed across teams. I got that understanding some time ago looking at uses of the OMPTgtExecModeFlags, but I wouldn't be surprised if I didn't understand Generic-SPMD fully.

@ergawy
Copy link
Author

ergawy commented Jan 13, 2025

That differentiation is not really in the spec, but it's instead part of the device runtime. It's an implementation detail that allows us to provide more efficient specializations for specific kinds of kernels. SPMD, for example, is used to identify kernels for which the number of teams and threads and the trip count of the single loop being run across all threads and teams remains constant during execution and is known in advance.

A generic kernel might have multiple loops, none at all, or violate any of the conditions above (which are not mandated by the spec), and for that case the generated code implements a state machine that reduces performance on GPUs. Generic-SPMD seems to fall somewhere in the middle, and I don't know much about it apart from the fact that it seems to represent target teams distribute, so something inside of a loop that is distributed across teams. I got that understanding some time ago looking at uses of the OMPTgtExecModeFlags, but I wouldn't be surprised if I didn't understand Generic-SPMD fully.

That clarifies things quite a bit. Thanks!

Extends `host_eval` support for the currently supported form of the
generic `loop` directive.
@ergawy ergawy merged commit a1eead6 into ROCm:amd-trunk-dev Jan 13, 2025
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants