Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Fusion] JIT compiler kernel fusion passes #7661

Merged
merged 1 commit into from
Dec 8, 2022

Conversation

sommerlukas
Copy link
Contributor

@sommerlukas sommerlukas commented Dec 6, 2022

This is the fourth patch in a series of patches to add an implementation of the kernel fusion extension. We have split the implementation into multiple patches to make them more easy to review.

This patch adds the LLVM passes that perform the kernel fusion and related optimizations:

  • A pass creating the function definition for the fused kernel from the input kernel definitions.
  • A pass performing internalization of dataflow internal to the fused kernel into either private or local memory.
    The type of memory to use is currently specified by the user in the runtime.
  • A pass propagating values for scalars and by-val aggregates from the SYCL runtime to the fused kernel as constants.

The information is propagated from the SYCL runtime to the passes via LLVM metadata inserted by the JIT compiler frontend.

After and between the fusion passes, some standard LLVM optimization and transformation passes are executed to enable passes and optimize the fused kernel.

Co-authored-by: Lukas Sommer [email protected]
Co-authored-by: Victor Perez [email protected]
Signed-off-by: Lukas Sommer [email protected]

@sommerlukas sommerlukas self-assigned this Dec 6, 2022
@sommerlukas
Copy link
Contributor Author

This patch can be reviewed and merged independently of #7531.

The tests are currently not yet run together with check-sycl, this will follow in a later patch/PR. They can be executed from the build folder by running check-sycl-fusion.

@sommerlukas sommerlukas force-pushed the kernel-fusion/fourth-patch branch from 5e1613a to 5451db0 Compare December 6, 2022 18:09
Copy link
Contributor

@Naghasan Naghasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just the nitpick

sycl-fusion/passes/kernel-fusion/SYCLKernelFusion.cpp Outdated Show resolved Hide resolved
@sommerlukas sommerlukas force-pushed the kernel-fusion/fourth-patch branch from 5451db0 to fae8208 Compare December 7, 2022 09:56
Co-authored-by: Lukas Sommer <[email protected]>
Co-authored-by: Victor Perez <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
@sommerlukas sommerlukas force-pushed the kernel-fusion/fourth-patch branch from fae8208 to 2021ae2 Compare December 7, 2022 17:33
@Naghasan
Copy link
Contributor

Naghasan commented Dec 8, 2022

@intel/llvm-gatekeepers the PR is ready to be merged, AMD failure is unrelated to this patch AFAIK

@steffenlarsen
Copy link
Contributor

HIP failures reported in #7634.

@steffenlarsen steffenlarsen merged commit e1e6df5 into intel:sycl Dec 8, 2022
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some post-commit review

NF->copyAttributesFrom(F);
// Drop masked-out attributes.
SmallVector<AttributeSet> Attributes;
const llvm::AttributeList PAL = NF->getAttributes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: llvm:: namespace is used inconsistently: there is using namespace llvm;, but in some places llvm:: is still used. For example, below on line 44 AttributeList is referenced without llvm::. This comment also applies to other files


{
// Copy metadata.
SmallVector<std::pair<unsigned, MDNode *>> MDs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised that SmallVector doesn't require a second argument, am I missing something? What is the point of using SmallVector if we are not pre-allocating some storage on stack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually recommended if there is no well motivated choice for N:

In the absence of a well-motivated choice for the number of inlined elements N, it is recommended to use SmallVector (that is, omitting the N). This will choose a default number of inlined elements reasonable for allocation on the stack (for example, trying to keep sizeof(SmallVector) around 64 bytes).

From https://llvm.org/doxygen/classllvm_1_1SmallVector.html#details

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know there is a default value for N, thanks!

// Copy metadata.
SmallVector<std::pair<unsigned, MDNode *>> MDs;
F->getAllMetadata(MDs);
for (auto MD : MDs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary copy due to missing &?


using namespace llvm;

static FunctionType *createMaskedFunctionType(const BitVector &Mask,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not anonymous namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding so far was that there doesn't seem to be a big difference between static and anonymous namespaces, also based on this.

Do you have a specific reason to prefer one over the other?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong reason, just making it more C++-y

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using static in this case allows us to keep the static functions closer to their non-static users/callers without declaring multiple anonymous namespaces, so we stick with that for now.

auto *NewIndex = [&]() -> Value * {
if (LocalSize == 1) {
return Builder.getInt64(0);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM Coding Standard: Don't use else after return

Comment on lines +188 to +189
auto *FK = dyn_cast<MDString>(MDOp.get());
assert(FK && "Kernel should be given by its name as MDString");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to use dyn_cast if you don't check the result. cast already contains an assert statement for you

if (StubFunction.hasMetadata(ParameterMDKind)) {
llvm::MDNode *ParamMD = StubFunction.getMetadata(ParameterMDKind);
for (const auto &Op : ParamMD->operands()) {
auto *Tuple = dyn_cast<MDNode>(Op.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, dyn_cast -> cast

Comment on lines +575 to +576
auto *ConstantMD = dyn_cast<ConstantAsMetadata>(MD);
assert(ConstantMD && "Metadata not constant");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dyn_cast -> cast

Comment on lines +153 to +154
SmallVector<Value *> Indices;
Indices.push_back(Builder.getInt32(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayRef is constructible from a single element: we should use that instead of performing an allocation on heap for a single pointer.

Comment on lines +157 to +159
if (RetCode) {
return std::move(RetCode);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Braces around single-line ifs can be omitted

@sommerlukas
Copy link
Contributor Author

@AlexeySachkov: Thank you for your feedback, I addressed your post-commit comments in #7719.

steffenlarsen pushed a commit that referenced this pull request Dec 9, 2022
Fix tests for kernel fusion passes in static builds by using correct
library, loadable by `opt`, for tests.

Also fixes post-commit feedback from @AlexeySachkov in #7661.

Signed-off-by: Lukas Sommer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants