[MLIR][ROCDL] Implement math.ipowi #4

lialan · 2025-01-13T14:07:22Z

No description provided.

…vm#122029) Move the common case of FieldDecl::getFieldIndex() inline to mitigate the cost of removing the extra `FieldNo` induction variable. Also rename isNoUniqueAddress parameter to isNonVirtualBaseType, which appears to be more accurate. I think the current name is just a consequence of autocomplete gone wrong.

Need to sync the mask between cost and actual emission to avoid bugs in mask calculation Fixes llvm#122324

I’m seeing a series of errors when trying to run the cmake configure step on macOS when the cmake generator is set to Xcode. All is well if I use the Ninja or Unix Makefile generators. Messages are all of the form: ~~~ CMake Error at …llvm-project/clang/cmake/modules/AddClang.cmake:120 (target_compile_definitions): Cannot specify compile definitions for target "obj.clangBasic" which is not built by this project. Call Stack (most recent call first): …llvm-project/clang/lib/Basic/CMakeLists.txt:57 (add_clang_library) ~~~ The remaining errors are similar but mention targets obj.clangAPINotes, obj.clangLex, obj.clangParse, and so on. The regression appears to have been introduced by commit 09fa2f0 (Oct 14 2024) which added the code in this area. My proposed solution is simply to add a test to ensure that the obj.x target exists before setting its compile definitions. There is precedent doing just this in both clang/cmake/modules/AddClang.cmake and clang/lib/support/CMakeLists.txt as well as in the “MSVC AND NOT CLANG_LINK_CLANG_DYLIB” path immediately above the offending line. I’ve also made a couple of grammatical tweaks in the comments surrounding this code. In case it's relevant, the cmake settings and definitions I've used to trigger these errors is: ~~~bash GENERATOR="Xcode" OUTDIR=build_macos cmake \ -S "$SCRIPT_DIR/llvm" \ -B "$SCRIPT_DIR/$OUTDIR" \ -G "$GENERATOR" \ -D CMAKE_BUILD_TYPE=Release \ -D CMAKE_OSX_ARCHITECTURES=arm64 \ -D LLVM_PARALLEL_LINK_JOBS=1 \ -D LLVM_ENABLE_PROJECTS="clang;lld" \ -D LLVM_TARGETS_TO_BUILD=RISCV \ -D LLVM_DEFAULT_TARGET_TRIPLE=riscv32-unknown-elf \ -D LLVM_OPTIMIZED_TABLEGEN=Yes ~~~ (cmake v3.31.1, Xcode 16.1. I know that not all of these variables are useful for the Xcode generator!) Co-authored-by: Paul Bowen-Huggett <[email protected]>

…llvm#122332) The SEW operand for these instructions should have a value of 0. This matches what was done for vcpop/vfirst.

…2286) Don't suggest to comment-out the parameter name if the parameter has an attribute that's spelled after the parameter name. This prevents the parameter's attributes from being wrongly applied to the parameter's type. This fixes llvm#122191.

…lvm#122190) The GPU ID operations already implement InferIntRangeInterface, which gives constant lower and upper bounds on those IDs when appropriate metadata is prentent on the operations or in the surrounding context. This commit uses that existing code to implement the ValueBoundsOpInterface, which is used when analyzing affine operations (unlike the integer range interface, which is used for arithmetic optimization). It also implements the interface for gpu.launch, where we can use it to express the constraint that block/grid sizes are equal to their value from outside the launch op and that the corresponding IDs are bounded above by that size. As a consequence, the test pass for this inference is updated to work on a FunctionOpInterface and not a func.func, creating minor churn in other tests.

) With this patch we switch from the temporary dummy seeds to actual seeds provided by the seed collector. The seeds get sliced and each slice is used as the starting point for vectorization.

The test runs asynchronous kernels and depending on the timing the output is slightly different. We now only check for the common parts of the output.

Summary: Previously we had some indirection here, this patch updates these utilities to just be normal template functions. We use SFINAE to manage the special case handling for floats. Also this strips address spaces so it can be used more generally.

Summary: Use a normal bitcast, remove from the shared utils since it's not available in GCC 7.4

…llvm#122354) This disables the support added in PR121985 by default while we investigate a compile time crash.

This adds a workflow for running HLSL tests on PRs that modify HLSL and DirectX code. The tests enabled here are the LLVM & Clang tests and the Offload execution tests: https://github.com/llvm-beanz/offload-test-suite/

Pre-commit some tests in preparation to teach ValueTracking's implied-cond about samesign.

…lvm#120327) The `sycl_kernel_entry_point` attribute is used to declare a function that defines a pattern for an offload kernel entry point. The attribute requires a single type argument that specifies a class type that meets the requirements for a SYCL kernel name as described in section 5.2, "Naming of kernels", of the SYCL 2020 specification. A unique kernel name type is required for each function declared with the attribute. The attribute may not first appear on a declaration that follows a definition of the function. The function is required to have a non-deduced `void` return type. The function must not be a non-static member function, be deleted or defaulted, be declared with the `constexpr` or `consteval` specifiers, be declared with the `[[noreturn]]` attribute, be a coroutine, or accept variadic arguments. Diagnostics are not yet provided for the following: - Use of a type as a kernel name that does not satisfy the forward declarability requirements specified in section 5.2, "Naming of kernels", of the SYCL 2020 specification. - Use of a type as a parameter of the attributed function that does not satisfy the kernel parameter requirements specified in section 4.12.4, "Rules for parameter passing to kernels", of the SYCL 2020 specification (each such function parameter constitutes a kernel parameter). - Use of language features that are not permitted in device functions as specified in section 5.4, "Language restrictions for device functions", of the SYCL 2020 specification. There are several issues noted by various FIXME comments. - The diagnostic generated for kernel name conflicts needs additional work to better detail the relevant source locations; such as the location of each declaration as well as the original source of each kernel name. - A number of the tests illustrate spurious errors being produced due to attributes that appertain to function templates being instantiated too early (during overload resolution as opposed to after an overload is selected). Included changes allow the `SYCLKernelEntryPointAttr` attribute to be marked as invalid if a `sycl_kernel_entry_point` attribute is used incorrectly. This is intended to prevent trying to emit an offload kernel entry point without having to mark the associated function as invalid since doing so would affect overload resolution; which this attribute should not do. Unfortunately, Clang eagerly instantiates attributes that appertain to functions with the result that errors might be issued for function declarations that are never selected by overload resolution. Tests have been added to demonstrate this. Further work will be needed to address these issues (for this and other attributes).

Summary: This isn't used anymore, I moved the GPU extensions into `offload/`.

…ARM64X (llvm#121500)

After llvm#120563 malloc_size also needs intercepting on Apple platforms, otherwise all type-sanitized binaries crash on startup with an objc error: realized class 0x12345 has corrupt data pointer: malloc_size(0x567) = 0 PR: llvm#122133

llvm#122371) …llvm#121991)" This reverts commit f8f8598. This breaks ARMv7 and s390x buildbot with the following message: ``` llvm-exegesis error: No available targets are compatible with triple "armv8l-unknown-linux-gnueabihf" FileCheck error: '<stdin>' is empty. FileCheck command line: /home/tcwg-buildbot/worker/clang-armv7-2stage/stage2/bin/FileCheck /home/tcwg-buildbot/worker/clang-armv7-2stage/llvm/llvm/test/tools/llvm-exegesis/dry-run-measurement.test ```

…m#120662) The Clang tablegen built-in function prototype parser has the `__bf16` type missing. This patch adds the missing type to the parser.

…pace declaration of a negative test. (llvm#122375) Commit 1a73654 added a missing diagnostic for incorrect placement of an attribute in a namespace declaration. This change corrects a SYCL test that inadvertently exercised the `sycl_kernel_entry_point` attribute in the wrong declaration location.

These make cross compiling the test suite more difficult, as you need the sysroot to contain these headers and libraries cross compiled for your target. It's straightforward to stick with the corresponding C headers.

If the pointer to be checked is statically known to be zero, the tag check will always pass since: 1) the tag is zero 2) shadow memory for address 0 is initialized to 0 and never updated. We can therefore elide the tag check. We perform the elision in two places: 1) the HWASan pass 2) when lowering the CHECK_MEMACCESS intrinsic. Conceivably, the HWASan pass may encounter a "cannot currently statically prove to be null" pointer (and is therefore unable to omit the intrinsic) that later optimization passes convert into a statically known-null pointer. As a last line of defense, we perform elision here too. This also updates the tests from llvm#122186

This patch improves the linker’s ability to estimate stub reachability in the `TextOutputSection::estimateStubsInRangeVA` function. It does so by including thunks that have already been placed ahead of the current call site address when calculating the threshold for direct stub calls. Before this fix, the estimation process overlooked existing forward thunks. This could result in some thunks not being inserted where needed. In rare situations, particularly with large and specially arranged codebases, this might lead to branch instructions being out of range, causing linking errors. Although this patch successfully addresses the problem, it is not feasible to create a test for this issue. The specific layout and order of thunk creation required to reproduce the corner case are too complex, making test creation impractical. Example error messages the issue could generate: ``` ld64.lld: error: banana.o:(symbol OUTLINED_FUNCTION_24949_3875): relocation BRANCH26 is out of range: 134547892 is not in [-134217728, 134217727]; references objc_autoreleaseReturnValue ld64.lld: error: main.o:(symbol _main+0xc): relocation BRANCH26 is out of range: 134544132 is not in [-134217728, 134217727]; references objc_release ```

) This adds a test line and updates a comment.

We want special handing for IGLP instructions in the scheduler but they should still be treated like they have side effects by other passes. Add a target hook to the ScheduleDAGInstrs DAG builder so that we have more control over this.

Providing the character that we failed on is helpful for figuring out what's going wrong in the tzdb.

The body of the loop only applies to wide induction recipes, skip any other header phi recipes up-frond

This patch fixes: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:255:18: error: private field 'DAG' is not used [-Werror,-Wunused-private-field]

…m#122552) - **[InstSimpify] Add tests for simplifying `(xor (sub C_Mask, X), C_Mask)`; NFC** - **[InstSimpify] Simplifying `(xor (sub C_Mask, X), C_Mask)` -> `X`** Helps address regressions with folding `clz(Pow2)`. Proof: https://alive2.llvm.org/ce/z/zGwUBp

…ng CR for `ct{t,l}z` (llvm#122548)

Note that PointerUnion::{is,get} have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> I'm not touching PointerUnion::dyn_cast for now because it's a bit complicated; we could blindly migrate it to dyn_cast_if_present, but we should probably use dyn_cast when the operand is known to be non-null.

This patch makes the metrics job also detect failures in individual steps. This is necessary now that we are setting continue-on-error in the premerge jobs to prevent sending out unnecessary email to detect what jobs actually fail.

…release note (llvm#122594) <img width="1137" alt="image" src="https://github.com/user-attachments/assets/25433743-2c19-422a-93c5-3edfc1bb7a3f" />

This adds a test that consists of compiling `#include <...>`, pretty much alone, for each public header file in each different language mode (`-std=...` compiler switch) with -Werror and many warnings enabled. There are several headers that have bugs when used alone, and many more headers that have bugs in certain language modes. So for now, compiling the new tests is gated on the cmake switch -DLLVM_LIBC_BUILD_HEADER_TESTS=ON. When all the bugs are fixed, the switch will be removed so future regressions don't land.

…o MSVC /d2ImportCallOptimization) (llvm#121516) This change implements import call optimization for AArch64 Windows (equivalent to the undocumented MSVC `/d2ImportCallOptimization` flag). Import call optimization adds additional data to the binary which can be used by the Windows kernel loader to rewrite indirect calls to imported functions as direct calls. It uses the same [Dynamic Value Relocation Table mechanism that was leveraged on x64 to implement `/d2GuardRetpoline`](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618). The change to the obj file is to add a new `.impcall` section with the following layout: ```cpp // Per section that contains calls to imported functions: // uint32_t SectionSize: Size in bytes for information in this section. // uint32_t Section Number // Per call to imported function in section: // uint32_t Kind: the kind of imported function. // uint32_t BranchOffset: the offset of the branch instruction in its // parent section. // uint32_t TargetSymbolId: the symbol id of the called function. ``` NOTE: If the import call optimization feature is enabled, then the `.impcall` section must be emitted, even if there are no calls to imported functions. The implementation is split across a few parts of LLVM: * During AArch64 instruction selection, the `GlobalValue` for each call to a global is recorded into the Extra Information for that node. * During lowering to machine instructions, the called global value for each call is noted in its containing `MachineFunction`. * During AArch64 asm printing, if the import call optimization feature is enabled: - A (new) `.impcall` directive is emitted for each call to an imported function. - The `.impcall` section is emitted with its magic header (but is not filled in). * During COFF object writing, the `.impcall` section is filled in based on each `.impcall` directive that were encountered. The `.impcall` section can only be filled in when we are writing the COFF object as it requires the actual section numbers, which are only assigned at that point (i.e., they don't exist during asm printing). I had tried to avoid using the Extra Information during instruction selection and instead implement this either purely during asm printing or in a `MachineFunctionPass` (as suggested in [on the forums](https://discourse.llvm.org/t/design-gathering-locations-of-instructions-to-emit-into-a-section/83729/3)) but this was not possible due to how loading and calling an imported function works on AArch64. Specifically, they are emitted as `ADRP` + `LDR` (to load the symbol) then a `BR` (to do the call), so at the point when we have machine instructions, we would have to work backwards through the instructions to discover what is being called. An initial prototype did work by inspecting instructions; however, it didn't correctly handle the case where the same function was called twice in a row, which caused LLVM to elide the `ADRP` + `LDR` and reuse the previously loaded address. Worse than that, sometimes for the double-call case LLVM decided to spill the loaded address to the stack and then reload it before making the second call. So, instead of trying to implement logic to discover where the value in a register came from, I instead recorded the symbol being called at the last place where it was easy to do: instruction selection.

) Adds support for delayed privatization for `simd` directives. This PR includes PFT down to LLVM IR lowering.

…atic functions (llvm#119974) Static member functions can be considered the same way as free functions are, so do that.

Msan is not supported on Android as mentioned in google/sanitizers#1381. We proactively give the warning saying it is unsupported to fix android/ndk#1958.

…lvm#102299) This checks that classes/structs inheriting from ``std::enable_shared_from_this`` does so with public inheritance, so it prevents crashes due to ``std::make_shared`` and ``shared_from_this()`` getting called when the internal weak pointer was not initialized (e.g. due to private inheritance).

…2634) Certain non-standard float types were directly passed through in the LLVM type converter, resulting in invalid IR or failed assertions: ``` mlir-opt: mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp:638: FailureOr<Type> mlir::LLVMTypeConverter::convertVectorType(VectorType) const: Assertion `LLVM::isCompatibleVectorType(vectorType) && "expected vector type compatible with the LLVM dialect"' failed. ``` The LLVM type converter should not define invalid type conversion rules for such types. If there is no type conversion rule, conversion patterns will not apply to ops with such operand types.

…cl) appears in the trailing return type of the lambda (llvm#122611) The (function) type of the lambda function is null while parsing trailing return type. The type is filled-in when the lambda body is entered. So, resolving `__PRETTY_FUNCTION__` before the lambda body is entered causes the crash. Fixes llvm#121274.

Option allows using full LTO when linking bitcode files compiled with unified LTO pipeline.

There is a narrow special-case in isImpliedCondICmps that can benefit from being taught about samesign. Since it costs us nothing to implement it, teach it about samesign, for completeness. This patch marks the completion of the effort to teach ValueTracking about samesign.

rnk and others added 30 commits January 9, 2025 11:21

[SLP]Fix mask processing for reused gathered scalars

5ff3674

Need to sync the mask between cost and actual emission to avoid bugs in mask calculation Fixes llvm#122324

[libc++][NFC] Remove trailing whitespace from release notes

2c6ed5f

[RISCV] Return MILog2SEW for mask instructions getOperandLog2EEW. NFC (…

b16777a

…llvm#122332) The SEW operand for these instructions should have a value of 0. This matches what was done for vcpop/vfirst.

[WebAssembly] Format WebAssembly ReleaseNote entries (llvm#122203)

876841b

[SandboxVec][BottomUpVec] Use SeedCollector and slice seeds (llvm#120826

6312bee

) With this patch we switch from the temporary dummy seeds to actual seeds provided by the seed collector. The seeds get sliced and each slice is used as the starting point for vectorization.

[OpenMP][FIX] Adjust test to be non-flaky (llvm#122331)

1739ba9

The test runs asynchronous kernels and depending on the timing the output is slightly different. We now only check for the common parts of the output.

[OpenMP] Use __builtin_bit_cast instead of UB type punning (llvm#122325)

f53cb84

Summary: Use a normal bitcast, remove from the shared utils since it's not available in GCC 7.4

[MemProf] Disable cloning of callsites in recursive cycles by default (…

3055e86

…llvm#122354) This disables the support added in PR121985 by default while we investigate a compile time crash.

Add pre-merge workflow for HLSL testing (llvm#122184)

218f15c

This adds a workflow for running HLSL tests on PRs that modify HLSL and DirectX code. The tests enabled here are the LLVM & Clang tests and the Offload execution tests: https://github.com/llvm-beanz/offload-test-suite/

VT/test: pre-commit tests to enable samesign optz (llvm#120257)

9d5299e

Pre-commit some tests in preparation to teach ValueTracking's implied-cond about samesign.

[bazel] Add missing dependency for cbcb7ad

f791a4f

[bazel] Port 0aa831e

d797d94

[libc] Remove leftover 'gpu/' source directory (llvm#122368)

0acdba8

Summary: This isn't used anymore, I moved the GPU extensions into `offload/`.

[LLD][COFF] Emit base relocation for native CHPE metadata pointer on …

8408722

…ARM64X (llvm#121500)

[Clang][TableGen] Add missing __bf16 type to the builtins parser (llv…

f764e71

…m#120662) The Clang tablegen built-in function prototype parser has the `__bf16` type missing. This patch adds the missing type to the parser.

[libc++] Add missing _LIBCPP_NODEBUG on internal aliases

c492a22

[RISCV][VLOPT] Add vmerge to isSupportedInstr (llvm#122340)

328c3a8

[libc][test] remove C++ stdlib includes (llvm#122369)

0efb376

These make cross compiling the test suite more difficult, as you need the sysroot to contain these headers and libraries cross compiled for your target. It's straightforward to stick with the corresponding C headers.

[VPlan] Remove dead ToRemove (NFC).

7ffb691

bernhardu and others added 28 commits January 11, 2025 18:54

[win/asan] GetInstructionSize: Add test for 8D A4 24 .... (llvm#119794

9a9e41c

) This adds a test line and updates a comment.

[libc++] Improve diagnostic when failing to parse the tzdb (llvm#122125)

2914ba1

Providing the character that we failed on is helpful for figuring out what's going wrong in the tzdb.

[VPlan] Skip non-induction phi recipes in legalizeAndOptimizeInductions.

7f59b4e

The body of the loop only applies to wide induction recipes, skip any other header phi recipes up-frond

[AMDGPU] Fix a warning

bfe93ae

This patch fixes: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:255:18: error: private field 'DAG' is not used [-Werror,-Wunused-private-field]

[ValueTracking] Take into account whether zero is poison when computi…

17ef436

…ng CR for `ct{t,l}z` (llvm#122548)

[TableGen] Avoid repeated hash lookups (NFC) (llvm#122586)

07ff786

[Sema] Avoid repeated hash lookups (NFC) (llvm#122588)

a56eb7c

[CI] Detect step failures in metrics job (llvm#122564)

eabf931

This patch makes the metrics job also detect failures in individual steps. This is necessary now that we are setting continue-on-error in the premerge jobs to prevent sending out unnecessary email to detect what jobs actually fail.

[clang-tidy][doc] combine the clang-tidy itself's change together in …

2c7829e

…release note (llvm#122594) <img width="1137" alt="image" src="https://github.com/user-attachments/assets/25433743-2c19-422a-93c5-3edfc1bb7a3f" />

[Driver] Avoid repeated map lookups (NFC) (llvm#122625)

4f6fabd

Fix build break in MIRPrinter (llvm#122630)

d997a72

[flang][OpenMP] Extend delayed privatization for omp.simd (llvm#122156

42da120

) Adds support for delayed privatization for `simd` directives. This PR includes PFT down to LLVM IR lowering.

[clang-tidy] performance-unnecessary-copy-initialization: Consider st…

a536444

…atic functions (llvm#119974) Static member functions can be considered the same way as free functions are, so do that.

[Driver] Error when using msan on Android (llvm#122540)

fdfe7e7

Msan is not supported on Android as mentioned in google/sanitizers#1381. We proactively give the warning saying it is unsupported to fix android/ndk#1958.

[gn build] Port 8ebc35f

7532958

Add 'unifiedlto' option to gold plugin (llvm#121336)

26b4a0a

Option allows using full LTO when linking bitcode files compiled with unified LTO pipeline.

[MLIR][ROCDL] Convert math::fpowi to ROCDL call

9c95137

Remove static_assert and use a runtime assert.

aab089e

remove the comment

8c180db

lialan closed this Jan 13, 2025

lialan deleted the lialan/rocdl_lib branch January 14, 2025 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLIR][ROCDL] Implement math.ipowi #4

[MLIR][ROCDL] Implement math.ipowi #4

lialan commented Jan 13, 2025

[MLIR][ROCDL] Implement math.ipowi #4

[MLIR][ROCDL] Implement math.ipowi #4

Conversation

lialan commented Jan 13, 2025