Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLIR][ROCDL] Implement math.ipowi #4

Closed
wants to merge 10,000 commits into from
Closed

[MLIR][ROCDL] Implement math.ipowi #4

wants to merge 10,000 commits into from

Conversation

lialan
Copy link

@lialan lialan commented Jan 13, 2025

No description provided.

rnk and others added 30 commits January 9, 2025 11:21
…vm#122029)

Move the common case of FieldDecl::getFieldIndex() inline to mitigate
the cost of removing the extra `FieldNo` induction variable.

Also rename isNoUniqueAddress parameter to isNonVirtualBaseType, which
appears to be more accurate. I think the current name is just a
consequence of autocomplete gone wrong.
Need to sync the mask between cost and actual emission to avoid bugs in
mask calculation

Fixes llvm#122324
I’m seeing a series of errors when trying to run the cmake configure
step on macOS when the cmake generator is set to Xcode. All is well if I
use the Ninja or Unix Makefile generators. Messages are all of the form:
~~~
CMake Error at …llvm-project/clang/cmake/modules/AddClang.cmake:120
(target_compile_definitions):
  Cannot specify compile definitions for target "obj.clangBasic" which
  is not built by this project.
Call Stack (most recent call first):
  …llvm-project/clang/lib/Basic/CMakeLists.txt:57 (add_clang_library)
~~~
The remaining errors are similar but mention targets obj.clangAPINotes,
obj.clangLex, obj.clangParse, and so on.

The regression appears to have been introduced by commit 09fa2f0
(Oct 14 2024) which added the code in this area.

My proposed solution is simply to add a test to ensure that the obj.x
target exists before setting its compile definitions. There is precedent
doing just this in both clang/cmake/modules/AddClang.cmake and
clang/lib/support/CMakeLists.txt as well as in the “MSVC AND NOT
CLANG_LINK_CLANG_DYLIB” path immediately above the offending line.

I’ve also made a couple of grammatical tweaks in the comments
surrounding this code.

In case it's relevant, the cmake settings and definitions I've used to
trigger these errors is:
~~~bash
GENERATOR="Xcode"
OUTDIR=build_macos
cmake \
-S "$SCRIPT_DIR/llvm" \
-B "$SCRIPT_DIR/$OUTDIR" \
-G "$GENERATOR" \
-D CMAKE_BUILD_TYPE=Release \
-D CMAKE_OSX_ARCHITECTURES=arm64 \
-D LLVM_PARALLEL_LINK_JOBS=1 \
-D LLVM_ENABLE_PROJECTS="clang;lld" \
-D LLVM_TARGETS_TO_BUILD=RISCV \
-D LLVM_DEFAULT_TARGET_TRIPLE=riscv32-unknown-elf \
-D LLVM_OPTIMIZED_TABLEGEN=Yes
~~~
(cmake v3.31.1, Xcode 16.1. I know that not all of these variables are
useful for the Xcode generator!)

Co-authored-by: Paul Bowen-Huggett <[email protected]>
…llvm#122332)

The SEW operand for these instructions should have a value of 0. This
matches what was done for vcpop/vfirst.
…2286)

Don't suggest to comment-out the parameter name if the parameter has an
attribute that's spelled after the parameter name.

This prevents the parameter's attributes from being wrongly applied to
the parameter's type.

This fixes llvm#122191.
…lvm#122190)

The GPU ID operations already implement InferIntRangeInterface, which
gives constant lower and upper bounds on those IDs when appropriate
metadata is prentent on the operations or in the surrounding context.

This commit uses that existing code to implement the
ValueBoundsOpInterface, which is used when analyzing affine operations
(unlike the integer range interface, which is used for arithmetic
optimization).

It also implements the interface for gpu.launch, where we can use it to
express the constraint that block/grid sizes are equal to their value
from outside the launch op and that the corresponding IDs are bounded
above by that size.

As a consequence, the test pass for this inference is updated to work on
a FunctionOpInterface and not a func.func, creating minor churn in other
tests.
)

With this patch we switch from the temporary dummy seeds to actual seeds
provided by the seed collector.
The seeds get sliced and each slice is used as the starting point for
vectorization.
The test runs asynchronous kernels and depending on the timing the
output is slightly different. We now only check for the common parts of
the output.
Summary:
Previously we had some indirection here, this patch updates these
utilities to just be normal template functions. We use SFINAE to manage
the special case handling for floats. Also this strips address spaces so
it can be used more generally.
Summary:
Use a normal bitcast, remove from the shared utils since it's not
available in
GCC 7.4
…llvm#122354)

This disables the support added in PR121985 by default while we
investigate a compile time crash.
This adds a workflow for running HLSL tests on PRs that modify HLSL and
DirectX code.

The tests enabled here are the LLVM & Clang tests and the Offload
execution tests: https://github.com/llvm-beanz/offload-test-suite/
Pre-commit some tests in preparation to teach ValueTracking's
implied-cond about samesign.
…lvm#120327)

The `sycl_kernel_entry_point` attribute is used to declare a function that
defines a pattern for an offload kernel entry point. The attribute requires
a single type argument that specifies a class type that meets the requirements
for a SYCL kernel name as described in section 5.2, "Naming of kernels", of
the SYCL 2020 specification. A unique kernel name type is required for each
function declared with the attribute. The attribute may not first appear on a
declaration that follows a definition of the function. The function is
required to have a non-deduced `void` return type. The function must not be
a non-static member function, be deleted or defaulted, be declared with the
`constexpr` or `consteval` specifiers, be declared with the `[[noreturn]]`
attribute, be a coroutine, or accept variadic arguments.

Diagnostics are not yet provided for the following:
- Use of a type as a kernel name that does not satisfy the forward
  declarability requirements specified in section 5.2, "Naming of kernels",
  of the SYCL 2020 specification.
- Use of a type as a parameter of the attributed function that does not
  satisfy the kernel parameter requirements specified in section 4.12.4,
  "Rules for parameter passing to kernels", of the SYCL 2020 specification
  (each such function parameter constitutes a kernel parameter).
- Use of language features that are not permitted in device functions as
  specified in section 5.4, "Language restrictions for device functions",
  of the SYCL 2020 specification.

There are several issues noted by various FIXME comments.
- The diagnostic generated for kernel name conflicts needs additional work
  to better detail the relevant source locations; such as the location of
  each declaration as well as the original source of each kernel name.
- A number of the tests illustrate spurious errors being produced due to
  attributes that appertain to function templates being instantiated too
  early (during overload resolution as opposed to after an overload is
  selected).

Included changes allow the `SYCLKernelEntryPointAttr` attribute to be
marked as invalid if a `sycl_kernel_entry_point` attribute is used incorrectly.
This is intended to prevent trying to emit an offload kernel entry point
without having to mark the associated function as invalid since doing so
would affect overload resolution; which this attribute should not do.
Unfortunately, Clang eagerly instantiates attributes that appertain to
functions with the result that errors might be issued for function
declarations that are never selected by overload resolution. Tests have
been added to demonstrate this. Further work will be needed to address
these issues (for this and other attributes).
Summary:
This isn't used anymore, I moved the GPU extensions into `offload/`.
After llvm#120563 malloc_size also
needs intercepting on Apple platforms, otherwise all type-sanitized
binaries crash on startup with an objc error:
realized class 0x12345 has corrupt data pointer: malloc_size(0x567) = 0

PR: llvm#122133
llvm#122371)

…llvm#121991)"

This reverts commit f8f8598.

This breaks ARMv7 and s390x buildbot with the following message:
```
llvm-exegesis error: No available targets are compatible with triple "armv8l-unknown-linux-gnueabihf"
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/tcwg-buildbot/worker/clang-armv7-2stage/stage2/bin/FileCheck /home/tcwg-buildbot/worker/clang-armv7-2stage/llvm/llvm/test/tools/llvm-exegesis/dry-run-measurement.test
```
…m#120662)

The Clang tablegen built-in function prototype parser has the `__bf16`
type missing. This patch adds the missing type to the parser.
…pace declaration of a negative test. (llvm#122375)

Commit 1a73654 added a missing
diagnostic for incorrect placement of an attribute in a namespace
declaration. This change corrects a SYCL test that inadvertently
exercised the `sycl_kernel_entry_point` attribute in the wrong
declaration location.
These make cross compiling the test suite more difficult, as you need
the
sysroot to contain these headers and libraries cross compiled for your
target.
It's straightforward to stick with the corresponding C headers.
If the pointer to be checked is statically known to be zero, the tag
check will always pass since:
1) the tag is zero
2) shadow memory for address 0 is initialized to 0 and never updated.
We can therefore elide the tag check.

We perform the elision in two places:
1) the HWASan pass
2) when lowering the CHECK_MEMACCESS intrinsic. Conceivably, the HWASan
pass may encounter a "cannot currently statically prove to be null"
pointer (and is therefore unable to omit the intrinsic) that later
optimization passes convert into a statically known-null pointer. As a
last line of defense, we perform elision here too.

This also updates the tests from
llvm#122186
This patch improves the linker’s ability to estimate stub reachability
in the `TextOutputSection::estimateStubsInRangeVA` function. It does so
by including thunks that have already been placed ahead of the current
call site address when calculating the threshold for direct stub calls.

Before this fix, the estimation process overlooked existing forward
thunks. This could result in some thunks not being inserted where
needed. In rare situations, particularly with large and specially
arranged codebases, this might lead to branch instructions being out of
range, causing linking errors.

Although this patch successfully addresses the problem, it is not
feasible to create a test for this issue. The specific layout and order
of thunk creation required to reproduce the corner case are too complex,
making test creation impractical.

Example error messages the issue could generate:
```
ld64.lld: error: banana.o:(symbol OUTLINED_FUNCTION_24949_3875): relocation BRANCH26 is out of range: 134547892 is not in [-134217728, 134217727]; references objc_autoreleaseReturnValue
ld64.lld: error: main.o:(symbol _main+0xc): relocation BRANCH26 is out of range: 134544132 is not in [-134217728, 134217727]; references objc_release
```
bernhardu and others added 28 commits January 11, 2025 18:54
)

This adds a test line and updates a comment.
We want special handing for IGLP instructions in the scheduler but they
should still be treated like they have side effects by other passes. Add
a target hook to the ScheduleDAGInstrs DAG builder so that we have more
control over this.
Providing the character that we failed on is helpful for figuring out
what's going wrong in the tzdb.
The body of the loop only applies to wide induction recipes, skip any other
header phi recipes up-frond
This patch fixes:

  llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:255:18: error: private
  field 'DAG' is not used [-Werror,-Wunused-private-field]
…m#122552)

- **[InstSimpify] Add tests for simplifying `(xor (sub C_Mask, X),
C_Mask)`; NFC**
- **[InstSimpify] Simplifying `(xor (sub C_Mask, X), C_Mask)` -> `X`**

Helps address regressions with folding `clz(Pow2)`.

Proof: https://alive2.llvm.org/ce/z/zGwUBp
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
This patch makes the metrics job also detect failures in individual
steps. This is necessary now that we are setting continue-on-error in
the premerge jobs to prevent sending out unnecessary email to detect
what jobs actually fail.
This adds a test that consists of compiling `#include <...>`,
pretty much alone, for each public header file in each different
language mode (`-std=...` compiler switch) with -Werror and many
warnings enabled.

There are several headers that have bugs when used alone, and
many more headers that have bugs in certain language modes.  So
for now, compiling the new tests is gated on the cmake switch
-DLLVM_LIBC_BUILD_HEADER_TESTS=ON.  When all the bugs are fixed,
the switch will be removed so future regressions don't land.
…o MSVC /d2ImportCallOptimization) (llvm#121516)

This change implements import call optimization for AArch64 Windows
(equivalent to the undocumented MSVC `/d2ImportCallOptimization` flag).

Import call optimization adds additional data to the binary which can be
used by the Windows kernel loader to rewrite indirect calls to imported
functions as direct calls. It uses the same [Dynamic Value Relocation
Table mechanism that was leveraged on x64 to implement
`/d2GuardRetpoline`](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618).

The change to the obj file is to add a new `.impcall` section with the
following layout:
```cpp
  // Per section that contains calls to imported functions:
  //  uint32_t SectionSize: Size in bytes for information in this section.
  //  uint32_t Section Number
  //  Per call to imported function in section:
  //    uint32_t Kind: the kind of imported function.
  //    uint32_t BranchOffset: the offset of the branch instruction in its
  //                            parent section.
  //    uint32_t TargetSymbolId: the symbol id of the called function.
```

NOTE: If the import call optimization feature is enabled, then the
`.impcall` section must be emitted, even if there are no calls to
imported functions.

The implementation is split across a few parts of LLVM:
* During AArch64 instruction selection, the `GlobalValue` for each call
to a global is recorded into the Extra Information for that node.
* During lowering to machine instructions, the called global value for
each call is noted in its containing `MachineFunction`.
* During AArch64 asm printing, if the import call optimization feature
is enabled:
- A (new) `.impcall` directive is emitted for each call to an imported
function.
- The `.impcall` section is emitted with its magic header (but is not
filled in).
* During COFF object writing, the `.impcall` section is filled in based
on each `.impcall` directive that were encountered.

The `.impcall` section can only be filled in when we are writing the
COFF object as it requires the actual section numbers, which are only
assigned at that point (i.e., they don't exist during asm printing).

I had tried to avoid using the Extra Information during instruction
selection and instead implement this either purely during asm printing
or in a `MachineFunctionPass` (as suggested in [on the
forums](https://discourse.llvm.org/t/design-gathering-locations-of-instructions-to-emit-into-a-section/83729/3))
but this was not possible due to how loading and calling an imported
function works on AArch64. Specifically, they are emitted as `ADRP` +
`LDR` (to load the symbol) then a `BR` (to do the call), so at the point
when we have machine instructions, we would have to work backwards
through the instructions to discover what is being called. An initial
prototype did work by inspecting instructions; however, it didn't
correctly handle the case where the same function was called twice in a
row, which caused LLVM to elide the `ADRP` + `LDR` and reuse the
previously loaded address. Worse than that, sometimes for the
double-call case LLVM decided to spill the loaded address to the stack
and then reload it before making the second call. So, instead of trying
to implement logic to discover where the value in a register came from,
I instead recorded the symbol being called at the last place where it
was easy to do: instruction selection.
)

Adds support for delayed privatization for `simd` directives. This PR
includes PFT down to LLVM IR lowering.
…atic functions (llvm#119974)

Static member functions can be considered the same way as free functions
are, so do that.
Msan is not supported on Android as mentioned in google/sanitizers#1381.
We proactively give the warning saying it is unsupported to fix
android/ndk#1958.
…lvm#102299)

This checks that classes/structs inheriting from
``std::enable_shared_from_this`` does so with public inheritance, so it
prevents crashes due to ``std::make_shared`` and ``shared_from_this()``
getting called when the internal weak pointer was not initialized (e.g.
due to private inheritance).
…2634)

Certain non-standard float types were directly passed through in the
LLVM type converter, resulting in invalid IR or failed assertions:

```
mlir-opt: mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp:638: FailureOr<Type> mlir::LLVMTypeConverter::convertVectorType(VectorType) const: Assertion `LLVM::isCompatibleVectorType(vectorType) && "expected vector type compatible with the LLVM dialect"' failed.
```

The LLVM type converter should not define invalid type conversion rules
for such types. If there is no type conversion rule, conversion patterns
will not apply to ops with such operand types.
…cl) appears in the trailing return type of the lambda (llvm#122611)

The (function) type of the lambda function is null while parsing
trailing return type. The type is filled-in when the lambda body is
entered. So, resolving `__PRETTY_FUNCTION__` before the lambda body is
entered causes the crash.

Fixes llvm#121274.
Option allows using full LTO when linking bitcode files compiled with
unified LTO pipeline.
There is a narrow special-case in isImpliedCondICmps that can benefit
from being taught about samesign. Since it costs us nothing to implement
it, teach it about samesign, for completeness. This patch marks the
completion of the effort to teach ValueTracking about samesign.
@lialan lialan closed this Jan 13, 2025
@lialan lialan deleted the lialan/rocdl_lib branch January 14, 2025 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment