-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inaccuracies in the znver1 scheduling model: vpmov*
, vtestp*
, vps*v*
, vcmp*
#54889
Comments
@llvm/issue-subscribers-backend-x86 |
@Fabian-R Something you could try is running llvm-exegesis for all supported instruction with just 1 run for each mode : https://llvm.org/docs/CommandGuide/llvm-exegesis.html llvm-exegesis -mode={latency,inverse_throughput,uops} -opcode-index=-1 > benchmarks.yaml You can then create a html dump of all discrepancies with the model: llvm-exegesis -mode=analysis -benchmarks-file=benchmarks.yaml -analysis-clusters-output-file=clusters.csv -analysis-inconsistencies-output-file=inconsistencies.html And then post the yaml / inconsistencies files here |
The first command eventually terminates with a segfault, but it still produces 500k lines of output. I had to remove "Check generated assembly with: ..." lines from this to successfully run the second command. The results are in the attached tar.gz archive: |
znver1/2 models were incorrectly modelling these as 3 cycle latency instructions on the wrong pipe and znver1 ymm variants also require double pumping. Now matches AMD SoG, Agner and instlatx64 numbers. Thanks to @Fabian-R for the report
* patch/main: (1001 commits) [LSR] Update outdated comment [mlir][CSE] Add ability to remove commutative operations [gn build] (manually) port f2526c1a5c6f Revert "[randstruct] Enforce using a designated init for a randomized struct" Revert "[randstruct] Force errors for all platforms" [clang-tidy] Add a Standalone diagnostics mode to clang-tidy Revert "Treat `std::move`, `forward`, and `move_if_noexcept` as builtins." Revert "Extend support for std::move etc to also cover std::as_const and" Revert "Update test to handle opaque pointers flag flip." Apply clang-tidy fixes for readability-identifier-naming in OpFormatGen.cpp (NFC) Apply clang-tidy fixes for llvm-qualified-auto in OpFormatGen.cpp (NFC) [mlir] Add asserts when changing various MLIRContext configurations [Arch64][SelectionDAG] Add target-specific implementation of srem [msan][test] Remove legacy PM style opt -foo tests [flang][runtime] Don't emit empty lines for bad writes Add some helpers to better check Scope's kind. NFC [flang] Accept %KIND type parameter inquiries on %RE,%IM, &c. [flang] Allow POINTER attribute statement on procedure interfaces [test] Test -Werror=foo -Wfoo & -Werror -Wno-error=foo -Wfoo [flang] Upgrade short actual character arguments to errors [asan][test] Remove legacy PM style opt -foo tests [LoongArch] Fix typo that SP should be R3 but not R2 [flang][runtime] Fix ENDFILE for formatted stream output [Clang][OpenMP] Use bitfields for flags in `OMPAtomicDirective` [OpenMP] Fix linting diagnostics in the linker wrapper Blind stab in the dark to fix a bot failure [NFC] fix cmake build Apply clang-tidy fixes for readability-identifier-naming in mlir-parser-fuzzer.cpp (NFC) Apply clang-tidy fixes for readability-identifier-naming in DummyParserFuzzer.cpp (NFC) [IR] Allow constant folding (insertelement <vscale x 2 x i32> zeroinitializer, i32 0, i32 i32 0. [mlir] Refactor LICM into a utility [OpenMP] Use new offloading binary when embedding offloading images [OpenMP] Don't manually strip sections in the linker wrapper llvm-reduce: Clone some of the easy function properties MIR: Serialize a few bool function fields [X86] Move hasOneUse check after opcode check. NFC Revert "[mlir] Refactor LICM into a utility" [DAGCombiner] Move call to hasOneUse after opcode checks. NFC Add DXIL Bitcode Writer and DXIL testing [DAGCombiner] Move or/xor/and opcode check in ReduceLoadOpStoreWidth before hasOneUse check. [Attributor][FIX] Use AttributorConfig in the unit tests too Extend support for std::move etc to also cover std::as_const and std::addressof, plus the libstdc++-specific std::__addressof. [flang] Handle parameter-dependent types in PDT initializers [Attributor][NFCI] Introduce AttributorConfig to bundle all options [randstruct] Force errors for all platforms [mlir] Refactor LICM into a utility Update test to handle opaque pointers flag flip. [llvm-objdump] Implemented PrintBranchImmAsAddress for MIPS [msan] Set poison_in_dtor=1 by default [flang] Finer control over error recovery with GetExpr() Treat `std::move`, `forward`, and `move_if_noexcept` as builtins. [VPlan] Handle equal address and store ops in onlyFirstLaneDemanded. [DebugInfo] Add a TargetFuncName field in DISubprogram for specifying DW_AT_trampoline as a string. Also update the signature of DIBuilder::createFunction to reflect this addition. Revert "[Attributor] CGSCC pass should not recompute results outside the SCC" [JITLink] Add missing moves from 43acef48d38e. [mlir][NFC] Cleanup the TestClone pass [mlir] Remove the use of FilterTypes for template metaprogramming [Attributor][NFC] Introduce a flag to distinguish the scope of a query [Attributor] CGSCC pass should not recompute results outside the SCC [Attributor][NFC] Code cleanup to minimize follow up changes [Attributor][NFC] Rename AAPotentialValues to AAPotentialConstantValues [JITLink] Refactor and expand DWARF pointer encoding support. [test][LoopDeletion] Precommit test [MLIR][Presburger] addSymbolicCut: fix the integral symbols heuristic to match the docs [randstruct] Enforce using a designated init for a randomized struct [gn build] Port 721651be246e [mlir][vector] Fix bug in extractFromBroadcast folding [Support][cmake] Fix snmalloc integration. NFC. [HLSL][clang][Driver] Support target profile command line option. [MLIR][Presburger][Simplex] moveRowUnknownToColumn: support the row sample value being zero [MLIR][ClonePass] Attempt fix for anonymous pass name Fix size of flexible array initializers, and re-enable assertions. [LLDB][NativePDB] Followup c50817d1bea4ac51ed776154014630a439176de6 [mlir] Fix BUILD issues and dependencies. [LLDB][NativePDB] Don't create inlined function parameters when it's malformed. [DWARF][FIX] Handle the use of multiple registers gracefully [AMDGPU][FIX] Proper load-store-vectorizer result with opaque pointers [NFC] Update comments Fix an edge case in determining is a function has a prototype [LLDB][NativePDB] Fix subfield_register_simple_type.s test [mlir] Update LICM to support Graph Regions [PGO] Remove legacy PM passes [MLIR] Fix operation clone [flang] Fix Symbol::Rank for ProcEntityDetails [mlir][ods][NFC] Move enum attribute definitions from OpBase.td to EnumAttr.td [mlir] Support opaque types in LLVM IR -> MLIR translation Properly identify builtins in a diagnostic note Clean up `OMPAtomicDirective::Create` [VP] Rename ISD::VP_FPROUND and ISD::VP_FPEXT [NFC][UpdateTestChecks] Fix whitespace in common.py and asm.py [clang] Implement Change scope of lambda trailing-return-type [NFC][Costmodel][LV][X86] Refresh one or two interleaved load/store tests [clang][deps] NFC: Update documentation [clang][deps] NFC: Inline function with single caller [Clang][Sema] Fix invalid redefinition error in if/switch/for statement Adjust Bazel BUILD files for 6d45558c1 [BOLT] Check if LLVM_REVISION is defined Fix failing test case found by bots: [clang][lex] NFC: Use FileEntryRef in PreprocessorLexer::getFileEntry() [clang] NFCI: Use FileEntryRef in FileManagerTest [clang] NFCI: Use DirectoryEntryRef in collectIncludePCH [clang][CodeGen] NFCI: Use FileEntryRef [clang][parse] NFCI: Use FileEntryRef in Parser::ParseModuleImport() [C89/C2x] Diagnose calls to a function without a prototype but passes arguments [mlir][vector] Reorder elementwise(transpose) [AArch64] Async unwind - Fix MTE codegen emitting frame adjustments in a loop Require asserts in newly added test [UpdateTestChecks] Prevent rapid onset insanity when forced to write LoopVectorize-driven costmodel tests [gn build] Port 1d83750f631d [libc++] Implement ranges::copy{, _n, _if, _backward} [AArch64][SelectionDAG] Refactor to support more scalable vector extending stores [gn build] (manually) port 6d45558c1a05d (MipsGenPostLegalizeGICombiner) [ExpandMemCmp] Properly expand `bcmp` to an equality pattern. [NFC] Add test in preparation for D123849. [WebAssembly] Remove TODO comment for IAS, NFC [UpdateTestChecks] Add NVPTX support in update_llc_test_checks.py Apply clang-tidy fixes for readability-identifier-naming in TestTypes.cpp (NFC) Apply clang-tidy fixes for modernize-use-default-member-init in ControlFlowSinkUtils.cpp (NFC) [Driver] Move Lanai IAS enabling to Generic_GCC::IsIntegratedAssemblerDefault, NFC [lit] Forward more sanitizer env in TestingConfig [NFC] Reformat a part of TestingConfig.py [RISCV][NFC] Refactor VL patterns for vnsrl and vnsra [ELF][ARM] Fix unneeded thunk for branches to hidden undefined weak AMDGPU: Add more mad_64_32 test cases [mlir] Fix verification order of nested ops. AMDGPU: Add mixed sign/zero-extend multiply-add test [flang][runtime] Don't skip input spaces when they are significant [mlir] Update bazel file after adding nvgpu to nvvm conversion [LoongArch] Fix shared build. NFC. [PGO][test] Fix memop_size_opt.ll [PGO][test] Remove duplicate --pgo-instr-memop tests [LoongArch] Add support for selecting constant materializations. [mlir][vector] Add operations used for Vector distribution [PGO][test] Change opt -foo tests to -passes= and remove duplicates [mlir] Add assert to fail with more info (NFC) [RISCV][VP] Add RVV codegen for vp.trunc. Add missing word in llc docs [BOLT][NFC] Use LLVM_REVISION instead of BOLT_VERSION_STRING [mlir][LLVMIR] Add more vector predication intrinsic ops. [gcov][test] Change some legacy PM tests to new PM and remove others clang/AMDGPU: Define macro for -munsafe-fp-atomics Mips/GlobalISel: Add stub post-legalizer combiner [utils] Use git to checkout code instead of svn in building docker image Fix MLIR website generation llvm-reduce: Handle cloning MachineFrameInfo and stack objects [flang] Accept TYPE(intrinsic type) in declarations only for non-extension type [libomptarget] [amdgpu] Hostcall offset check should consider implicit args llvm-reduce: Inform MRI of used phys reg masks llvm-reduce: Copy register allocation hints to clone AMDGPU: Select i8/i16 global and flat atomic load/store [flang] Defer NAMELIST group item name resolution AMDGPU: Fix assert if v_mov_b32_dpp is last instruction in the block llvm-reduce: Fix asserting on undef virtual registers llvm-reduce: Fix handling of generic virtual registers MachineCSE: Report this requires SSA llvm-reduce: Fix some copy-pasted comment errors MachineFunction: Remove unused field Remove folder introduced by incorrect patch level [flang] Allow modification of construct entities [lldb] Show the DBGError if dsymForUUID can't find a dSYM [mlir][nvgpu] Move mma.sync and ldmatrix in nvgpu dialect [randstruct] Add test for "-frandomize-layout-seed-file" flag Revert "[lldb] Pin the shared cache when iterating over its images" [flang] Fix TYPE/CLASS IS (T(...)) in SELECT TYPE Revert "[randstruct] Add test for "-frandomize-layout-seed-file" flag" Apply clang-tidy fixes for modernize-use-default-member-init in PDLLServer.cpp (NFC) Apply clang-tidy fixes for modernize-use-default-member-init in SparseTensorUtils.cpp (NFC) [flang] Local generics must not shadow host-associated generics [flang] Inner INTRINSIC must not shadow host generic [randstruct] Add test for "-frandomize-layout-seed-file" flag [flang][runtime] Preserve effect of positioning in record in non-advancing output [flang] Make F0.1 output editing of zero edge case consistent [Driver] Remove unneeded -f[no-]pascal-strings translation. NFC [mlir][sparse][taco] Use the SparseCompiler from python/tools. [flang] Raise FP exceptions from runtime conversion to binary [NVPTX][tests] Do not run the test CodeGen/Generic/2010-11-04-BigByval.ll [DFSan] Avoid replacing uses of functions in comparisions. Comment out assertions about initializer size added in D123649. [flang] Correct interaction between generics and intrinsics [X86] Fix test case for SoftPromoteHalf of STRICT_FP_EXTEND/STRICT_FP_ROUND. [libc++][NFC] Add missing 'return 0' to test [libc][docs] Add doc for libc string functions [flang] Use full result range for clock_gettime implementation of SYSTEM_CLOCK [VE][compiler-rt] Correct location of VE support in clear_cache function, NFC [PS5] Add basic PS5 driver behavior [lldb] Remove TestShell.test [flang] Fix shape analysis of RESHAPE result Allow flexible array initialization in C++. [gn build] Port b9ca972b1ff0 [BPF] handle opaque-pointer for __builtin_preserve_enum_value [HLSL] Pointers are unsupported in HLSL [ELF][AArch64] Fix unneeded thunk for branches to hidden undefined weak [InstCombine] canonicalize select with signbit test Revert "[NVPTX] Disable parens for identifiers starting with '$'" [flang] Always encode multi-byte output in UTF-8 [clang-tidy] Add portability-std-allocator-const check [NVPTX] Disable parens for identifiers starting with '$' [LLDB][NativePDB] Fix a crash when S_DEFRANGE_SUBFIELD_REGISTER descirbes a simple type [LLDB][NativePDB] Fix inline line info in line table [lldb] Port Process::PrintWarning* to use the new diagnostic events [lldb] Prevent crash when adding a stop hook with --shlib Use descriptive register names for readability (NFC). [BOLT][perf2bolt] Fix base address calculation for shared objects [libc++] Adds a missing include. [flang] Defer all function result type processing Revert "[IROutliner] Ensure that phi values that are passed in as arguments are remapped as arguments" [IROutliner] Ensure that phi values that are passed in as arguments are remapped as arguments [PS4] Fix a couple of typos [IROutliner] Ensure that incoming blocks of PHINodes are included in the unique numbering gneration for phi nodes for each exit path [mlir][nvgpu] Add NVGPU dialect (architectural specific gpu dialect) [clang-format] Skip preprocessor lines when finding the record lbrace [SVE] Refactor MGATHER lowering for unsupported passthru values. [SimplifyCFG] Try to fold switch with single result value and power-of-2 cases to mask+select [flang] Fix combining cases of USE association & generic interfaces [AArch64] Add mayRaiseFPException to appropriate instructions [AArch64] Adjust aarch64 constrained intrinsics tests and un-XFAIL [AArch64] Lowering and legalization of strict FP16 [FPEnv][InstSimplify] Fold fsub -0.0, -X ==> X [gn build] Port 1fdf952deeb9 [HLSL] Add Semantic syntax, and SV_GroupIndex [gn build] Port e471ba3d0122 [Object] Add binary format for bundling offloading metadata [OpenMP] Make offloading sections have the SHF_EXCLUDE flag [flang] Improve appearance of message attachments [libc++] Fix undefined behavior in `std::filebuf` [AArch64] Move v4i8 concat load lowering to a combine. [NVPTX] Fix barrier.ll LIT test [NVPTX] Avoid dots in global names [NVPTX] .attribute(.managed) is only supported for sm_30 and PTX 4.0 [NVPTX] shfl.sync is introduced in PTX 6.0 [NVPTX] 64-bit atom.{and,or,xor,min,max} require sm_32 or higher [gn build] Port 58d9ab70aef3 [libc++][ranges] Implement ranges::minmax and ranges::minmax_element [flang] Fix intrinsic interface for DIMAG/DCONJG [flang] Fix float-number representation bug [mlir] fix compiler warnings [gn build] Port dd47ab750b58 Revert "[clang-tidy] Add portability-std-allocator-const check" [gn build] Port 73da7eed8fac Revert "[gn build] Port 73da7eed8fac" [mlir][vector] Cast away leading one dims for insert ops [mlir][vector] Fold splat constant transpose [PS4] NFC refactor of PS4 toolchain class, prep for PS5 [X86] Adjust fsetcc/fmin/fmax costs to match SoG (Issue #54889) [Clang][AArch64][SVE] Add shift operators for SVE vector types [Clang][AArch64][SVE] Allow subscript operator for SVE types [mlir] Introduce Transform dialect [mlir] Split intrinsics out of LLVMOps.td [mlir] initial support for opaque pointers in the LLVM dialect [SVE] Add support for non-element-type sized scaling when lowering MGATHER/MSCATTER. [VPlan] Turn external defs in Value -> VPValue mapping. [flang] Fix ICE for sqrt(0.0) evaluation [flang] Do not ICE on out-of-range data statement designator [flang] Allow IMPLICIT NONE(EXTERNAL) with GenericDetails Apply clang-tidy fixes for llvm-qualified-auto in VectorTransforms.cpp (NFC) Apply clang-tidy fixes for performance-for-range-copy in SCF.cpp (NFC) [flang][driver] Add support for `-mmlir` [gn build] Port 6ba1b9075dc1 Reland "[AST] Add a new TemplateKind for template decls found via a using decl."" [AMDGPU] Remove redundand RequiredAlignment assignment. NFCI. [flang] Fix DYLIB builds [clang][lex] NFCI: Use FileEntryRef in PPCallbacks::InclusionDirective() [AMDGPU] Add a test for flat scratch SVS addressing Revert "[AArch64] Async unwind - Adjust unwind info in AArch64LoadStoreOptimizer" [RISCV] Remove sext_inreg+riscv_grev/riscv_gorc isel patterns [RISCV][NFC] Refactor patterns for Multiply Add instructions [MLIR] Fix missing return statement warning in PatternMatch.h [AMDGPU][NFC] Organize code around reserving VGPR32 for AGPR copy. [AMDGPU] Try to avoid inserting duplicate s_inst_prefetch [MLIR][GPU] Add canonicalization patterns for folding simple gpu.wait ops. Revert "[sanitizer] Don't run malloc hooks for stacktraces" [Driver] Sort Generic_GCC::IsIntegratedAssemblerDefault, NFC [Driver] Fix -fpascal-strings on Darwin [gn build] Port 73da7eed8fac [clang-tidy] Add portability-std-allocator-const check LowerSwitch: Avoid inserting NewDefault block test: Don't depend on behavior of switch lower in one test. NFC [Driver] Simplify some hasFlag patterns with addOptInFlag/addOptOutFlag. NFC [flang] Respect left tab limit with Tn editing after ADVANCE='NO' [mlir] Introduce ml_program dialect. Revert "[clang] Implement Change scope of lambda trailing-return-type" RISCV] Add clang builtins for CLZ instruction. [Darwin][ASan][Sanitizer] Fixes Sanitizer NonUnique Identifier to Account for Mac arm64 architectures. [NFC] Generically resolve body in FunctionOpInterface verifyBody. [Clang] Move Hexagon / VE IAS enabling to Generic_GCC::IsIntegratedAssemblerDefault, NFC [cmake] Loosen multi-distribution restrictions [MLIR][Presburger] change some `push_back`s to `emplace_back`s [MLIR][Presburger] change some post-increments/decrements to pre-increments/decrements [RISCV][NFC] Use addExpr() instead of createExpr() [HWASan] symbolize: use buildid index for locals. [flang] Error handling for out-of-range CASE values [sanitizer] Disable malloc_hook_skip on Darwin [gn build] (manually) port ab8abeaf48ab [randstruct] Fix -Wunused-but-set-variable with Clang>=D122271 in -DLLVM_ENABLE_ASSERTIONS=off builds [flang] Fold IBITS() intrinsic function [mlir] Fix a typo to load lsp-mode correctly. [Driver] Change CLANG_ENABLE_OPAQUE_POINTERS_INTERNAL to affect driver default instead of cc1 default [sanitizer] Don't run malloc hooks for stacktraces [bazel] Set CLANG_ENABLE_OPAQUE_POINTERS_INTERNAL to 1 [gn build] Set CLANG_ENABLE_OPAQUE_POINTERS_INTERNAL=1 [lldb] Remove reproducer logic from LocateSymbolFileMacOSX [lldb] Format LocateSymbolFileMacOSX (NFC) Apply clang-tidy fixes for llvm-else-after-return in SCF.cpp (NFC) Apply clang-tidy fixes for readability-identifier-naming in OpenMPDialect.cpp (NFC) [flang] expand the num_images test coverage [flang] Emit a portability warning for padding in COMMON [MLIR][GPU] Add GPU ops nvvm.mma.sync, nvvm.mma.ldmatrix, lane_id Adapt "cross compile?" check for Apple Silicon [lldb] Fix a bug in the decorator matching logic. Run update_test_checks.py after parameter renaming in r03b807d3f2999888bbe395945987af06f201c142 (NFC). [debugserver ] Un-conditionalize use of libcompression [AMDGPU] Increate hazard for store dwordx3/4 to 2 waitstates on gfx940 [SimplifyCFG] improve readability in switch-to-select; NFC [SimplifyCFG] add more tests for switch to select transform; NFC [clang] Implement Change scope of lambda trailing-return-type Replace numbered function arguments with descriptive names. Fix compatibility with retroactive C++23 change [NFC] [libc++] `bitset::operator[] const` should return bool RegAlloc: Fix remaining virtual registers after allocation failure [lld-macho][nfc] De-templatize UnwindInfoSection Revert "[sanitizer] Don't run malloc hooks for stacktraces" [lldb] Expand $ when using tcsh [BOLT] Update skipRelocation for aarch64 Replace %0 in function arguments with descriptive names. [clang][dataflow] Weaken abstract comparison to enable loop termination. [mlir][vector] Add unrolling pattern for TransposeOp Revert "[clang] Implement Change scope of lambda trailing-return-type" Restrict lvalue-to-rvalue conversions in CGExprConstant. RegAllocGreedy: Remove redundant check for virtual registers AMDGPU: Relax test check on tablegen debug output [iwyu] Handle regressions in libLLVM header include [DA] Refactor with a better API [BOLT][TEST] Add -no-pie to two tests [sanitizer] Don't run malloc hooks for stacktraces Fix a typo with this test function name [BOLT][TEST] Remove -no-pie from cflags/cxxflags [Clang] Fix html error in cxx_status.html [NFC] Revert "[LICM] Only create load in pre-header when promoting load." [BOLT] Ignore PC-relative relocations from data to data [BOLT] Fix data race in shortenInstructions Fix Werror build issue from 6f20744b7ff875 [clang] Implement Change scope of lambda trailing-return-type [mlir][ods] Remove StrEnumAttr [Sema] Don't check bounds for function pointer Add support for ignored bitfield conditional codegen. [mlir][pdll] Include string in PDLLServer.h [libunwind][AIX] implementation of the unwinder for AIX Revert "[ValueTracking] Make getStringLenth aware of strdup" [AArch64] Add new shuffles tests, and regenerate aarch64-wide-shuffle.ll and neon-wide-splat.ll. NFC [mlir][pdll] Rename extra dir flag [AArch64][SelectionDAG] stick all the power-of-two tests in a separate file; NFC [NFC] Fix build failure with GCC 11 in C++20 mode [mlir][pdll] Add extra-dirs for LSP includes. Check users of instrinsics instead of traversing entire function.NFC [NFC] Add CMake cache file for HLSL [AArch64] Async unwind - Adjust unwind info in AArch64LoadStoreOptimizer [demangler] Rust demangler buffer return [AMDGPU] Initialize a couple more Subtarget fields [libunwind][AIX] implementation of the unwinder for AIX Recommit "[LICM] Only create load in pre-header when promoting load." [gn build] Port a85da649b9ac [libunwind][AIX] implementation of the unwinder for AIX [CUDA][HIP] Fix host used external kernel in archive [SimplifyLibCalls] Don't mark memchr() memory as fully dereferenceable [clang-format] Fix SeparateDefinitionBlocks breaking up function-try-block. [NFC] Simplify /noimplib argument logic [LLD][COFF] Add support for /noimplib [SimplifyCFG] add tests for switch to select; NFC Revert "[SimplifyCFG] add tests for switch to select; NFC" [OpenMP] Lowering to MLIR of ordered threads directive [flang][OpenMP] Add semantic checks of nesting of region about ordered construct [mlir][docs] Fix broken links [libc++] Mark completed paper as complete [gn build] Port 2fb026ee4d1a [libc++] Post-commit adjustments after rebasing D117656 Implement move_sentinel and C++20 move_iterator. [lldb] Fixup af921006d3792f for non-linux platforms [SimplifyCFG] add tests for switch to select; NFC [gn build] Port 2b424f4ea82e [libc++] Implement ranges::filter_view [SystemZ] Implement adjustInliningThreshold(). [clangd] Export preamble AST and serialized size as metrics [lldb] Remove the global platform list [compiler-rt] Don't explictly ad-hoc sign dylibs on APPLE if ld is new enough [mlir][Tensor] Fix wrong comment (NFC) Correctly diagnose prototype redeclaration errors in C [X86] Covert unsigned int 0 to float-point with FILD instruction. [DAG] Enable SimplifyVBinOp folds on add/sub sat intrinsics [AMDGPU][MC][GFX10] Removed unsupported 64bit DPP opcodes [X86] Add tests showing failure to pull common shuffles through add/sub sat intrinsics [SimplifyCFG] make a debug option for case max when converting switch to select [InlineAsm] Add support for address operands ("p"). [flang][nfc] Simplify TargetMachine initialisation [AMDGPU][GFX10] Enabled op_sel for v_add_nc_u16 and v_sub_nc_u16 [BOLT] Fix two aarch64 tests [DAG] Add non-uniform vector support to (shl (srl x, c1), c2) -> (and (shift x, c3)) [flang][driver] Add support for generating LLVM bytecode files [RISCV][NFC] Reorganize check prefixes in some tests to reduce redundant lines [AArch64] Add missing HasNEON predicate in scalar FABD patterns [AArch64] Baseline test for D123491 [AutoUpgrade] Don't lose attributes when upgrading mem intrinsics [AArch64][SVE] Fix lowering of "fcmp ueq/one" when using SVE [RISCV][NFC] Refactor the type promotion of fsl/fsr/becompress/bdecompress/bfp [Test] Add tests showing duplicate PHIs generated by RS4GC (NFC) [LTO] Remove legacy PM support Revert "[ubsan] Simplify ubsan_GetStackTrace" [LLD][COFF] Add support for /noimplib [Clang] Remove support for legacy pass manager [clang][ASTImporter] Fix an import error handling related bug. [clang] NFC, move CompilerInvocation::setLangDefaults to LangOptions.h [ubsan] Simplify ubsan_GetStackTrace Support the min of module flags when linking, use for AArch64 BTI/PAC-RET [clangd] Fix incorrect operator< impl for HighlightingToken [gn build] Port e53c461bf3f0 [libc++][ranges] Implement `lazy_split_view`. [clang][preprocessor] Allow calling DumpToken() on annotation tokens [X86][test] Add encoding/decoding tests for VEX instruction w/ address-size prefix [clang-format] Allow empty .clang-format file [libomptarget][amdgpu] Add hidden_heap_v1 kernarg metadata [lldb] Re-enable TestStepNoDebug.py on AS [lldb] Print diagnostic prefixes (error, warning) in color [NFC][sanitizer] Consolidate malloc hook invocations [mlir][LLVM-IR] Added support for global variable attributes [NFC] [AST] Reduce the size of TemplateParmPosition [NFC][sanitizer] Remove unnececary HOOK macros [InstCombine] [NFC] Add a test for fneg.ll [clang][test] Disable opaque pointers in test [mlir][Arithmetic] Add common constant folder function for type cast ops. [NFC][msan] Rename SymbolizerScope to UnwinderScope and hide [NFC][sanitizer] Clang format some code [NFC][msan] Switch pointer to a reference [lldb] Escape semicolons for all shells [SLP]Improve reductions analysis and emission, part 1. AMDGPU: Update reqd-work-group-size optimization for umin intrinsic Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth" [test][DSE] Precommit test RegAllocGreedy: Fix illegal eviction assert for urgent evictions [AMDGPU] Split unaligned 4 DWORD DS operations [docs][ORC] Fix RST error in dfffb7df24e. Revert "[clang-format] Allow empty .clang-format file" RegAllocGreedy: Roll back successful recolorings on failure [docs] Update OrcV2 doc to include some notes on code removal. [clang-format] Allow empty .clang-format file Fix libcxx build after cd0a5889d71c62ae7cefc [ArgPromo][OpaquePointer] Don't promote mismatched function types [examples][ORC] Add a new example showing the ORCv2 removable code APIs. [MSan] Ensure argument shadow initialized on memcpy Revert "[MSan] Ensure argument shadow initialized on memcpy" [Reland][lit] Use sharding for GoogleTest format [MSan] Ensure argument shadow initialized on memcpy [GlobalsModRef][FIX] Ensure we honor synchronizing effects of intrinsics [NVPTX][FIX] Allow __nvvm_reflect in the presence of opaque pointers [OpenMP][FIX] Ensure to set the context for wait events if necessary AMDGPU: Don't use unreachable on stores to unhandled address space Revert "[clang-format] Allow empty .clang-format file" [clang-format] Allow empty .clang-format file GlobalISel: Implement MoreElements for select of vector conditions AArch64/GlobalISel: Remove pointless s1 legalize rules GlobalISel: Fix lowerSelect handling of boolean high bits GlobalISel: Handle widening addo/subo booleans GlobalISel: Handle widening umulo/smulo condition outputs GlobalISel: Update mutationIsSane assert for scalable vectors Mips/GlobalISel: Add test for atomic load [RISCV] Add a encodeLMUL function to RISCVVType. NFC [PowerPC] Fix EmitPPCBuiltinExpr to emit arguments once lit.cfg.py: remove obsoleted feature clang-driver [Driver][test] Remove unused/obsoleted REQUIRES: clang-driver [trace][intelpt] Remove code smell when printing the raw trace size [trace][intelpt] Add task timer classes [ubsan][test] Unsupport Android for new test diag-stacktrace.cpp [clang][extract-api] Add support for true anonymous enums AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally [AMDGPU] Update ds-alignment.ll test checks. NFC. [mlir][sparse] refactored python setup of sparse compiler [mlir][Linalg] Allow collapsing subset of the reassociations when fusing by collapsing. [SLP][X86] Add ray_sphere intersection methods from c-ray benchmark [lldb] Re-enable fixed on-device tests [Bitcode] materialize Functions early when BlockAddress taken [mlir][OpenMP] Added omp.task [ubsan] Fix print_stacktrace=1:fast_unwind_on_fatal=0 to correctly fallback to fast unwinder [InstCombine] Add more memrchr tests (NFC). [OpenMP][libomp] Replace global variable references with local object [docs] Mention that we are in the process of removing the legacy PM for the optimization pipeline [libc++] Define legacy symbols for inline functions at a finer-grained level [AArch64][LOH] Don't ignore regmasks in bundles by iterating over instrs. [AArch64] Cleanup call-rv-marker.ll test. NFC. [X86] Fix handling of maskmovdqu in x32 differently [MLIR][Presburger] Remove inheritance from PresburgerSpace in IntegerRelation, PresburgerRelation and PWMAFunction [clang][ExtractAPI][NFC] Fix sed delimiter in test [NFC][CodeGen] Use ArrayRef in TargetLowering functions [AMDGPU][Codegen] Unsupported image sample texture map instructions [SimplifyCFG] cleanup code for converting switch to select (NFC) [OpenMP][libomp] Fix some Doxygen issues [AArch64] Async unwind - function epilogues [NFC][libc++][test] Move time tests. [AMDGPU] Use default member initializers in Subtarget classes [gn build] Fix a URL in a comment [InstSimplify] Don't fold phi of poison and trapping const expr (PR49839) [InstSimplify] Add test for PR49839 (NFC) [AMDGPU] Split unaligned 3 DWORD DS operations [AMDGPU] Refactor LDS alignment checks. [X86] getFauxShuffleMask - remove use DemandedElts TODO [pseudo] Remove unused clangTesting dep. NFC [clang-tidy] Never consider assignments as equivalent in `misc-redundant-expression` check [lldb] Adjust libc++ string formatter for changes in D122598 [Clang] Fix unknown type attributes diagnosed twice with [[]] spelling [ValueTracking] Make getStringLenth aware of strdup [lldb][AArch64] Automatically add all extensions to disassembler [AMDGPU][DOC][NFC] Updated GFX10 assembler syntax description [MLIR][Presburger] normalizeDiv: add assert that denom > 0 [AMDGPU][DOC][NFC] Updated GFX1030 assembler syntax description [DAG] Add non-uniform vector support to (shl (sr[la] exact X, C1), C2) folds Update the Bazel build files for "[mlir][Math] Replace some constant ..." [mlir][Math] Replace some constant folder functions with common folder functions. [MLIR][Presburger][Simplex] addSymbolicCut: don't add symbol div if denom is 1 [X86] Fix extact -> exact typo in test names [gn build] Port 95f0f69f1ff8 Revert "[AST] Add a new TemplateKind for template decls found via a using decl." [mlir] Prefix pass manager options with `mlir-` [mlir][scf][bufferize][NFC] Lookup buffer using helper function [InlineCost] Check that function types match [gn build] Port 5a5be4044f0b [AST] Add a new TemplateKind for template decls found via a using decl. [BOLT] Compact legacy profiles [clang][ASTImporter] Add import of attribute 'enable_if'. Apply clang-tidy fixes for performance-unnecessary-value-param in LLVMDialect.cpp (NFC) Apply clang-tidy fixes for performance-unnecessary-value-param in SplitReduction.cpp (NFC) Guard copy of std::function to llvm::function_ref (fix crash) [sanitizer] Fix internal_mmap in internal symbolizer Use std::function instead of function_ref in MLIR JitRunner Revert "Fix CUDA runtime wrapper for GPU mem alloc/free to async" workflow: When updating the issueXX branch, use force push [llvm-pdbutil] Fix broken '-modi' option after change D122226. Apply clang-tidy fixes for readability-identifier-naming in LinalgOps.cpp (NFC) Apply clang-tidy fixes for performance-for-range-copy in LinalgOps.cpp (NFC) [CodeGen][test] Fix disable-tail-calls.c if CLANG_ENABLE_OPAQUE_POINTERS_INTERNAL is off [Driver] -fno-optimize-sibling-calls: use the same spelling for its -cc1 counterpart [AMDGPU] Graceful abort for waterfalls in SIOptimizeVGPRLiveRange Fix BUILD dependency for ExecutionEngineUtils [AMDGPU] Pre-commit test for D123569. NFC. Apply clang-tidy fixes for llvm-qualified-auto in LinalgOps.cpp (NFC) Apply clang-tidy fixes for performance-move-const-arg in ArithmeticOps.cpp (NFC) [MLIR] NFC. Address clang-tidy warning in AffineOps.cpp [sanitizer] Fix typo in test Fix CUDA runtime wrapper for GPU mem alloc/free to async [Clang] CWG 1394: Incomplete types as parameters of deleted functions [NFC][Clang] Use previously declared variable instead of calling function redundantly [CSKY] Remove redundant enabling of IAS for Clang, NFC [MLIR][OpenMP] Add support for threadprivate directive [mlir][NFC] Remove some redundant code. [sanitizer] Update undefined symbols of symbolizer [mlir] Add msan memory unpoisoning macros to mlir ExecutionEngine [InstCombine] fold more constant remainder to select-of-constants remainder [InstCombine] Fold icmp(X) ? f(X) : C Fixing BUILD dependency on the DialectBase. [InstCombine][NFC] Add baseline tests for folds icmp(X) ? f(X) : C [SelectionDAG] Remove unecessary null check after call to getNode. NFC [sanitizer] Make test pass with InternalSymbolizer [sanitizer] Fix arg types of internal functions GlobalISel: Verify atomic load/store ordering restriction AArch64/GlobalISel: Regenerate mir test checks Reland [mlir] Remove uses of LLVM's legacy pass manager [gn build] Port 203a1e36ed75 Revert "[mlir] Remove uses of LLVM's legacy pass manager" GlobalISel: Add memSizeNotByteSizePow2 legality helper GlobalISel: Implement computeKnownBits for overflow bool results AMDGPU/GlobalISel: Add some additional IR tests for zextload AMDGPU/GlobalISel: Add more tests for inreg extend + load combine Mips/GlobalISel: Remove test IR sections and regenerate checks AArch64/GlobalISel: Remove IR section from a test AMDGPU/GlobalISel: Remove unused parameter Reapply "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass" [mlir][Linalg] Split `populateElementwiseOpsFusionPatterns`. [mlir] Remove uses of LLVM's legacy pass manager Apply clang-tidy fixes for llvm-qualified-auto in AffineOps.cpp (NFC) Apply clang-tidy fixes for llvm-qualified-auto in ConvertShapeConstraints.cpp (NFC) AMDGPU: Align the implicit kernel argument segment to 8 bytes for v5 [mlir-vscode] Don't emit errors if the user didn't set the server path [mlir-vscode] Refactor server creation to be lazy [mlir-vscode] Fix processing of files not within the workspace don't extra notify ModulesDidLoad() from LoadModuleAtAddress() [mlir:docs] Add proper documentation for defining dialects [mlir] Split dialect definition constructs out of OpBase into DialectBase Allow building heatmaps from basic sampled events with `-nl`. [VFS] RedirectingFileSystem only replace path if not already mapped [runtimes][CI] Add a 20 minutes individual test time out [CMake][gn][Bazel] Remove HAVE_PTHREAD_GETSPECIFIC [RISCV][SelectionDAG] Add a hook to sign extend i32 ConstantInt operands of phis on RV64. [libc] Fix nested namespace issues with multiply_add.h. [OpenMP] Do not use the default pipeline without optimizations [Support] Remove unused/uncompilable !HAVE_PTHREAD_GETSPECIFIC code path [HWASan] allow symbolizer script to index binaries by build id. [test][clang] Use -clear-ast-before-backend instead of -flegacy-pass-manager in CommandLineTest [test] Remove various legacy pass manager tests [docs] Remove outdated -fexperimental-new-pass-manager for profile data remapping support [test] Remove references to -fno-legacy-pass-manager in tests Value::isTransitiveUsedByMetadataOnly: Don't repeatedly add an element to the worklist. NFC [test] Remove references to -fexperimental-new-pass-manager in tests [clang-tidy] Support parenthesized literals in modernize-macro-to-enum [lldb] Don't report progress in the REPL AArch64 adding more tests to show the simple scenarios for or/and combine [InstCombine] guard against splat-mul corner case [MLIR][Presburger][Simplex] symbolic lexmin: add some normalization heuristics [lld-macho][nfc] Use includeInSymtab for all symtab-skipping logic [MLIR][Presburger] subtract: fix bug in the non-recursive implementation [Driver] Simplify hasFlag pattern with addOptInFlag/addOptOutFlag helpers AMDGPU/SDAG: Custom SETCC (i.e. ballot) is always uniform [mlir][ods] ODS-level Attribute Optimizations [LoopUnroll] Always respect user unroll pragma [clang][extract-api] Emit "functionSignature" in SGF for ObjC methods. [libcxx] locale_bionic.h: skip ndk-version.h on Android platform [TableGen][NFC] Reflow Record accessor comments [TableGen][NFC] Fix copy/paste error in comment [llvm-lib] Add /WX, warn by default on empty inputs, add opt-out [RISCV] Remove riscv-v-fixed-length-vector-elen-max command line option. [RISCV] Remove ExtZvl enum from RISCVSubtarget. NFC [lldb] Silence warnings about unused static variables in RegisterInfos_arm64.h [TargetLowering][RISCV] Allow truncation when checking if the arguments of a setcc are splats. [libcxx] [test] Fix back-to-back use of get_temp_file_name() on Windows [libc++][NFC] Use noexcept instead of _NOEXCEPT for code compiled into the library [libc] Add a definition of pthread_attr_t and its getters and setters. [Dexter] Collate penalties of the same type into a single line for each [lld][macho]Fix test to sort symbol table before dumping [InstCombine] try to fold low-mask of ashr to lshr [InstCombine] add tests for low-mask of ashr; NFC Revert "[LICM] Only create load in pre-header when promoting load." [clangd] Performance improvements and cleanup [gn build] Port c292b6066cca [AMDGPU] Regenerate insert_vector_dynelt.ll [SimplifyLibCalls] Remove unnecessary inbounds check [InstCombine] Add strlen of gep test without inbounds (NFC) [libc++] Implement P1007R3: std::assume_aligned [LICM] Only create load in pre-header when promoting load. [libc++] Make .version.pass.cpp tests be compile-only tests [MLIR][Presburger] Make PWMAFunction inheritence from space private [mlir][tensor] Add pattern to fold ExtractSliceOp, PadOp chains. [dllexport] odr-use constexpr default args for constructor closures [compiler-rt][SystemZ] Skip fuzzer/coverage.test [Clang] Avoid legacy PM in some tests (NFC) [libc++] Remove the usage of __init in operator+ [llvm][AArch64] Generate getExtensionFeatures from the list of extensions [gn build] Port b4ad28da196d [Clang] Override method ModuleImportRead in MultiplexASTDeserializationListener [CodeGen] Async unwind - add a pass to fix CFI information Remove deprecated `parseSourceFile/String()` overloads. [mlir][emitc][nfc] Replace !emitc.opaque pointers [SDAG] try to reduce compare of funnel shift equal 0 [LICM] Add additional test for load hoisting, simplify existing one. Revert "AArch64: take compact unwind frame size from last CFI instruction." AArch64: add nvcast patterns for v1f64 AArch64: take compact unwind frame size from last CFI instruction. Tail calls: look through AssertZExt to find register copy. [Clang] Add -no-opaque-pointers to native powerpc test (NFC) [InstCombine] Fold sub(add(x,y),min/max(x,y)) -> max/min(x,y) (PR38280) [C++20][Modules] Add testcases from section 10.2 dependent on header units. [mlir][vector] Swap ExtractSliceOp(TransferWriteOp). [OpenCL] Add device enqueue guards for DSE builtins [X86] Account for high uop/resource usage in BSF/BSR instructions [CGCall] Check store type in findDominatingStoreToReturnValue() [mlir][vector] Update transfer read/write doc (NFC). [flang] D123388 fix - remove unused variable from test [AST] Remove a duplicated getDecl method in TemplateName, NFC. [flang][runtime] Prefer process time over thread time in CPU_TIME Revert rG88ff6f70c45f2767576c64dde28cbfe7a90916ca "[X86] Extend vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)) to include inner or(pshufb(x), pshufb(y)) chains" [ThinLTOCodeGenerator] Remove support for legacy PM [Flang][OpenMP] Add implementation of privatisation [Clang] Enable opaque pointers by default [Clang] Add -no-opaque-pointers to recently added test (NFC) [C++20][Modules] Remove an empty statement [NFC]. [X86] Add shuffle combine tests where we fail to fold a mask into a or(pshufb,pshufb) chain [llvm-lto] Remove support for legacy pass manager [flang] Lower optionals in GET_COMMAND_ARGUMENT and GET_ENVIRONMENT_VARIABLE [flang] add a static assert in CheckUnitNumberInRangeImpl [AArch64][NFC] Update comment in AArch64.td [AArch64] Split fuse-literals feature [AVR] Merge AVRRelaxMemOperations into AVRExpandPseudoInsts [RISCV] Add basic code modeling for llvm.experimental.stepvector intrinsic [CUDA][HIP] Externalize kernels in anonymous name space [llvm-objcopy] Update comments with capitalization change from 6b575395d47b8 Fix a misuse of `cast` [LAA] Add test with simpler load of pointer select. [LICM] Add test for PR51248. [LICM] Trim unneeded functions from test, add promote-able load. [X86] Remove dead code from test case [libc++] Rename the template arguments of the algorithm result types [X86] Extend vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)) to include inner or(pshufb(x), pshufb(y)) chains [ORC] add lazy jit support for riscv64 [X86] combineExtractSubvector - fold extract_subvector(insert_subvector(V,X,C1),C1) [Driver] Prepend - to option name in err_drv_unsupported_option_argument diagnostic [VPlan] Place VPExpandSCEVRecipe in pre-header. [Driver] Simplify OPT_fcolor_diagnostics claim [Driver] Simplify -f[no-]diagnostics-color handling. NFC [Frontend] Simplify -finline* handling. NFC [Driver] Fix -f[no-]inline to override -f[no-]inline-functions/-finline-hint-functions [X86][AMX] Fix infinite loop of getShape. [RISCV] Remove unnecessary cast to i8* when converting gather/scatter to strided load/store. [ObjCopy][NFC] Refactor handling of linkedit_data_command in MachOWriter [ObjCopy][NFC] Add missing const in MachOLayoutBuilder.h [ObjCopy][NFC] Refactor handling of linkedit_data_command Giving a lot more functions prototypes; NFC [randstruct] NFC change to use static [gn build] Port 7aa8c38a9e19 [randstruct] Add randomize structure layout support [IRBuilder] Remove commented out include. [X86] Remove cfi noise from splat-for-size.ll tests Add some prototypes to fix -Wstrict-prototypes. NFC [flang] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off build [RISCV] Only try LUI+SH*ADD+ADDI for int materialization if LUI+ADDI+SH*ADD failed. [X86] Add original test coverage for Issue #54819 [X86] Fold concat(pshufb(x,y),pshufb(z,w)) -> pshufb(concat(x,z),concat(y,w)) [clang-format] Add execute permission to dump_format_help.py Add some prototypes to these functions; NFC [gn build] Port a96443eddedc [libc++] Implement P0401R6 (allocate_at_least) [X86] lowerV64I8Shuffle - attempt to fold to SHUFFLE(ALIGNR(X,Y)) and OR(PSHUFB(X),PSHUFB(Y)) Add some prototypes to these checks; NFC [VPlan] Model pre-header explicitly. [X86][SSE] combineSelect - more aggressively create zero elements in the or(pshufb(x), pshufb(y)) fold [CUDA/HIP] Remove argument from module ctor/dtor signatures [X86] Add v64i8 shuffle test coverage [X86] Reduce some superfluous diffs between znver1/znver2 models. NFC [LoopVectorize] Regenerate first-order-recurrence.ll [AArch64] validateTargetOperandClass - early out from MCK_MPR case. NFCI. [PowerPC] Generate tests for 16-byte atomic load/store. NFC. [sanitizer] Disable new test on Android to fix a bot [gn build] Port 889302292bf6 [libc++][format][4/6] Improve formatted_size. [libc++][format][3/6] Adds a __container_buffer. Reland "[Driver] Default CLANG_DEFAULT_PIE_ON_LINUX to ON"" [scudo][test] Link with -no-pie to be agnostic of CLANG_DEFAULT_PIE_ON_LINUX [flang] Support export/import OpenMP Threadprivate Flag [clang][OpenMP5.1] Initial parsing/sema for has_device_addr [BOLT] Check MCContext errors [lld-macho] Use fewer indirections in UnwindInfo implementation Revert D121556 "[randstruct] Add randomize structure layout support" [gn build] Port 46b2a463bdef [randstruct] Use llvm::shuffle to avoid STL impl difference after D121556 [gn build] Port 2a2149c754f9 [AMDGPU] Fix regression with vectorization limiting Adapt the ObjC stepping algorithm to deal with "selector-stubs" in clang. [randstruct] Remove RandstructTest.cpp from list [randstruct] temporarily remove test that's failing [PowerPC] Adjust `MaxAtomicSizeInBitsSupported` on PPC64 Add some function prototypes; NFC unbreak Modules/cxx20-export-import.cpp with LLVM_APPEND_VC_REV after fa34951fbc9bde75 Fix bazel rule for __support_fputil_fma when using header modules. [PowerPC] Support 16-byte lock free atomics on pwr8 and up Transforms: Fix code duplication between LowerAtomic and AtomicExpand [randstruct] disable test for Windows for now. No reason for these not to have prototypes; NFC Skip test on earlier clang versions [MSAN] add __b64_pton and __b64_ntop intercepts [RGT] Use GTEST_SKIP() in more places where we skip a test [clang-tidy] Deal with keyword tokens in preprocessor conditions [lldb] XFAIL tests that aren't passing remotely [lldb] Skip more tests that don't make sense to run remotely [randstruct] add expected output for WIN64 Reland "[MTE] Add -fsanitize=memtag* and friends." [libc][NFC] implement printf parser [libc++] Add missing 'return 0;' to main() in test [libcxx][NFC] Format sort.h [libc++] Rename PS() macro to avoid clashing with Xtensa register name [gn build] Port 3f0587d0c668 [gn build] Port 2aa575fd7f4b [C89/C2x] Improve diagnostics around strict prototypes in C Revert "[MTE] Add -fsanitize=memtag* and friends." [libc++] Avoid using anonymous struct with base classes (fixes GCC 12) [AMDGPU] Enable PreRARematerialize scheduling pass with multiple high RP regions AMDGPU: Add codegen test for ctpop(ballot(x)) [randstruct] Add randomize structure layout support Revert D120327 "compiler-rt: Add udivmodei5 to builtins and add bitint library" [RGT] Use GTEST_SKIP instead of just returning [flang] Do not fold fir.box_addr when it has a slice [MTE] Add -fsanitize=memtag* and friends. [mlir][sparse] Moving <P,I,V>-invariant parts of SparseTensorStorage to base [LV] Set debug loc after setting insert point. [LV] Add test case for wrong debug location with replicate recipe. lld/AMDGPU: Fix asserts if no object files are involved in link [libc] Add support for x86-64 targets that do not have FMA instructions. [libc++][test] Use the Japanese locale. Use writable temporary file for test compiler output instead of hardcoded name. NFCI. [lldb] Skip a bunch of tests that shouldn't run remotely [lldb] Fix TestQuoting when run remotely [lldb] Import Foundation in TestConflictingDefinition.py Use portable formatting specified in test. NFCI. [Clang] [Docs] Add HLSLSupport page [clang-offload-bundler] fix "no output file" issue with -outputs [CaptureTracking] Ignore ephemeral values in EarliestEscapeInfo [MC][ELF] Improve st_size propagation rule [MC][test] Improve offset.s Add one more definition for symbols in prctl unit test. [clang][extract-api] Emit "navigator" property of "name" in SymbolGraph [flang] Fix semantic analysis for "forall" targeted by "label" [RISCV] Select unmasked FP setcc insts via ISel post-process [AMDGPU] Fix inline asm causing assert during PreRARematerialize stage in scheduler pass [memprof] Deduplicate and outline frame storage in the memprof profile. NFC: Avoid unused variable warning in UnwindLevel1.c [RISCV] Always select (and (srl X, C), Mask) as (srli (slli X, C2), C3). [InstCombine] Add sub(add(x,y),minmax(x,y)) -> maxmin(x,y) tests Add definitions for symbols in unit test for prctl. [Loads] Check type size in bits during store to load forwarding [VPlan] Preserve debug location when creating branch. [LV] Add test for missing debug info on branch in vector loop. [LSR] Optimize unused IVs to final values in the exit block [libc++] Adds back_insert_iterator::__get_container. [NFC][libc++][format] Prepare unit tests. [Support][unittests] Silence warning when building with Clang 13 on Windows. [OpenMP] Fix linker error when building info tool [ConstantFold] Add test for load of i8 from i1 (NFC) [flang][OpenMP] Added allocate clause translation for OpenMP block constructs Clarify language option default value behavior; NFC [OpenMP] Remove help and documentation for old flag [AMDGPU][SIMachineFunctionInfo] Code cleanup (NFC). [X86][FastISel] Fix with.overflow + select eflags clobber (PR54369) [llvm-pdbutil] Move global state (Filters) inside LinePrinter class. Fix another g++ incompatibility. Same issue as 932f27dc1f03. [flang] Handle dynamically optional argument in EXIT [Sanitizer] Add -no-opaque-pointers to IR test (NFC) [Profile] Add -no-opaque-pointers to IR tests (NFC) [CGCall] Make findDominatingStoreToReturnValue() more robust [clang-tidy] Make performance-inefficient-vector-operation work on members [mlir][Linalg] Add pooling_nchw_sum op. [flang][NFC] rename isAbsent to isStaticallyAbsent in IntrinsicCall.cpp [VP] Explicitly map from VP intrinsic to ISD opcode [gn build] Port 08920cc04343 [AArch64] Remove always true Perfect cost check. NFC Fix Sphinx build [OpenCL] Add generic addrspace guards for get_fence [gn build] (manually) port bf2dc4b37623 [AMDGPU] Use GCNPat in the buffer atomic pattern multiclasses Disambiguate conversion cast for GCC [AMDGPU] Increase detection range for s_mov, v_cmpx transformation. [libc++] Add __is_callable type trait and begin granularizing type_traits [libc++] Add tests for std::string default constructor and destructor compiler-rt/lib/builtins/udivmodei5.c: Fix missing macro argument [InstCombine] Add various other modulo-by-constant tests for Issue #22303 [mlir][tensor] Fix verifier and bufferization of collapse_shape [mlir][bufferize] Do not insert useless casts for newly allocated buffers [mlir][arith][bufferize] Fix tensors with different layouts after bufferization [X86] Fix SLM scheduler model for PMULLD (PR37059) [spirv] Make header self-contained. NFC. [X86] Add additional test for PR54369 (NFC) [gold] Remove support for legacy pass manager Revert "Reland "[RISCV][NFC] Moving RVV intrinsic type related util to llvm/Support"" [analyzer] Don't track function calls as control dependencies [MemoryBuiltins] Remove unnecessary lambda capture (NFC) [SafeStack] Move test to X86 directory [LICM] Pass MemorySSAUpdater by referene (NFC) [C++20][Modules] Adjust handling of exports of namespaces and using-decls. [mlir][Vector] Fold extractelement splat. [LoopSink] Require MemorySSA [SafeStack] Don't create SCEV min between pointer and integer (PR54784) [mlir][Arithmetic] Add constant folder for negf. [Clang][Fortify] drop inline decls when redeclared [builtin_object_size] Basic support for posix_memalign [clang][deps] Ensure deterministic filename case Reland "[RISCV][NFC] Moving RVV intrinsic type related util to llvm/Support" Bump minimum toolchain version Introduce branchless sorting functions for sort3, sort4 and sort5. compiler-rt: Add udivmodei5 to builtins and add bitint library [mlir][NFC] Drop a few unnecessary includes from Pass.h [CSKY] Correct the alignment of FPR register [mlir] Add support for operation-produced successor arguments in BranchOpInterface [asan] Always skip first object from dl_iterate_phdr [llvm-profgen] Filter out invalid LBR ranges. [CSKY] support select instruction in floating type [demangler] Support C23 _BitInt type NFC: Silence unused function 'scaleAndAdd' in release build. [RISCV][NFC] Add missing lit.local.cfg in test/CodeGen/MIR/RISCV/ [gn build] Port 690085c9b715 [libomptarget] Implement pointer lookup as 5.1 spec. [RISCV] Fixing stack offset for RVV object with vararg in stack. [RISCV] Pre-commit for fixing stack offset for RVV object [RISCV] Store/restore RISCVMachineFunctionInfo into MIR YAML file [NFC] Remove unused variable in CodeGenModules Add support for atomic memory copy lowering [mlir][LLVMIR] Add more vector predication intrinsic ops. [InferAddressSpaces] Fix assert on invalid bitcast placement [RISCV][NFC] Use defvar to simplify pattern definations. [InstCombine] fold more constant divisor to select-of-constants divisor [mlir] Width parameterization of BitEnum attributes NFC: Eliminate warning for unused type alias FnTraitsT in release builds. [ORC] Fix handling of casts in llvm.global_ctors. DebugInfo: Consider the type of NTTP when simplifying template names [MSAN] extend prctl interceptor to support PR_SCHED_CORE [trace][intel pt] Create a common accessor for live and postmortem data [trace][intel pt] Create a class for the libipt decoder wrapper [test][DSE] Precommit more assume tests Fix format specifier. NFCI. [llvm-symbolizer] Fix line offset for inline site. [lld-macho][nfc] Give non-text ConcatOutputSections order-independent finalization [AMDGPU] Fix handling of gfx10 LDS misaligned access bug [compiler-rt][builtins] Move DMB definition to syn-ops.h Revert "[PowerPC] Fix EmitPPCBuiltinExpr to emit arguments once" [ELF] Fix non-relocatable-non-emit-relocs --gc-sections to discard .L symbols [AMDGPU] Split unaligned LDS access instead of scalarizing [ELF][test] Improve discard-locals.s [LV] Add test case for PR54427. [PowerPC] Fix EmitPPCBuiltinExpr to emit arguments once [lldb] Use getMainExecutable in SBDebugger::PrintStackTraceOnError Revert "[libc++][format] Use a helper constant." [lldb][gui] remove the "expand" diamond for variables where expanding fails [lldb][gui] handle Ctrl+C to stop a running process [ARM] Add missing return to ARMTTIImpl::isLoweredToCall. [lld/mac] Add some comments and asserts [Driver][NFC] Simplify handling of flags in Options.td Reland [GreedPatternRewriter] Preprocess constants while building worklist when not processing top down [lld-macho][nfc] Remove indirection when looking up common section members [AArch64] Insert subvector costs [OpenMP] Add dynamic memory function to omp.h and add documentation [OpenMP] Change target memory tests to use allocators [mlir][ods] Fix builder gen for VariadicRegion with inferred types [lldb] Add Python bindings to print stack traces on crashes. [clang] Use -triple, not -target for %clang_cc1 [clang] Fix macos build broken after D120989 [clang-tidy] Fix invalid fix-it for cppcoreguidelines-prefer-member-initializer [clang][extract-api][NFC] Use dedicated API to check for macro equality [tosa][mlir] Add dynamic width/height support for depthwise convolution in tosa-to-linalg InstCombineCalls: fix annotateAnyAllocCallSite to report changes [X86] Add PR35202 test case for commuted cmp merging [clang][NFC] Extract EmitAssemblyHelper::shouldEmitRegularLTOSummary [libc++][format] Use a helper constant. [clang][ExtractAPI] Fix declaration fragments for ObjC methods [CaptureTracking] Ignore ephemeral values when determining pointer escapeness [X86] Add PR19752 test case [AArch64] Update tests with the `update_llc_test_checks.py` script (NFC) [mlir][vector] Fold extract(broadcast) of same rank [clang][extract-api] Process only APIs declared in inputs MemoryBuiltins: only claim an allocator family on builtin functions BuildLibCalls: also set allocsize() attributes InstCombineCalls: when adding an align attribute, never reduce it MemoryBuiltins: also check function definition for allocalign InstCombineCalls: infer return alignment from allocalign attributes [crt][test] Fix dso_handle.cpp for Linux systems which default to PIE [x86] Replace getNodeIfExists to doesNodeExist when only check node exist [RISCV] Add more .vx patterns for VLMax integer setccs. [clang][ExtractAPI] Fix appendSpace in DeclarationFragments [RISCV] Add swapped patterns to VPatIntegerSetCCVL_VIPlus1. [SVE] Add more gather/scatter tests to highlight bugs in their generated code. [libc] Add a linux Thread class in __support/threads. [libcxx] Add flag to disable __builtin_assume in _LIBCPP_ASSERT AMDGPU: Set implicit kernarg size to be of 256 bytes for code object version 5 [X86] Enable fast variable per-lane shuffle tuning on all Ryzen targets (PR44795) Remove a few effectively-unused FileEntry APIs. NFC [mlir] specify dialect names in doc generation [Sink] Don't sink non-willreturn calls (PR51188) [Sink] Add willreturn test [InstCombine] SimplifyDemandedUseBits - allow and(srem(X,Pow2),C) -> and(X,C) to work on vector types [MLIR][Presburger] refactor subtraction to be non-recursive [libc++] Add back-deployment testing on arm64 macs Add missing template keywords Revert "Reland "[Driver] Default CLANG_DEFAULT_PIE_ON_LINUX to ON""" [AMDGPU][MC][GFX10] Added syntactic sugar for s_waitcnt_depctr operand [InstCombine] Regenerate and(srem(X,Pow2),C) test and add vector coverage remove dead code in parseRegisterList checking for ARM::RA_AUTH_CODE [InstCombine] SimplifyDemandedUseBits - add TODO to remove shl node if we only demand known sign bits of the shift source [InstCombine] SimplifyDemandedUseBits - remove lshr node if we only demand known sign bit [gn build] Port 1306b1025c50 [libc++][ranges] Implement ranges::count{, _if} [lld-macho][nfc] Factor out findSymbolAtOffset Fix MSVC "not all control paths return a value" warning [X86] Ensure ZN3Tuning inherits from ZN2Tuning instead of ZNTuning [X86] Add test case for PR44795 [clang] Verify internal entity module mangling Fix warnings when `-Wdeprecated-enum-enum-conversion` is enabled [gn build] (manually) port 3031fa88f01e [gn build] (manually) port 5390606aa963 [lld/mac] Don't emit stabs entries for functions folded during ICF [libc++] Remove redundant __invoke_constexpr functions [Clang] Remove redundant -no-opaque-pointers flag in test (NFC) [clang][DebugInfo] Support debug info for alias variable [LoongArch] Split asmstr to opcstr and opnstr in LAInst class definition. NFC [bugpoint] ReduceCrashingFunctions::TestFuncs - fix dereference of null point static analyzer warning Fix grammar and punctuation across several docs; NFC [AMDGPU] Regenerate xor3-i1-const.ll test(NFC) [RISCV] Fix crash for section alignment with .option norvc [DebugInfo] Use DW_ATE_signed encoding when creating a Fortran array index type. [clangd] NFC: Fix doc typos [Clang] Add -no-opaque-pointers to more tests (NFC) [CSKY] Support bitcast operation from/to double to/from two GPRs [X86] Add Issue #50412 fcmp-logic test case Fix "result of 32-bit shift implicitly converted to 64 bits" MSVC warning. NFC. [bazel] Port 3031fa88f0 [OpaquePtrs][Clang] Add -no-opaque-pointers to tests (NFC) [libc++][ranges] Add implicit conversion to bool test for ranges::find{, if, if_not} [lldb] Fix building standalone LLDB on Windows. [MLIR] Standalone: Fix copy-and-paste typo (NFC) [lldb] [CMake] Disable GCC's -Wstringop-truncation warning. NFC. [clang][ASTImporter] Not using consumeError at failed import of in-class initializer. [clang-tidy] Silence unused variable warning in release builds. NFCI. [LoongArch] Improve td files indentation a little bit. NFC [RISCV] Select unmasked integer setcc insts via ISel post-process [bazel] Port 5390606aa963 Transforms: Remove unused include [VPlan] Use vector.body as header name in VPlan native path. [RISCV][VP] Add basic RVV codegen for vp.fcmp [lld] Remove support for legacy pass manager [OpaquePtr][Clang] Add CLANG_ENABLE_OPAQUE_POINTERS cmake option [mlir][CSE] Remove duplicated operations with MemRead side-effect [x86] Improve select lowering for smin(x, 0) & smax(x, 0) [LoopSink] Use MemorySSA with legacy pass manager [clang-tidy] bugprone-signal-handler: Message improvement and code refactoring. ... Signed-off-by: Edwiin Kusuma Jaya <[email protected]>
znver1/2 models were incorrectly modelling the fpupipe (should be pipe2 for shift-by-scalar-amount and pipe1 for shift-by-element-amount) and znver1 ymm variants also require double pumping. Now matches AMD SoG, Agner and instlatx64 numbers. Thanks to @Fabian-R for the report
znver1 ymm variants of VPMOVSX**/VPMOVZX** instructions require double pumping. Now matches AMD SoG, Agner and instlatx64 numbers. Thanks to @Fabian-R for the report
znver1/2 models were missing the vtestps/pd overrides to match the vptest integer equivalents. Noticed while investigating Issue #54889
znver1/2 models were incorrectly modelling the latency/throughput/uops and znver1 ymm variants also require double pumping. Now matches what I can decipher from the AMD SoG, Agner and instlatx64 numbers vs the llvm-exegesis report provided by @Fabian-R
Resolving as I think we've covered all the reported mismatches now |
We encountered several more inaccuracies in the znver1 scheduling model:
vpmov(s|z)x(b|w|q)
instructions that write to ymm registers are predicted faster by llvm-mca than they run, e.g. (numbers are inverse throughput):It seems like they use the information for the xmm version, which is faster according to uops.info.
AMD's table doesn't include these versions of the instructions.
vtestp(s|d)
instructions with ymm operands are predicted faster by llvm-mca than they run, e.g.:For the xmm version, llvm-mca predicts the same whereas llvm-exegesis measures an inverse throughput of 1.0.
The AMD table claims a throughput of 2, i.e. an inverse throughput of 0.5, which agrees with neither of those.
uops.info agrees with the llvm-exegesis measurements.
vps(llvd|llvq|ravd|ravq|rlvd|rlvq)
with 3 register operands or 2 register operands and a memory operand are predicted too fast by llvm-mca, e.g.:and
The AMD table does not mention those instructions, the uops.info measurements agree with llvm-exegesis on the throughput, but not the port usage.
(V)CMPcc(SS|PS|PD|SD)
have wrong (inverse) throughput / resource usage and latency:For throughput e.g.:
and for latency:
AMD's table reports them, consistently with llvm-exegesis, as having a latency of 1 and a throughput of 2, since they use only one FPU0/1 uop (that is for the xmm version, for the ymm version the throughput is 1 with two such uops).
Sorry for the long issue; sadly, there seems to be a lot to find.
Please do tell me if I should separate these into multiple issues!
The text was updated successfully, but these errors were encountered: