Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastmath fminnum.f16 could be better on non avx512_fp16 x86 #61271

Open
gbaraldi opened this issue Mar 8, 2023 · 1 comment
Open

fastmath fminnum.f16 could be better on non avx512_fp16 x86 #61271

gbaraldi opened this issue Mar 8, 2023 · 1 comment

Comments

@gbaraldi
Copy link
Contributor

gbaraldi commented Mar 8, 2023

This was found in JuliaLang/julia#48848. Where LLVM started folding a function into fast fminnum.f16 and that led to a regression.

original IR

; ModuleID = 'min_fast'
source_filename = "min_fast"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

;  @ fastmath.jl:244 within `min_fast`
define half @julia_min_fast_681(half %0, half %1) #0 {
top:
; ┌ @ essentials.jl:575 within `ifelse`
   %2 = fpext half %0 to float
   %3 = fpext half %1 to float
   %4 = fcmp fast olt  float %2, %3
   %5 = select i1 %4, half %0, half %1
; └
  ret half %5
}

attributes #0 = { "frame-pointer"="all" "probe-stack"="inline-asm" }

!llvm.module.flags = !{!0, !1}

!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}

new IR

; ModuleID = 'min_fast'
source_filename = "min_fast"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

;  @ fastmath.jl:252 within `min_fast`
define half @julia_min_fast_678(half %0, half %1) #0 {
top:
; ┌ @ essentials.jl:586 within `ifelse`
   %2 = call fast half @llvm.minnum.f16(half %0, half %1)
; └
  ret half %2
}

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare half @llvm.minnum.f16(half, half) #2

attributes #0 = { "frame-pointer"="all" "probe-stack"="inline-asm" }
attributes #2 = { nofree nosync nounwind readnone speculatable willreturn }
!llvm.module.flags = !{!0, !1}

!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}

Old assembly

	.text
	.file	"min_fast"
	.globl	julia_min_fast_681              # -- Begin function julia_min_fast_681
	.p2align	4, 0x90
	.type	julia_min_fast_681,@function
julia_min_fast_681:                     # @julia_min_fast_681
	.cfi_startproc
# %bb.0:                                # %top
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	vpextrw	$0, %xmm1, %eax
	vpextrw	$0, %xmm0, %ecx
	movzwl	%cx, %ecx
	vmovd	%ecx, %xmm0
	vcvtph2ps	%xmm0, %xmm0
	movzwl	%ax, %eax
	vmovd	%eax, %xmm1
	vcvtph2ps	%xmm1, %xmm1
	vucomiss	%xmm1, %xmm0
	cmovbl	%ecx, %eax
	vpinsrw	$0, %eax, %xmm0, %xmm0
	popq	%rbp
	.cfi_def_cfa %rsp, 8
	retq
.Lfunc_end0:
	.size	julia_min_fast_681, .Lfunc_end0-julia_min_fast_681
	.cfi_endproc
                                        # -- End function
	.section	".note.GNU-stack","",@progbits

New assembly

	.text
	.file	"min_fast"
	.globl	julia_min_fast_678              # -- Begin function julia_min_fast_678
	.p2align	4, 0x90
	.type	julia_min_fast_678,@function
julia_min_fast_678:                     # @julia_min_fast_678
	.cfi_startproc
# %bb.0:                                # %top
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	vpextrw	$0, %xmm0, %eax
	vpextrw	$0, %xmm1, %ecx
	movzwl	%cx, %ecx
	vmovd	%ecx, %xmm0
	vcvtph2ps	%xmm0, %xmm0
	movzwl	%ax, %eax
	vmovd	%eax, %xmm1
	vcvtph2ps	%xmm1, %xmm1
	vcmpltss	%xmm0, %xmm1, %xmm2
	vblendvps	%xmm2, %xmm1, %xmm0, %xmm0
	vcvtps2ph	$4, %xmm0, %xmm0
	vmovd	%xmm0, %eax
	vpinsrw	$0, %eax, %xmm0, %xmm0
	popq	%rbp
	.cfi_def_cfa %rsp, 8
	retq
.Lfunc_end0:
	.size	julia_min_fast_678, .Lfunc_end0-julia_min_fast_678
	.cfi_endproc
                                        # -- End function
	.section	".note.GNU-stack","",@progbits

This is with llc-15 -mcpu alderlake

@llvmbot
Copy link
Member

llvmbot commented Mar 8, 2023

@llvm/issue-subscribers-backend-x86

@gbaraldi gbaraldi changed the title fastmath fminnum.f16 could be better on emulated x86 fastmath fminnum.f16 could be better on non avx512_fp16 x86 Mar 8, 2023
phoebewang added a commit that referenced this issue Mar 13, 2023
eymay pushed a commit to eymay/llvm-project that referenced this issue Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants