Fix bfloat16 detection on aarch64 #55417

giordano · 2024-08-08T13:39:30Z

As discussed offline with @gbaraldi, I adapted the test in the have_fp16 function. This kinda fixes #54389 for me in the sense that now I see have_bf16 returns true on a64fx and Nvidia Grace (neoverse-v2), but BFloat16 code generation doesn't look right to me, on both systems I get:

julia> using BFloat16s

julia> code_llvm(+, NTuple{2,BFloat16}; debuginfo=:none)

; Function Signature: +(Core.BFloat16, Core.BFloat16)
define bfloat @"julia_+_1885"(bfloat %"x::BFloat16", bfloat %"y::BFloat16") #0 {
top:
  %0 = insertelement <2 x bfloat> poison, bfloat %"x::BFloat16", i64 0
  %1 = insertelement <2 x bfloat> %0, bfloat %"y::BFloat16", i64 1
  %2 = bitcast <2 x bfloat> %1 to <2 x i16>
  %3 = zext <2 x i16> %2 to <2 x i32>
  %4 = shl nuw <2 x i32> %3, <i32 16, i32 16>
  %5 = bitcast <2 x i32> %4 to <2 x float>
  %shift = shufflevector <2 x float> %5, <2 x float> poison, <2 x i32> <i32 1, i32 poison>
  %6 = fadd <2 x float> %shift, %5
  %7 = extractelement <2 x float> %6, i64 0
  %8 = fcmp ord float %7, 0.000000e+00
  br i1 %8, label %L13, label %L30

L13:                                              ; preds = %top
  %bitcast_coercion11 = bitcast float %7 to i32
  %9 = lshr i32 %bitcast_coercion11, 16
  %10 = and i32 %9, 1
  %narrow = add nuw nsw i32 %10, 32767
  %11 = zext nneg i32 %narrow to i64
  %12 = zext i32 %bitcast_coercion11 to i64
  %13 = add nuw nsw i64 %11, %12
  %14 = lshr i64 %13, 16
  %15 = trunc i64 %14 to i16
  %bitcast_coercion20 = bitcast i16 %15 to bfloat
  br label %L30

L30:                                              ; preds = %L13, %top
  %value_phi = phi bfloat [ %bitcast_coercion20, %L13 ], [ 0xR7FC0, %top ]
  ret bfloat %value_phi
}

This is still demoting bfloat to float, and this isn't much different from #54025 (comment).

maleadt · 2024-08-08T17:34:14Z

If I'm interpreting llvm/llvm-project#97975 (comment) correctly, LLVM's problematic excess precision handling might not be in use with bfloat. If we can verify this, maybe we can just get rid of the demote pass for BFloat16 and have_bf16.

The other condition is that all our targets that do not support bfloat (i.e., most of them) properly legalize bfloat instructions to something that they do support. IIRC I encountered some instruction selection failures when first working on this, but those might have been fixed now that we're on a more recent version of LLVM.

giordano · 2024-08-08T20:15:10Z

Removing the entire bf16 demotion business would probably be a better option: @gbaraldi pointed out that bf16 on aarch64 doesn't support much, probably only conversions to other floats and dot operations: a fadd bfloat bfloat in LLVM IR is turned into fadd of floats in native code anyway: https://godbolt.org/z/93YaG19oc. Which means that while this PR correctly detects bf16 feature on aarch64, what the bf16 feature does on aarch64 isn't quite enough for our demotion pass.

Probably relevant: https://community.arm.com/arm-community-blogs/b/ai-and-ml-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a

giordano · 2024-08-12T20:52:31Z

Closing in favour of #55479

Fix bfloat16 detection on aarch64

2738b30

giordano added the system:arm ARMv7 and AArch64 label Aug 8, 2024

giordano requested review from maleadt and gbaraldi August 8, 2024 13:39

giordano added the compiler:llvm For issues that relate to LLVM label Aug 8, 2024

maleadt mentioned this pull request Aug 8, 2024

WIP: try integrate BFloat16 eschnett/SIMD.jl#124

Open

giordano mentioned this pull request Aug 12, 2024

Consider dropping bfloat16 demotion pass #55479

Closed

giordano closed this Aug 12, 2024

giordano deleted the mg/demote-bfloat16-aarch64 branch August 12, 2024 20:52

This was referenced Aug 12, 2024

have_bf16 always returns false on Nvidia Grace CPU (neoverse-v2) on Linux #54389

Closed

Demote(B)Float16 pass: only keep enabled for PPC. #55486

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bfloat16 detection on aarch64 #55417

Fix bfloat16 detection on aarch64 #55417

giordano commented Aug 8, 2024 •

edited

Loading

maleadt commented Aug 8, 2024

giordano commented Aug 8, 2024

giordano commented Aug 12, 2024

Fix bfloat16 detection on aarch64 #55417

Fix bfloat16 detection on aarch64 #55417

Conversation

giordano commented Aug 8, 2024 • edited Loading

maleadt commented Aug 8, 2024

giordano commented Aug 8, 2024

giordano commented Aug 12, 2024

giordano commented Aug 8, 2024 •

edited

Loading