Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FindIntrinsics creates incorrect IR for casts from bool to arithmetic type #6601

Closed
shoaibkamil opened this issue Feb 1, 2022 · 5 comments · Fixed by #6622
Closed

FindIntrinsics creates incorrect IR for casts from bool to arithmetic type #6601

shoaibkamil opened this issue Feb 1, 2022 · 5 comments · Fixed by #6622

Comments

@shoaibkamil
Copy link
Contributor

Here's a small repro:

#include "Halide.h"

using namespace Halide;

class MyGen : public Generator<MyGen> {
public:
  Input<Buffer<float, 1>> in{"input"};
  Output<Buffer<float, 1>> out{"output"};
  Var x, tx;

  void generate() {
    out(x) = cast<float>(in(x) >= 0) - cast<float>(in(x) < 0);
    out.gpu_tile(x, tx, 16)
      .vectorize(tx, 4);
  }

};

HALIDE_REGISTER_GENERATOR(MyGen, mygen)

Building for a GPU target results in:

Unhandled exception: Error: Can't represent an integer with this many bits in Metal C: int4x4

Drilling down a bit, what I'm seeing is that the IR after FindIntrinsics includes expressions like:

output[ramp(((.__thread_id_x*4) + output.extent.0) + -16, 1, 4)] = 
   float32x4((int4x4)widening_sub(int2x4((x4(0.000000f) <= t19)), int2x4((t19 < x4(0.000000f)))))

@rootjalex
Copy link
Member

rootjalex commented Feb 2, 2022

This issue is coming from here: https://github.com/halide/Halide/blob/1e7040c9ab44c31b4754aa0cdd3ca0555c8f8142/src/FindIntrinsics.cpp#L217-#L218

Which allows for this rewrite:

cast<float>(a)  - cast<float>(b) -> cast<float>(widening_sub(a, b))

As long as a and b can be losslessly cast to uints with < 32 bits. (likewise for < 64 bits if it was a cast<double>). Is this a valid rewrite that just shouldn't be applied to booleans, or should it not be a valid rewrite at all? I'm unsure of the semantics here (maybe @dsharlet or @abadams know?).

The same problem also appears exists for add, but I imagine this is rather difficult to trigger: https://github.com/halide/Halide/blob/1e7040c9ab44c31b4754aa0cdd3ca0555c8f8142/src/FindIntrinsics.cpp#L187-#L188

@pranavb-ca
Copy link
Contributor

Minor nit: I think it allows for the rewrite

cast<float>(a)  - cast<float>(b) -> (widening_sub(a, b)

FWIW, this used to be in the Hexagon backend in Halide for the widening_add & sub operations which Dillon pulled into FindIntrinsics. We didn't however apply this to booleans. May be the check for types needs to be more restrictive.

@rootjalex
Copy link
Member

@pranavb-ca I'm not sure what the difference is, could you elaborate? The cast back to float comes from here:

result = Cast::make(op->type, result);

I agree that this should probably be more strict on types, but I'm also wondering if this rewrite should be valid for, e.g.:

cast<float>((uint8)a)  - cast<float>((uint8)b) -> cast<float>((uint16)widening_sub((uint8)a, (uint8)b))

@pranavb-ca
Copy link
Contributor

@pranavb-ca I'm not sure what the difference is, could you elaborate? The cast back to float comes from here:

result = Cast::make(op->type, result);

I agree that this should probably be more strict on types, but I'm also wondering if this rewrite should be valid for, e.g.:

cast<float>((uint8)a)  - cast<float>((uint8)b) -> cast<float>((uint16)widening_sub((uint8)a, (uint8)b))

I apologize, you are right @rootjalex. Since in the Hexagon backend we used to restrict it to the same type (different width) I assumed that was the case here as well.

@rootjalex
Copy link
Member

Oh no worries @pranavb-ca ! Thanks for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants