-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSS] Support kDLBool type #75
Comments
bool is encoded as |
As I understand, the size of bool element is usually one byte, so representing it with |
Thanks @alonre24 On compiler projects like LLVM, |
Thank you @tqchen, there seems to be different ways in which the different frameworks decide to implement bool. It is not guaranteed to be I'm of the opinion that having a separate |
Indeed, this issue would happen in MPI programs too ( |
What I originally meant was to specify the sub-bit storage layout, which means When the bool was indeed stored as sub-byte in bitmask setting, it then becomes |
I just realized in CuPy we do not support exchanging bool arrays because of lack of @tqchen I think this is a nontrivial interpretation that not everybody expects (nor was it documented). For example the ongoing NumPy PR did not support bool either (numpy/numpy#19083). AFAIK only PyTorch supports it but it's unclear to me where they learned this. I am +1 for adding |
My view is that adding a new type for bool seems best (and clearest). Does PyTorch even really support it, or just export bool as a uint8 (losing the |
Thanks everyone for great discussion so far. Trying to summarize the thread. A0:
|
I think this as storing exactly 8-tuples of bool, under the specifications? So a
Also, someone might get the idea to store It seems to me that if there is a possibility of bit-sized stride support (e.g. in form of bit-arrays), creating the explicit bool dtype is a lot cleaner. If that will never happen, I still think it is nicer, but I doubt there are serious practical issues with using I had the impression that the standard already explicitly supports bit-strides, since the header includes the line (in a comment): size *= (t->dtype.bits * t->dtype.lanes + 7) / 8; although, I guess there is a fair chance that nobody would do that and you can consider it undefined right now.
Yes, but I don't understand that this is a reason for or against A0? The storage needs to be clearly defined as 1 or 0 (or possible |
Thanks @seberg . To followup on the conversation about the bitmasking. The particular comment of
To followup on the multi-dimensional bit-array. We would indeed require some form of logical to physical mapping in this case. Explicitly specifying a vector type In the case of A1, |
Yeah, you are right, I managed to mis-parse the code as rounding up the size rather than the itemsize :(. Although I guess many here likely didn't realize that sub-bit datatypes are standardized to be stored padded to full bytes :).
You are right. Do not use
(For what its worth, NumPy can't always guarantee What about sub-byte strides? I do agree that there is an inconsistency if you would use I have heard the bit-mask requests for NumPy often enough to think that it may be good to anticipate it (not necessarily act on that right now, though). The future might for example see a flag (e.g. on the dtype field) to indicate true bit-sized storage (non byte-padded). That flag will allow both |
@tqchen The discussion seems to have reached an impasse at the
I would vote for an explicit |
@tqchen @rgommers @seberg @kmaehashi @tirthasheshpatel @jakirkham @kkraus14 I'd like us to revisit this topic, and specially I'd like to request for adding |
Thanks for rekindle this topic. After thinking a bit more on this topic. I also now agree that having kDLBool is going to be helpful. I would assume the current state would call for something like
to represent the current bool type in numpy/PyTorch? |
Sounds good, it might be nice to review if anyone has expects True to be only the bit set, rather than anything |
seems there is general consensus on this. We can consider a PR to add such spec. perhaps @leofang can send a PR? I think it makes sense to require true be the only bit set |
Shoooot my apology @tqchen I completely missed the traffic here for some reason. Will do later today! |
To represent tensors of type bool - is it possible adding to the
DLDataTypeCode
enum another type (kDLBool)? That can be very useful, as some well known machine learning platforms (TF, torch, Onnxruntime) support models whose inputs/outputs are tensors of type bool. Thanks!The text was updated successfully, but these errors were encountered: