-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: extend to_layout
options
#2763
Conversation
b17a58d
to
bba3276
Compare
2b2b5ae
to
3ce35d4
Compare
Codecov Report
Additional details and impacted files
|
Annoyingly, I've run into the problem of specifying scalar-promotion rules again. We kicked the bucket down the road somewhat, but I think it's time to robustly solve this. At this stage, I think we can just replicate NumPy's rules for Python scalars, because Python scalars are expected to be non-kernel outputs (i.e. static). However, if we have places where this doesn't hold, we might need to think about implementing NEP-50 (https://numpy.org/neps/nep-0050-scalar-promotion.html#motivation-and-scope) which proposes to remove the value-dependence that we're worried about in Awkward |
9fec71e
to
2cf737b
Compare
This might be needed for some commonsense broadcasting cases. For instance, some_array["new_field"] = "a single string" may need to be presented as |
6bdd5eb
to
75c70dc
Compare
@jpivarski the failing test in 3.12 on windows is due to the change in scalar promotion; Awkward promotes scalars to the |
We should be interpreting Python If the scalars are really untyped (i.e. Python |
Fine! Although, now that's also a list that includes Emscripten whose default is 32 bit.
Yes, the |
I ran into JavaScript being limited to 32-bit in the Kaitai web tool. That was quite a surprise. Okay, there are two exceptions: NumPy in Windows and JavaScript/Emscripten. |
75c70dc
to
5b8ed8c
Compare
I just changed This allows finer control of strings, and brings some symmetry to the API |
f74cb28
to
271e679
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More granularity in the options is good.
Although a Python string is now by default turned into an array of characters, which is not good for broadcasting, the functions that use it set the argument to turn the Python string into a length-1 array of strings, so that nothing breaks.
We talked about how Python numbers are not always promoted to int64
and float64
, but are promoted to whatever they need to be to not change the array they're combined with, which is what NumPy does, and since NumPy does it, we'll do it, too.
This allow_other
has changed its meaning, so it should also change its name: it will become allow_unknown
(Angus and I are on Zoom now). Then, if anybody is using allow_other
, they'll be alerted to the change by a TypeError. When we scanned for usage in the Awkward-dependent repos, none of them are using ak.to_layout
, so we don't need a deprecation cycle.
So one change is needed before merging: rename allow_other
→ allow_unknown
. Other than that, it's done.
…cikit-hep/awkward into agoose77/feat-extend-to-layout-options
At present,
to_layout
eagerly converts anything it can recognise into an array. Yet, there are several places where we want to be stricter about what is supported and to be able to identify scalar inputs.This PR extends the configurability of
to_layout
, to permit finer control of scalar input handling. It adds new parameters toak.to_layout:
allow_none
— whetherNone
values are permitted as inputsuse_from_iter
— whether iterable values are permitted as inputsprimitive_policy
— the policy governing handling of scalars, e.g. promotion, pass-through, errorstring_policy
— the policy governing handling of strings, e.g. promotion, pass-through, as-characters, errorI also removed the
__cast__
default that castsstr
to length-1 arrays of strings.