feat: extend `to_layout` options #2763

agoose77 · 2023-10-20T17:01:00Z

At present, to_layout eagerly converts anything it can recognise into an array. Yet, there are several places where we want to be stricter about what is supported and to be able to identify scalar inputs.

This PR extends the configurability of to_layout, to permit finer control of scalar input handling. It adds new parameters to ak.to_layout:

allow_none — whether None values are permitted as inputs
use_from_iter — whether iterable values are permitted as inputs
primitive_policy — the policy governing handling of scalars, e.g. promotion, pass-through, error
string_policy — the policy governing handling of strings, e.g. promotion, pass-through, as-characters, error
I also removed the __cast__ default that casts str to length-1 arrays of strings.

codecov · 2023-10-22T18:22:38Z

Codecov Report

Merging #2763 (175c52f) into main (b333fe4) will decrease coverage by 0.04%.
Report is 2 commits behind head on main.
The diff coverage is 91.66%.

Additional details and impacted files

Files	Coverage Δ
src/awkward/_connect/numba/arrayview.py	`93.37% <100.00%> (ø)`
src/awkward/_connect/numexpr.py	`90.66% <ø> (ø)`
src/awkward/_nplikes/typetracer.py	`76.58% <100.00%> (+0.12%)`	⬆️
src/awkward/_slicing.py	`86.86% <100.00%> (+0.26%)`	⬆️
src/awkward/contents/content.py	`76.25% <100.00%> (+0.09%)`	⬆️
src/awkward/highlevel.py	`76.53% <ø> (ø)`
src/awkward/operations/ak_all.py	`96.00% <100.00%> (ø)`
src/awkward/operations/ak_any.py	`96.00% <100.00%> (ø)`
src/awkward/operations/ak_argcartesian.py	`88.57% <ø> (ø)`
src/awkward/operations/ak_argcombinations.py	`86.95% <ø> (ø)`
... and 67 more

agoose77 · 2023-10-22T22:26:00Z

Annoyingly, I've run into the problem of specifying scalar-promotion rules again. We kicked the bucket down the road somewhat, but I think it's time to robustly solve this. At this stage, I think we can just replicate NumPy's rules for Python scalars, because Python scalars are expected to be non-kernel outputs (i.e. static).

However, if we have places where this doesn't hold, we might need to think about implementing NEP-50 (https://numpy.org/neps/nep-0050-scalar-promotion.html#motivation-and-scope) which proposes to remove the value-dependence that we're worried about in Awkward

src/awkward/_nplikes/typetracer.py

jpivarski · 2023-10-23T22:22:09Z

I also removed the __cast__ default that casts str to length-1 arrays of strings.

This might be needed for some commonsense broadcasting cases. For instance,

some_array["new_field"] = "a single string"

may need to be presented as ak.Array(["a single string"]) so that broadcasting sees this as one atom to duplicate len(some_array) times.

agoose77 · 2023-10-24T00:38:03Z

@jpivarski the failing test in 3.12 on windows is due to the change in scalar promotion; Awkward promotes scalars to the ArrayBuilder default type (int64) rather than the NumPy default for the platform. I don't think that is necessarily a problem, though I wonder whether ArrayBuilder should default to the NumPy default dtype for the current platform rather than hard-coding int64. Any strong feelings here? The real question, though, is are you OK with changing this behavior such that untyped scalars go via ArrayBuilder rather than using asarray?

jpivarski · 2023-10-24T15:25:57Z

We should be interpreting Python int as int64. That's a widespread convention; only NumPy on Windows doesn't do it and I think that's a mistake.

If the scalars are really untyped (i.e. Python int or float), they can go via ArrayBuilder, but NumPy-typed scalars (i.e. np.int32 and np.float64) would lose their types if they went through ArrayBuilder. ArrayBuilder takes any integer as int64 and any floating point number as float64; it just doesn't have other input types compiled into it (which would dramatically increase its surface area, since its number of methods has to be equal to the number of classes, so handling $n$ input types means $n^2$ method-class pairs).

agoose77 · 2023-10-24T15:33:47Z

We should be interpreting Python int as int64. That's a widespread convention; only NumPy on Windows doesn't do it and I think that's a mistake.

Fine! Although, now that's also a list that includes Emscripten whose default is 32 bit.

If the scalars are really untyped (i.e. Python int or float), they can go via ArrayBuilder, but NumPy-typed scalars (i.e. np.int32 and np.float64) would lose their types if they went through ArrayBuilder.

Yes, the np.generic scalars are handled via np.asarray :)

jpivarski · 2023-10-24T15:37:59Z

Emscripten whose default is 32 bit

I ran into JavaScript being limited to 32-bit in the Kaitai web tool. That was quite a surprise.

Okay, there are two exceptions: NumPy in Windows and JavaScript/Emscripten.

agoose77 · 2023-10-24T17:17:23Z

I just changed strings_as_characters to a new literal string_policy, and renamed scalar_policy to primitive_policy.

This allows finer control of strings, and brings some symmetry to the API

jpivarski

More granularity in the options is good.

Although a Python string is now by default turned into an array of characters, which is not good for broadcasting, the functions that use it set the argument to turn the Python string into a length-1 array of strings, so that nothing breaks.

We talked about how Python numbers are not always promoted to int64 and float64, but are promoted to whatever they need to be to not change the array they're combined with, which is what NumPy does, and since NumPy does it, we'll do it, too.

This allow_other has changed its meaning, so it should also change its name: it will become allow_unknown (Angus and I are on Zoom now). Then, if anybody is using allow_other, they'll be alerted to the change by a TypeError. When we scanned for usage in the Awkward-dependent repos, none of them are using ak.to_layout, so we don't need a deprecation cycle.

So one change is needed before merging: rename allow_other → allow_unknown. Other than that, it's done.

…cikit-hep/awkward into agoose77/feat-extend-to-layout-options

agoose77 temporarily deployed to docs October 20, 2023 17:11 — with GitHub Actions Inactive

agoose77 force-pushed the agoose77/feat-extend-to-layout-options branch from b17a58d to bba3276 Compare October 22, 2023 17:58

agoose77 temporarily deployed to docs October 22, 2023 18:08 — with GitHub Actions Inactive

agoose77 force-pushed the agoose77/feat-extend-to-layout-options branch from 2b2b5ae to 3ce35d4 Compare October 22, 2023 18:15

agoose77 temporarily deployed to docs October 22, 2023 18:26 — with GitHub Actions Inactive

agoose77 temporarily deployed to docs October 22, 2023 19:20 — with GitHub Actions Inactive

agoose77 mentioned this pull request Oct 22, 2023

feat: add .attrs to highlevel objects #2757

Merged

agoose77 force-pushed the agoose77/feat-extend-to-layout-options branch from 9fec71e to 2cf737b Compare October 23, 2023 15:03

agoose77 temporarily deployed to docs October 23, 2023 15:20 — with GitHub Actions Inactive

agoose77 commented Oct 23, 2023

View reviewed changes

src/awkward/_nplikes/typetracer.py Show resolved Hide resolved

agoose77 requested a review from jpivarski October 23, 2023 17:29

agoose77 marked this pull request as draft October 23, 2023 17:33

agoose77 temporarily deployed to docs October 23, 2023 17:35 — with GitHub Actions Inactive

agoose77 force-pushed the agoose77/feat-extend-to-layout-options branch from 6bdd5eb to 75c70dc Compare October 24, 2023 00:26

agoose77 changed the base branch from main to agoose77/refactor-nplike-ufunc-type-promotion October 24, 2023 00:26

agoose77 marked this pull request as ready for review October 24, 2023 00:26

agoose77 temporarily deployed to docs October 24, 2023 00:46 — with GitHub Actions Inactive

Base automatically changed from agoose77/refactor-nplike-ufunc-type-promotion to main October 24, 2023 17:00

agoose77 added 5 commits October 24, 2023 18:02

refactor: make to_layout support finer granularity

eee539c

feat: extend customisation of to_layout

86ef643

fix: pass allow_record to from_iter

a36da6d

fix: add allow_none

3d66840

fix: type promotion

5062fd4

agoose77 added 3 commits October 24, 2023 18:02

fix: always return input for non-promoted scalars

9af7a6f

refactor: remove useless to_layout

42a6256

refactor: add string_policy

5b8ed8c

agoose77 force-pushed the agoose77/feat-extend-to-layout-options branch from 75c70dc to 5b8ed8c Compare October 24, 2023 17:16

agoose77 temporarily deployed to docs October 24, 2023 17:30 — with GitHub Actions Inactive

agoose77 added 2 commits October 25, 2023 14:05

fix: promote strings

62570f0

test: basic strings

271e679

agoose77 force-pushed the agoose77/feat-extend-to-layout-options branch from f74cb28 to 271e679 Compare October 25, 2023 13:05

agoose77 temporarily deployed to docs October 25, 2023 13:19 — with GitHub Actions Inactive

agoose77 added 2 commits October 25, 2023 15:05

test: more tests

bf7381c

test: check against default ArrayBuilder type

40033f2

agoose77 temporarily deployed to docs October 25, 2023 14:26 — with GitHub Actions Inactive

jpivarski approved these changes Oct 25, 2023

View reviewed changes

agoose77 added 2 commits October 25, 2023 16:36

refactor: rename allow_other → allow_unknown

081c32f

Merge branch 'main' into agoose77/feat-extend-to-layout-options

837befc

agoose77 enabled auto-merge (squash) October 25, 2023 15:36

agoose77 added 2 commits October 25, 2023 18:10

fix: more rename fixes

d7a33fe

Merge branch 'agoose77/feat-extend-to-layout-options' of github.com:s…

96d9183

…cikit-hep/awkward into agoose77/feat-extend-to-layout-options

agoose77 disabled auto-merge October 25, 2023 17:15

agoose77 temporarily deployed to docs October 25, 2023 17:22 — with GitHub Actions Inactive

test: explicitly check string coercion for with_field

175c52f

agoose77 enabled auto-merge (squash) October 26, 2023 09:40

agoose77 deployed to docs October 26, 2023 09:45 — with GitHub Actions View deployment

agoose77 merged commit 26cfc12 into main Oct 26, 2023
36 checks passed

agoose77 deleted the agoose77/feat-extend-to-layout-options branch October 26, 2023 09:50

agoose77 mentioned this pull request Nov 1, 2023

fix: support scalar strings in fill_none #2793

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: extend `to_layout` options #2763

feat: extend `to_layout` options #2763

agoose77 commented Oct 20, 2023 •

edited

Loading

codecov bot commented Oct 22, 2023 •

edited

Loading

agoose77 commented Oct 22, 2023

jpivarski commented Oct 23, 2023

agoose77 commented Oct 24, 2023 •

edited

Loading

jpivarski commented Oct 24, 2023

agoose77 commented Oct 24, 2023

jpivarski commented Oct 24, 2023

agoose77 commented Oct 24, 2023

jpivarski left a comment

feat: extend to_layout options #2763

feat: extend to_layout options #2763

Conversation

agoose77 commented Oct 20, 2023 • edited Loading

codecov bot commented Oct 22, 2023 • edited Loading

Codecov Report

agoose77 commented Oct 22, 2023

jpivarski commented Oct 23, 2023

agoose77 commented Oct 24, 2023 • edited Loading

jpivarski commented Oct 24, 2023

agoose77 commented Oct 24, 2023

jpivarski commented Oct 24, 2023

agoose77 commented Oct 24, 2023

jpivarski left a comment

Choose a reason for hiding this comment

feat: extend `to_layout` options #2763

feat: extend `to_layout` options #2763

agoose77 commented Oct 20, 2023 •

edited

Loading

codecov bot commented Oct 22, 2023 •

edited

Loading

agoose77 commented Oct 24, 2023 •

edited

Loading