argchoose and choose #79

jpivarski · 2020-01-14T17:28:27Z

Like #78, it is sufficient to define only argchoose in C++; we can add chose in Python.

This takes only one array and computes the per-element upper triangle of combinations of that array with itself. It is "choosing without replacement." For example,

>>> import awkward
>>> first = awkward.fromiter([[1, 2, 3], [], [4, 5]])
>>> second = awkward.fromiter([["a", "b"], ["c"], ["d", "e", "f"]])
>>> first.choose(2)
<JaggedArray [[(1, 2) (1, 3) (2, 3)] [] [(4, 5)]] at 0x7fc91ccce850>
>>> second.choose(2)
<JaggedArray [[(a, b)] [] [(d, e) (d, f) (e, f)]] at 0x7fc91d58b110>

The parameter, n >= 2, is the number of fields the output tuples should have. Choosing with n=3 is different from choosing with n=2 and then cross on the output: choose selects the upper diagonal in an n-dimensional matrix of possibilities. For example,

>>> second.choose(3)
<JaggedArray [[] [] [(d, e, f)]] at 0x7fc91ccce590>

is not the same as

>>> second.choose(2).cross(second)
<JaggedArray [[(a, b, a) (a, b, b)] [] [(d, e, d) (d, e, e) (d, e, f) ... (e, f, d) (e, f, e) (e, f, f)]] at 0x7fc91cc6add0>

because the latter doesn't eliminate duplicates like (a, b, a). The only way to pick three letters without duplicates is if the original array had three elements, like ["d", "e", "f"] in second[2] (which is why it's the only non-empty result of second.choose(3).

This and cross (issue #78) are the two basic generators of particle combinatorics in HEP analyses.

The text was updated successfully, but these errors were encountered:

nsmith- · 2020-01-17T15:33:31Z

I'd like to propose an additional axis= argument to all combinatorics functions, defaulting as usual to -1. For example,

a = ak.fromiter([[['000', '001'], ['010']], [['100', '101'], ['110', '111']]])
assert a.choose(2).tolist() == [[[('000', '001')], []], [[('100', '101')], [('110', '111')]]]
assert a.choose(2, axis=1).tolist() == [[(['000', '001'], ['010'])], [(['100', '101'], ['110', '111'])]]
# axis 0 would typically be a bad idea, but valid

nsmith- · 2020-01-17T15:36:38Z

Actually I just realized that right now axis=1 is the default in awkward0. The axis=-1 here would be a.copy(content=a.content.choose(2)) in awkard0

jpivarski · 2020-01-17T16:03:29Z

I don't know whether we need/want the same axis default for all functions. For something like flatten, the usual case is axis=0, but for a reducer, the usual case is axis=-1. The first most important thing is that axis works the same way/means the same thing for all functions. Maybe they should also have the same default. Maybe not. I don't know how confusing it would be to users if they have different defaults. (Or no defaults? No, that's too extreme.)

jpivarski assigned ianna Jan 14, 2020

jpivarski added the feature New feature or request label Jan 14, 2020

jpivarski added this to the Minimum viable product for analysis milestone Jan 14, 2020

jpivarski mentioned this issue Jan 17, 2020

argcross and cross #78

Closed

jpivarski linked a pull request Mar 12, 2020 that will close this issue

argchoose and choose. #160

Merged

11 tasks

jpivarski closed this as completed in #160 Mar 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

argchoose and choose #79

argchoose and choose #79

jpivarski commented Jan 14, 2020

nsmith- commented Jan 17, 2020

nsmith- commented Jan 17, 2020

jpivarski commented Jan 17, 2020

argchoose and choose #79

argchoose and choose #79

Comments

jpivarski commented Jan 14, 2020

nsmith- commented Jan 17, 2020

nsmith- commented Jan 17, 2020

jpivarski commented Jan 17, 2020