Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argchoose and choose #79

Closed
jpivarski opened this issue Jan 14, 2020 · 3 comments · Fixed by #160
Closed

argchoose and choose #79

jpivarski opened this issue Jan 14, 2020 · 3 comments · Fixed by #160
Assignees
Labels
feature New feature or request

Comments

@jpivarski
Copy link
Member

Like #78, it is sufficient to define only argchoose in C++; we can add chose in Python.

This takes only one array and computes the per-element upper triangle of combinations of that array with itself. It is "choosing without replacement." For example,

>>> import awkward
>>> first = awkward.fromiter([[1, 2, 3], [], [4, 5]])
>>> second = awkward.fromiter([["a", "b"], ["c"], ["d", "e", "f"]])
>>> first.choose(2)
<JaggedArray [[(1, 2) (1, 3) (2, 3)] [] [(4, 5)]] at 0x7fc91ccce850>
>>> second.choose(2)
<JaggedArray [[(a, b)] [] [(d, e) (d, f) (e, f)]] at 0x7fc91d58b110>

The parameter, n >= 2, is the number of fields the output tuples should have. Choosing with n=3 is different from choosing with n=2 and then cross on the output: choose selects the upper diagonal in an n-dimensional matrix of possibilities. For example,

>>> second.choose(3)
<JaggedArray [[] [] [(d, e, f)]] at 0x7fc91ccce590>

is not the same as

>>> second.choose(2).cross(second)
<JaggedArray [[(a, b, a) (a, b, b)] [] [(d, e, d) (d, e, e) (d, e, f) ... (e, f, d) (e, f, e) (e, f, f)]] at 0x7fc91cc6add0>

because the latter doesn't eliminate duplicates like (a, b, a). The only way to pick three letters without duplicates is if the original array had three elements, like ["d", "e", "f"] in second[2] (which is why it's the only non-empty result of second.choose(3).

This and cross (issue #78) are the two basic generators of particle combinatorics in HEP analyses.

@jpivarski jpivarski added the feature New feature or request label Jan 14, 2020
@nsmith-
Copy link
Member

nsmith- commented Jan 17, 2020

I'd like to propose an additional axis= argument to all combinatorics functions, defaulting as usual to -1. For example,

a = ak.fromiter([[['000', '001'], ['010']], [['100', '101'], ['110', '111']]])
assert a.choose(2).tolist() == [[[('000', '001')], []], [[('100', '101')], [('110', '111')]]]
assert a.choose(2, axis=1).tolist() == [[(['000', '001'], ['010'])], [(['100', '101'], ['110', '111'])]]
# axis 0 would typically be a bad idea, but valid

@nsmith-
Copy link
Member

nsmith- commented Jan 17, 2020

Actually I just realized that right now axis=1 is the default in awkward0. The axis=-1 here would be a.copy(content=a.content.choose(2)) in awkard0

@jpivarski
Copy link
Member Author

I don't know whether we need/want the same axis default for all functions. For something like flatten, the usual case is axis=0, but for a reducer, the usual case is axis=-1. The first most important thing is that axis works the same way/means the same thing for all functions. Maybe they should also have the same default. Maybe not. I don't know how confusing it would be to users if they have different defaults. (Or no defaults? No, that's too extreme.)

@jpivarski jpivarski linked a pull request Mar 12, 2020 that will close this issue
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants