-
Notifications
You must be signed in to change notification settings - Fork 39
Conversation
Added preliminary version of combinations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Put the
import awkward.array.table
(fully qualified) inside the function so that there isn't a dependency order among array types. Also, use the fully qualified name throughout (awkward.array.table.Table
rather thanTable
). - Remove
vectorized_parents
. User-facing methods won't say how they're implemented: we would either replace regularparents
with this or not. I've already done performance tests: regularparents
is faster for Numpy, so we keep that. This vectorized algorithm would be the right thing to do on a GPU (a futureawkward.gpu
acceleration package), but not for regular Numpy. - Remove
return_indices
as an argument; this method should be namedaproduct
and always return indexes (likeamin
andamax
). There should be another method namedproduct
that calls this method and applies the indexes to get values. Moreover, I've established a style in which we spell it "indexes" (English allows both, so we have to pick one for better search-and-replace). - Remove the
writeable
argument; it should inheritself._writeable
. - The first jagged array is
self
, so that this can be used as a method. It should be a red flag when a method never usesself
. The second jagged array should be namedother
, a convention I've established. You only need to check the type ofother
(which would exclude theNone
case, no need for an explicit check). - Minimize the introduction of unnecessary names. For instance
NUMEVENTS
(which, as a name, would be confusing outside of a physics context) is justlen(self)
, which is short enough to use as an idiom. Moreover, it's only used once. Introducing names adds cognitive burden, making code harder to understand, especially when they're not meaningful names. Reducing the number of names would also help you avoidunderscored_names
. - The dtype should be
self.INDEXTYPE
. It's centrally managed in the base class. - Use spaces around arithmetic operations and after commas for clarity. (I use spaces around
+
and-
but sometimes not*
and/
as a reminder of order of operations.) - Inside a JaggedArray method, use the private member names:
_content
rather thancontent
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Hide
parents_from_offsets
inside one of these methods (so that the user doesn't see it) or just inline the loop because it's only used in one place. - Don't copy-paste huge blocks of code!
product
uses the output ofargproduct
, so have it callargproduct
to extract what it needs, rather than reimplement it. Even the sanity checks (ValueErrors
) don't have to be re-checked: ifproduct
callsargproduct
,argproduct
does all the necessary checks. - Don't convert arrays with
astype
if you don't have to.numpy.arange
takes adtype
argument, so use that.
starts2 = other.starts | ||
stops2 = other.stops | ||
counts2 = stops2 - starts2 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we check that counts1.shape == counts2.shape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't be necessary, as we are doing shape checking for starts array. The shape of counts and starts are same.
@@ -406,6 +407,56 @@ def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): | |||
return None | |||
else: | |||
return JaggedArray(starts, stops, result) | |||
|
|||
|
|||
def argproduct(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a small bit of documentation about what this function does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check it out. I have added simple docstrings describing the usage
awkward/array/jagged.py
Outdated
pairs_counts = numpy.zeros(len(starts1)+1, dtype=self.INDEXTYPE) | ||
pairs_counts[1:] = numpy.cumsum(counts1*counts2, dtype=self.INDEXTYPE) | ||
|
||
def parents_from_offsets(offsets, content): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
content is needed in this function only to get the length of the output array. This is almost encoded into offsets. I wonder if a simpler solution could be used? Eg, to either pass in len(content) or to append the N+1 offset of the offsets array (which may be useful for other reasons)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(actually my second suggestion looks to be already the case - eg, can I not do
out=numpy.full(offsets[-1],-1, dtype=selfINDEXTYPE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you are right! We need the length only. I will do it by tomorrow morning ( travelling now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
- Added parents without using content - Added simple docstrings regarding usage.
I had to make some corrections to the conventions (e.g. "offsets" we being named "counts") and fixed a bug ( However, now that I've pushed to what I thought was this pull request, I don't see my commit showing up here. Is it on your fork? |
Superseded by pull request #2 (which has all of these commits, plus one from me). |
@jpivarski Thanks! That was a sneaky little bug.. I don't see the commit on my fork. Maybe it was closed? |
Added preliminary version of combinations. It has been tested with different
JaggedArray
inputs, and was working as expected. The tests will be given shortly.Things that are to be done:
JaggedArray
)