-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deprecate scalar conversions for rank>0 arrays #10404
Comments
Personally agree with you (there is a reason we got rid of the |
Not even -
Me too. The deprecation warning could stick around for a very long time, and making noise downstream seems pretty reasonable if that noise identifies their bugs. |
I agree with @seberg -- this change would indeed make the API cleaner, but I'm dubious about whether we can actually pull it off or the pain would be worth the benefit. The next step would be to try making the change and see how much stuff breaks (e.g. by running the test suites for big downstream projects like scipy, scikit-learn, astropy, ...). That'd at least give us some data to discuss. |
Thanks for the comments. I've adapted to original post to reflect the fact that arrays of larger rank can be converted to floats as well. I understand that it can be painful to deprecate stuff, particularly if its not clear how much a functionality is actually used downstream. I would suggest that I make the change locally and compile scipy, sympy, pandas, scikit-learn, and astropy (any other suggestions?) against it and report back with the experience. |
Alright, I've now dug into scipy and this is what I found:
I'll leave it up to you to comment and/or close this issue. |
@nschloe I don't follow why that second case becomes illegal. It seems like that should be well defined, since the array |
@shoyer I'll try to explain why I think it failed. The array element a = numpy.random.rand(10, 1)
a[0] = numpy.array([1.0]) is a scalar (i.e., empty shape |
@nschloe: That might be considered a sort of reverse broadcasting, which currently succeeds in some cases but not others: a = np.zeros((1, 3))
b = np.empty((3,))
b[...] = a # leading 1 dimensions are removed
b[...] += a # but this fails! Thinking again, allowing a |
@nschloe: in your branch, does |
Yeah… well the lhs broadcasting for singleton assignments was what I
was thinking of. Not sure if there is more, but if it is the main
point, it should not be all that bad….
Eric, hehe, that should succeed, but without checking I am not actually
sure anymore if it sidesteps and just does item assignment currently,
though I doubt it.
There may be a bit annoyance with object arrays, where you never do
this for item assignment, since you assume that the array should be the
object. But that wouldn't even have to change.
|
I think that collectively, we've hit the nail on the head there - reverse broadcasting has no place in scalar assignments, because it just doesn't generalize to object arrays. I think that
ought to be forbidden, since its semantics are dtype dependent - code written this way for numeric arrays will already fail in unexpected ways for object arrays. If the reverse-broadcasting is desired by the user, then they should be forced to write
which does the same thing for all dtypes. Making them write it this way eliminates a class of bug, which is exactly what we should be aiming to do with Whether reverse broadcasting is actually desirable is also perhaps questionable, but you've made it clear to me that it's an orthogonal issue. |
Yeah, its orthogonal. If you attempt this, you basically have to put into place the stuff that allows to deprecate the other, too. But that is all there is to it probably. EDIT: OK it is not quite orthogonal, since for a bit the behaviour between some array likes would differ. |
I think @shoyer's point was that if |
@njsmith: I mistakenly read that as shape |
I've finally gotten around to setting up a branch for this, master...nschloe:bug-10404. Here are the results: import numpy
a = numpy.random.rand(10, 1)
a[0] = numpy.array([1.0]) # okay
a[0] = numpy.array(1.0) # okay
a[0] = 1.0 # okay
b = numpy.random.rand(10)
b[0] = numpy.array([1.0]) # ValueError: setting an array element with a sequence.
b[0, ...] = numpy.array([1.0]) # okay
b[0] = numpy.array(1.0) # okay
b[0] = 1.0 # okay This all makes sense, and the error message is accurate too. Instead of recommending to insert Yeah, so this is a breaking change that would need a long way of deprecation first. Should I make the ValueError a deprecation and submit a PR where we can discuss further, or are there other suggestions? |
Nothing wrong with PRs in any case, we can always close them ;). It might be a very long and possibly painful deprecation, unfortunately. About the assignment thing, currently that type of broadcasting is also supported for non-scalar LHS, which may or may not be something to look into/synchronize. The problem with these things is that often you only know how bad this kind of change is for downstream until they start seriously using it. |
If you make a PR we can at least identify where we rely on this internally |
➡️ PR #10615. |
As another pain point, this behavior is a tad confusing when combined with I ran into this when trying to write overloads for a function, say
I originally ran into this when writing a custom dtype, but found this was the behavior for floats too. I will file a |
The scalar conversion of size-1 arrays I think is a part of Numpy semantics that is quite central, and I believe it's not a small amount of code that assumes it (scipy/scipy#11147 also seems to suggest). IMO, this would be more like Numpy 2.0 stuff. |
Perhaps there's a compromise where we can still deprecate it in 1.x, but it's a far longer deprecation period than normal - perhaps |
@eric-wieser how are we going to deprecate/futurewarn though? Return new scalars, which warn when in-place operations happen, or when someone tries to hash them? I suppose that is actually possible, but I guess it might be very noisy. OTOH, considering how much I dislike this behaviour, maybe there is a point in trying it out. Even if we would hide it behind some env variable for now. |
Isn't it as simple as replacing the valueerror in the linked PR with a warning? |
I don't think the length of the deprecation period matters here --- either the API is kept stable or not. The arguments what is gained by this seem not so clearly considered above (or at least I didn't get them by reading it twice :): Unlike with strict-casting and float indices, it's not as clear presence of this coercion in code would usually be a bug (only sympy and pybind11 corner cases are mentioned above, not sure if the object array assignment is the same). |
The way I see it is:
Perhaps then this should be a |
👍 I think there no need for hard breaking people's code so you're right, a deprecation is perhaps a bit harsh. However, when using float conversion of rank>0 arrays, you're almost always making a mistake in how you treat your arrays. A typical case would be returning More typical anti-patterns caught by this change: import numpy
a = numpy.array([1.0e-1], dtype=numpy.float16)
# ....
val = float(a) # you want a[0] import numpy
p = numpy.array([3.14])
a = numpy.array([1.0, 2*p, 1/p]) # whoops: a is of type obj
#
b = numpy.zeros((10, 3))
b[0] = a # ! |
|
Indeed. And until now, in many cases where the user should have used |
@nschloe, I think your scipy PR might be fixing the symptom and not the cause - for anyone else reading along, I've been trying a different approach in scipy/scipy#11158. |
In scipy/scipy#11147, majority of the changes do not appear to be actually bugs --- most of them are extracting a Python scalar of given type from an array known to have a size 1 (with exception if not), i.e. |
In principle |
This PR reflects some of the progress achieved in issue #10404 and is used to asses the impact of the changes. With the changes in this PR, `float(numpy.array([1.0])` now gives a warning; likewise some other things: ```python import numpy a = numpy.random.rand(10, 1) a[0] = numpy.array([1.0]) # okay a[0] = numpy.array(1.0) # okay a[0] = 1.0 # okay b = numpy.random.rand(10) b[0] = numpy.array([1.0]) # ValueError: setting an array element with a sequence. b[0, ...] = numpy.array([1.0]) # okay b[0] = numpy.array(1.0) # okay b[0] = 1.0 # okay ``` This aligns the behavior of numpy arrays with that of lists: ```python float([3.14]) ``` ``` TypeError: float() argument must be a string or a number, not 'list' ``` ```python import numpy as np a = np.random.rand(5) a[0] = [3.14] ``` ``` ValueError: setting an array element with a sequence. ``` Fixes #10404.
I have decided to give the deprecation a shot. I think the situation is a bit better now (Scipy, pandas, matplotlib should be pretty much clean). Also the changes in SciPy were by far the largest, which gives me hope that downstream libraries are not as badly affected. That said, I do suspect we should take this slow, and I am happy to revert if we notice that it is painful for downstream/end-users. (Or anyone here has concerns.) |
Thanks! I like this approach, hopefully will not be too bumpy of a ride. |
This looks pretty horrible for statsmodels, we need to add a million squeeze to 1-d, one element arrays. Does the same occur when assigining e.g a 2dim row to a 1-dim row slice of a 2dim array? (Aside: the current test failures with this are only in 2 unit tests that check that no warning is raised) |
I tried the patch on our numpy-heavy codebase at my company, by entering |
@kalvdans Thanks for sharing! Having to explicitly convert the expression to a scalar makes sense to me. After all, the result of the operation could theoretically have any size. idx = elements == Z
assert np.sum(idx) == 1
idx = idx[0]
res = linestrengths[idx] |
* Fix for NumPy 1.25 * Revert #6973 * Revert "Don't test numpy prerelease on azure (#6996)" This reverts commit e57c23d. * Skip warning check on older, noisier versions of NumPy * Remove upperpin * Use stable sort in _raveled_offsets_and_distances NumPy enabled optimizations for quicksort on AVX-512 enabled processors [1]. We sort based on neighbor distances which can be equal. The default quicksort is not stable and could have different behavior on AVX-512. Test this assumption. [1] https://numpy.org/devdocs/release/1.25.0-notes.html#faster-np-argsort-on-avx-512-enabled-processors * Fix forgotten doctest in _offsets_to_raveled_neighbors after updating to stable sort internally. * Fix for NumPy 1.25 * Revert #6973 * Revert "Don't test numpy prerelease on azure (#6996)" This reverts commit e57c23d. * Skip warning check on older, noisier versions of NumPy * Remove upperpin * Use stable sort in _raveled_offsets_and_distances NumPy enabled optimizations for quicksort on AVX-512 enabled processors [1]. We sort based on neighbor distances which can be equal. The default quicksort is not stable and could have different behavior on AVX-512. Test this assumption. [1] https://numpy.org/devdocs/release/1.25.0-notes.html#faster-np-argsort-on-avx-512-enabled-processors * Fix forgotten doctest in _offsets_to_raveled_neighbors after updating to stable sort internally. * Handle rank>0 scalar arrays numpy/numpy#10404 * Use stable sort * Use stable sort in max_tree (debug) See if it does anything. * Use stable sort in _get_high_intensity_peaks (debug) See if it does anything. * Upper pin numpy<1.26 for tests --------- Co-authored-by: Stefan van der Walt <[email protected]> Co-authored-by: Lars Grüter <[email protected]>
Numpy allows for conversion of arrays into scalars if they are size-1, i.e.,
I'm fine with rank-0 conversions, but I found that discrimination of conversion based on array size can easily lead to inconsistencies downstream. See, for example, sympy/sympy#13924 where suddenly you have a different behavior of multiplication depending on the length of the input array.
Moreover, this feature is actually redundant for rank>0 arrays since the value can simply be retrieved from the array via
x[0]
.When grepping for
"converted to Python scalar"
in the numpy sources, one finds thatnumpy/lib/user_array.py
haswhich seems to be the more consistent solution.
I would hence like to suggest to deprecate scalar conversion for rank>0 size-1 arrays.
The text was updated successfully, but these errors were encountered: