-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linalg.solve broadcasting behavior is ambiguous #285
Comments
linalg.solve is presently ambiguous in its definition. See data-apis/array-api#285. So for now we are not implementing the test, as it is not possible to make a test that works.
I support the decision to allow broadcasting only for matrices and not for vectors. I don't think we need to remove the vector case completely (we need to remove stacks of 1-D vectors). Let's think of |
Sorry, I didn't have time to think about this until now. Using this way to interpret, I don't see any ambiguity:
Am I missing something? |
I think what you are missing is that the stack part of the shapes should broadcast with one another (only the stack part, the "matrix" parts should never broadcast, even if they are size 1). A single (N, M) matrix can be thought of as an array with "stack shape" The whole idea with stacks of matrices is that we think of the last two dimensions of the array as single entries of matrices in a larger array. So a (2, 2, 3, 3) array is like a (2, 2) array of 3x3 matrices. If we had a (2, 2) array of numbers and a 0-dimensional () array (scalar), they would broadcast together. Similarly, a (2, 2, 3, 3) stack of matrices should combine with a single (3, 4) matrix (the only additional restriction being that the matrix shapes need to align). This is how the matmul operator works, as @IvanYashchuk pointed out. For example, these all work:
In each case, the matrix part of the shape is the last two dimensions, which align for the matrix product (3x4 and 4x5). The stack parts are
which are broadcast compatible. The resulting shape for the first two is (2, 3, 5) and for the last one (2, 2, 3, 5). |
Thank you, Aaron! Indeed, I missed the broadcasting behavior, and always calculated the batch array from My preference would be also to "disable a stack of 1D vectors", but to do it the other way around: disable the possibility of |
@leofang My sense is that the more general solution here is allowing Accordingly, my preference is allowing the following, in line with @asmeurer's and @IvanYashchuk's proposal:
Only in (2) would stacking in |
I am on the same page with @leofang the matrix case has an additional dimension We see, that both indices We see, that the index |
If I understand your point correctly, @DerWeh, your point is that, if we support stacks of vectors, we don't need to support providing The above would be in contrast to providing a single I am not convinced requiring a stack of vectors is any cleaner conceptually than providing |
I'd argue that supporting My argument is
If we had more time I'd prefer us to get other library developers involved and/or do a downstream survey... but considering the approaching deadline how about we discuss and decide in today's call @kgryte? |
@leofang Already on today's agenda. 😅 |
solve(A, B) is the same as A\B using MATLAB notation, or, if you prefer, A.inv() @ B. When B is a 1-D array it is interpreted as a column vector, but when B is (M, K) it is interpreted as K length-M column vectors, i.e., a MxK matrix. So B being 2-dimensional does mean it is a matrix. This also shows why supporting stacks of 1-D vectors isn't really necessary. A matrix is already a "stack" of column vectors, stacked in the last dimension, since matrix inversion is linear and works column-by-column. It's of course confusing that the stacking happens in the final dimension, but that's a consequence of the convention of solve working on a column vector. If your 1-D vectors are "stacked" in the last dimension, that means they are row vectors, so you need to use Is there a way we can analyze the usage data to get an idea of how often 2-D inputs are passed as the second argument to solve? |
Minor fix (note that it's not xⱼ but xⱼₖ: |
Based on today's call, we decided to move forward with supporting stacks of matrices, but not stacks of vectors, as proposed in #305. Doing so maintains ecosystem consistency (NumPy, PyTorch) which already support providing an Furthermore, the current proposal aligns with LAPACK as mentioned by @IvanYashchuk. While one of the design principles for linear algebra APIs is implementation agnosticism, in this case, ignoring the LAPACK performance consideration was not considered advisable. Instead, while effectively "stacking" column vectors by appending a dimension, rather than prepending, is unique among other APIs in the specification, here, given existing practice and historical implementation considerations, supporting |
Please also adjust the docstring of linalg.solve accordingly.
Should be replaced by something like: Talking about a system of linear equations is highly misleading, as people stated previously, one should think of Can we also put some help, how to solve a system of linear equations? x = linalg.solve(A, b[..., None])[..., 0] or do I need some fancy broadcasting and moving of axes, like shape = b.shape
A, b = np.broadcast_arrays(A, b[..., None])
b = b[...,0].reshape(-1, shape[-1]).T
x = linalg.solve(A, b).T.reshape(shape) to benefit from the batched LAPACK calls? Sorry for all the ruckus, but solving systems of linear equations is one of the most common tasks for me, and of course I want my functions to behave like gu-functions. |
Why is it "highly misleading"? For a full-rank matrix
The only thing disallowed is In fact,
I think it'd depend on which of the cases above you have in mind. |
@IvanYashchuk As I mentioned in yesterday's call this is not the case. There are ways to make only 1 call, which is an implementation detail, so this alone is not enough to justify the API design decision we agreed upon. |
@leofang But the discussion here was that this won't be what is done. So solving a system of linear equations is (using def solve(A, b):
return np.sum(np.linalg.inv(A) * b[..., np.newaxis, :], axis=-1) while solving a linear matrix equations is def solve(A, B):
return np.linalg.inv(A) @ B
This is exactly the point, functions using
Solving a system of linear equations is |
NumPy's solve() does not handle the ambiguity around x2 being 1-D vector vs. an n-D stack of matrices in the way that the standard specifies. Namely, x2 should be treated as a 1-D vector iff it is 1-dimensional, and a stack of matrices in all other cases. This workaround is borrowed from array-api-strict. See numpy/numpy#15349 and data-apis/array-api#285. Note that this workaround only works for NumPy. CuPy currently does not support stacked vectors for solve() (see https://github.com/cupy/cupy/blob/main/cupy/cublas.py#L43), and the workaround in cupy.array_api.linalg does not seem to actually function.
Previously the np.linalg.solve documentation stated: a : (..., M, M) array_like Coefficient matrix. b : {(..., M,), (..., M, K)}, array_like however, this is inherently ambiguous. For example, if a has shape (2, 2, 2) and b has shape (2, 2), b could be treated as a (2,) stack of (2,) column vectors, in which case the result should have shape (2, 2), or as a single 2x2 matrix, in which case, the result should have shape (2, 2, 2). NumPy resolved this ambiguity in a confusing way, which was to treat b as (..., M) whenever b.ndim == a.ndim - 1, and as (..., M, K) otherwise. A much more consistent way to handle this ambiguity is to treat b as a single vector if and only if it is 1-dimensional, i.e., use b : {(M,), (..., M, K)}, array_like This is consistent with, for instance, the matmul operator, which only uses the special 1-D vector logic if an operand is exactly 1-dimensional, and treats the operands as (stacks of) 2-D matrices otherwise. This updates np.linalg.solve() to use this behavior. This is a backwards compatibility break, as any instance where the b array has more than one dimension and exactly one fewer dimension than the a array will now use the matrix logic, potentially returning a different result with a different shape. The previous behavior can be manually emulated with something like np.solve(a, b[..., None])[..., 0] since b as a (M,) vector is equivalent to b as a (M, 1) matrix (or the user could manually import and use the internal solve1() gufunc which implements the b-as-vector logic). This change aligns the solve() function with the array API, which resolves this broadcasting ambiguity in this way. See https://data-apis.org/array-api/latest/extensions/generated/array_api.linalg.solve.html#array_api.linalg.solve and data-apis/array-api#285. Fixes numpy#15349 Fixes numpy#25583
linalg.solve is presently ambiguous in its definition. See data-apis/array-api#285. So for now we are not implementing the test, as it is not possible to make a test that works.
The spec for the
linalg.solve
function seems ambiguous. Insolve(x1, x2)
,x1
has shape(..., M, M)
andx2
either has shape(..., M)
or(..., M, K)
. In either case, the...
parts should be broadcast compatible.This is ambiguous. For example, if
x1
is shape(2, 2, 2)
andx2
is shape(2, 2)
, should this be interpreted asx2
is(2,)
stack of a(2,)
vector, i.e., the result would be(2, 2, 2, 1)
after broadcasting, or as a single stack of a 2x2 matrix, i.e., resulting in(2, 2, 2, 2)
.torch.linalg.solve
docs: https://pytorch.org/docs/stable/generated/torch.linalg.solve.htmlnumpy.linalg.solve
docs: https://numpy.org/doc/stable/reference/generated/numpy.linalg.solve.html#numpy.linalg.solveRegarding NumPy, it seems to sometimes pick one over the other, even when only the other one makes sense. For example
Here it wants to treat
x2
as a single 2x1 matrix, which is shape incompatible with the 1x1x1
, but it could also treat it a(2,)
stacks of length 1 vectors.I think there are also some issues with the way the spec describes broadcasting. It says "
shape(x2)[:-1]
must be compatible withshape(x1)[:-1]
" but I think this should beshape(x2)[:-2]
and so on, since matrix dimensions should never broadcast with each other. It also says that the output should always have same shape asx2
, which contradicts that the inputs should broadcast together.If I am reading the pytorch docs correctly, it resolves this by only allowing broadcasting in the case where
x2
is exactly 1- or 2-dimensional. Otherwise whenx2
is a stack of matrices, the stack part of the shape has to match the stack part ofshape(x1)
exactly.However, I think this still is ambiguous in the case I noted above where
x1
is(2, 2, 2)
andx2
is(2, 2)
.x2
could be a matrix, which would broadcast, or a stack of a (2,) matrix, which has a matching stack shape asx1
.So I think more is required to disambiguate, e.g., only allow broadcasting for matrices and not for vectors. One could also remove the vector case completely, or only allow it in the sample case of
x2
being 1-D (i.e., no stacks of 1-D vectors).The text was updated successfully, but these errors were encountered: