Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linalg.solve broadcasting behavior is ambiguous #285

Closed
asmeurer opened this issue Oct 15, 2021 · 16 comments · Fixed by #305
Closed

linalg.solve broadcasting behavior is ambiguous #285

asmeurer opened this issue Oct 15, 2021 · 16 comments · Fixed by #305
Labels
topic: Linear Algebra Linear algebra.
Milestone

Comments

@asmeurer
Copy link
Member

asmeurer commented Oct 15, 2021

The spec for the linalg.solve function seems ambiguous. In solve(x1, x2), x1 has shape (..., M, M) and x2 either has shape (..., M) or (..., M, K). In either case, the ... parts should be broadcast compatible.

This is ambiguous. For example, if x1 is shape (2, 2, 2) and x2 is shape (2, 2), should this be interpreted as x2 is (2,) stack of a (2,) vector, i.e., the result would be (2, 2, 2, 1) after broadcasting, or as a single stack of a 2x2 matrix, i.e., resulting in (2, 2, 2, 2).

Regarding NumPy, it seems to sometimes pick one over the other, even when only the other one makes sense. For example

>>> x1 = np.eye(1)
>>> x2 = np.asarray([[0.], [0.]])
>>> x1.shape
(1, 1)
>>> x2.shape
(2, 1)
>>> np.linalg.solve(x1, x2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 5, in solve
  File "/Users/aaronmeurer/anaconda3/envs/array-apis/lib/python3.9/site-packages/numpy/linalg/linalg.py", line 393, in solve
    r = gufunc(a, b, signature=signature, extobj=extobj)
ValueError: solve: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (m,m),(m,n)->(m,n) (size 2 is different from 1)

Here it wants to treat x2 as a single 2x1 matrix, which is shape incompatible with the 1x1 x1, but it could also treat it a (2,) stacks of length 1 vectors.

I think there are also some issues with the way the spec describes broadcasting. It says "shape(x2)[:-1] must be compatible with shape(x1)[:-1]" but I think this should be shape(x2)[:-2] and so on, since matrix dimensions should never broadcast with each other. It also says that the output should always have same shape as x2, which contradicts that the inputs should broadcast together.

If I am reading the pytorch docs correctly, it resolves this by only allowing broadcasting in the case where x2 is exactly 1- or 2-dimensional. Otherwise when x2 is a stack of matrices, the stack part of the shape has to match the stack part of shape(x1) exactly.

However, I think this still is ambiguous in the case I noted above where x1 is (2, 2, 2) and x2 is (2, 2). x2 could be a matrix, which would broadcast, or a stack of a (2,) matrix, which has a matching stack shape as x1.

So I think more is required to disambiguate, e.g., only allow broadcasting for matrices and not for vectors. One could also remove the vector case completely, or only allow it in the sample case of x2 being 1-D (i.e., no stacks of 1-D vectors).

asmeurer added a commit to data-apis/array-api-tests that referenced this issue Oct 15, 2021
linalg.solve is presently ambiguous in its definition. See
data-apis/array-api#285. So for now we are not
implementing the test, as it is not possible to make a test that works.
@IvanYashchuk
Copy link

I support the decision to allow broadcasting only for matrices and not for vectors. I don't think we need to remove the vector case completely (we need to remove stacks of 1-D vectors).

Let's think of linalg.solve(x1, x2) function as an efficient shortcut for linalg.inv(x1) @ x2. linalg.inv preserves the shape of the input, so the answer to the question of what shapes and broadcasting should be allowed for the solve function could be "do the same as @/__matmul__ allows".
Also I think the following comparison should run without errors: x1 @ np.linalg.solve(x1, x2) == x2. Currently, it would fail for x1 with shape (2, 3, 3) and x2 with shape (2, 3) even though np.linalg.solve(x1, x2) runs fine. It leads to the suggestion that a stack of 1-D vectors should not be allowed.

@kgryte kgryte changed the title linalg.solve is ambiguous linalg.solve broadcasting behavior is ambiguous Oct 21, 2021
@leofang
Copy link
Contributor

leofang commented Oct 22, 2021

Sorry, I didn't have time to think about this until now. Using this way to interpret, I don't see any ambiguity:

  1. When x1 is (2, 2, 2) and x2 is (2, 2): Since x1 must be either one matrix or a stack of matrices, the only way to have a meaningful stack shape is to interpret it as (2,). Take that to interpret x2, we get a (2,) stack of 1D vector (of shape (2,)).
  2. When x1 is (1, 1) and x2 is (2, 1): Since x1.ndim is 2, it's a single matrix --- no stacks. Then, x2 should be interpreted as a single 2*1 matrix (or M=2 and K=1, which makes myself a bit more comfortable than calling it a matrix).
  3. When x1 is (2, 3, 3) and x2 is (2, 3): The same logic leads to x1 is a (2,) stack of 3*3 matrices, and x2 is a (2,) stack of 1D vectors of shape (3,).

Am I missing something?

@asmeurer
Copy link
Member Author

I think what you are missing is that the stack part of the shapes should broadcast with one another (only the stack part, the "matrix" parts should never broadcast, even if they are size 1). A single (N, M) matrix can be thought of as an array with "stack shape" (), that is a single 0-dimensional stack. This () stack shape is broadcast compatible with any other stack shape.

The whole idea with stacks of matrices is that we think of the last two dimensions of the array as single entries of matrices in a larger array. So a (2, 2, 3, 3) array is like a (2, 2) array of 3x3 matrices. If we had a (2, 2) array of numbers and a 0-dimensional () array (scalar), they would broadcast together. Similarly, a (2, 2, 3, 3) stack of matrices should combine with a single (3, 4) matrix (the only additional restriction being that the matrix shapes need to align).

This is how the matmul operator works, as @IvanYashchuk pointed out. For example, these all work:

np.ones((2, 3, 4)) @ np.ones((4, 5))
np.ones((2, 3, 4)) @ np.ones((1, 4, 5))
np.ones((2, 3, 4)) @ np.ones((2, 2, 4, 5))

In each case, the matrix part of the shape is the last two dimensions, which align for the matrix product (3x4 and 4x5). The stack parts are

(2,), ()
(2,), (1,)
(2,), (2, 2)

which are broadcast compatible. The resulting shape for the first two is (2, 3, 5) and for the last one (2, 2, 3, 5).

@leofang
Copy link
Contributor

leofang commented Oct 27, 2021

Thank you, Aaron! Indeed, I missed the broadcasting behavior, and always calculated the batch array from x1.shape[:-2] to determine how to interpret x2. But, from my mistake we can see a possibility to remove the ambiguity.

My preference would be also to "disable a stack of 1D vectors", but to do it the other way around: disable the possibility of x2 having shape of (..., M, K). The factor K there is never intuitive to users (not to me, at least), especially when we also accept prepended batch dimensions "..." for x2. If we want batches, we should always make it prepended. This also allows a unique broadcasting behavior for "a stack of 1D vectors".

@kgryte kgryte added this to the v2021 milestone Nov 4, 2021
@kgryte kgryte added the topic: Linear Algebra Linear algebra. label Nov 4, 2021
@kgryte
Copy link
Contributor

kgryte commented Nov 4, 2021

@leofang My sense is that the more general solution here is allowing x2 to have shape (..., M, K), which satisfies both the use case of stacked Mx1 matrices (equivalent to M length vectors) and MxK matrices (multiple columns of ordinate values).

Accordingly, my preference is allowing the following, in line with @asmeurer's and @IvanYashchuk's proposal:

  1. x1 with shape (..., M, M) and x2 with shape (M).
  2. x1 with shape (..., M, M) and x2 with shape (..., M, K).

Only in (2) would stacking in x2 be supported.

@DerWeh
Copy link

DerWeh commented Nov 4, 2021

I am on the same page with @leofang the matrix case has an additional dimension K, which is not involved.
This becomes clear using the Einstein notation for the matrix equations. The vector version is:

.

We see, that both indices i and j are relevant to the problem. The matrix version, however, is

.

We see, that the index k is not involved in the actual calculation. We can solve the problem independently of k, and just stack the solutions to obtain a matrix.

@kgryte
Copy link
Contributor

kgryte commented Nov 4, 2021

If I understand your point correctly, @DerWeh, your point is that, if we support stacks of vectors, we don't need to support providing x2 as a matrix. And as a user, if you want to solve for multiple ordinate vectors, one could stack an MxM matrix x1 for each x2 ordinate vector.

The above would be in contrast to providing a single MxM matrix and K column vectors.

I am not convinced requiring a stack of vectors is any cleaner conceptually than providing K column vectors. It would still seem that allowing K right-hand sides affords a more general functionality than only allowing stacks of vectors.

@leofang
Copy link
Contributor

leofang commented Nov 4, 2021

I'd argue that supporting (..., M, K) has been one of the poor, legacy design decisions made in NumPy. Since we've been trying to avoid some of such decisions in the spec, we might as well follow the same route here and not support it.

My argument is

  1. I don't see why we can't place K into the stack dimensions and let the solver internally broadcast it for us; this should be possible (with well-defined semantics):
    • x1 with shape (..., M, M) and x2 with shape (M,)
    • x1 with shape (..., M, M) and x2 with shape (..., M)
    • x1 with shape (..., M, M) and x2 with shape (..., K, M) for K columns: Internally the solver can reshape x1 into (..., 1, M, M) and then it'd be broadcastable
  2. Mathematically it's unintuitive as for what's going on in terms of tensor contraction, like @DerWeh rightfully pointed out

If we had more time I'd prefer us to get other library developers involved and/or do a downstream survey... but considering the approaching deadline how about we discuss and decide in today's call @kgryte?

@kgryte
Copy link
Contributor

kgryte commented Nov 4, 2021

@leofang Already on today's agenda. 😅

@asmeurer
Copy link
Member Author

asmeurer commented Nov 4, 2021

solve(A, B) is the same as A\B using MATLAB notation, or, if you prefer, A.inv() @ B. When B is a 1-D array it is interpreted as a column vector, but when B is (M, K) it is interpreted as K length-M column vectors, i.e., a MxK matrix. So B being 2-dimensional does mean it is a matrix.

This also shows why supporting stacks of 1-D vectors isn't really necessary. A matrix is already a "stack" of column vectors, stacked in the last dimension, since matrix inversion is linear and works column-by-column. It's of course confusing that the stacking happens in the final dimension, but that's a consequence of the convention of solve working on a column vector. If your 1-D vectors are "stacked" in the last dimension, that means they are row vectors, so you need to use mT to convert them to column vectors before passing to solve. Only supporting stacks of vectors as you suggest would effectively mean solve inverts a row vector instead of a column vector.

Is there a way we can analyze the usage data to get an idea of how often 2-D inputs are passed as the second argument to solve?

@IvanYashchuk
Copy link

IvanYashchuk commented Nov 4, 2021

I don't see why we can't place K into the stack dimensions and let the solver internally broadcast it for us

linalg.solve uses LAPACK (cuSOLVER in case of CuPy) as a backend and these libraries do not have efficient batched variants and it's more efficient to have the K number of columns for x2 than a stack of vectors.
Consider two cases:

  1. x1 with shape (M, M) and x2 is a stack of vectors with shape (1000, M); it is 1000 calls to LAPACK API if no reshaping of x2 to (M, 1000) is done.
  2. x1 with shape (..., M, M) and x2 matrix with 1000 columns (M, 1000) then it's only one call to LAPACK API per each matrix in (possibly batched) x1.

We see, that both indices i and j are relevant to the problem. The matrix version, however, is

![](https://render.githubusercontent.com/render/math?math=A_{ij}x_j = B_{ik} \Rightarrow x_j = {(A^{-1})}{ji} B{ik}).

We see, that the index k is not involved in the actual calculation. We can solve the problem independently of k, and just stack the solutions to obtain a matrix.

Minor fix (note that it's not xⱼ but xⱼₖ:
Aᵢⱼxⱼ = Bᵢₖ ⇒ xⱼₖ = A⁻¹ⱼᵢ Bᵢₖ
Inversion of the matrix is independent of k - yes, but the multiplication of A⁻¹ by B is not independent of k.

@kgryte
Copy link
Contributor

kgryte commented Nov 4, 2021

Based on today's call, we decided to move forward with supporting stacks of matrices, but not stacks of vectors, as proposed in #305. Doing so maintains ecosystem consistency (NumPy, PyTorch) which already support providing an (....,M,K) matrix of ordinate vectors.

Furthermore, the current proposal aligns with LAPACK as mentioned by @IvanYashchuk. While one of the design principles for linear algebra APIs is implementation agnosticism, in this case, ignoring the LAPACK performance consideration was not considered advisable. Instead, while effectively "stacking" column vectors by appending a dimension, rather than prepending, is unique among other APIs in the specification, here, given existing practice and historical implementation considerations, supporting K column vectors seems reasonable.

@DerWeh
Copy link

DerWeh commented Nov 4, 2021

Please also adjust the docstring of linalg.solve accordingly.

Returns the solution to the system of linear equations represented by the well-determined (i.e., full rank) linear matrix equation AX = B .

Should be replaced by something like:
Return the solution to the linear matrix equation `AX = B` with full rank `A`.

Talking about a system of linear equations is highly misleading, as people stated previously, one should think of linalg.inv(x1) @ x2 instead.
Or maybe write that explicitly?

Can we also put some help, how to solve a system of linear equations?
Is the right way to use:

x = linalg.solve(A, b[..., None])[..., 0]

or do I need some fancy broadcasting and moving of axes, like

shape = b.shape
A, b = np.broadcast_arrays(A, b[..., None])
b = b[...,0].reshape(-1, shape[-1]).T
x = linalg.solve(A, b).T.reshape(shape)

to benefit from the batched LAPACK calls?

Sorry for all the ruckus, but solving systems of linear equations is one of the most common tasks for me, and of course I want my functions to behave like gu-functions.

@leofang
Copy link
Contributor

leofang commented Nov 5, 2021

Talking about a system of linear equations is highly misleading, as people stated previously, one should think of linalg.inv(x1) @ x2 instead.
Or maybe write that explicitly?

Why is it "highly misleading"? For a full-rank matrix A and a vector b, solving A x = b is equivalent to x = A^{-1} b. I don't see any issue. Here the discussion is essentially around the behavior when you have a stack of A and/or b:

  • 1 A, 1 b: A.shape = (M, M), b.shape = (M,)
  • 1 A, K b: A.shape = (M, M), b.shape = (M, K)
  • a stack of A for the same b: A.shape = (..., M, M), b.shape = (M,)
  • a stack of A with a stack of K b: A.shape = (..., M, M), b.shape = (..., M, K) (with the same stack shape "...")

The only thing disallowed is A.shape = (..., M, M), b.shape = (..., M).

In fact, solve() is meant to be the API for "solving a system of linear equations", which means to return the solution x for A x = b. The rest is implementation detail (including API design, docstring, etc).

Can we also put some help, how to solve a system of linear equations?

I think it'd depend on which of the cases above you have in mind.

@leofang
Copy link
Contributor

leofang commented Nov 5, 2021

  1. x1 with shape (M, M) and x2 is a stack of vectors with shape (1000, M); it is 1000 calls to LAPACK API if no reshaping of x2 to (M, 1000) is done.

@IvanYashchuk As I mentioned in yesterday's call this is not the case. There are ways to make only 1 call, which is an implementation detail, so this alone is not enough to justify the API design decision we agreed upon.

@DerWeh
Copy link

DerWeh commented Nov 6, 2021

@leofang But the discussion here was that this won't be what is done. x = A^{-1} b would be a matrix-vector product (sum over last dimension of A and b), however, discussions say we should think of it as x=A^{-1} @ b, a matrix-matrix product (sum over last dimension of A and second but last of b).

So solving a system of linear equations is (using numpy)

def solve(A, b):
    return np.sum(np.linalg.inv(A) * b[..., np.newaxis, :], axis=-1)

while solving a linear matrix equations is

def solve(A, B):
    return np.linalg.inv(A) @ B

Here the discussion is essentially around the behavior when you have a stack of A and/or b

This is exactly the point, functions using solve should always give the right results, so they have to treat stacks correctly.

I think it'd depend on which of the cases above you have in mind.

Solving a system of linear equations is A.shape = (..., M, M), b.shape = (..., M), where ... are arbitrary leading dimensions that are broadcastable.

asmeurer added a commit to asmeurer/array-api-compat that referenced this issue Feb 22, 2024
NumPy's solve() does not handle the ambiguity around x2 being 1-D vector vs.
an n-D stack of matrices in the way that the standard specifies. Namely, x2
should be treated as a 1-D vector iff it is 1-dimensional, and a stack of
matrices in all other cases. This workaround is borrowed from
array-api-strict.

See numpy/numpy#15349 and
data-apis/array-api#285.

Note that this workaround only works for NumPy. CuPy currently does not
support stacked vectors for solve() (see
https://github.com/cupy/cupy/blob/main/cupy/cublas.py#L43), and the workaround
in cupy.array_api.linalg does not seem to actually function.
asmeurer added a commit to asmeurer/numpy that referenced this issue Mar 1, 2024
Previously the np.linalg.solve documentation stated:

a : (..., M, M) array_like
    Coefficient matrix.
b : {(..., M,), (..., M, K)}, array_like

however, this is inherently ambiguous. For example, if a has shape (2, 2, 2)
and b has shape (2, 2), b could be treated as a (2,) stack of (2,) column
vectors, in which case the result should have shape (2, 2), or as a single 2x2
matrix, in which case, the result should have shape (2, 2, 2).

NumPy resolved this ambiguity in a confusing way, which was to treat b as
(..., M) whenever b.ndim == a.ndim - 1, and as (..., M, K) otherwise.

A much more consistent way to handle this ambiguity is to treat b as a single
vector if and only if it is 1-dimensional, i.e., use

b : {(M,), (..., M, K)}, array_like

This is consistent with, for instance, the matmul operator, which only uses
the special 1-D vector logic if an operand is exactly 1-dimensional, and
treats the operands as (stacks of) 2-D matrices otherwise.

This updates np.linalg.solve() to use this behavior.

This is a backwards compatibility break, as any instance where the b array has
more than one dimension and exactly one fewer dimension than the a array will
now use the matrix logic, potentially returning a different result with a
different shape. The previous behavior can be manually emulated with something
like

np.solve(a, b[..., None])[..., 0]

since b as a (M,) vector is equivalent to b as a (M, 1) matrix (or the user
could manually import and use the internal solve1() gufunc which implements
the b-as-vector logic).

This change aligns the solve() function with the array API, which resolves
this broadcasting ambiguity in this way. See
https://data-apis.org/array-api/latest/extensions/generated/array_api.linalg.solve.html#array_api.linalg.solve
and data-apis/array-api#285.

Fixes numpy#15349
Fixes numpy#25583
cr313 added a commit to cr313/test-array-api that referenced this issue Apr 19, 2024
linalg.solve is presently ambiguous in its definition. See
data-apis/array-api#285. So for now we are not
implementing the test, as it is not possible to make a test that works.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: Linear Algebra Linear algebra.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants