linalg.solve broadcasting behavior is ambiguous #285

asmeurer · 2021-10-15T21:14:10Z

The spec for the linalg.solve function seems ambiguous. In solve(x1, x2), x1 has shape (..., M, M) and x2 either has shape (..., M) or (..., M, K). In either case, the ... parts should be broadcast compatible.

This is ambiguous. For example, if x1 is shape (2, 2, 2) and x2 is shape (2, 2), should this be interpreted as x2 is (2,) stack of a (2,) vector, i.e., the result would be (2, 2, 2, 1) after broadcasting, or as a single stack of a 2x2 matrix, i.e., resulting in (2, 2, 2, 2).

Relevant pytorch issue about this: Broadcasting behaviour for linear algebra solvers pytorch/pytorch#52915
Relevant NumPy issue: np.linalg.solve documentation suggests ambigous behaviour numpy/numpy#15349
torch.linalg.solve docs: https://pytorch.org/docs/stable/generated/torch.linalg.solve.html
numpy.linalg.solve docs: https://numpy.org/doc/stable/reference/generated/numpy.linalg.solve.html#numpy.linalg.solve

Regarding NumPy, it seems to sometimes pick one over the other, even when only the other one makes sense. For example

>>> x1 = np.eye(1)
>>> x2 = np.asarray([[0.], [0.]])
>>> x1.shape
(1, 1)
>>> x2.shape
(2, 1)
>>> np.linalg.solve(x1, x2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 5, in solve
  File "/Users/aaronmeurer/anaconda3/envs/array-apis/lib/python3.9/site-packages/numpy/linalg/linalg.py", line 393, in solve
    r = gufunc(a, b, signature=signature, extobj=extobj)
ValueError: solve: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (m,m),(m,n)->(m,n) (size 2 is different from 1)

Here it wants to treat x2 as a single 2x1 matrix, which is shape incompatible with the 1x1 x1, but it could also treat it a (2,) stacks of length 1 vectors.

I think there are also some issues with the way the spec describes broadcasting. It says "shape(x2)[:-1] must be compatible with shape(x1)[:-1]" but I think this should be shape(x2)[:-2] and so on, since matrix dimensions should never broadcast with each other. It also says that the output should always have same shape as x2, which contradicts that the inputs should broadcast together.

If I am reading the pytorch docs correctly, it resolves this by only allowing broadcasting in the case where x2 is exactly 1- or 2-dimensional. Otherwise when x2 is a stack of matrices, the stack part of the shape has to match the stack part of shape(x1) exactly.

However, I think this still is ambiguous in the case I noted above where x1 is (2, 2, 2) and x2 is (2, 2). x2 could be a matrix, which would broadcast, or a stack of a (2,) matrix, which has a matching stack shape as x1.

So I think more is required to disambiguate, e.g., only allow broadcasting for matrices and not for vectors. One could also remove the vector case completely, or only allow it in the sample case of x2 being 1-D (i.e., no stacks of 1-D vectors).

The text was updated successfully, but these errors were encountered:

linalg.solve is presently ambiguous in its definition. See data-apis/array-api#285. So for now we are not implementing the test, as it is not possible to make a test that works.

IvanYashchuk · 2021-10-16T10:43:50Z

I support the decision to allow broadcasting only for matrices and not for vectors. I don't think we need to remove the vector case completely (we need to remove stacks of 1-D vectors).

Let's think of linalg.solve(x1, x2) function as an efficient shortcut for linalg.inv(x1) @ x2. linalg.inv preserves the shape of the input, so the answer to the question of what shapes and broadcasting should be allowed for the solve function could be "do the same as @/__matmul__ allows".
Also I think the following comparison should run without errors: x1 @ np.linalg.solve(x1, x2) == x2. Currently, it would fail for x1 with shape (2, 3, 3) and x2 with shape (2, 3) even though np.linalg.solve(x1, x2) runs fine. It leads to the suggestion that a stack of 1-D vectors should not be allowed.

leofang · 2021-10-22T06:06:05Z

Sorry, I didn't have time to think about this until now. Using this way to interpret, I don't see any ambiguity:

When x1 is (2, 2, 2) and x2 is (2, 2): Since x1 must be either one matrix or a stack of matrices, the only way to have a meaningful stack shape is to interpret it as (2,). Take that to interpret x2, we get a (2,) stack of 1D vector (of shape (2,)).
When x1 is (1, 1) and x2 is (2, 1): Since x1.ndim is 2, it's a single matrix --- no stacks. Then, x2 should be interpreted as a single 2*1 matrix (or M=2 and K=1, which makes myself a bit more comfortable than calling it a matrix).
When x1 is (2, 3, 3) and x2 is (2, 3): The same logic leads to x1 is a (2,) stack of 3*3 matrices, and x2 is a (2,) stack of 1D vectors of shape (3,).

Am I missing something?

asmeurer · 2021-10-26T22:06:34Z

I think what you are missing is that the stack part of the shapes should broadcast with one another (only the stack part, the "matrix" parts should never broadcast, even if they are size 1). A single (N, M) matrix can be thought of as an array with "stack shape" (), that is a single 0-dimensional stack. This () stack shape is broadcast compatible with any other stack shape.

The whole idea with stacks of matrices is that we think of the last two dimensions of the array as single entries of matrices in a larger array. So a (2, 2, 3, 3) array is like a (2, 2) array of 3x3 matrices. If we had a (2, 2) array of numbers and a 0-dimensional () array (scalar), they would broadcast together. Similarly, a (2, 2, 3, 3) stack of matrices should combine with a single (3, 4) matrix (the only additional restriction being that the matrix shapes need to align).

This is how the matmul operator works, as @IvanYashchuk pointed out. For example, these all work:

np.ones((2, 3, 4)) @ np.ones((4, 5))
np.ones((2, 3, 4)) @ np.ones((1, 4, 5))
np.ones((2, 3, 4)) @ np.ones((2, 2, 4, 5))

In each case, the matrix part of the shape is the last two dimensions, which align for the matrix product (3x4 and 4x5). The stack parts are

(2,), ()
(2,), (1,)
(2,), (2, 2)

which are broadcast compatible. The resulting shape for the first two is (2, 3, 5) and for the last one (2, 2, 3, 5).

leofang · 2021-10-27T01:55:57Z

Thank you, Aaron! Indeed, I missed the broadcasting behavior, and always calculated the batch array from x1.shape[:-2] to determine how to interpret x2. But, from my mistake we can see a possibility to remove the ambiguity.

My preference would be also to "disable a stack of 1D vectors", but to do it the other way around: disable the possibility of x2 having shape of (..., M, K). The factor K there is never intuitive to users (not to me, at least), especially when we also accept prepended batch dimensions "..." for x2. If we want batches, we should always make it prepended. This also allows a unique broadcasting behavior for "a stack of 1D vectors".

kgryte · 2021-11-04T04:07:57Z

@leofang My sense is that the more general solution here is allowing x2 to have shape (..., M, K), which satisfies both the use case of stacked Mx1 matrices (equivalent to M length vectors) and MxK matrices (multiple columns of ordinate values).

Accordingly, my preference is allowing the following, in line with @asmeurer's and @IvanYashchuk's proposal:

x1 with shape (..., M, M) and x2 with shape (M).
x1 with shape (..., M, M) and x2 with shape (..., M, K).

Only in (2) would stacking in x2 be supported.

DerWeh · 2021-11-04T07:10:59Z

I am on the same page with @leofang the matrix case has an additional dimension K, which is not involved.
This becomes clear using the Einstein notation for the matrix equations. The vector version is:

$A_{ij}x_j = b_i \Rightarrow x_j = {(A^{-1})}_{ji} b_i$ .

We see, that both indices i and j are relevant to the problem. The matrix version, however, is

$A_{ij}x_{jk} = B_{ik} \Rightarrow x_{jk} = {(A^{-1})}_{ji} B_{ik}$ .

We see, that the index k is not involved in the actual calculation. We can solve the problem independently of k, and just stack the solutions to obtain a matrix.

kgryte · 2021-11-04T08:07:22Z

If I understand your point correctly, @DerWeh, your point is that, if we support stacks of vectors, we don't need to support providing x2 as a matrix. And as a user, if you want to solve for multiple ordinate vectors, one could stack an MxM matrix x1 for each x2 ordinate vector.

The above would be in contrast to providing a single MxM matrix and K column vectors.

I am not convinced requiring a stack of vectors is any cleaner conceptually than providing K column vectors. It would still seem that allowing K right-hand sides affords a more general functionality than only allowing stacks of vectors.

leofang · 2021-11-04T15:46:40Z

I'd argue that supporting (..., M, K) has been one of the poor, legacy design decisions made in NumPy. Since we've been trying to avoid some of such decisions in the spec, we might as well follow the same route here and not support it.

My argument is

I don't see why we can't place K into the stack dimensions and let the solver internally broadcast it for us; this should be possible (with well-defined semantics):
- x1 with shape (..., M, M) and x2 with shape (M,)
- x1 with shape (..., M, M) and x2 with shape (..., M)
- x1 with shape (..., M, M) and x2 with shape (..., K, M) for K columns: Internally the solver can reshape x1 into (..., 1, M, M) and then it'd be broadcastable
Mathematically it's unintuitive as for what's going on in terms of tensor contraction, like @DerWeh rightfully pointed out

If we had more time I'd prefer us to get other library developers involved and/or do a downstream survey... but considering the approaching deadline how about we discuss and decide in today's call @kgryte?

kgryte · 2021-11-04T16:16:52Z

@leofang Already on today's agenda. 😅

asmeurer · 2021-11-04T16:22:41Z

solve(A, B) is the same as A\B using MATLAB notation, or, if you prefer, A.inv() @ B. When B is a 1-D array it is interpreted as a column vector, but when B is (M, K) it is interpreted as K length-M column vectors, i.e., a MxK matrix. So B being 2-dimensional does mean it is a matrix.

This also shows why supporting stacks of 1-D vectors isn't really necessary. A matrix is already a "stack" of column vectors, stacked in the last dimension, since matrix inversion is linear and works column-by-column. It's of course confusing that the stacking happens in the final dimension, but that's a consequence of the convention of solve working on a column vector. If your 1-D vectors are "stacked" in the last dimension, that means they are row vectors, so you need to use mT to convert them to column vectors before passing to solve. Only supporting stacks of vectors as you suggest would effectively mean solve inverts a row vector instead of a column vector.

Is there a way we can analyze the usage data to get an idea of how often 2-D inputs are passed as the second argument to solve?

IvanYashchuk · 2021-11-04T16:28:45Z

I don't see why we can't place K into the stack dimensions and let the solver internally broadcast it for us

linalg.solve uses LAPACK (cuSOLVER in case of CuPy) as a backend and these libraries do not have efficient batched variants and it's more efficient to have the K number of columns for x2 than a stack of vectors.
Consider two cases:

x1 with shape (M, M) and x2 is a stack of vectors with shape (1000, M); it is 1000 calls to LAPACK API if no reshaping of x2 to (M, 1000) is done.
x1 with shape (..., M, M) and x2 matrix with 1000 columns (M, 1000) then it's only one call to LAPACK API per each matrix in (possibly batched) x1.

We see, that both indices i and j are relevant to the problem. The matrix version, however, is

![](https://render.githubusercontent.com/render/math?math=A_{ij}x_j = B_{ik} \Rightarrow x_j = {(A^{-1})}{ji} B{ik}).

We see, that the index k is not involved in the actual calculation. We can solve the problem independently of k, and just stack the solutions to obtain a matrix.

Minor fix (note that it's not xⱼ but xⱼₖ:
Aᵢⱼxⱼ = Bᵢₖ ⇒ xⱼₖ = A⁻¹ⱼᵢ Bᵢₖ
Inversion of the matrix is independent of k - yes, but the multiplication of A⁻¹ by B is not independent of k.

kgryte · 2021-11-04T18:55:17Z

Based on today's call, we decided to move forward with supporting stacks of matrices, but not stacks of vectors, as proposed in #305. Doing so maintains ecosystem consistency (NumPy, PyTorch) which already support providing an (....,M,K) matrix of ordinate vectors.

Furthermore, the current proposal aligns with LAPACK as mentioned by @IvanYashchuk. While one of the design principles for linear algebra APIs is implementation agnosticism, in this case, ignoring the LAPACK performance consideration was not considered advisable. Instead, while effectively "stacking" column vectors by appending a dimension, rather than prepending, is unique among other APIs in the specification, here, given existing practice and historical implementation considerations, supporting K column vectors seems reasonable.

DerWeh · 2021-11-04T21:53:36Z

Please also adjust the docstring of linalg.solve accordingly.

Returns the solution to the system of linear equations represented by the well-determined (i.e., full rank) linear matrix equation AX = B .

Should be replaced by something like:
Return the solution to the linear matrix equation `AX = B` with full rank `A`.

Talking about a system of linear equations is highly misleading, as people stated previously, one should think of linalg.inv(x1) @ x2 instead.
Or maybe write that explicitly?

Can we also put some help, how to solve a system of linear equations?
Is the right way to use:

x = linalg.solve(A, b[..., None])[..., 0]

or do I need some fancy broadcasting and moving of axes, like

shape = b.shape
A, b = np.broadcast_arrays(A, b[..., None])
b = b[...,0].reshape(-1, shape[-1]).T
x = linalg.solve(A, b).T.reshape(shape)

to benefit from the batched LAPACK calls?

Sorry for all the ruckus, but solving systems of linear equations is one of the most common tasks for me, and of course I want my functions to behave like gu-functions.

leofang · 2021-11-05T18:40:48Z

Talking about a system of linear equations is highly misleading, as people stated previously, one should think of linalg.inv(x1) @ x2 instead.
Or maybe write that explicitly?

Why is it "highly misleading"? For a full-rank matrix A and a vector b, solving A x = b is equivalent to x = A^{-1} b. I don't see any issue. Here the discussion is essentially around the behavior when you have a stack of A and/or b:

1 A, 1 b: A.shape = (M, M), b.shape = (M,)
1 A, K b: A.shape = (M, M), b.shape = (M, K)
a stack of A for the same b: A.shape = (..., M, M), b.shape = (M,)
a stack of A with a stack of K b: A.shape = (..., M, M), b.shape = (..., M, K) (with the same stack shape "...")

The only thing disallowed is A.shape = (..., M, M), b.shape = (..., M).

In fact, solve() is meant to be the API for "solving a system of linear equations", which means to return the solution x for A x = b. The rest is implementation detail (including API design, docstring, etc).

Can we also put some help, how to solve a system of linear equations?

I think it'd depend on which of the cases above you have in mind.

leofang · 2021-11-05T18:42:53Z

x1 with shape (M, M) and x2 is a stack of vectors with shape (1000, M); it is 1000 calls to LAPACK API if no reshaping of x2 to (M, 1000) is done.

@IvanYashchuk As I mentioned in yesterday's call this is not the case. There are ways to make only 1 call, which is an implementation detail, so this alone is not enough to justify the API design decision we agreed upon.

DerWeh · 2021-11-06T13:05:07Z

@leofang But the discussion here was that this won't be what is done. x = A^{-1} b would be a matrix-vector product (sum over last dimension of A and b), however, discussions say we should think of it as x=A^{-1} @ b, a matrix-matrix product (sum over last dimension of A and second but last of b).

So solving a system of linear equations is (using numpy)

def solve(A, b):
    return np.sum(np.linalg.inv(A) * b[..., np.newaxis, :], axis=-1)

while solving a linear matrix equations is

def solve(A, B):
    return np.linalg.inv(A) @ B

Here the discussion is essentially around the behavior when you have a stack of A and/or b

This is exactly the point, functions using solve should always give the right results, so they have to treat stacks correctly.

I think it'd depend on which of the cases above you have in mind.

Solving a system of linear equations is A.shape = (..., M, M), b.shape = (..., M), where ... are arbitrary leading dimensions that are broadcastable.

NumPy's solve() does not handle the ambiguity around x2 being 1-D vector vs. an n-D stack of matrices in the way that the standard specifies. Namely, x2 should be treated as a 1-D vector iff it is 1-dimensional, and a stack of matrices in all other cases. This workaround is borrowed from array-api-strict. See numpy/numpy#15349 and data-apis/array-api#285. Note that this workaround only works for NumPy. CuPy currently does not support stacked vectors for solve() (see https://github.com/cupy/cupy/blob/main/cupy/cublas.py#L43), and the workaround in cupy.array_api.linalg does not seem to actually function.

Previously the np.linalg.solve documentation stated: a : (..., M, M) array_like Coefficient matrix. b : {(..., M,), (..., M, K)}, array_like however, this is inherently ambiguous. For example, if a has shape (2, 2, 2) and b has shape (2, 2), b could be treated as a (2,) stack of (2,) column vectors, in which case the result should have shape (2, 2), or as a single 2x2 matrix, in which case, the result should have shape (2, 2, 2). NumPy resolved this ambiguity in a confusing way, which was to treat b as (..., M) whenever b.ndim == a.ndim - 1, and as (..., M, K) otherwise. A much more consistent way to handle this ambiguity is to treat b as a single vector if and only if it is 1-dimensional, i.e., use b : {(M,), (..., M, K)}, array_like This is consistent with, for instance, the matmul operator, which only uses the special 1-D vector logic if an operand is exactly 1-dimensional, and treats the operands as (stacks of) 2-D matrices otherwise. This updates np.linalg.solve() to use this behavior. This is a backwards compatibility break, as any instance where the b array has more than one dimension and exactly one fewer dimension than the a array will now use the matrix logic, potentially returning a different result with a different shape. The previous behavior can be manually emulated with something like np.solve(a, b[..., None])[..., 0] since b as a (M,) vector is equivalent to b as a (M, 1) matrix (or the user could manually import and use the internal solve1() gufunc which implements the b-as-vector logic). This change aligns the solve() function with the array API, which resolves this broadcasting ambiguity in this way. See https://data-apis.org/array-api/latest/extensions/generated/array_api.linalg.solve.html#array_api.linalg.solve and data-apis/array-api#285. Fixes numpy#15349 Fixes numpy#25583

linalg.solve is presently ambiguous in its definition. See data-apis/array-api#285. So for now we are not implementing the test, as it is not possible to make a test that works.

kgryte changed the title ~~linalg.solve is ambiguous~~ linalg.solve broadcasting behavior is ambiguous Oct 21, 2021

kgryte added this to the v2021 milestone Nov 4, 2021

kgryte added the topic: Linear Algebra Linear algebra. label Nov 4, 2021

kgryte mentioned this issue Nov 4, 2021

Disallow providing a stack of vectors to linalg.solve #305

Merged

kgryte closed this as completed in #305 Nov 9, 2021

jakevdp mentioned this issue Jan 5, 2024

jnp.linalg.solve: fully implement batched cases jax-ml/jax#19224

Merged

jakevdp mentioned this issue Jan 16, 2024

DOC: Document different broadcasting rules for linalg vs np.broadcast numpy/numpy#25583

Closed

jakevdp mentioned this issue Feb 6, 2024

jnp.linalg.solve: deprecate batched 1D solves when b.ndim > 1 jax-ml/jax#19674

Merged

asmeurer mentioned this issue Feb 22, 2024

Workaround np.linalg.solve ambiguity data-apis/array-api-compat#93

Merged

asmeurer mentioned this issue Mar 1, 2024

API: Remove broadcasting ambiguity from np.linalg.solve numpy/numpy#25914

Merged

asmeurer mentioned this issue May 22, 2024

Broadcasting behaviour for linear algebra solvers pytorch/pytorch#52915

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linalg.solve broadcasting behavior is ambiguous #285

linalg.solve broadcasting behavior is ambiguous #285

asmeurer commented Oct 15, 2021 •

edited by leofang

Loading

IvanYashchuk commented Oct 16, 2021

leofang commented Oct 22, 2021

asmeurer commented Oct 26, 2021

leofang commented Oct 27, 2021

kgryte commented Nov 4, 2021 •

edited

Loading

DerWeh commented Nov 4, 2021 •

edited

Loading

kgryte commented Nov 4, 2021

leofang commented Nov 4, 2021

kgryte commented Nov 4, 2021

asmeurer commented Nov 4, 2021

IvanYashchuk commented Nov 4, 2021 •

edited

Loading

kgryte commented Nov 4, 2021

DerWeh commented Nov 4, 2021

leofang commented Nov 5, 2021

leofang commented Nov 5, 2021 •

edited

Loading

DerWeh commented Nov 6, 2021 •

edited

Loading

linalg.solve broadcasting behavior is ambiguous #285

linalg.solve broadcasting behavior is ambiguous #285

Comments

asmeurer commented Oct 15, 2021 • edited by leofang Loading

IvanYashchuk commented Oct 16, 2021

leofang commented Oct 22, 2021

asmeurer commented Oct 26, 2021

leofang commented Oct 27, 2021

kgryte commented Nov 4, 2021 • edited Loading

DerWeh commented Nov 4, 2021 • edited Loading

kgryte commented Nov 4, 2021

leofang commented Nov 4, 2021

kgryte commented Nov 4, 2021

asmeurer commented Nov 4, 2021

IvanYashchuk commented Nov 4, 2021 • edited Loading

kgryte commented Nov 4, 2021

DerWeh commented Nov 4, 2021

leofang commented Nov 5, 2021

leofang commented Nov 5, 2021 • edited Loading

DerWeh commented Nov 6, 2021 • edited Loading

asmeurer commented Oct 15, 2021 •

edited by leofang

Loading

kgryte commented Nov 4, 2021 •

edited

Loading

DerWeh commented Nov 4, 2021 •

edited

Loading

IvanYashchuk commented Nov 4, 2021 •

edited

Loading

leofang commented Nov 5, 2021 •

edited

Loading

DerWeh commented Nov 6, 2021 •

edited

Loading