Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor LinearExpression and Constraint using composition #55

Merged
merged 26 commits into from
Dec 9, 2022

Conversation

coroa
Copy link
Member

@coroa coroa commented Nov 3, 2022

Since there are unsafe methods in xr.Dataset that do not correctly pass on the LinearExpression or Constraint behaviour, we switch the two classes to a composition instead of the current inheritance pattern.

This is meant to address @lumbric's comment #43 (comment) that another design pattern would be saner.

Note that several methods that are probably safe are not forwarded through to the new LinearExpression.

Variable is still inheritance based and should also be converted similarly.

Several docstrings are still missing.

Since there are unsafe methods in xr.Dataset that do not correctly
pass on the LinearExpression or Constraint behaviour, we switch
the two classes to a composition instead of inheritance scheme.
@coroa coroa requested a review from FabianHofmann November 3, 2022 15:14
@coroa coroa marked this pull request as draft November 3, 2022 15:17
@coroa coroa force-pushed the composition-design branch from f48ce23 to ef9d2ee Compare November 3, 2022 15:22
@codecov
Copy link

codecov bot commented Nov 3, 2022

Codecov Report

Base: 86.79% // Head: 87.03% // Increases project coverage by +0.24% 🎉

Coverage data is based on head (749b673) compared to base (90bc355).
Patch coverage: 93.57% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #55      +/-   ##
==========================================
+ Coverage   86.79%   87.03%   +0.24%     
==========================================
  Files          10       13       +3     
  Lines        2067     2214     +147     
  Branches      329      348      +19     
==========================================
+ Hits         1794     1927     +133     
- Misses        202      209       +7     
- Partials       71       78       +7     
Impacted Files Coverage Δ
linopy/monkey_patch_xarray.py 83.33% <83.33%> (ø)
linopy/variables.py 90.93% <87.50%> (-0.43%) ⬇️
linopy/constraints.py 85.90% <95.83%> (+0.64%) ⬆️
linopy/expressions.py 87.15% <96.18%> (+0.70%) ⬆️
linopy/__init__.py 100.00% <100.00%> (ø)
linopy/common.py 89.09% <100.00%> (ø)
linopy/model.py 90.32% <100.00%> (ø)
linopy/testing.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@FabianHofmann
Copy link
Collaborator

@coroa this is great stuff! Stable and backwards-compatible (with a tiny fix, the pypsa optimization already runs through). If you want I'd propose the implementation for the Variable class to properly understand everything. Otherwise I'd wait till you say this is ready for review. What do you think?

def __init__(self, *args, **kwargs):

# workaround until https://github.com/pydata/xarray/pull/5984 is merged
if isinstance(args[0], DataArray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to suggest that the class supports initialization from a DataArray object passed as first positional parameter args[0]. The new interface is not backwards compatible in this way. But I found only one place where the constructor Constraint() is used directly, so I guess this should be fine. I don't really understand the comment and how it is related to the linked pull request.

Copy link
Member Author

@coroa coroa Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR 5984 was merged for xarray 0.20.2 , ie below what we currently support as oldest possible version so not relevant anymore.

Indeed, I changed the Constraint semantics to only ever possibly refer to a full named Constraint that is already represented within a Constraints container (i considered renaming it to BoundConstraint to make this more obvious).

The AnonymousConstraint seemed to fulfill the need for anything that is not attached to a model yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note also that this means that you cannot use Constraint to represent a slice of a constraint, ie. there is no way to sel into a constraint and still use a Constraint object to hold it. (i only understood that feature through converting Variable where this is needed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm the workaround for PR 5984 is obsolete, but I assume there was a reason why linopy.Constraint had to support initialization from a DataArray object, right? I can't see such a place changed in this PR here nor could I find such a place in the linopy repository, so I'd guess: this is either outdated or used in some downstream repository (PyPSA?).

@lumbric
Copy link
Contributor

lumbric commented Nov 4, 2022

Great, thank you! I am not familiar with the code-base, but LGTM and seems to fully resolve the issues addressed in #43.

Note that to_dataset() is deprecated, but still used.

@coroa
Copy link
Member Author

coroa commented Nov 7, 2022

I continued a bit further and tried to convert the Variable class as well, but hit a road-block with xarray eagerly left multiplying into Variables, ie.

c * x

if c is an xarray of coefficients and x is the non-xarray inherited Variable class, then it does not evaluate the __rmul__ but falls into xarray trying to __mul__ a Variable that has been converted to an ndim=0 numpy array, like
np.array(Variable(...)). I don't see a way around this.

Stepping through xarray's left multiplication, I see that they had a special case for inherited xarrays, which was hit before. Not sure, whether this can be solved.

@coroa
Copy link
Member Author

coroa commented Nov 7, 2022

To give you a head-start to dig into the problem, consider the following test case (which is the setup of test_sum_drop_zeros at

def test_sum_drop_zeros():
):

import pandas as pd
import xarray as xr
from linopy import Model

m = Model()
z = m.add_variables(0, pd.DataFrame([[1, 2], [3, 4], [5, 6]]).T, name="z")

coeff = xr.zeros_like(z.labels)
coeff[1, 0] = 3
coeff[0, 2] = 5
expr = coeff * z

will fail with

Traceback (most recent call last):
  Input In [9] in <module>
    expr = coeff * z
  File ~/.local/conda/envs/pypsa/lib/python3.10/site-packages/xarray/core/_typed_ops.py:212 in __mul__
    return self._binary_op(other, operator.mul)
  File ~/.local/conda/envs/pypsa/lib/python3.10/site-packages/xarray/core/dataarray.py:3098 in _binary_op
    f(self.variable, other_variable)
  File ~/.local/conda/envs/pypsa/lib/python3.10/site-packages/xarray/core/_typed_ops.py:402 in __mul__
    return self._binary_op(other, operator.mul)
  File ~/.local/conda/envs/pypsa/lib/python3.10/site-packages/xarray/core/variable.py:2469 in _binary_op
    result = Variable(dims, new_data, attrs=attrs)
  File ~/.local/conda/envs/pypsa/lib/python3.10/site-packages/xarray/core/variable.py:305 in __init__
    self._dims = self._parse_dimensions(dims)
  File ~/.local/conda/envs/pypsa/lib/python3.10/site-packages/xarray/core/variable.py:573 in _parse_dimensions
    raise ValueError(
ValueError: dimensions ('dim_0', 'dim_1') must have the same length as the number of data dimensions, ndim=0

@FabianHofmann
Copy link
Collaborator

Indeed, that is a bit tricky. I'm thinking about overloading the xarray Dataset and DataArray mul function. But still not sure how secure this would turn out and where to actually do that.
xarray uses a _binary_op function to assign arithmetic operations with type distinction, resulting in differentiation for the __mul__ operator. I have to do some more research on what is going on there.

@coroa
Copy link
Member Author

coroa commented Nov 7, 2022

overloading the xarray Dataset and DataArray

nah, that is not a valid option, since you don't want to limit the types of coeff.

I'm happy if you find another option, but from going through __mul__, it looks to me like Variable needs to inherit from DataArray since you then trigger the note from https://docs.python.org/3/reference/datamodel.html#object.__rmul__
image

which i would translate to __rmul__ gets precedence, if the right-hand argument is of a more specific type than the left one (which is why it worked before).

@FabianHofmann
Copy link
Collaborator

Indeed, this makes sense. However, I am still lingering with the idea of supplementing the mul function. Something like:

import linopy 
import pandas as pd
import xarray as xr

def __mul__(da, other):
    if isinstance(other, linopy.Variable):
        return other * da
    else: 
        return da * other
    
xr.DataArray.__mul__ = __mul__


m = linopy.Model()

coords = [pd.RangeIndex(10, name='a')]
x = m.add_variables(coords=coords, name='x')

c = xr.DataArray(range(10), coords=coords)

c * x

which works fine.

@coroa
Copy link
Member Author

coroa commented Nov 7, 2022

Indeed, this makes sense. However, I am still lingering with the idea of supplementing the mul function. Something like:

NO! Really, no. That is what you would typically call type piracy.

That can have a plethora of unforeseeable consequences, that would happen just when you import linopy.

@coroa
Copy link
Member Author

coroa commented Nov 7, 2022

We could ask in an issue on xarray's tracker if they can add some sort of hooking mechanism? Or what they would suggest for our use case.

@FabianHofmann
Copy link
Collaborator

NO! Really, no. That is what you would typically call type piracy.

Convinced, I already had the feeling that this is not the correct approach.

@FabianHofmann
Copy link
Collaborator

We could ask in an issue on xarray's tracker if they can add some sort of hooking mechanism? Or what they would suggest for our use case.

Yes, let's do that. In general, I think it is important to further support the coefficient-variable (in that order) multiplication pattern.

@coroa
Copy link
Member Author

coroa commented Nov 7, 2022

Note also, that all we would need is that:

xr.DataArray.__mul__ returns NotImplemented for our case.

ie. the least-invasive monkey patch would be:

orig_multiplication = xr.DataArray.__mul__
def new_multiplication(da, other):
    if isinstance(other, linopy.Variable):
        return NotImplemented
    return orig_multiplication(da, other)
xr.DataArray.__mul__ = new_multiplication

BUT: i don't think that is a valid strategy.

@coroa
Copy link
Member Author

coroa commented Nov 7, 2022

Hmm, i thought about this once more and I think it is an acceptable approach if we limit it to the above and
maybe warn about it in the documentation.

Maybe adding some syntactic sugar around it:

@monkey_patch(xr.DataArray, pass_unpatched_method=True)
def __mul__(da, other, unpatched_method):
    if isinstance(other, linopy.Variable):
        return NotImplemented
    return unpatched_method(da, other)

where we define monkey_patch like:

from functools import update_wrapper, partialmethod
def monkey_patch(cls, pass_unpatched_method=False):
    def deco(func):
        wrapped = getattr(cls, func.__name__)
        if pass_unpatched_method:
            func = partialmethod(func, unpatched_method=wrapped)
        update_wrapper(func, wrapped)
        setattr(cls, func.__name__, func)
        return func
    return deco

I fail to see how this could possibly go wrong :) (famous last words)! Maybe we should implement this and then ask around at xarray.

@coroa
Copy link
Member Author

coroa commented Nov 7, 2022

After some clean-up including several doc-strings. This might be getting somewhere.

It still hinges on whether we want to allow the monkey patching? (which now has utility in
common.py and is triggered from variables.py, maybe this should move into a dedicated
monkey_patch_xarray.py module to make it super obvious).

I think we can ask whether xarray could adhere to the scheme set out by numpy that
if __array_ufunc__ is set to None and other implements __rmul__ then it returns
NotImplemented (see at https://numpy.org/doc/stable/reference/arrays.classes.html#numpy.class.__array_ufunc__ ).

ie. np.array([1, 2]) * variable works thanks to that rule, xr.DataArray([1, 2], ...) * variable does not, unless
we monkey-patch.

@lumbric
Copy link
Contributor

lumbric commented Nov 8, 2022

No idea what the proper solution would be and what to change in xarray. I am slightly confused how __rmul__ is supposed to be used and don't have better suggestions than the approach suggested by @coroa.

Wouldn't be the only sane approach for the implementation for __mul__ in xarray (and numpy and everywhere else) to raise NotImplemented in case other is of unknown type? In that case there would be absolutely no problem or need for any kind of other hooking mechanism. However, I just checked a couple of types (numpy arrays, str, int, dict, ...) and all of them raise a TypeError for types they can't handle. This makes sense semantically because it is a type error in most cases and not a missing implementation, but it prevents the usage of __rmul__ if types are not inherited as we see here.

@coroa
Copy link
Member Author

coroa commented Nov 9, 2022

However, I just checked a couple of types (numpy arrays, str, int, dict, ...) and all of them raise a TypeError for types they can't handle.

What do you mean by checked? You tried multiplying them? The logic for a * b is. Python tries in order:

  1. check if issubclass(type(b), type(a))
  2. if it is:
    a. call b.__rmul__(a)
    b. if it (returned NotImplemented or it is not defined) call a.__mul__(b)
  3. if it isn't
    a. call a.__mul__(b)
    b. if it (returned NotImplemented or it is not defined), call b.__rmul__(a)
  4. if we get NotImplemented again or it is not defined, then a TypeError is raised.

Consider:

In [1]: class A:
   ...:     def __rmul__(self, other):
   ...:         print("Called __rmul__")
   ...:         return NotImplemented
   ...:

In [2]: 7 * A()
Called __rmul__
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [2], line 1
----> 1 7 * A()

TypeError: unsupported operand type(s) for *: 'int' and 'A'

@lumbric
Copy link
Contributor

lumbric commented Nov 9, 2022

Uh sry, seems as if I got some things wrong when writing my previous comment. Let me correct:

I have the impression that the only sane way to implement a.__mul__(b) is to return a reasonable value for a * b or to return NotImplemented if the class type(a) does not know how to handle a multiplication with b. Standard data types like int, str, float and dict seem to follow this pattern, but numpy, xarray and pandas do not. I don't understand why (my guess is that it makes implementation of __mul__ easier).

This example demonstrates what I mean:

class MyType:
    def __mul__(self, other):
        return 42

    def __rmul__(self, other):
        return 43

MyType() * other is always 42 as one can expect. other * MyType() has some surprising results. In my opinion, there is no reasonable result other than 43 for other * MyType(), which means the implementation of __mul__ in numpy, pandas and xarray should be changed.

The good part:

>>> 3 * MyType()
43
>>> MyType() * 3
42
>>> 'string' * MyType()
43
>>> MyType() * 'string'
42
>>> 3. * MyType()
43
>>> MyType() * 3.
42
>>> object() * MyType()
43
>>> MyType() * object()
42

Numpy, xarray and pandas do some really surprising things, but all of them follow the same pattern:

>>> np.array(3.) * MyType()
43
>>> MyType() * np.array(3.)
42
>>> np.array([3., 4.]) * MyType()
array([43, 43], dtype=object)
>>> MyType() * np.array([3., 4.])
42
>>> pd.Series([1, 2, 3]) * MyType()                                                                            
0    43
1    43
2    43
>>> MyType() * pd.Series([1, 2, 3])                                                                            
42
>>> xr.DataArray(3) * MyType()                                                                                 
array(43)
>>> MyType() * xr.DataArray(3)                                                                                 
42

@FabianHofmann FabianHofmann marked this pull request as ready for review November 16, 2022 16:37
@FabianHofmann
Copy link
Collaborator

Thanks @coroa for this nice change :)

@FabianHofmann FabianHofmann merged commit 62ead75 into master Dec 9, 2022
@coroa coroa deleted the composition-design branch December 14, 2022 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants