Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MrSQMClassifier non-compliance with sktime serialization interface #7

Open
fkiraly opened this issue Mar 21, 2023 · 12 comments
Open

MrSQMClassifier non-compliance with sktime serialization interface #7

fkiraly opened this issue Mar 21, 2023 · 12 comments

Comments

@fkiraly
Copy link

fkiraly commented Mar 21, 2023

@lnthach, @heerme, I've tried integrating MrSQM to distribute it with sktime (by interface to this package), also as a proof-of-concept of how we could back-integrate MrSEQL.
Issue for MrSQM: sktime/sktime#4338
Issue for MrSEQL: sktime/sktime#4296

The PR sktime/sktime#4337 works in that it proves that we can integrate cython based estimators with sktime tests and the up-to-date framework, if those estimators are in an external package such as mrsql (this one).

Now, with the tests working, it would appear that MrSQM is failing two contract tests:

  • _get_fitted_params, fitted parameter inspection
  • serialization, i.e., pickling does not work

I assume the same tests would fail if you run check_estimator, i.e.,

from sktime.utils.estimator_checks import check_estimator

check_estimator(MrSQMClassifier, raise_exceptions=True)

How do we proceed from here?

Problem 1: serialization

I'm not too familiar with cython, the exception when serializing (saving to binary) is

TypeError: no default __reduce__ due to non-trivial __cinit__

To make it work, I suspect one would have to implement serialization via the higher-level interface (save and load) or via the lower-level interface, __reduce__. @achieveordie has done that previously for classes of estimators, maybe he has more insight.

To provoke the error without the test suite, do pickle(my_estimator) on a fitted instance of MrSQMClassifier (that should work without complaining for an easy solution via the lower-level interface).

Problem 2: fitted parameters

The other issue is get_fitted_params, this should be implemented either directly or via inheritance from BaseClassifier - that can be solved on the sktime side though.

@lnthach
Copy link
Collaborator

lnthach commented Mar 24, 2023

Thanks @fkiraly,

Re problem 1, I was actually able to pickle MrSQMClassifier, however, only if SFA is not used.
For example, this works for me.

clf = MrSQMClassifier(nsax=1,nsfa=0).fit(X_train, y_train)
pickle.dumps(clf)

But this provoked the same TypeError

clf = MrSQMClassifier(nsax=0,nsfa=1).fit(X_train, y_train)
pickle.dumps(clf)

There is not much difference between the SAX and SFA Cython wrappers code so I suspect the problem is in the C++ code.
A quick and easy solution would be replacing the SFA C++ implementation with the new Python implementation (https://github.com/patrickzib/dictionary). This will slow down MrSQM significantly though.

@fkiraly
Copy link
Author

fkiraly commented Mar 24, 2023

Thanks for identifying the root cause.

I patched over the two issues in sktime/sktime#4337 now, as a consequence you cannot retrieve fitted parameters or use nsfa greater than 0 with the full contract.

It can be released that way and is scheduled for 0.17.0.

We should leave this open until this is fixed in this package, and the two patches are removed in sktime.

Patch revert needs to be applied in three places in sktime (the MrSQM classifier):

  • _get_fitted_params override
  • get_test_params need to include a test case with nsfa strictly greater 0
  • warning about nsfa and persistence needs to be removed in docstring

@fkiraly
Copy link
Author

fkiraly commented Mar 24, 2023

May I hand to you the reponsibility to maintain the sktime interface from 0.17.0 on?

(e.g., once you fix the interface non-compliance in mrsqm and have that released, revert the patches too)

@fkiraly
Copy link
Author

fkiraly commented Mar 24, 2023

A quick and easy solution would be replacing the SFA C++ implementation with the new Python implementation (https://github.com/patrickzib/dictionary). This will slow down MrSQM significantly though.

I think speed is still sth you'd want to maintain.
I think you just need to produce a suitable __reduce__ method, that might just be a few lines.
Not sure how to do that in C code though.

@fkiraly
Copy link
Author

fkiraly commented Jun 6, 2023

may I kindly ask if there are plans (if yes timeline) on resolving this? Need any help?

@fkiraly
Copy link
Author

fkiraly commented Jul 5, 2023

?

@lnthach
Copy link
Collaborator

lnthach commented Aug 14, 2023

Hi @fkiraly ,

I just revisited this issue and found a solution. The problem is that MrSQMClassifier stores Cython objects (PySFA) that contain pointers. I made several attempts with the reduce function as well as getstate and setstate but couldn't make it work. The solution I found is that to only store the necessary information in a Python array and initiate the objects on the fly. This should work with pickle now. I also updated the Pypi package. On a different note, I also have put MrSEQL on Pypi (https://pypi.org/project/mrseql/).

@fkiraly
Copy link
Author

fkiraly commented Aug 28, 2023

Great!

Apologies for the late reply.

The solution I found is that to only store the necessary information in a Python array and initiate the objects on the fly. This should work with pickle now. I

Ok, that should work - I'll try to remove the patch, lower bound the required mrsqm version, and see what the tests say.

Reference issue: sktime/sktime#4396
Downstream PR that enables tests: sktime/sktime#5171

On a different note, I also have put MrSEQL on Pypi (https://pypi.org/project/mrseql/).

Excellent!

The sktime issue is here: sktime/sktime#4296

How do we proceed with this? I think there are still users who'd like this back.

It should be easy to whip up an interface to make it searchable in sktime, just copy-pasting mrsqm. Would you like to do that, or would you prefer someone else does it?

Update: naive replication of MrSQM recipe here sktime/sktime#5178

@fkiraly
Copy link
Author

fkiraly commented Aug 28, 2023

@lnthach, quick question: have you addressed both problem 1 (pickle) and 2 (parameters), or only problem 1? By intention, I mean.

@fkiraly
Copy link
Author

fkiraly commented Aug 28, 2023

Looks like the pickling bug is fixed!

This unlocks further tests, which seem to indicate there is an issue with the random_state contract, I've opened a new issue: #11

@lnthach
Copy link
Collaborator

lnthach commented Aug 30, 2023

@lnthach, quick question: have you addressed both problem 1 (pickle) and 2 (parameters), or only problem 1? By intention, I mean

I only addressed problem 1. Re problem 2 sorry if it's a silly question, what need to be done to fix this problem ?

@fkiraly
Copy link
Author

fkiraly commented Sep 9, 2023

Re problem 2 sorry if it's a silly question, what need to be done to fix this problem ?

Your estimator has to implement a method get_fitted_params which returns a str keyed dict of parameters once the estimator is fitted. The return can be the empty dict if there are no fitted parameters to show to the user.

The "quickest" fix would be to add this method to the class:

def get_fitted_params(self):
    return {}

though of course you may like to return fitted parameters to show to the user if there are any of interest.

fkiraly added a commit to sktime/sktime that referenced this issue Dec 23, 2023
…pstream bugfix (#5171)

Upstream issues re persistence of `MrSQM` have been fixed in version
`0.0.5` of `mrsqm`, see #4396 and
mlgig/mrsqm#7 (comment).

This PR adds back the `nsfa>0` test case and removes the warning from
the docstring.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants