FEA add secure persistence #128

adrinjalali · 2022-09-06T07:28:06Z

This PR adds secure persistence for sklearn models. You can test it with:

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegressionCV

from skops import load, save

X, y = load_iris(return_X_y=True)
model = LogisticRegressionCV(solver="liblinear").fit(X, y)
save(file="/tmp/test.skops", obj=model)
instance = load(file="/tmp/test.skops")
print(model.score(X, y), instance.score(X, y))

The file it creates is a zip file which you can investigate. It creates a schema.json inside that zip file which includes all the info needed to reconstruct the object.

Things to add:

Things to do in a separate PR:

add code to allow extension from other third party libraries

We basically go through the attributes of the object, and we persist them, very similar to what https://github.com/pytorch/torchsnapshot does; except that the objects we deal with, unlike pytorch objects, don't expose state_dict and load_state_dict. Therefore we implement the equivalent of those methods here ourselves. For third party libraries, they would need to implement the equivalent methods and we'll have a way for them to register those methods for their objects with us.

This is a very early prototype, and very open to discussions regarding the format and the design.

cc @skops-dev/maintainers @osanseviero @LysandreJik @julien-c

BenjaminBossan · 2022-09-06T12:20:49Z

What do you think about storing having a metainfo object in the schema? So basically something like:

{
  "metainfo": {...},
  "obj": <actual obj>
}

We could put stuff like protocol version, sklearn/skops version, etc. into metainfo. Also, we could add a hash/fingerprint of the object there to verify it.

adrinjalali · 2022-09-07T09:26:59Z

I added the common tests here, and here's the summary:

=============================================================================== short test summary info ================================================================================
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[ClassifierChain(base_estimator=LogisticRegression(C=1))] - TypeError: _BaseChain.__init__() missing 1 required positi...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[CountVectorizer()] - TypeError: Object of type type is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[DictVectorizer()] - TypeError: Object of type type is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[FeatureAgglomeration()] - TypeError: Object of type function is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[FeatureHasher()] - TypeError: Object of type type is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[GenericUnivariateSelect()] - TypeError: Object of type function is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[HashingVectorizer()] - TypeError: Object of type type is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[IterativeImputer()] - assert nan == nan
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[KNNImputer()] - assert nan == nan
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[Lars()] - AssertionError: assert <class 'numpy.float64'> == <class 'float'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[LarsCV()] - AssertionError: assert <class 'numpy.float64'> == <class 'float'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[LassoLars()] - AssertionError: assert <class 'numpy.float64'> == <class 'float'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[LassoLarsCV()] - AssertionError: assert <class 'numpy.float64'> == <class 'float'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[LassoLarsIC()] - AssertionError: assert <class 'numpy.float64'> == <class 'float'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[MLPClassifier()] - AssertionError: assert <class 'tuple'> == <class 'list'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[MLPRegressor()] - AssertionError: assert <class 'tuple'> == <class 'list'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[MinMaxScaler()] - AssertionError: assert <class 'tuple'> == <class 'list'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[MissingIndicator()] - assert nan == nan
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[MultiOutputClassifier(estimator=LogisticRegression(C=1))] - TypeError: MultiOutputClassifier.__init__() missing 1 req...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[MultiOutputRegressor(estimator=Ridge())] - TypeError: MultiOutputRegressor.__init__() missing 1 required positional a...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[OneHotEncoder()] - TypeError: Object of type type is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[OneVsOneClassifier(estimator=LogisticRegression(C=1))] - TypeError: OneVsOneClassifier.__init__() missing 1 required ...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[OneVsRestClassifier(estimator=LogisticRegression(C=1))] - TypeError: OneVsRestClassifier.__init__() missing 1 require...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[OrdinalEncoder()] - TypeError: Object of type type is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[OutputCodeClassifier(estimator=LogisticRegression(C=1))] - TypeError: OutputCodeClassifier.__init__() missing 1 requi...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[RFE(estimator=LogisticRegression(C=1))] - TypeError: RFE.__init__() missing 1 required positional argument: 'estimator'
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[RFECV(estimator=LogisticRegression(C=1))] - TypeError: RFECV.__init__() missing 1 required positional argument: 'esti...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[RegressorChain(base_estimator=Ridge())] - TypeError: _BaseChain.__init__() missing 1 required positional argument: 'b...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[RidgeCV()] - AssertionError: assert <class 'tuple'> == <class 'list'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[RidgeClassifierCV()] - AssertionError: assert <class 'tuple'> == <class 'list'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[RobustScaler()] - AssertionError: assert <class 'tuple'> == <class 'list'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SelectFdr()] - TypeError: Object of type function is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SelectFpr()] - TypeError: Object of type function is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SelectFromModel(estimator=SGDRegressor(random_state=0))] - TypeError: SelectFromModel.__init__() missing 1 required p...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SelectFwe()] - TypeError: Object of type function is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SelectKBest()] - TypeError: Object of type function is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SelectPercentile()] - TypeError: Object of type function is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SelfTrainingClassifier(base_estimator=LogisticRegression(C=1))] - TypeError: SelfTrainingClassifier.__init__() missin...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SequentialFeatureSelector(estimator=LogisticRegression(C=1))] - TypeError: SequentialFeatureSelector.__init__() missi...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SimpleImputer()] - assert nan == nan
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[StackingClassifier(estimators=[('est1',LogisticRegression(C=0.1)),('est2',LogisticRegression(C=1))])] - TypeError: Ob...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[StackingRegressor(estimators=[('est1',Ridge(alpha=0.1)),('est2',Ridge(alpha=1))])] - TypeError: Object of type Ridge ...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[TfidfVectorizer()] - TypeError: Object of type type is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[VotingClassifier(estimators=[('est1',LogisticRegression(C=0.1)),('est2',LogisticRegression(C=1))])] - TypeError: Obje...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[VotingRegressor(estimators=[('est1',Ridge(alpha=0.1)),('est2',Ridge(alpha=1))])] - TypeError: Object of type Ridge is...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[ARDRegression()] - TypeError: ARDRegression.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[AdaBoostClassifier()] - TypeError: Object of type DecisionTreeClassifier is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[AdaBoostRegressor()] - TypeError: Object of type DecisionTreeRegressor is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[AdditiveChi2Sampler()] - TypeError: AdditiveChi2Sampler.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[AffinityPropagation()] - TypeError: AffinityPropagation.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[AgglomerativeClustering()] - TypeError: AgglomerativeClustering.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[BaggingClassifier()] - TypeError: Object of type DecisionTreeClassifier is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[BaggingRegressor()] - TypeError: Object of type DecisionTreeRegressor is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[BayesianGaussianMixture()] - TypeError: BaseMixture.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[BernoulliRBM()] - TypeError: BernoulliRBM.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Binarizer()] - TypeError: Binarizer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Birch()] - TypeError: Birch.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[BisectingKMeans()] - TypeError: Object of type RandomState is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[CCA()] - TypeError: _PLS.fit() got an unexpected keyword argument 'y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[CalibratedClassifierCV(base_estimator=LogisticRegression(C=1))] - TypeError: Object of type _CalibratedClassifier is not ...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[CategoricalNB()] - TypeError: Object of type ndarray is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[ClassifierChain(base_estimator=LogisticRegression(C=1))] - TypeError: ClassifierChain.fit() got an unexpected keyword arg...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[CountVectorizer()] - TypeError: CountVectorizer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[DecisionTreeClassifier()] - TypeError: Object of type int64 is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[DecisionTreeRegressor()] - TypeError: Object of type Tree is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[DictVectorizer()] - TypeError: DictVectorizer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[DictionaryLearning()] - TypeError: DictionaryLearning.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[EllipticEnvelope()] - TypeError: EllipticEnvelope.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[EmpiricalCovariance()] - TypeError: EmpiricalCovariance.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[ExtraTreeClassifier()] - TypeError: Object of type int64 is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[ExtraTreeRegressor()] - TypeError: Object of type Tree is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[ExtraTreesClassifier()] - TypeError: Object of type ExtraTreeClassifier is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[ExtraTreesRegressor()] - TypeError: Object of type ExtraTreeRegressor is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[FactorAnalysis()] - TypeError: FactorAnalysis.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[FastICA()] - TypeError: FastICA.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[FeatureAgglomeration()] - TypeError: FeatureAgglomeration.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[FeatureHasher()] - TypeError: FeatureHasher.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[FunctionTransformer()] - TypeError: FunctionTransformer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GammaRegressor()] - TypeError: Object of type HalfGammaLoss is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GaussianMixture()] - TypeError: BaseMixture.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GaussianProcessClassifier()] - TypeError: GaussianProcessClassifier.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GaussianProcessRegressor()] - TypeError: GaussianProcessRegressor.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GaussianRandomProjection()] - TypeError: BaseRandomProjection.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GenericUnivariateSelect()] - TypeError: _BaseFilter.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GradientBoostingClassifier()] - TypeError: Object of type BinomialDeviance is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GradientBoostingRegressor()] - TypeError: Object of type LeastSquaresError is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GraphicalLasso()] - TypeError: GraphicalLasso.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GraphicalLassoCV()] - TypeError: GraphicalLassoCV.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[HashingVectorizer()] - TypeError: HashingVectorizer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[HistGradientBoostingClassifier()] - TypeError: Object of type uint64 is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[HistGradientBoostingRegressor()] - TypeError: Object of type uint64 is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[IncrementalPCA()] - TypeError: IncrementalPCA.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[IsolationForest()] - TypeError: Object of type ExtraTreeRegressor is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Isomap()] - TypeError: Isomap.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[IsotonicRegression()] - ValueError: Isotonic regression input X should be a 1d array or 2d array with 1 feature
FAILED skops/tests/test_persist.py::test_can_persist_fitted[IterativeImputer()] - TypeError: IterativeImputer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KBinsDiscretizer()] - TypeError: KBinsDiscretizer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KNNImputer()] - TypeError: KNNImputer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KNeighborsClassifier()] - TypeError: KNeighborsClassifier.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KNeighborsRegressor()] - TypeError: KNeighborsRegressor.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KNeighborsTransformer()] - TypeError: KNeighborsTransformer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KernelCenterer()] - TypeError: KernelCenterer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KernelDensity()] - TypeError: Object of type KDTree is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KernelPCA()] - TypeError: KernelPCA.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KernelRidge()] - DeprecationWarning: The 'sym_pos' keyword is deprecated and should be replaced by using 'assume_a = "pos...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LabelBinarizer()] - TypeError: LabelBinarizer.fit() got multiple values for argument 'y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LabelEncoder()] - TypeError: LabelEncoder.fit() got multiple values for argument 'y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LabelPropagation()] - TypeError: LabelPropagation.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LabelSpreading()] - TypeError: BaseLabelPropagation.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Lars()] - TypeError: Lars.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LarsCV()] - TypeError: LarsCV.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LassoLars()] - TypeError: Lars.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LassoLarsCV()] - TypeError: LarsCV.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LassoLarsIC()] - TypeError: LassoLarsIC.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LatentDirichletAllocation()] - TypeError: LatentDirichletAllocation.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LedoitWolf()] - TypeError: LedoitWolf.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LinearDiscriminantAnalysis()] - TypeError: LinearDiscriminantAnalysis.fit() got an unexpected keyword argument 'sample_we...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LocalOutlierFactor()] - TypeError: LocalOutlierFactor.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LocallyLinearEmbedding()] - TypeError: LocallyLinearEmbedding.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MDS()] - TypeError: MDS.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MLPClassifier()] - TypeError: BaseMultilayerPerceptron.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MLPRegressor()] - TypeError: BaseMultilayerPerceptron.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MaxAbsScaler()] - TypeError: MaxAbsScaler.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MeanShift()] - TypeError: MeanShift.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MinCovDet()] - TypeError: MinCovDet.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MinMaxScaler()] - TypeError: MinMaxScaler.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MiniBatchDictionaryLearning()] - TypeError: MiniBatchDictionaryLearning.fit() got an unexpected keyword argument 'sample_...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MiniBatchNMF()] - TypeError: MiniBatchNMF.fit_transform() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MiniBatchSparsePCA()] - TypeError: MiniBatchSparsePCA.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MissingIndicator()] - TypeError: MissingIndicator.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MultiLabelBinarizer()] - TypeError: MultiLabelBinarizer.fit() got multiple values for argument 'y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MultiOutputClassifier(estimator=LogisticRegression(C=1))] - TypeError: MultiOutputClassifier.fit() missing 1 required pos...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MultiOutputRegressor(estimator=Ridge())] - DeprecationWarning: The 'sym_pos' keyword is deprecated and should be replaced...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MultiTaskElasticNet()] - TypeError: MultiTaskElasticNet.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MultiTaskElasticNetCV()] - TypeError: MultiTaskElasticNetCV.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MultiTaskLasso()] - TypeError: MultiTaskElasticNet.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MultiTaskLassoCV()] - TypeError: MultiTaskLassoCV.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[NMF()] - TypeError: NMF.fit_transform() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[NearestCentroid()] - TypeError: NearestCentroid.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[NearestNeighbors()] - TypeError: NearestNeighbors.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[NeighborhoodComponentsAnalysis()] - TypeError: NeighborhoodComponentsAnalysis.fit() got an unexpected keyword argument 's...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Normalizer()] - TypeError: Normalizer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Nystroem()] - TypeError: Nystroem.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OAS()] - TypeError: OAS.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OPTICS()] - TypeError: OPTICS.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OneHotEncoder()] - TypeError: OneHotEncoder.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OneVsOneClassifier(estimator=LogisticRegression(C=1))] - TypeError: OneVsOneClassifier.fit() got an unexpected keyword ar...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OneVsRestClassifier(estimator=LogisticRegression(C=1))] - TypeError: OneVsRestClassifier.fit() got an unexpected keyword ...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OrdinalEncoder()] - TypeError: OrdinalEncoder.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OrthogonalMatchingPursuit()] - TypeError: OrthogonalMatchingPursuit.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OrthogonalMatchingPursuitCV()] - TypeError: OrthogonalMatchingPursuitCV.fit() got an unexpected keyword argument 'sample_...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OutputCodeClassifier(estimator=LogisticRegression(C=1))] - TypeError: OutputCodeClassifier.fit() got an unexpected keywor...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PCA()] - TypeError: PCA.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PLSCanonical()] - TypeError: _PLS.fit() got an unexpected keyword argument 'y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PLSRegression()] - TypeError: PLSRegression.fit() got an unexpected keyword argument 'y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PLSSVD()] - TypeError: PLSSVD.fit() got an unexpected keyword argument 'y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PassiveAggressiveClassifier()] - TypeError: PassiveAggressiveClassifier.fit() got an unexpected keyword argument 'sample_...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PassiveAggressiveRegressor()] - TypeError: PassiveAggressiveRegressor.fit() got an unexpected keyword argument 'sample_we...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PatchExtractor()] - TypeError: PatchExtractor.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Perceptron()] - TypeError: Object of type Hinge is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PoissonRegressor()] - TypeError: Object of type HalfPoissonLoss is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PolynomialCountSketch()] - TypeError: PolynomialCountSketch.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PolynomialFeatures()] - TypeError: PolynomialFeatures.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PowerTransformer()] - TypeError: PowerTransformer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[QuadraticDiscriminantAnalysis()] - TypeError: QuadraticDiscriminantAnalysis.fit() got an unexpected keyword argument 'sam...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[QuantileRegressor()] - DeprecationWarning: `method='interior-point'` is deprecated and will be removed in SciPy 1.11.0. P...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[QuantileTransformer()] - TypeError: QuantileTransformer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RBFSampler()] - TypeError: RBFSampler.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RFE(estimator=LogisticRegression(C=1))] - TypeError: Object of type int64 is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RFECV(estimator=LogisticRegression(C=1))] - TypeError: RFECV.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RadiusNeighborsClassifier()] - TypeError: RadiusNeighborsClassifier.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RadiusNeighborsRegressor()] - TypeError: RadiusNeighborsRegressor.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RadiusNeighborsTransformer()] - TypeError: RadiusNeighborsTransformer.fit() got an unexpected keyword argument 'sample_we...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RandomForestClassifier()] - TypeError: Object of type DecisionTreeClassifier is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RandomForestRegressor()] - TypeError: Object of type DecisionTreeRegressor is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RandomTreesEmbedding()] - TypeError: Object of type ExtraTreeRegressor is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RegressorChain(base_estimator=Ridge())] - TypeError: RegressorChain.fit() missing 1 required positional argument: 'Y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Ridge()] - DeprecationWarning: The 'sym_pos' keyword is deprecated and should be replaced by using 'assume_a = "pos"'. 's...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RidgeClassifier()] - DeprecationWarning: The 'sym_pos' keyword is deprecated and should be replaced by using 'assume_a = ...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RobustScaler()] - TypeError: RobustScaler.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SGDClassifier()] - TypeError: Object of type Hinge is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SGDOneClassSVM()] - TypeError: Object of type Hinge is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SelectFdr()] - TypeError: _BaseFilter.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SelectFpr()] - TypeError: _BaseFilter.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SelectFromModel(estimator=SGDRegressor(random_state=0))] - TypeError: SelectFromModel.__init__() missing 1 required posit...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SelectFwe()] - TypeError: _BaseFilter.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SelectKBest()] - TypeError: _BaseFilter.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SelectPercentile()] - TypeError: _BaseFilter.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SelfTrainingClassifier(base_estimator=LogisticRegression(C=1))] - TypeError: SelfTrainingClassifier.fit() got an unexpect...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SequentialFeatureSelector(estimator=LogisticRegression(C=1))] - TypeError: SequentialFeatureSelector.fit() got an unexpec...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[ShrunkCovariance()] - TypeError: ShrunkCovariance.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SimpleImputer()] - TypeError: SimpleImputer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SkewedChi2Sampler()] - TypeError: SkewedChi2Sampler.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SparsePCA()] - TypeError: SparsePCA.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SparseRandomProjection()] - TypeError: BaseRandomProjection.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SpectralBiclustering()] - TypeError: BaseSpectral.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SpectralClustering()] - TypeError: SpectralClustering.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SpectralCoclustering()] - TypeError: BaseSpectral.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SpectralEmbedding()] - TypeError: SpectralEmbedding.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SplineTransformer()] - TypeError: Object of type BSpline is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[StackingClassifier(estimators=[('est1',LogisticRegression(C=0.1)),('est2',LogisticRegression(C=1))])] - TypeError: Object...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[StackingRegressor(estimators=[('est1',Ridge(alpha=0.1)),('est2',Ridge(alpha=1))])] - DeprecationWarning: The 'sym_pos' ke...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[StandardScaler()] - TypeError: Object of type int64 is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[TSNE()] - TypeError: TSNE.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[TfidfTransformer()] - TypeError: TfidfTransformer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[TfidfVectorizer()] - TypeError: TfidfVectorizer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[TheilSenRegressor()] - TypeError: TheilSenRegressor.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[TruncatedSVD()] - TypeError: TruncatedSVD.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[TweedieRegressor()] - TypeError: Object of type HalfTweedieLossIdentity is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[VarianceThreshold()] - TypeError: VarianceThreshold.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[VotingClassifier(estimators=[('est1',LogisticRegression(C=0.1)),('est2',LogisticRegression(C=1))])] - TypeError: Object o...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[VotingRegressor(estimators=[('est1',Ridge(alpha=0.1)),('est2',Ridge(alpha=1))])] - DeprecationWarning: The 'sym_pos' keyw...
=============================================================== 212 failed, 182 passed, 1 xfailed, 18 warnings in 7.25s ================================================================

adrinjalali · 2022-09-07T09:28:55Z

What do you think about storing having a metainfo object in the schema? So basically something like:
{
  "metainfo": {...},
  "obj": <actual obj>
}
We could put stuff like protocol version, sklearn/skops version, etc. into metainfo. Also, we could add a hash/fingerprint of the object there to verify it.

It kinda makes sense, but there isn't a clear distinction between the two. For estimators, it makes total sense, but for a numpy array, is the file name metainfo or the object itself? Or for a numpy function, is there anything in the object then?

adrinjalali · 2022-09-07T12:27:45Z

Of course we fail on 3.7 🤦🏼

- Add more test cases: nested pipeline, FunctionTransformer - Make list of estimators that fail more fine grained; instead of ignoring a class completely, ignore a specific instance of the class because some instances of the same class may fail or not fail - Mark estimators that fail to xfail with strict=True, this way we can quickly discover if a change made an estimator pass - Fix a bug in testing function that did not correctly compare values if they were nan (because nan!=nan) - Check predict_log_proba method

…est-additions-ben

BenjaminBossan · 2022-09-07T12:55:32Z

@adrinjalali Hey, I accidentally pushed directly on your remote instead of using my own branch, sorry for that. Please let me know if my recent changes make sense to you or if I should fix anything. Here is the description:

Add more test cases: nested pipeline, FunctionTransformer
Make list of estimators that fail more fine grained; instead of
ignoring a class completely, ignore a specific instance of the class
because some instances of the same class may fail or not fail
Mark estimators that fail to xfail with strict=True, this way we can
quickly discover if a change made an estimator pass
Fix a bug in testing function that did not correctly compare values if
they were nan (because nan!=nan)
Check predict_log_proba method

I also wanted to add inverse_transform as a method to check but that caused some problems, because some of the defined estimators only support inverse_transform when initialized with specific parameters, which they aren't.

adrinjalali · 2022-09-07T13:01:10Z

skops/tests/test_persist.py

+]
+
+
+ESTIMATORS_EXPECTED_TO_FAIL_NON_FITTED = {


The idea is to have one list, where we have the estimator, and when we remove the estimator all tests for that estimator should pass. So I'll revert to the previous list.

Hmm, I don't quite understand. What you describe should work with the current approach, no?

One problem of the previous approach is that it does not differentiate between different instantiations of the same estimator. E.g. FunctionTransformer with numpy functions works but FunctionTransformer with scipy functions doesn't. If only the class is checked, we can't differentiate.

Or is the problem that we have two lists, one for non-fitted and one for fitted?

skops/tests/test_persist.py

This is useful e.g. for clustering models that don't have a predict method. While adding these tests. A few estimators started to fail. Most of them failed because of the discussed issue of numpy scalars being loaded as 0-dim arrays. The issue was fixed via explicit type casting. However, 3 estimators are now failing and had to be added back to the list of failing estimators. I added a comment for each one of them why they fail and how to potentially address the issue.

Works locally but not on CI...

Co-authored-by: Adrin Jalali <[email protected]>

On some systems, during conversion, there is a loss of precision that makes the tests fail. This change loosesn the tolerance, making the tests pass.

…li-persist

Windows complains that files don't have permissions, even though each test gets their own file. Maybe this helps.

From my understanding, pytest will (eventually) clean up the temporary files it creates. Therefore, use the pytest tmp_path fixture instead of the builtin tempfile module and don't explicitly clean up. What's strange is that for other tests, this wasn't necessary, not sure why it is here. It may have to do with how we save and load here or with the use of zip files. Ideally, someone with access to Windows could test it.

Not pretty but let's see if it works.

LysandreJik · 2022-09-16T15:30:57Z

Great job getting this merged! 🤗

osanseviero · 2022-09-16T15:37:54Z

Very exciting 🚀 looking forward to try it out

FEA add secure pesrsistence

7932410

adrinjalali marked this pull request as draft September 6, 2022 07:29

adrinjalali added 2 commits September 6, 2022 09:29

Merge remote-tracking branch 'upstream/main' into persist

d33885e

fix init

947406c

merveenoyan changed the title ~~FEA add secure pesrsistence~~ FEA add secure persistence Sep 6, 2022

add common tests

ddf9e80

BenjaminBossan and others added 3 commits September 7, 2022 12:44

Additions to persistence PR (#1)

7f08f9a

Merge remote-tracking branch 'upstream/main' into persist

8ce7a7d

skip tests for failing estimators

6b0c727

adrinjalali and others added 3 commits September 7, 2022 14:30

fixing tests?

9109952

Merge branch 'persist' of github.com:adrinjalali/skops into persist-t…

6640dec

…est-additions-ben

Fix accidental revert of np.allclose

d06ec51

adrinjalali commented Sep 7, 2022

View reviewed changes

adrinjalali and others added 11 commits September 7, 2022 15:06

commit to pull Ben's changes

ccd35fd

no idea what I'm doing

5464e04

remove sample weight

ad1982a

feature union and pipeline are now tested

8b7a612

skip instead of xfail

0569970

remove pipeline and featureunion specific functions

95cf7bd

regenerate the ignore list, and ignore less

50ff9cd

fix callibratedclassifercv

3a9f3d4

Add SimpleImputer to ignore list

5d1b43d

Works locally but not on CI...

Indent 2 in schema json for readability

a976450

adrinjalali and others added 8 commits September 14, 2022 18:09

remove empty file

3e181fd

don't pop keys

0ffe648

Move generic_get_state/instance to _general.py (#12)

ff109a5

move maskedarray implementation to have ndarrays together

7266673

Merge branch 'persist' of github.com:adrinjalali/skops into persist

45e8e7b

Add support for BallTree, BinaryTree (#8)

1bfffc9

Explicitly handle unsupported types (#13)

18f43f6

Co-authored-by: Adrin Jalali <[email protected]>

Merge branch 'main' into persist

b39836a

adrinjalali marked this pull request as ready for review September 15, 2022 14:10

BenjaminBossan added 17 commits September 15, 2022 18:25

Loosen tolerance for comparing array values

454b944

On some systems, during conversion, there is a loss of precision that makes the tests fail. This change loosesn the tolerance, making the tests pass.

Merge branch 'persist' of github.com:adrinjalali/skops into adrinjala…

f00a6b7

…li-persist

Debugging CI: add sleep between tests

7fda8ae

Windows complains that files don't have permissions, even though each test gets their own file. Maybe this helps.

Debug CI: Testing with dummy function

bc39610

Debug CI: Testing with dummy function 2

d004a09

Debug CI: Testing with dummy function 3

fd5d8c6

Debug CI: Testing with dummy function 4

8a50012

Debug CI: Testing with dummy function 5

e3d046e

Debug CI: Testing with dummy function 6

70b28cd

Debug CI: Testing with dummy function 7

5730b0d

Debug CI: Testing with dummy function 8

38b5760

Debug CI: Testing with dummy function 9

aca75b0

Debug CI: Testing with dummy function 10

7a1d112

Reduce tolerance for allclose on MacOS

64859a1

Not pretty but let's see if it works.

Loosen tolerance for non-MacOS platforms

47988aa

Loosen tolerance for non-MacOS platforms even more

8d878ed

BenjaminBossan merged commit 4fe963a into skops-dev:main Sep 16, 2022

adrinjalali deleted the persist branch September 16, 2022 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEA add secure persistence #128

FEA add secure persistence #128

adrinjalali commented Sep 6, 2022 •

edited

Loading

BenjaminBossan commented Sep 6, 2022

adrinjalali commented Sep 7, 2022

adrinjalali commented Sep 7, 2022

adrinjalali commented Sep 7, 2022

BenjaminBossan commented Sep 7, 2022

adrinjalali Sep 7, 2022

BenjaminBossan Sep 7, 2022

LysandreJik commented Sep 16, 2022

osanseviero commented Sep 16, 2022

		]


		ESTIMATORS_EXPECTED_TO_FAIL_NON_FITTED = {

FEA add secure persistence #128

FEA add secure persistence #128

Conversation

adrinjalali commented Sep 6, 2022 • edited Loading

BenjaminBossan commented Sep 6, 2022

adrinjalali commented Sep 7, 2022

adrinjalali commented Sep 7, 2022

adrinjalali commented Sep 7, 2022

BenjaminBossan commented Sep 7, 2022

adrinjalali Sep 7, 2022

Choose a reason for hiding this comment

BenjaminBossan Sep 7, 2022

Choose a reason for hiding this comment

LysandreJik commented Sep 16, 2022

osanseviero commented Sep 16, 2022

adrinjalali commented Sep 6, 2022 •

edited

Loading