Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Add intelex inference example #303

Merged
merged 1 commit into from
Mar 3, 2023

Conversation

ahuber21
Copy link
Contributor

@ahuber21 ahuber21 commented Feb 20, 2023

I want to give a self-contained example how scikit-learn-intelex speeds up model inference times. This effort was kicked off in #251 and the PR in this form is basically just to align the requirements.

I'm new to the discussion, so I'm including just the most basic things. Please let me know if this example is a step in the right direction.

Output from running on my machine:

$ python examples/use_intelex.py
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
[skl] Inference took t_stock = 3.7e+00s and achieved 21.7% accuracy
[skl-ex] Inference took t_opt = 2.3e-01s and achieved 21.7% accuracy
t_stock / t_opt = 16.1
# ... more output about pushing and downloading ...
[skl] Inference took t_stock = 3.8e+00s and achieved 21.7% accuracy
[skl-ex] Inference took t_opt = 2.5e-01s and achieved 21.7% accuracy
t_stock / t_opt = 15.3
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
[skl] Inference took t_stock = 3.8e+00s and achieved 21.7% accuracy

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this example. This is not an in depth review yet, just what I found on first look:

The example is not actually being run but just included as is into the docs, as can be seen here:

https://skops--303.org.readthedocs.build/en/303/auto_examples/use_intelex.html#sphx-glr-auto-examples-use-intelex-py

I'm not completely sure why that happens here but not with the other examples, Possibly, this example needs to be linked explicitly in the docs somewhere? A good place would be in:

https://github.com/skops-dev/skops/blob/3e1f138c5a7c12863ca263beb2804ceede07d35a/docs/examples.rst

(which would be a good addition regardless of whether that is the root of the issue or not)

In general, I think this guide could be more useful if it was made a bit more accessible. E.g. explain in a few words what intelex does, link to its docs, and avoid too many abbreviations like "skl" and "sklex".

@ahuber21
Copy link
Contributor Author

ahuber21 commented Feb 22, 2023

Hi @BenjaminBossan
thanks for the feedback! I'm happy to hear that the original suggestion was already a step in the right direction.
I added your requested changes and also added some content to the introduction in fixup commits that we can squash later.
Let's see if the code actually runs this time.

Let me know what else you want to see!

@BenjaminBossan
Copy link
Collaborator

The example is still not building correctly but I believe I have figured out the issue. It seems like the file name should always start with plot_. Could you please try if that fixes the issue?

Btw. you can build the docs locally by going into docs/ and running make html, then open the html files in _build/html.

@ahuber21
Copy link
Contributor Author

Hi @BenjaminBossan thanks for the tip. The latest commit worked locally so I hope this time the example runs.

I had an interesting thought. At the bottom of my example I show that loading a pickled non-optimized model has no speedup, even when sklearnex is loaded. I'm curious if that is also true for skops.io methods. Essentially, if a new object is created, i.e. the constructor is called, Intel optimizations should apply. It would be interesting to see if there are any compatibility issues.
I think @adrinjalali already looked into that, maybe he can comment.

@BenjaminBossan
Copy link
Collaborator

The latest commit worked locally so I hope this time the example runs.

The doc build is failing, but it's still progress :) The trouble now is that building the docs has a dependency on sklearnex, which is not installed. Could you please add it to the dependencies as a "doc" dependency?

I'm curious if that is also true for skops.io methods. Essentially, if a new object is created, i.e. the constructor is called, Intel optimizations should apply.

I think this will depend on how exactly the patching works, which I don't know. Ideally, there would be an easy way to check on each estimator if it was patched, something like hasattr(estimator, "_uses_intelex") or sklearnex.check_is_patched(estimator), I believe this would be really useful.

@ahuber21
Copy link
Contributor Author

ahuber21 commented Feb 23, 2023

Progress indeed. The sklearnex.check_is_patched(estimator) suggestion is very good. I'm wondering if we already have something like this. Let me ping @napetrov, maybe he knows more.

Edit: There is an unreleased function that allows you to see the global patching status in the latest main branch, but nothing to investigate model instances. I've created a feature request.

@ahuber21
Copy link
Contributor Author

By the way, the pipeline now complains about the missing HF token. I followed the approach that I found in plot_hf_hub.py, so I think the error is nothing new. How is this expected to work?

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Andreas for adding this example. I have now given it a more thorough review. Overall it looks good, but I think a few things could be improved further. Please take a look at my comments.

By the way, the pipeline now complains about the missing HF token. I followed the approach that I found in plot_hf_hub.py, so I think the error is nothing new. How is this expected to work?

Yes, that's unfortunately expected, see #47. As soon as this is merged, it should work, as it would use the HF token set up for this repo. When I tested this locally with my token, it worked.

@ahuber21
Copy link
Contributor Author

Hi Benjamin. Thanks! I hope all your comments are addressed with the latest commit.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @ahuber21 , it's in a pretty good shape.

I left a few minor comments. I haven't made sure the line lengths are correct. You don't need to "accept suggestion" here, you can take the texts and apply them the way you like.

Comment on lines 84 to 103
from sklearn.neighbors import KNeighborsClassifier

clf = KNeighborsClassifier(3)
clf.fit(X_train, y_train)

# %%
# Training the optimized model
# ============================
# We apply ``patch_sklearn()`` and reimport the model to load the patched
# version. A message is shown, telling us that Intel(R) Extension for
# Scikit-learn* has been enabled.
patch_sklearn()
from sklearn.neighbors import KNeighborsClassifier

clf_opt = KNeighborsClassifier(3)
clf_opt.fit(X_train, y_train)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in a script, this is very confusing. It'd be nicer if we explicitly import the class from sklearnex for it to be clear which one it is, and leave a comment that users don't have to change imports, and they can only add the call to patch on top of the file before they run imports.

We can also here show training times and compare them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the imports. Training for kNN is <1s and in many cases sklearnex comes with a few percent of overhead. We're working on that, but for now I think it wouldn't help to add it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case, it does help to add it, since right now we're only selectively showing data, not the whole truth, and that's not what we wanna do here.

Copy link
Contributor Author

@ahuber21 ahuber21 Mar 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a perf_counter for the fit stage for completeness. On my machine, both fit times were ~0.05s. The numbers are printed, but doing anything else with them (like calculating ratios) won't help if we don't do multiple runs. I hope you're fine with how it is now. Let me know :)

@ahuber21 ahuber21 force-pushed the feat-sklearnex-example branch from 2dce17e to 0dd8640 Compare February 28, 2023 12:38
@adrinjalali
Copy link
Member

Build failing, once you're done with edits, please ping me for a second review.

@ahuber21
Copy link
Contributor Author

ahuber21 commented Mar 1, 2023

Don't quite understand what happened with the ubuntu tests. To me it looked unrelated. Anyway, I just pushed an update without the "Download model and re-evaluate" section, so there'll be a new run.
From my side, I've added everything. Let me know what you think @adrinjalali

@ahuber21 ahuber21 force-pushed the feat-sklearnex-example branch from 9471ca6 to 1eb5d5f Compare March 1, 2023 08:50
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the nits, LGTM. WDYT @BenjaminBossan

@ahuber21 ahuber21 force-pushed the feat-sklearnex-example branch from 8fc0c67 to bad02d2 Compare March 1, 2023 17:08
@adrinjalali adrinjalali changed the title FEAT: Add intelex inference example DOC Add intelex inference example Mar 2, 2023
@ahuber21 ahuber21 force-pushed the feat-sklearnex-example branch from bad02d2 to e73cfb9 Compare March 2, 2023 11:47
@ahuber21 ahuber21 force-pushed the feat-sklearnex-example branch from e73cfb9 to e047a1d Compare March 2, 2023 17:29
@ahuber21
Copy link
Contributor Author

ahuber21 commented Mar 2, 2023

Force-pushed again because in the intro I still said that we download the models again. Nothing else changed.

@adrinjalali
Copy link
Member

@ahuber21 we squash and merge anyway, please avoid force pushing since it makes reviewing changes harder :)

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks good, thanks Andreas

@BenjaminBossan BenjaminBossan merged commit ffe3ea7 into skops-dev:main Mar 3, 2023
@ahuber21 ahuber21 deleted the feat-sklearnex-example branch March 3, 2023 09:48
@ahuber21
Copy link
Contributor Author

I think this will depend on how exactly the patching works, which I don't know. Ideally, there would be an easy way to check on each estimator if it was patched, something like hasattr(estimator, "_uses_intelex") or sklearnex.check_is_patched(estimator), I believe this would be really useful.

@BenjaminBossan this feature is now added in the latest release.

from sklearnex import is_patched_instance
my_instance = pickle.load(open("some_file.pkl", "rb"))
is_patched_instance(my_instance)  # true if it was trained using sklearnex, false otherwise 

@BenjaminBossan
Copy link
Collaborator

Nice addition, thx for letting us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants