Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ks_abc when run with plot=False still plots the graph #147

Closed
RNarayan73 opened this issue Feb 10, 2023 · 13 comments · Fixed by #152
Closed

ks_abc when run with plot=False still plots the graph #147

RNarayan73 opened this issue Feb 10, 2023 · 13 comments · Fixed by #152
Labels
enhancement New feature or request

Comments

@RNarayan73
Copy link

RNarayan73 commented Feb 10, 2023

Version check:

Run and copy the output:

import sys, dython
print(sys.version_info)
print(dython.__version__)
sys.version_info(major=3, minor=10, micro=9, releaselevel='final', serial=0)
0.6.7

this is strange, as I checked conda list to confirm that I have 0.7.3 installed

conda list dython
# packages in environment at C:\Users\naray\Miniconda3\envs\skl_310:
#
# Name                    Version                   Build  Channel
dython                    0.7.3              pyhd8ed1ab_0    conda-forge

Describe the bug:

I would like to use the ks_abc function as a metric via make_scorer(). All I want is a scalar output for the abc value.

However, when called as a standalone function with plot=False as shown below, it still shows the plot (in addition to the abc value) in jupyter.

When called as a standalone function without plot=False argument, it shows the plot, but doesn't show the requested value from the dict.

Code to reproduce:

import dython

ks_abc(y_true, y_pred[:, CLASS_POSITIVE], plot=False)['abc']

Error message:

There is no error message. It simply draws the plot in addition to the scalar value from the output dict even when plot=False.

Input data:

@RNarayan73 RNarayan73 added the bug Something isn't working label Feb 10, 2023
@shakedzy
Copy link
Owner

I think I understand why this happens (and I assume you run this on a Jupyter notebook too). I'll work on a fix for this soon.

@shakedzy
Copy link
Owner

Ok, so I'm taking back what I wrote, this isn't a bug. This behavior occurs when you run this on Jupyter notebook, and is the result of Jupyter's behavior of displaying ax when they're provided as the final output of a block (which is the case here, as the ax is part of the output dict).

If you wish to force Jupyter not to display it, simply add a None as the last line of your cell.

If this happens outside of Jupyter notebook, let me know.

@shakedzy shakedzy added nothing to do There is nothing to do and removed bug Something isn't working labels Feb 28, 2023
@RNarayan73
Copy link
Author

@shakedzy, Thanks for your reply.

As I mentioned in my original post, I am trying to use it as a function to return a scalar for ks_abc which in turn I intend to package as a metric with make_scorer(). I will then use the metric to score and evaluate models and in hp tuning.

I can't imagine where I should add a None to prevent the ax from rendering on Jupyter.

Would it not make sense to not output ax as part of the dict at all within the function at all if plot=False? Taking it a step further, would it not be possible for the function to not even generate ax at all if it doesn't need to plot? That would make it more efficient when used to simply return the ks_abc value, wouldn't it?

I hope I have clarified my need.
In summary, while you had developed the function primarily as a plotting function, I find value in using it as a metric as per this blog https://towardsdatascience.com/the-metric-system-how-to-correctly-measure-your-model-17d3feaed6ab

I trust you will agree and advise how I can achieve this.

Regards
Narayan

@shakedzy
Copy link
Owner

shakedzy commented Mar 4, 2023

Hey Narayan, it will be great if you can post the function you developed here, so I can try to figure out what the issue is

@RNarayan73
Copy link
Author

Hi @shakedzy, here you go!

import dython

def ks_abc_score(y_true, y_prob):
    return ks_abc(y_true, y_prob, plot=False)['abc']

Later, I wrap it as below:

make_scorer(ks_abc_score, needs_proba=True)

and use it as a metric in HP tuning.

Hope this helps.

Regards
Narayan

@shakedzy
Copy link
Owner

Please provide a full script I can run

@Socvest
Copy link

Socvest commented Mar 18, 2023

This seemed to work for me:

import matplotlib.pyplot as plt
dict_Results = nominal.associations(dataframe, plot=False)
plt.close()

When plot=True the last line does not prevent the chart from displaying

Original source for solution.

Great package btw! Thanks!

@RNarayan73
Copy link
Author

Thanks for the suggestion, @Socvest. Your workaround does seem to arrest the problem and reduce run time significantly. I've included it below in the code for comparison but commented it out to highlight the initial problem.

Here you go @shakedzy !

from sklearn.datasets import make_classification
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from dython.model_utils import ks_abc

def ks_abc_score(y_true, y_prob):
    return ks_abc(y_true, y_prob, plot=False)['abc']
#     res = ks_abc(y_true, y_prob, plot=False)
#     plt.close()
#     return res['abc']


X, y = make_classification(n_samples=100, n_features=4)

grid_search_pipe = GridSearchCV(Pipeline([('clf', SGDClassifier(loss='log_loss'))]), 
                                param_grid={'clf__alpha': [10e-02, 10e-01]}, 
                                scoring=make_scorer(ks_abc_score, needs_proba=True), 
                                verbose=1
                               )
grid_search_pipe.fit(X, y)

y_prb = grid_search_pipe.predict_proba(X)
ks_abc_score(y, y_prb[:, 1])

Note that for each of the 10 fits in the example above, it generates a ks_abc chart in jupyter, which takes up significant memory and processing time. The call to ks_abc_score function adds another chart!
@Socvest's workaround does cut down on the run time and presumably memory, but makes the function to calculate a score look very awkward.

Hope this helps in getting to a fix.

Regards
Narayan

@shakedzy
Copy link
Owner

shakedzy commented May 6, 2023

hey @RNarayan73 - thanks for the script. I just ran it on Jupyter Lab, and indeed got several plots. I ran the exact same thing from a Python console and a as Python script, and got no plots at all. This again concludes that the issue here is not ks_abc or dython, but how Jupyter works with matplotlib plots under the hood. So unfortunately, I will not make a fix for this, as there's nothing broken, and I too recommend using plt.close() to fix this in Jupyter.

@RNarayan73
Copy link
Author

Hi @shakedzy thanks for looking into this.
My suggestion is,
Would it not make sense to only generate and output ax if plot=True?
Would it not be possible for the function to not even generate ax at all if it doesn't need to plot? That would make it more efficient when used to simply return the ks_abc value, wouldn't it?

@shakedzy
Copy link
Owner

shakedzy commented May 8, 2023

hey @RNarayan73 - the point of plot=False is to not plot as a side-effect, but to still create the ax and return it, either to manipulate it or to plot it later. So plot=False does what it should.

The point here is that the issue is due to the framework used, not he library. BUT. - I did thought of a way to fix this. Keep following!

@shakedzy shakedzy reopened this May 8, 2023
@shakedzy shakedzy added enhancement New feature or request and removed nothing to do There is nothing to do labels May 8, 2023
shakedzy added a commit that referenced this issue May 8, 2023
@shakedzy shakedzy linked a pull request May 8, 2023 that will close this issue
@shakedzy
Copy link
Owner

shakedzy commented May 8, 2023

hey @RNarayan73 - install from source from this PR and let me know if it works.

verify you installed correctly by running:

import dython
print(dython.__version__)

This should yield 0.7.4.dev

@RNarayan73
Copy link
Author

Excellent, @shakedzy. It works fine now.

However, installing it via pip forced an update to numpy 1.24.
I had originally installed the conda version which didn't require a numpy udpate and I hope the new version through conda won't force one either.

Regards,
Narayan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants