-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ppscore interpretation #39
Comments
Hi Fernando, thank you for posting the question. |
Hi Fernando, I gave the question quite some thought and for now I would like to reply the following: First, the technical interpretation of the PPS is:
The interpretation depends on the context: Nevertheless, there are some levels that are often helpful during everyday life:
Based on those levels, it is often important to check the PPS for multiple columns and then determine your interpretation based on that. What do you think about this explanation? Do you have some specific scenarios, use cases or questions that you want the PPS to answer? |
Hi @FlorianWetschoreck, thanks for your attention and commitment. I liked your consideration on "interpreting on the context", it makes sense. Also the ranges you described generates insights for me, so I can code a way to automate a ppsThreshold (keep reading to understand what I mean) I'm using specifically ppscore as part of an approach to detect relationships (linear, non linear, quadratic, trigonometric, log, exponential, etc) among variables. I see that would need a faceted approach since relationships are asymmetric and may have different shapes (linear, non linear etc). I consider combining ppscore, spearman corr and MIC.
So ultimately I would be able to conclude sth like:
Based on your personal and professional experience, this "relationship detection" approach that I described, makes sense to you? Are you aware of any (preferably) open-source python package that does that? Regards, Fernando |
Hi Fernando, About your solution approach:
|
Hi @FlorianWetschoreck, @tkrabel , @SuryaThiru
Quick question: how to properly interpret ppscore?
Say if you have a dataset, 3000 rows x 30 columns; you then apply pps.matrix(), then sort values by ppscore. Is there a "rule of thumb" or rational-guideline to categorize ppscore levels?
Like the following:
Note: the ranges and categories I gave are totally arbitrary
I read this article - https://towardsdatascience.com/rip-correlation-introducing-the-predictive-power-score-3d90808b9598 , but couldnt find an answer there
Thanks a million, Fernando
The text was updated successfully, but these errors were encountered: