Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

differnt baseline scores for the same y #73

Open
LiuYYSS opened this issue Apr 29, 2023 · 1 comment
Open

differnt baseline scores for the same y #73

LiuYYSS opened this issue Apr 29, 2023 · 1 comment

Comments

@LiuYYSS
Copy link

LiuYYSS commented Apr 29, 2023

Hi, if my understanding is correct, the calculation of the baseline score does not involve X. Therefore, for the same Y, even if different X are used, the baseline score should be the same. However, I recently came across a strange phenomenon when using pps.matrix(). As shown in the attached image, three different baseline scores appeared when calculating the baseline score for Y=5. I have uploaded a pickle dump of my pandas dataframe. I hope you can test it and tell me if you can reproduce this error.

image

dataframe.zip

import ppscore as pps
pps.matrix(dataset_df)
ppscore=1.30
pandas=1.5.3
@alessandro-lica-DO
Copy link

Hi, perhaps it could depend on any null values you have in your dataset. I think that the baseline model only uses the samples of the dataset where the x is defined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants