You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem Description
In the current implementation of AutoDist, the PSI is infinite if one of the bin from the test dataset is empty. Is could be nice to include the possibility to as add one 'fake' observation to the empty bin and get an estimate of the PSI that is closer to reality
Another possibility is to adjust the PSI function itself:
defpsi(d1, d2, verbose=False):
# Eventually correct for empty bins in d2ifavoid_empty_bins:
previous_count=d2.sum()
d2=np.where(d2==0, 1, d2)
# Calculate the number of samples in each distributionn=d1.sum()
m=d2.sum()
ifprevious_count!=m:
# log warning# Calculate the ratio of samples in each binexpected_ratio=d1/nactual_ratio=d2/mpsi_value=np.sum((actual_ratio-expected_ratio) *np.log(actual_ratio/expected_ratio))
Also it could be good to have a warning when avoid_empty_bin is effectively used.
In this dummy example it leads to a PSI of zero but in larger dataset, the impact is limited.
The text was updated successfully, but these errors were encountered:
Problem Description
In the current implementation of AutoDist, the PSI is infinite if one of the bin from the test dataset is empty. Is could be nice to include the possibility to as add one 'fake' observation to the empty bin and get an estimate of the PSI that is closer to reality
Desired Outcome
Solution Outline
Another possibility is to adjust the PSI function itself:
Also it could be good to have a warning when avoid_empty_bin is effectively used.
In this dummy example it leads to a PSI of zero but in larger dataset, the impact is limited.
The text was updated successfully, but these errors were encountered: