infinite values PSI #115

gverbock · 2021-03-26T12:54:03Z

Problem Description
In the current implementation of AutoDist, the PSI is infinite if one of the bin from the test dataset is empty. Is could be nice to include the possibility to as add one 'fake' observation to the empty bin and get an estimate of the PSI that is closer to reality

Desired Outcome

Solution Outline

class DistributionStatistics(object):


def compute(self, d1, d2, avoid_empty_bins, verbose=False)

if self.binning_strategy:
            self.binner.fit(d1)
            d1_preprocessed = self.binner.counts_
            d2_preprocessed = self.binner.compute(d2)
            if avoid_empty_bins: 
                        d2_preprocessed = np.where(d2_preprocessed == 0, 1, d2_preprocessed)

Another possibility is to adjust the PSI function itself:

def psi(d1, d2, verbose=False):



# Eventually correct for empty bins in d2
     if avoid_empty_bins: 
                        previous_count = d2.sum()
                        d2 = np.where(d2 == 0, 1, d2)

# Calculate the number of samples in each distribution
    n = d1.sum()
    m = d2.sum()
    if previous_count != m:
           # log warning

 # Calculate the ratio of samples in each bin
    expected_ratio = d1 / n
    actual_ratio = d2 / m

 psi_value = np.sum((actual_ratio - expected_ratio) * np.log(actual_ratio / expected_ratio))

Also it could be good to have a warning when avoid_empty_bin is effectively used.

In this dummy example it leads to a PSI of zero but in larger dataset, the impact is limited.

gverbock added the enhancement New feature or request label Mar 26, 2021

Matgrb mentioned this issue Mar 26, 2021

Fix issue with empty bins #116

Merged

Matgrb closed this as completed in #116 Mar 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

infinite values PSI #115

infinite values PSI #115

gverbock commented Mar 26, 2021 •

edited

Loading

infinite values PSI #115

infinite values PSI #115

Comments

gverbock commented Mar 26, 2021 • edited Loading

gverbock commented Mar 26, 2021 •

edited

Loading