Detection of linear patterns and decoupling of concerns #37

FlorianWetschoreck · 2020-08-31T06:30:17Z

When the PPS is applied toward linear relationships with the same error but different slopes, the score varies a lot e.g. from 0.1 to 0.7 depending on the slope.

This might not be the behaviour that we expect intuitively and normalizing the target does not help.
The reason for this is that the ppscore calculates the ratio of the variance of the predictor to the variance of the baseline. If the slope is steep, the ratio is higher because the baseline makes more errors. If the slope is flat, the variances are nearly the same.

The underlying problem is that the current metric and calculation of the ppscore couples to questions:

Is there a valid pattern? e.g. statistical significance or predictive power after cross-validation
Is the variance of the pattern low? (compared to baseline variance)

If either of those two criteria is wrong or weak, the ppscore will be low, too.
Only if both are true, the ppscore will be high.

The problem with the linear cases is that the pattern is valid BUT the variance of the pattern is not low because there is a lot of noise - even if the pattern is statistically significant. (High error to signal ratio)
For this scenario (and maybe also for others), we might want to find a calculation that decouples those two concerns

Some rough code:

import pandas as pd
import numpy as np
import seaborn as sns
import bamboolib as bam

import ppscore as pps

df = pd.DataFrame()
df["x"] = np.random.uniform(-2, 2, 1_000_000)
df["error"] = np.random.uniform(-0.5, 0.5, 1_000_000)
df["y"] = df["x"] * df["x"] + df["error"]
df["0.3_linear_x"] = 0.3*df["x"]+df["error"] #0.11 pps
df["0.5_linear_x"] = 0.5*df["x"]+df["error"] #0.4 pps
df["1_linear_x"] = 1*df["x"]+df["error"] # 0.68 pps

# normalized linear to [0,1] via +2 and /4
df["1_linear_x_norm"] = (df["1_linear_x"] + 2)/4 #0.68 pps, too

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detection of linear patterns and decoupling of concerns #37

Detection of linear patterns and decoupling of concerns #37

FlorianWetschoreck commented Aug 31, 2020

Detection of linear patterns and decoupling of concerns #37

Detection of linear patterns and decoupling of concerns #37

Comments

FlorianWetschoreck commented Aug 31, 2020