You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The root cause is that we don't ignore nulls when sampling training data. The fact that this works for L2 and Cosine appears to largely be an accident, and may be producing incorrect values.
We will change the sampling method to only sample non-null rows.
We were not filtering out null values when sampling. Because we often
call `array.values()` on Arrow arrays, which ignores the null bitmap, we
are often silently treating the nulls as zeros (or possibly undefined
values). Only thing that caught these nulls is an assertion. However,
residualization occurring with L2 and Cosine often meant that these
values were transformed and null information was lost before the
assertion, which is why it got past previous unit tests.
This PR adds more assertions validating there aren't nulls, and makes
sure the sampling code handles null vectors.
Closes#3402Closes#3400
The text was updated successfully, but these errors were encountered: