-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecation warning for lists passed into DMatrix #3970
Conversation
The documentation does not include lists as an allowed type for the data inputted into DMatrix. Despite this, a list can be passed in without an error. This change would prevent a list form being passed in directly. I experienced an issue where passing in a list vs a np.array resulted in different predictions (sometimes over 10% relative difference) for the same data. Though these differences were infrequent (~1.5% of cases tested), in certain applications this could cause serious issues.
Internally the list is converted to xgboost/python-package/xgboost/core.py Lines 400 to 403 in ce1bbf4
@scwilkinson @hcho3 WDYT? |
@trivialfis As far as I can tell, lists are the only problem datatype that can currently get through DMatrix. It seems then like the simplest way to avoid any issues they cause would be to specifically catch them. |
@scwilkinson Make sense to me. :) But this breaks a test from distributed tests. So I think maybe it's more appropriate to display a deprecation warning for now and remove it completely after the next release. |
@trivialfis That sounds like a good idea! Also handy for people who may be used to passing in lists, so they don't get surprised by their code suddenly throwing TypeErrors. I'd appreciate the help! |
@scwilkinson It doesn't seem I have the permission to push to your branch. Here is a small patch I made, thanks. From c5e1f558b1bc8fb2661d3c38039f376eaef8a1f9 Mon Sep 17 00:00:00 2001
From: fis <[email protected]>
Date: Mon, 10 Dec 2018 01:35:39 +1300
Subject: [PATCH] Use warning instead.
---
python-package/xgboost/core.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/python-package/xgboost/core.py b/python-package/xgboost/core.py
index 225fe756..748a3da0 100644
--- a/python-package/xgboost/core.py
+++ b/python-package/xgboost/core.py
@@ -14,6 +14,7 @@ import ctypes
import os
import re
import sys
+import warnings
import numpy as np
import scipy.sparse
@@ -382,7 +383,8 @@ class DMatrix(object):
weight = _maybe_dt_array(weight)
if isinstance(data, list):
- raise TypeError('can not initialize DMatrix from list')
+ warnings.warn('Initializing DMatrix from List is deprecated.',
+ DeprecationWarning)
elif isinstance(data, STRING_TYPES):
self.handle = ctypes.c_void_p()
_check_call(_LIB.XGDMatrixCreateFromFile(c_str(data),
--
2.19.1
|
@trivialfis I've added a change similar to yours, but it continues with initialisation after raising the warning. Hopefully this will fix the tests! |
Codecov Report
@@ Coverage Diff @@
## master #3970 +/- ##
===========================================
+ Coverage 56.35% 56.66% +0.3%
Complexity 205 205
===========================================
Files 186 187 +1
Lines 14775 14826 +51
Branches 498 498
===========================================
+ Hits 8327 8401 +74
+ Misses 6209 6186 -23
Partials 239 239
Continue to review full report at Codecov.
|
Seems Travis hangs. Just trying to trigger a rerun. |
@scwilkinson Thanks for the good contribution ! |
@trivialfis Thank you for the help! |
The documentation does not include lists as an allowed type for the data inputted into DMatrix. Despite this, a list can be passed in without an error. This change would prevent a list form being passed in directly.
I experienced an issue where passing in a list vs a np.array resulted in different predictions (sometimes over 10% relative difference) for the same data. Though these differences were infrequent (~1.5% of cases tested), in certain applications this could cause serious issues.