-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DMatrix handling of one-hot labels (Python) #10095
Comments
At the moment, the DMatrix supports consuming 2-D labels not doesn't support returning them. We implemented the receiving part for DMatrix so that we can get basic multi-target/label training to work, the returning part is still working in progress as we are trying to improve the support for custom-objective with multi-target/label. |
I see. In the meantime, suppose I needed to calculate a custom metric or objective on predicted values vs. DMatrix.get_label(). Should I simply reference to original dataset instead of pulling it out of DMatrix as a work-around? Any idea on when the returning functionality will become available? |
Unfortunately, It's not a high priority at the moment.
If you can use the original dataset, use it, with or without the feature support in DMatrix. One less conversion is more efficient. The That said, the returned label is a row-major matrix (as internal knowledge instead of something we want to document), you can use this to reshape the numpy array accordingly. |
Closing in favor of #9043 |
It appears that DMatrix does not handle one-hot encoded labels appropriately.
Symptom:
DMatrix flattens one-hot encoded labels into a 1d array for shape [samples * classes,] instead of preserving original shape.
Example:
Explanation of output:
The original y variable in the above example has shape [100, 5] (e.g. five one-hot encoded labels)
However, if I try to extract labels from the DMatrix, it has been reshaped to [100 * 5, ].
Question:
Is this working as intended? Does DMatrix not support one-hot encoded labels?
Additional Notes:
There does not appear to be any parameter to fix this issue in the official docs.
The text was updated successfully, but these errors were encountered: