-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change Dataset interface to support sparse matrix. #165
Conversation
Codecov Report
@@ Coverage Diff @@
## master #165 +/- ##
==========================================
- Coverage 89.46% 89.46% -0.01%
==========================================
Files 37 37
Lines 1557 1566 +9
==========================================
+ Hits 1393 1401 +8
- Misses 164 165 +1
Continue to review full report at Codecov.
|
The changes looks good to my. |
@yangarbiter Do you want me to fix them? |
Yes, please fix them |
Last two questions and I'll merge. |
""" | ||
return list(filter(lambda entry: entry[1] is not None, self.data)) | ||
return self._X[self.get_labeled_mask()], self._y[self.get_labeled_mask()].tolist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to make y into a list, can we just keep it numpy array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually need this.
For some multi-label code, I think they are taking advantage of the nested structure of out output. I removed it ran against the unit test and got back with a lot of fails. So I would suggest just keep it as a list. And I don't think the performance would not be affected too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think we can merge first and I'll look into this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I am starting to adapt libact to my own research experiments. So might start opening more full requests for improvements in the future :)
Changed the
libact.base.dataset.Dataset
interface to support sparse matrix as X.The interfaces for
get_entries
,get_labeled_entries
andget_unlabled_entries
are changed.Since most of the usage of these methods are getting the list of tuple and zip them back to the a feature matrix and list of labels, directly change the interface to output in this format would benefit both using and storing the data.
This would also directly support
scipy.sparse.csr_matrix
since the zipping during the initialization is removed.The interface of
Dataset.data[]
is still implemented via__getitem__
magic method to support some of the use case that involve direct access to the entries.