Change Dataset interface to support sparse matrix. #165

eugene-yang · 2019-06-30T19:42:58Z

Changed the libact.base.dataset.Dataset interface to support sparse matrix as X.
The interfaces for get_entries, get_labeled_entries and get_unlabled_entries are changed.
Since most of the usage of these methods are getting the list of tuple and zip them back to the a feature matrix and list of labels, directly change the interface to output in this format would benefit both using and storing the data.

This would also directly support scipy.sparse.csr_matrix since the zipping during the initialization is removed.
The interface of Dataset.data[] is still implemented via __getitem__ magic method to support some of the use case that involve direct access to the entries.

… matrix

codecov-io · 2019-06-30T19:48:51Z

Codecov Report

Merging #165 into master will decrease coverage by <.01%.
The diff coverage is 98.52%.

@@            Coverage Diff             @@
##           master     #165      +/-   ##
==========================================
- Coverage   89.46%   89.46%   -0.01%     
==========================================
  Files          37       37              
  Lines        1557     1566       +9     
==========================================
+ Hits         1393     1401       +8     
- Misses        164      165       +1

Impacted Files	Coverage Δ
...ery_strategies/multiclass/hierarchical_sampling.py	`95.85% <100%> (ø)`	⬆️
...query_strategies/multilabel/binary_minimization.py	`100% <100%> (ø)`	⬆️
...ct/query_strategies/active_learning_by_learning.py	`85.71% <100%> (ø)`	⬆️
...ltilabel/cost_sensitive_reference_pair_encoding.py	`92.1% <100%> (ø)`	⬆️
libact/query_strategies/variance_reduction.py	`68.88% <100%> (-1.33%)`	⬇️
libact/labelers/ideal_labeler.py	`100% <100%> (ø)`	⬆️
libact/query_strategies/hintsvm.py	`93.02% <100%> (ø)`	⬆️
...es/multilabel/multilabel_with_auxiliary_learner.py	`89.58% <100%> (ø)`	⬆️
libact/models/multilabel/dummy_clf.py	`94.11% <100%> (ø)`	⬆️
.../multiclass/active_learning_with_cost_embedding.py	`100% <100%> (ø)`	⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d86b7b8...776ee7b. Read the comment docs.

libact/base/dataset.py

yangarbiter · 2019-07-01T18:12:04Z

The changes looks good to my.
Thanks for the contribution.
Just fix the coding style (mainly about whitespaces) and I think it's be ready to merge.

eugene-yang · 2019-07-01T18:14:19Z

@yangarbiter Do you want me to fix them?
Like the whitespaces around the brackets?

yangarbiter · 2019-07-01T18:17:27Z

Yes, please fix them
For example lb = lbr.label( trn_ds.data[ask_id][0] ) should be lb = lbr.label(trn_ds.data[ask_id][0])
I didn't mark all of them.
But try to comply with google's style guide (https://google.github.io/styleguide/pyguide.html#36-whitespace)
Thanks.

yangarbiter · 2019-07-03T07:34:37Z

Last two questions and I'll merge.
Thanks for the hard work @eugene-yang .

examples/albl_plot.py

examples/plot.py

libact/base/dataset.py

yangarbiter · 2019-07-03T07:13:59Z

libact/base/dataset.py

        """
-        return list(filter(lambda entry: entry[1] is not None, self.data))
+        return self._X[self.get_labeled_mask()], self._y[self.get_labeled_mask()].tolist()


Do we need to make y into a list, can we just keep it numpy array?

We actually need this.
For some multi-label code, I think they are taking advantage of the nested structure of out output. I removed it ran against the unit test and got back with a lot of fails. So I would suggest just keep it as a list. And I don't think the performance would not be affected too much.

Ok, I think we can merge first and I'll look into this later.

Cool. I am starting to adapt libact to my own research experiments. So might start opening more full requests for improvements in the future :)

libact/query_strategies/quire.py

change dataset interface to support numpy arrays and scipy csr sparse…

f497ac8

… matrix

update examples

12eb065

yangarbiter assigned yangarbiter and sian-chen Jun 30, 2019

yangarbiter requested review from yangarbiter and sian-chen June 30, 2019 23:59

sian-chen approved these changes Jul 1, 2019

View reviewed changes

libact/base/dataset.py Outdated Show resolved Hide resolved

remove redundant whitespace

37da965

eugene-yang added 3 commits July 2, 2019 10:37

update coding style

acf7136

oops, miss 2 places

3e32656

should be all...

f629d3a

yangarbiter reviewed Jul 3, 2019

View reviewed changes

whitespace

776ee7b

yangarbiter approved these changes Jul 3, 2019

View reviewed changes

yangarbiter merged commit 33975f8 into ntucllab:master Jul 3, 2019

eugene-yang mentioned this pull request Jul 3, 2019

libact can't adequately handle sparse matrices (csr_matrix) #155

Closed

eugene-yang deleted the sparse-dataset branch July 3, 2019 23:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change Dataset interface to support sparse matrix. #165

Change Dataset interface to support sparse matrix. #165

eugene-yang commented Jun 30, 2019 •

edited

Loading

codecov-io commented Jun 30, 2019 •

edited

Loading

yangarbiter commented Jul 1, 2019 •

edited

Loading

eugene-yang commented Jul 1, 2019

yangarbiter commented Jul 1, 2019

yangarbiter commented Jul 3, 2019 •

edited

Loading

yangarbiter Jul 3, 2019

eugene-yang Jul 3, 2019

yangarbiter Jul 3, 2019

eugene-yang Jul 3, 2019

Change Dataset interface to support sparse matrix. #165

Change Dataset interface to support sparse matrix. #165

Conversation

eugene-yang commented Jun 30, 2019 • edited Loading

codecov-io commented Jun 30, 2019 • edited Loading

Codecov Report

yangarbiter commented Jul 1, 2019 • edited Loading

eugene-yang commented Jul 1, 2019

yangarbiter commented Jul 1, 2019

yangarbiter commented Jul 3, 2019 • edited Loading

yangarbiter Jul 3, 2019

Choose a reason for hiding this comment

eugene-yang Jul 3, 2019

Choose a reason for hiding this comment

yangarbiter Jul 3, 2019

Choose a reason for hiding this comment

eugene-yang Jul 3, 2019

Choose a reason for hiding this comment

eugene-yang commented Jun 30, 2019 •

edited

Loading

codecov-io commented Jun 30, 2019 •

edited

Loading

yangarbiter commented Jul 1, 2019 •

edited

Loading

yangarbiter commented Jul 3, 2019 •

edited

Loading