Add CSD #258

daletovar · 2019-06-18T04:23:59Z

This is to implement a generalization of csr/csc (#125). The format is based on GRCS/GCCS but instead of linearizing the odd and even dimensions, we linearize arbitrarily. Essentially it stores a csr matrix under-the-hood with the data, indices, and indptr attributes. What's nice about this is that we get 2d csr/csc for free. csr would look like this: CSD((data,indices,indptr),shape=(100,100),compressed_axes=(0,)). After allowing arbitrary axis compression the indexing that I had written no longer works. However, single element indexing seems to work just fine. One tricky aspect is that compressing arbitrary dimensions, which involves transposing axes, changes the ordering so that the array is often not C-ordered. Consequently, changes in storage, coming from reshape, resize, or change_compressed_axes (for now) involves converting to COO first. Indeed, COO is used a lot under-the-hood for converting to-and-from different formats. A little more than I'd like.

hameerabbasi

A few high-level comments. Great work!

sparse/compressed/compressed.py

hameerabbasi · 2019-06-18T07:13:24Z

sparse/compressed/compressed.py

+        if compressed_axes == None:
+            compressed_axes = (np.argmin(self.shape),)
+        elif len(compressed_axes) >= len(shape):
+            raise ValueError('cannot compress all axes')


Nice check!

sparse/compressed/compressed.py

hameerabbasi · 2019-06-18T07:18:50Z

sparse/compressed/convert.py

+from operator import mul
+import numba
+
+def convert_prep(inds,shape,axisptr):


Anything with heavy loops like this should have the numba decorator added. Python is slow, Numba makes it fast, at the expense of some stuff.

hameerabbasi · 2019-06-18T07:19:51Z

sparse/coo/core.py

@@ -1704,6 +1704,9 @@ def reshape(self, shape, order='C'):

        if self.shape == shape:
            return self
+
+        if np.prod(self.shape) != np.prod(shape):


Should either be changed to reduce as in the shape or dtype=np.intp added.

This looks like remnants of my past PR. I'm not sure how but this commit should be deleted.

It's fine, just don't delete it. Just rewrite the line in a subsequent commit.

Co-Authored-By: Hameer Abbasi <[email protected]>

hameerabbasi · 2019-06-18T07:46:44Z

My bad... Should've been GXCS. 🙁 A simple find/replace should do it. 😄

daletovar · 2019-06-18T08:25:57Z

Thanks for the comments. I really appreciate it. Do you have ideas about how we should do the indexing? I have some thoughts but it seemed like you were interested in working on that.

daletovar · 2019-06-25T00:10:11Z

Hi @hameerabbasi, I've been working on the indexing a little bit and I wanted to run a few things by you. Firstly, indexing generally preserves compressed dimensions. For instance, for an (10,10,10) array with compressed_axes=(1,2), indexing with [:6,0,:5) will produce an array of shape (6,5) with compressed_axes=(1,). If one indexes along only compressed axes or only non-compressed axes then I think the result will default to compressed_axes=(0,). I'm still working on that bit. The main algorithm performs a binary search for every element in the new array. I think this is probably fine for now but I'd like to add indexing routines that make better use of the storage scheme. As a last thought, GXCS doesn't make much sense for 1d arrays. I'm thinking of returning a COO array if indexing into 1d. Alternatively, a 1d GXCS array could have an empty indptr and an empty compressed_axes. I'm curious if you have thoughts on any of this.

hameerabbasi · 2019-06-25T10:07:00Z

Ahh, yes, of course. Indexing is done as you suggest. The compressed axis is split into two parts:

a = indptr[:-1].reshape(compressed_axes_shape)
b = indptr[1:].reshape(compressed_axes_shape)

starts, ends = a[compressed_idx], b[compressed_idx]

For the uncompressed part, you can use the COO indexing almost as-is, with just converting to- and from- flat indices on the fly instead of all the time.

hameerabbasi · 2019-06-25T13:33:34Z

Also, I apologize for the unresponsiveness. I'll do my best to be more responsive! Feel free to ask for a meeting if you have further issues/questions!

hameerabbasi · 2019-08-04T06:58:42Z

This pull request introduces 1 alert when merging 21817d5 into 939f3ff - view on LGTM.com

new alerts:

1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.

Comment posted by LGTM.com

hameerabbasi · 2019-08-04T08:30:37Z

This pull request introduces 1 alert when merging 366a8ed into 939f3ff - view on LGTM.com

new alerts:

1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.

Comment posted by LGTM.com

hameerabbasi · 2019-08-08T09:12:51Z

This pull request introduces 1 alert when merging 78752f9 into 939f3ff - view on LGTM.com

new alerts:

1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.

Comment posted by LGTM.com

hameerabbasi · 2019-08-08T21:16:56Z

This pull request introduces 1 alert when merging 2445e1e into 939f3ff - view on LGTM.com

new alerts:

1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.

Comment posted by LGTM.com

hameerabbasi · 2019-08-09T07:37:19Z

This pull request introduces 1 alert when merging 4b523b9 into 939f3ff - view on LGTM.com

new alerts:

1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.

Comment posted by LGTM.com

hameerabbasi · 2019-08-09T09:05:02Z

This pull request introduces 1 alert when merging 5edf2be into 939f3ff - view on LGTM.com

new alerts:

1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.

Comment posted by LGTM.com

hameerabbasi · 2019-08-09T21:00:51Z

This pull request introduces 1 alert when merging 02a8589 into 939f3ff - view on LGTM.com

new alerts:

1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.

Comment posted by LGTM.com

hameerabbasi · 2019-08-10T07:54:01Z

This pull request introduces 1 alert when merging 7b9dccf into 939f3ff - view on LGTM.com

new alerts:

1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.

Comment posted by LGTM.com

hameerabbasi · 2019-08-11T18:44:30Z

Thanks, @daletovar

daletovar and others added 18 commits April 18, 2019 22:23

Update core.py

9ea4053

Update core.py

26d8a1f

Add files via upload

4033683

Update __init__.py

e7d8141

test

dfcd83d

fix imports

7ce64f1

Update convert.py

5f3f4e9

Add files via upload

b903b22

Delete csr_indexing.py

0a37664

Delete csc.py

b722d73

Delete csc_indexing.py

92bd1fd

Delete csr.py

147ec5c

Update indexing.py

8375ae6

Update compressed.py

bcc4c7d

Update compressed.py

a4fe539

Update indexing.py

d4b323b

Add files via upload

7caa68c

Add files via upload

02f3af9

hameerabbasi reviewed Jun 18, 2019

View reviewed changes

daletovar and others added 2 commits June 18, 2019 00:33

Update sparse/compressed/compressed.py

8a1ef1d

Co-Authored-By: Hameer Abbasi <[email protected]>

change CSD to GCXS

d127dae

daletovar added 2 commits June 18, 2019 01:17

Update compressed.py

388c76b

change to GXCS

55ef6c0

daletovar added 2 commits June 21, 2019 13:12

Delete Untitled.ipynb

5515274

Update compressed.py

e5ee15d

Update indexing.py

366a8ed

add indexing with None

78752f9

daletovar added 2 commits August 8, 2019 13:31

fix indexing for compressed axes

c11ed49

use numba.typed.List

2445e1e

daletovar added 3 commits August 8, 2019 23:54

Update convert.py

c30db2e

Update compressed.py

20216a7

Update indexing.py

4b523b9

daletovar added 5 commits August 9, 2019 00:58

Update test_compressed.py

c63ebfa

Update utils.py

8bb2b56

Update convert.py

f39c3ea

Update core.py

d9d9614

Update convert.py

5edf2be

Update convert.py

02a8589

daletovar added 4 commits August 10, 2019 00:03

Update convert.py

3792523

Update indexing.py

bb54ec1

Update indexing.py

2f4b538

Update convert.py

7b9dccf

hameerabbasi merged commit 8a80598 into pydata:master Aug 10, 2019

rok mentioned this pull request Nov 25, 2019

ARROW-4226: [C++] Add sparse CSF tensor support apache/arrow#5716

Closed

rgommers mentioned this pull request Mar 9, 2020

Sparse ndarray class scipy/scipy#8162

Closed

luk-f-a mentioned this pull request Jun 27, 2020

Add CSD #125

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CSD #258

Add CSD #258

daletovar commented Jun 18, 2019

hameerabbasi left a comment

hameerabbasi Jun 18, 2019

hameerabbasi Jun 18, 2019

hameerabbasi Jun 18, 2019

daletovar Jun 18, 2019

hameerabbasi Jun 18, 2019 •

edited

Loading

hameerabbasi commented Jun 18, 2019

daletovar commented Jun 18, 2019

daletovar commented Jun 25, 2019

hameerabbasi commented Jun 25, 2019 •

edited

Loading

hameerabbasi commented Jun 25, 2019

hameerabbasi commented Aug 4, 2019

hameerabbasi commented Aug 4, 2019

hameerabbasi commented Aug 8, 2019

hameerabbasi commented Aug 8, 2019

hameerabbasi commented Aug 9, 2019

hameerabbasi commented Aug 9, 2019

hameerabbasi commented Aug 9, 2019

hameerabbasi commented Aug 10, 2019

hameerabbasi commented Aug 11, 2019

Add CSD #258

Add CSD #258

Conversation

daletovar commented Jun 18, 2019

hameerabbasi left a comment

Choose a reason for hiding this comment

hameerabbasi Jun 18, 2019

Choose a reason for hiding this comment

hameerabbasi Jun 18, 2019

Choose a reason for hiding this comment

hameerabbasi Jun 18, 2019

Choose a reason for hiding this comment

daletovar Jun 18, 2019

Choose a reason for hiding this comment

hameerabbasi Jun 18, 2019 • edited Loading

Choose a reason for hiding this comment

hameerabbasi commented Jun 18, 2019

daletovar commented Jun 18, 2019

daletovar commented Jun 25, 2019

hameerabbasi commented Jun 25, 2019 • edited Loading

hameerabbasi commented Jun 25, 2019

hameerabbasi commented Aug 4, 2019

hameerabbasi commented Aug 4, 2019

hameerabbasi commented Aug 8, 2019

hameerabbasi commented Aug 8, 2019

hameerabbasi commented Aug 9, 2019

hameerabbasi commented Aug 9, 2019

hameerabbasi commented Aug 9, 2019

hameerabbasi commented Aug 10, 2019

hameerabbasi commented Aug 11, 2019

hameerabbasi Jun 18, 2019 •

edited

Loading

hameerabbasi commented Jun 25, 2019 •

edited

Loading