Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CSD #258

Merged
merged 72 commits into from
Aug 10, 2019
Merged

Add CSD #258

merged 72 commits into from
Aug 10, 2019

Conversation

daletovar
Copy link
Contributor

This is to implement a generalization of csr/csc (#125). The format is based on GRCS/GCCS but instead of linearizing the odd and even dimensions, we linearize arbitrarily. Essentially it stores a csr matrix under-the-hood with the data, indices, and indptr attributes. What's nice about this is that we get 2d csr/csc for free. csr would look like this: CSD((data,indices,indptr),shape=(100,100),compressed_axes=(0,)). After allowing arbitrary axis compression the indexing that I had written no longer works. However, single element indexing seems to work just fine. One tricky aspect is that compressing arbitrary dimensions, which involves transposing axes, changes the ordering so that the array is often not C-ordered. Consequently, changes in storage, coming from reshape, resize, or change_compressed_axes (for now) involves converting to COO first. Indeed, COO is used a lot under-the-hood for converting to-and-from different formats. A little more than I'd like.

Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few high-level comments. Great work!

sparse/compressed/compressed.py Outdated Show resolved Hide resolved
if compressed_axes == None:
compressed_axes = (np.argmin(self.shape),)
elif len(compressed_axes) >= len(shape):
raise ValueError('cannot compress all axes')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice check!

sparse/compressed/compressed.py Outdated Show resolved Hide resolved
sparse/compressed/compressed.py Outdated Show resolved Hide resolved
sparse/compressed/compressed.py Outdated Show resolved Hide resolved
from operator import mul
import numba

def convert_prep(inds,shape,axisptr):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything with heavy loops like this should have the numba decorator added. Python is slow, Numba makes it fast, at the expense of some stuff.

@@ -1704,6 +1704,9 @@ def reshape(self, shape, order='C'):

if self.shape == shape:
return self

if np.prod(self.shape) != np.prod(shape):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should either be changed to reduce as in the shape or dtype=np.intp added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like remnants of my past PR. I'm not sure how but this commit should be deleted.

Copy link
Collaborator

@hameerabbasi hameerabbasi Jun 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine, just don't delete it. Just rewrite the line in a subsequent commit.

@hameerabbasi
Copy link
Collaborator

My bad... Should've been GXCS. 🙁 A simple find/replace should do it. 😄

@daletovar
Copy link
Contributor Author

Thanks for the comments. I really appreciate it. Do you have ideas about how we should do the indexing? I have some thoughts but it seemed like you were interested in working on that.

@daletovar
Copy link
Contributor Author

Hi @hameerabbasi, I've been working on the indexing a little bit and I wanted to run a few things by you. Firstly, indexing generally preserves compressed dimensions. For instance, for an (10,10,10) array with compressed_axes=(1,2), indexing with [:6,0,:5) will produce an array of shape (6,5) with compressed_axes=(1,). If one indexes along only compressed axes or only non-compressed axes then I think the result will default to compressed_axes=(0,). I'm still working on that bit. The main algorithm performs a binary search for every element in the new array. I think this is probably fine for now but I'd like to add indexing routines that make better use of the storage scheme. As a last thought, GXCS doesn't make much sense for 1d arrays. I'm thinking of returning a COO array if indexing into 1d. Alternatively, a 1d GXCS array could have an empty indptr and an empty compressed_axes. I'm curious if you have thoughts on any of this.

@hameerabbasi
Copy link
Collaborator

hameerabbasi commented Jun 25, 2019

Ahh, yes, of course. Indexing is done as you suggest. The compressed axis is split into two parts:

a = indptr[:-1].reshape(compressed_axes_shape)
b = indptr[1:].reshape(compressed_axes_shape)

starts, ends = a[compressed_idx], b[compressed_idx]

For the uncompressed part, you can use the COO indexing almost as-is, with just converting to- and from- flat indices on the fly instead of all the time.

@hameerabbasi
Copy link
Collaborator

Also, I apologize for the unresponsiveness. I'll do my best to be more responsive! Feel free to ask for a meeting if you have further issues/questions!

@hameerabbasi
Copy link
Collaborator

This pull request introduces 1 alert when merging 21817d5 into 939f3ff - view on LGTM.com

new alerts:

  • 1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.


Comment posted by LGTM.com

@hameerabbasi
Copy link
Collaborator

This pull request introduces 1 alert when merging 366a8ed into 939f3ff - view on LGTM.com

new alerts:

  • 1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.


Comment posted by LGTM.com

@hameerabbasi
Copy link
Collaborator

This pull request introduces 1 alert when merging 78752f9 into 939f3ff - view on LGTM.com

new alerts:

  • 1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.


Comment posted by LGTM.com

@hameerabbasi
Copy link
Collaborator

This pull request introduces 1 alert when merging 2445e1e into 939f3ff - view on LGTM.com

new alerts:

  • 1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.


Comment posted by LGTM.com

@hameerabbasi
Copy link
Collaborator

This pull request introduces 1 alert when merging 4b523b9 into 939f3ff - view on LGTM.com

new alerts:

  • 1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.


Comment posted by LGTM.com

@hameerabbasi
Copy link
Collaborator

This pull request introduces 1 alert when merging 5edf2be into 939f3ff - view on LGTM.com

new alerts:

  • 1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.


Comment posted by LGTM.com

@hameerabbasi
Copy link
Collaborator

This pull request introduces 1 alert when merging 02a8589 into 939f3ff - view on LGTM.com

new alerts:

  • 1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.


Comment posted by LGTM.com

@hameerabbasi
Copy link
Collaborator

This pull request introduces 1 alert when merging 7b9dccf into 939f3ff - view on LGTM.com

new alerts:

  • 1 for __eq__ not overridden when adding attributes

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.


Comment posted by LGTM.com

@hameerabbasi hameerabbasi merged commit 8a80598 into pydata:master Aug 10, 2019
@hameerabbasi
Copy link
Collaborator

Thanks, @daletovar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants