Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Refactor the Buffer class #11447

Merged
merged 40 commits into from
Aug 11, 2022

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Aug 3, 2022

Description

This PR introduces factory functions to create Buffer instances, which makes it possible to change the returned buffer type based on a configuration option in a follow-up PR.

Beside simplifying the code base a bit, this is motivated by the spilling work in #10746. We would like to introduce a new spillable Buffer class that requires minimal changes to the existing code and is only used when enabled explicitly. This way, we can introduce spilling in cuDF as an experimental feature with minimal risk to the existing code.

@shwina and I discussed the possibility to let Buffer.__new__ return different class type instances instead of using factory functions but we concluded that having Buffer() return anything other than an instance of Buffer is simply too surprising :)

Notice, this is breaking because it removes unused methods such as Buffer.copy() and Buffer.nbytes.
However, we still support creating a buffer directly by calling Buffer(obj). AFAIK, this is the only way Buffer is created outside of cuDF, which a github search seems to confirm.
This PR doesn't change the signature of Buffer.__init__() anymore.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@madsbk madsbk added 2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function breaking Breaking change labels Aug 3, 2022
@github-actions github-actions bot added the Python Affects Python cuDF API. label Aug 3, 2022
@codecov
Copy link

codecov bot commented Aug 3, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.10@e1a4e03). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head ff3c948 differs from pull request most recent head 19114a8. Consider uploading reports for the commit 19114a8 to get more accurate results

@@               Coverage Diff               @@
##             branch-22.10   #11447   +/-   ##
===============================================
  Coverage                ?   86.48%           
===============================================
  Files                   ?      145           
  Lines                   ?    22844           
  Branches                ?        0           
===============================================
  Hits                    ?    19756           
  Misses                  ?     3088           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@madsbk madsbk changed the title [WIP] Refactoring of the Buffer class [WIP] Refactor the Buffer class Aug 3, 2022
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits

madsbk and others added 2 commits August 3, 2022 18:34
Co-authored-by: Lawrence Mitchell <[email protected]>
Co-authored-by: Lawrence Mitchell <[email protected]>
@vyasr
Copy link
Contributor

vyasr commented Aug 3, 2022

I generally like the changes in this PR. Overall I might prefer to make more of the factories classmethods so that 1) any access to internals of the Buffer class is restricted to methods, not free functions, and 2) subclasses can inherit and modify the factories as needed. Also relevant in principle (although we are dealing with Python and not C++ here) is item 23 from Scott Meyer's Effective C++: prefer free functions rather than member functions because free functions provide better encapsulation (much more directly relevant in C++ where attributes can actually be made private). As a general guideline I would use classmethods for any factory that uses internal methods or attributes (as_buffer using Buffer.__new__ is a good example). I would use free functions if all they're doing is preprocessing arguments and then forwarding to a constructor, unless we anticipate needing to customize the behavior for subclasses.

buffer_from_pointer currently looks like a good example of the latter, but I think that's misleading. It seems like the sole purpose of buffer_from_pointer is to provide an API-level statement that it Never copies any data and ptr must represents device memory, right? However, that's not really true right now since it just forwards arguments to the constructor, so the type hints are really the only thing indicating that a user shouldn't pass an arbitrary object here. It might make sense to implement this as a classmethod that directly sets the members (essentially just the pointer branch of __init__) so that there's no ambiguity where the function works for arbitrary objects in practice despite being documented otherwise.

I started writing this comment then had to step out. Looks like @wence- had similar comments!

@madsbk
Copy link
Member Author

madsbk commented Aug 4, 2022

Overall, I agree with both of you, my concern with classmethods arise when we want to switch to other Buffer types seamlessly.
For example, in #10746 we want as_bufer() to return a SpilleableBuffer when the spilling option is enabled globally (e.g. by setting CUDF_SPILLING=True).
Would it be OK if calling Buffer.as_buffer(obj) returns a SpilleableBuffer instance?

@madsbk
Copy link
Member Author

madsbk commented Aug 4, 2022

Another design issue I like to discuss is how we support Buffer sub-classes like SpilleableBuffer that will be implemented in Cython since Cython classes cannot inherent from Python classes.

I see two approaches:

  • Implement Buffer in Cython
  • Rename the Buffer class to something like NonSpilleableBuffer and then define a typing.Protocol named Buffer that NonSpilleableBuffer and SpilleableBuffer (and others) implements.

Initially, I thought that using a typing.Protocol would be a great solution but it has the side effect that calling Buffer() becomes illegal. A fair mount of downstream projects calls Buffer(obj) to convert obj to a buffer thus we would force them to use as_buffer() instead.

So maybe implementing Buffer in Cython is the best solution? Or is there a third option I haven't considered?

@wence-
Copy link
Contributor

wence- commented Aug 4, 2022

Overall, I agree with both of you, my concern with classmethods arise when we want to switch to other Buffer types seamlessly.
For example, in #10746 we want as_bufer() to return a SpilleableBuffer when the spilling option is enabled globally (e.g. by setting CUDF_SPILLING=True).
Would it be OK if calling Buffer.as_buffer(obj) returns a SpilleableBuffer instance?

I think this needs some thought. Do you envisage that SpillableBuffer vs Buffer will be a global configuration switch, or do you think that you'll want buffer creation to make spillable objects on a per-buffer basis?

@wence-
Copy link
Contributor

wence- commented Aug 4, 2022

Initially, I thought that using a typing.Protocol would be a great solution but it has the side effect that calling Buffer() becomes illegal. A fair mount of downstream projects calls Buffer(obj) to convert obj to a buffer thus we would force them to use as_buffer() instead.

On the whole I like interfaces over inheritance (which is what Protocol provides). In terms of migration and downstream projects, one could

  1. Add a deprecation warning in Buffer pointing at as_buffer for 22.10 (removing in 22.12)
  2. introduce the Protocol-based approach in 22.12

Note that this is multiply-breaking because you can't isinstance(foo, SomeProtocol) unless you ask for a @runtime_checkable protocol.

@madsbk
Copy link
Member Author

madsbk commented Aug 5, 2022

I think this needs some thought. Do you envisage that SpillableBuffer vs Buffer will be a global configuration switch, or do you think that you'll want buffer creation to make spillable objects on a per-buffer basis?

We want a global configuration switch. I hope at some point we can remove the old Buffer functionality and use SpillableBuffer exclusively.

On the whole I like interfaces over inheritance

Agree, we could also embrace this fully and introduce a DeviceBufferLike Protocol that cuDF uses in combination with as_buffer()? (maybe as_device_buffer_like() is a better name?).

Then we could keep Buffer as is for now?

Co-authored-by: Bradley Dice <[email protected]>
@madsbk
Copy link
Member Author

madsbk commented Aug 10, 2022

Thanks for the reviews, I think I have addressed all of them?

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits

@wence-
Copy link
Contributor

wence- commented Aug 10, 2022

Thanks for the reviews, I think I have addressed all of them?

Sorry, I had some still in-flight

Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! My concerns were addressed. Let's wait for all the other reviewers to approve before merging

@madsbk
Copy link
Member Author

madsbk commented Aug 10, 2022

@shwina What do think of creating a utils/config.py module instead utils/string.py #11447 (comment) ?

@madsbk
Copy link
Member Author

madsbk commented Aug 10, 2022

Thanks for the reviews, I think I have addressed all of them?

Sorry, I had some still in-flight

No worries, I think I have addressed them now?

@wence- wence- self-requested a review August 10, 2022 16:19
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question -- otherwise LGTM.

@madsbk
Copy link
Member Author

madsbk commented Aug 11, 2022

@vyasr do you have anything?

@madsbk
Copy link
Member Author

madsbk commented Aug 11, 2022

@brandon-b-miller do you have anything more?

@shwina
Copy link
Contributor

shwina commented Aug 11, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 95935db into rapidsai:branch-22.10 Aug 11, 2022
@madsbk madsbk deleted the buffer_refactor branch August 12, 2022 07:02
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Nov 1, 2022
cuDF's `Buffer` doesn't have `Buffer.empty()` since rapidsai/cudf#11447. Can we remove `CumlArray.copy()`?

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - William Hicks (https://github.com/wphicks)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #4958
rapids-bot bot pushed a commit that referenced this pull request Nov 2, 2022
This PR replaces `DeviceBufferLike` with `Buffer` and clear the way for a spillable sub-class of `Buffer`.

#### Context 
The introduction of the [`DeviceBufferLike`](#11447) protocol was motivated by [the spilling work](#11553), which we initially thought would have to be implemented in Cython. However, it can be done in pure Python, which makes `DeviceBufferLike` an unneeded complexity.  

#### Review notes 

- In order to introduce a spillable-buffer in the future, we still use a factory function, `as_buffer()`, to create Buffers. 
- `buffer.py` is moved into the submodule `core.buffer` to ease organization when adding the spillable-buffer and spilling manager. 

#### Breaking
This PR breaks external use of `Buffer` e.g. `Buffer.__init__` raise an exception now and the `"constructor-kwargs"` header from #4164 has been removed. 
Submitted a PR to fix this in cuml: rapidsai/cuml#4965

##

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Ashwin Srinath (https://github.com/shwina)

URL: #12009
jakirkham pushed a commit to jakirkham/cuml that referenced this pull request Feb 27, 2023
cuDF's `Buffer` doesn't have `Buffer.empty()` since rapidsai/cudf#11447. Can we remove `CumlArray.copy()`?

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - William Hicks (https://github.com/wphicks)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4958
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants