Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ht.array() default to copy=None (e.g., only if necessary) #1119

Merged
merged 29 commits into from
Jun 19, 2023

Conversation

ClaudiaComito
Copy link
Contributor

@ClaudiaComito ClaudiaComito commented Mar 15, 2023

Description

This PR changes the default of the ht.array(copy) attribute to None. Previous default was copy=True. We introduce a deviation from the NumPy API to reduce the memory footprint of Heat operations. The changes comply with the Python array-API standard.

  • Previous default behaviour when creating a DNDarray: copy=True

    • ht.array by default would make a copy of the underlying data object.
    • Users could disable this by explicitly setting copy=False
    • Pros: we'd be sure that by default, process-local data objects are contiguous
    • Cons: obviously by default we are copying huge arrays
  • New default: following the Python array-API standard for asarray, the following options are now available for the copy attribute:
    - copy (Optional[bool]) – boolean indicating whether or not to copy the input. If True, the function must always copy. If False, the function must never copy for input which supports the buffer protocol and must raise a ValueError in case a copy would be necessary. If None, the function must reuse existing memory buffer if possible and copy otherwise. Default: None.

Issue/s resolved: #1117 #990

Changes proposed:

  • factories.array() attribute copy now defaults to None, i.e. existing memory buffer is reused if possible, copied otherwise.
  • copy=False raises a ValueError if a copy would be necessary. Note: this only holds for "input which supports buffer protocol", i.e. not for scalars.
  • manipulations.diagonal() modified to return a contiguous local array
  • updated tests

Type of change

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
    The underlying local data object of a DNDarray (dndarray.larray) might now be a view of a torch tensor. Copy might be necessary before communication.

Memory requirements

In progress

Performance

In progress

Due Diligence

  • All split configurations tested
  • Multiple dtypes tested in relevant functions
  • Documentation updated (if needed)
  • Title of PR is suitable for corresponding CHANGELOG entry

Does this change modify the behaviour of other functions? If so, which?

yes

The underlying local data object of a DNDarray (dndarray.larray) might now be a view of a torch tensor. Copy might be necessary before communication.

@codecov
Copy link

codecov bot commented Mar 15, 2023

Codecov Report

Merging #1119 (04b9259) into main (6ddc295) will decrease coverage by 0.09%.
The diff coverage is 100.00%.

❗ Current head 04b9259 differs from pull request most recent head 17ca24f. Consider uploading reports for the commit 17ca24f to get more accurate results

@@            Coverage Diff             @@
##             main    #1119      +/-   ##
==========================================
- Coverage   91.85%   91.77%   -0.09%     
==========================================
  Files          74       72       -2     
  Lines       10712    10505     -207     
==========================================
- Hits         9840     9641     -199     
+ Misses        872      864       -8     
Flag Coverage Δ
unit 91.77% <100.00%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
heat/core/factories.py 97.12% <100.00%> (-0.92%) ⬇️
heat/core/manipulations.py 98.86% <100.00%> (ø)
heat/core/types.py 96.83% <100.00%> (+0.01%) ⬆️

... and 6 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ghost
Copy link

ghost commented Mar 15, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

Mystic-Slice
Mystic-Slice previously approved these changes Mar 25, 2023
Copy link
Collaborator

@Mystic-Slice Mystic-Slice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me!

@Mystic-Slice
Copy link
Collaborator

Mystic-Slice commented Mar 25, 2023

The tests' failing has something to do with the numpy type system. After searching around, I found out that:
They have a separate dtype structure.
So, obj.dtype returns "int64" which is actually numpy.dtype[int64] which even though is equal to numpy.int64, is not the same object.

> np.dtype("int64") == np.int64 # True
> np.dtype("int64") is np.int64 # False

Source

Workaround

It is to make use of the type attribute that the numpy.dtype object has, which gives us back the usual data type (here, numpy.int64)
Add this snippet after line 516 in heat/core/type.py

# Get around the numpy type system (np.dtype[<actual_type_we_want>])
if(hasattr(a_type, "type")):
    a_type = a_type.type

This solves the issue for me locally.

PS: Are there any other objects that might also have a type attribute?

@ClaudiaComito ClaudiaComito added this to the 1.3.0 milestone Mar 29, 2023
@ClaudiaComito ClaudiaComito changed the base branch from release/1.2.x to main March 29, 2023 10:12
@ClaudiaComito ClaudiaComito dismissed Mystic-Slice’s stale review March 29, 2023 10:12

The base branch was changed.

@github-actions
Copy link
Contributor

Thank you for the PR!

@ClaudiaComito
Copy link
Contributor Author

The tests' failing has something to do with the numpy type system. After searching around, I found out that: They have a separate dtype structure. So, obj.dtype returns "int64" which is actually numpy.dtype[int64] which even though is equal to numpy.int64, is not the same object.

> np.dtype("int64") == np.int64 # True
> np.dtype("int64") is np.int64 # False

Source

Workaround

It is to make use of the type attribute that the numpy.dtype object has, which gives us back the usual data type (here, numpy.int64) Add this snippet after line 516 in heat/core/type.py

# Get around the numpy type system (np.dtype[<actual_type_we_want>])
if(hasattr(a_type, "type")):
    a_type = a_type.type

This solves the issue for me locally.

PS: Are there any other objects that might also have a type attribute?

Thanks a lot for reviewing @Mystic-Slice , and for proposing a solution. I have modified ht.canonical_heat_types similarly to what you suggested. torch.dtype have no type attribute... yet! We will have to revisit if that changes.

@ClaudiaComito ClaudiaComito added API Anything relating the API factories array API memory footprint and removed API Anything relating the API labels Mar 30, 2023
@ClaudiaComito ClaudiaComito changed the title ht.array shall not copy by default ht.array(copy) default to None (e.g., only if necessary) instead of True Mar 30, 2023
Mystic-Slice
Mystic-Slice previously approved these changes Mar 30, 2023
@ClaudiaComito
Copy link
Contributor Author

@JuanPedroGHM is there a good way of checking for memory usage within the benchmarks?

@github-actions
Copy link
Contributor

Thank you for the PR!

@github-actions
Copy link
Contributor

Thank you for the PR!

@github-actions
Copy link
Contributor

github-actions bot commented Jun 6, 2023

Thank you for the PR!

@github-actions
Copy link
Contributor

Thank you for the PR!

@ClaudiaComito
Copy link
Contributor Author

I think I'm done here @JuanPedroGHM, let me know if something's missing.

@ClaudiaComito ClaudiaComito mentioned this pull request Jun 13, 2023
4 tasks
@github-actions
Copy link
Contributor

Thank you for the PR!

@github-actions
Copy link
Contributor

Thank you for the PR!

@ClaudiaComito ClaudiaComito assigned mrfh92 and unassigned JuanPedroGHM Jun 19, 2023
Copy link
Collaborator

@mrfh92 mrfh92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view the changes look fine. Therefore, I recommend merging if the CI runs through.

@mrfh92 mrfh92 merged commit 3a840cc into main Jun 19, 2023
@mrfh92 mrfh92 deleted the features/#1117-array-copy-None branch June 19, 2023 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set factories default to copy=Nonein line with Python array-API
4 participants