Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring GPU extensions #1365

Merged
merged 66 commits into from
Apr 12, 2024

Conversation

kmp5VT
Copy link
Collaborator

@kmp5VT kmp5VT commented Mar 27, 2024

Description

In this PR I am working to address some refactoring issues. Such as removing using statements from module files, try to diagnose/fix GPU blocksparse backend failures, update adapt functions to use storagemode like AMDGPU, remove gpu examples from testing based on plan listed in google drive

Checklist:

  • Investigate why the AMDGPU and CUDA backends don’t work for BlockSparse storage. Implement resize! for AMDGPUs to get block sparse operations on AMDGPU working (see Add resize! JuliaGPU/Metal.jl#279 for a reference on how it might be implemented).
  • Update the adapt functions to have storagemode as the input parameter
    Update the Extensions libraries to move the bare using statements from the NDTensorsXExt.jl file to the files where they are necessary.
  • Delete NDTensors/ext/examples.
  • Remove examples from NDTensors test
  • Define a MtlArrayAdaptor for adapting storage to MtlArray, analagous to CuArrayAdaptor, to be used here: https://github.com/ITensor/ITensors.jl/blob/v0.3.57/NDTensors/ext/NDTensorsMetalExt/adapt.jl
  • update the generic_zeros and generic_rand functions for Dense and AbstractArray
  • All unittests pass

@kmp5VT kmp5VT marked this pull request as draft March 27, 2024 00:46
@mtfishman
Copy link
Member

Note that resize! is defined for AMDGPU.ROCVector: https://github.com/JuliaGPU/AMDGPU.jl/blob/v0.8.11/src/array.jl#L266.

@codecov-commenter
Copy link

codecov-commenter commented Mar 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 50.23%. Comparing base (e8197ce) to head (9ce1991).
Report is 4 commits behind head on main.

❗ Current head 9ce1991 differs from pull request most recent head 57abe1f. Consider uploading reports for the commit 57abe1f to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1365       +/-   ##
===========================================
- Coverage   79.59%   50.23%   -29.36%     
===========================================
  Files         114      113        -1     
  Lines        9032     8979       -53     
===========================================
- Hits         7189     4511     -2678     
- Misses       1843     4468     +2625     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kmp5VT
Copy link
Collaborator Author

kmp5VT commented Mar 27, 2024

So far I have updated the CUDA library to remove using statements from the module package. Next I am looking into this

Investigate why the CUDA backends don’t work for BlockSparse storage (there is no append! function) and find a way to fix this issue.

@kmp5VT kmp5VT marked this pull request as ready for review April 5, 2024 12:43
return StoreT(data)
end

function generic_zeros(StoreT::Type{<:Dense}, dims::Tuple{Integer})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function generic_zeros(StoreT::Type{<:Dense}, dims::Tuple{Integer})
function generic_zeros(StoreT::Type{<:Dense}, dims::Integer)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My proposal was to define both of them in terms of ::Integer, not ::Tuple{Integer}, since it is a simpler interface for new types to overload in general.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is that if you do something like this generic_zeros(Dense, (2,)) it calls the AbstractArray version of this funciton

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so then you should rewrite the AbstractArray version so that doesn't happen.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand what you're suggesting to do. I reverted the code back the original design you suggested which works for generic_zeros(Dense, 2) and generic_zeros(Dense, (2,))

@kmp5VT
Copy link
Collaborator Author

kmp5VT commented Apr 12, 2024

@mtfishman If you have a minute can you please take a look at this PR again? Thank you I appreciate your time!

NDTensors/Project.toml Outdated Show resolved Hide resolved
Project.toml Outdated Show resolved Hide resolved
@mtfishman
Copy link
Member

Looks good, thanks!

@mtfishman mtfishman merged commit 2a6afa5 into ITensor:main Apr 12, 2024
15 of 16 checks passed
@mtfishman mtfishman changed the title [WIP] Refactoring GPU extensions Refactoring GPU extensions Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants