Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] [1/5] Header structure: replace specializations #1437

Conversation

ahendriksen
Copy link
Contributor

@ahendriksen ahendriksen commented Apr 20, 2023

This PR replaces the specialization headers with a split header structure. It is a more reviewable version of #1415.

This is achieved by following the strategy outlined in issue #1416.

The following steps are performed:

Preparatory

  • Move files around.
    • d9801e8 Move header files with expensive function templates from header.cuh to header-inl.cuh to preserve history and minimize the diff in later comits.
    • 50b374d Move raft_runtime source files into separate directory. They would otherwise clash with source files added in later commits.
  • Fix existing issues in RAFT unearthed by this PR:
    • 8974ae3 Fix several locations where files were depending on transitive includes
    • 95ef31b Fix distance::getWorkspaceSize. It did not compile previously.
  • Prepare for changes that are coming:
    • 7edcb6a Create a functor for rbf_fin_op that can be used to instantiate templates later on.
    • 71de7bd Similar move for distance functors used in registers.cuh.

Meat of PR: Use header structure as outlined in the new documentation.

Fixup: Add some more template instances, remove unnecessary functions, fix a test.

@ahendriksen
Copy link
Contributor Author

Review tip: I have moved the headers by adding the -inl suffix in the first commit. The diff is significantly smaller if you start from there.

@ahendriksen ahendriksen changed the title [ENH] [0/5] Header structure: replace specializations [ENH] [1/5] Header structure: replace specializations Apr 20, 2023
@tfeher
Copy link
Contributor

tfeher commented Apr 21, 2023

Review tip: I have moved the headers by adding the -inl suffix in the first commit. The diff is significantly smaller if you start from there.

Indeed filtering the commits help tremendously. We have large mechanical changes in, the following commits:

It make sense to review these separately. The rest (commits 2-9, 12-16) can be then lumped together during review. It is good to follow review the PR according to your PR description that gives context for each commit.

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Allard for breaking up the oiriginal PR (#1415) into multiple pieces! Also thanks for the detailed description of the changes in the PR description. The questions I had for #1415 are already answered here (or a tracking issue is created). I have one question left, but that could be answered in a follow up PR.

Overall the PR looks good to me, and I am excited about the compile time improvements that we can get once we go through all the steps outlined here and in he follow up PRs.

cpp/include/raft/distance/distance-ext.cuh Show resolved Hide resolved
Copy link
Member

@benfred benfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look great to me - I'm really looking forward to having this series of PR's merged, I think the improvements to build times will make RAFT development much easier.

I'd like to see the corresponding downstream PR's pass CI before getting this merged (like rapidsai/cuml#5363) - do you know why its failing with cuda 11.2/11.4 but working with 11.8? The failure looks like a missing symbol w/ the RusselRao distance

In [3]: import cpp_demangle

In [4]: cpp_demangle.demangle("_ZN4raft8distance8distanceILNS0_12DistanceTypeE18EfffiEEvRKNS_9resourcesEPKT0_S8_PT2_T3_SB_SB_bS6_")
Out[4]: 'void raft::distance::distance<(raft::distance::DistanceType)18, float, float, float, int>(raft::resources const&, float const*, float const*, float*, int, int, int, bool, float)'

@cjnolet
Copy link
Member

cjnolet commented Apr 25, 2023

do you know why its failing with cuda 11.2/11.4 but working with 11.8?

@vyasr actually fixed this one today. It turns out some recent changes to the conda packages were causing the cuda toolkit to get pinned to a specific version (11.8) and it was causing cuml's CI to pull in an old version of raft packages for other cuda versions.

Cuml's CI should run fine now (assuming theres no failures from the changes in this PR)

@tfeher
Copy link
Contributor

tfeher commented Apr 26, 2023

Some of the headers was split under the detail folder, like (like select_k.cuh) while others were above the detail folder (public level). @ahendriksen was it done this way to minimize the changes, or was there any other motivation?

The public API did not change in any of the cases, so functionally the current status is fine. But I think it is nicer if we split the files under the detail folder when possible. I would prefer if we do not hold up this PR for such cosmetic changes, but we should consider returning to this question.

@ahendriksen
Copy link
Contributor Author

Some of the headers was split under the detail folder, like (like select_k.cuh) while others were above the detail folder (public level). @ahendriksen was it done this way to minimize the changes, or was there any other motivation?

Splitting at detail rather than at public level was done for three reasons:

  • The detail header was used internally: for instance the ivf_flat_interleaved_scan_kernel is called in the ivf_flat header as well as the refine header.
  • Kind of accidentally: the coalesced reduction kernel was duplicated many many times so I added an instance. In hindsight, it might be better to instantiate at a higher level (for instance linalg::reduce, linalg::norm, etc). To tease apart which raft::linalg kernels are worth instantiating and how could be investigated in the future. The coalesced reduction instances show how it could be done.
  • As part of design: the raft::distance kernels were optimized for compile time previously. Compiling the kernels without the public header includes can be substantially faster.

But I think it is nicer if we split the files under the detail folder when possible.

I think we should favor splitting at the public level. This is beneficial for two reasons: (1) to document which instances exist and (2) to allow downstream libraries to instantiate templates themselves. Instancing at public level allows downstream libraries to opt in to RAFT_EXPLICIT_INSTANTIATE_ONLY: when they use a template that has not been instantiated in libraft.so, they can instantiate it themselves. Here, it is better that these templates are part of the public API so they don't break as often.

@tfeher
Copy link
Contributor

tfeher commented Apr 26, 2023

I think we should favor splitting at the public level. This is beneficial for two reasons: (1) to document which instances exist ,,,,

Good points!

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really looking forward to these changes, @ahendriksen! I've gone through all the files and I think everything looks good, with the exception of some trivial things (mostly docs related).

cpp/include/raft/distance/distance.cuh Outdated Show resolved Hide resolved
cpp/include/raft/distance/distance-inl.cuh Show resolved Hide resolved
docs/source/developer_guide.md Outdated Show resolved Hide resolved
docs/source/developer_guide.md Outdated Show resolved Hide resolved
docs/source/developer_guide.md Outdated Show resolved Hide resolved
docs/source/using_libraft.md Outdated Show resolved Hide resolved
docs/source/using_libraft.md Outdated Show resolved Hide resolved
@ahendriksen ahendriksen force-pushed the enh-replace-specializations-by-split-headers branch from 80abe51 to fa523aa Compare April 27, 2023 12:47
@cjnolet cjnolet closed this Apr 28, 2023
rapids-bot bot pushed a commit that referenced this pull request Apr 28, 2023
This is a rebase of all the commits in PRs:
- #1437 
- #1438 
- #1439 
- #1440 
- #1441 

The original PRs have not been rebased to preserve review comments. This PR is up to date with branch 23.06.

Closes #1416

Authors:
  - Allard Hendriksen (https://github.com/ahendriksen)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1469
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Build Time Improvement CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants