This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 449
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
893b341
to
a4c5fe3
Compare
49771d7
to
02f54b8
Compare
b0a9193
to
83003c9
Compare
1a05faf
to
bcc0b2f
Compare
9316b71
to
3372af6
Compare
4337a78
to
1e5d345
Compare
4539f3f
to
2e41e30
Compare
1a59daa
to
cc75c15
Compare
nvc++ will stop defining __NVCOMPILER_CUDA_ARCH__ soon, removing the ability to determine the PTX arch at compile time. This updates agents and collective algorithms to no longer require the PTX_ARCH template parameter, and changes the CUB_WARP_SIZE(PTX_ARCH), etc helpers to not take an argument. The latter macros only mattered on obsolete arches and have no effect on currently supported architectures.
This fixes the issue reported in NVIDIA#299. There's no clear reason why this should use `RandomBits` unconditionally.
This can be used to restrict the number of kernel instantations until `__CUDA_ARCH_LIST__` is available.
This check was being used to detect host vs. device code at compile-time, which is no longer possible on all of our platforms. Changed it to just detect RDC state and nothing else. Some tests wouldn't compile with RDC enabled. Fixed them.
NVBug 2431416: If a kernel is only launched on the host (e.g. the launch is ifdef'd out when !defined(__CUDA_ARCH__)), spurious unused parameter warnings will be emitted from the __wrapper_device_stub_[kernel name] stub. We regularly hit this due to per-architecture tunings, and there is no known workaround that can be easily applied. We just disable unused parameter warnings around the kernel definitions.
Closing, these changes will be merged in a series of smaller PRs. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
compiler: nvc++
Specific to the NVC++ compiler.
P0: must have
Absolutely necessary. Critical issue, major blocker, etc.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.