Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not execute tests using EventHostManualTrigger if not supported #546

Merged

Conversation

BenjaminW3
Copy link
Member

Adds a trait and a test method isEventHostManualTriggerSupported to not use the EventHostManualTrigger when it is not supported. It was always supported with CUDA 8.0 but is only conditionally enabled in CUDA 9.
This fixes the failing tests due to this disabled feature in #537 by checking the trait before executing the test.

@psychocoderHPC
Copy link
Member

Why is it only conditional enabled since CUDA 9. What is the reason for it?

@BenjaminW3
Copy link
Member Author

I do not know. I have not yet found any reasoning.

@BenjaminW3
Copy link
Member Author

When you are looking for other usages of CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS in github you will find two occurences where they do the same. I have had a look at those places before opening the PR.

@tdd11235813
Copy link
Contributor

There is still the error. I try to investigate this, why the test is executed.

$ test/unit/event/event

Running 20 test cases...
unknown location(0): fatal error: in "event/eventTestShouldBeFalseWhileInQueueAndTrueAfterBeingProcessed<std__tuple<alpaka__dev__DevCudaRt,_alpaka__queue__QueueCudaRtAsync>>": std::runtime_error: /home/matwerne/cuda-workspace/alpaka/test/common/include/alpaka/test/event/EventHostManualTrigger.hpp(649) cuStreamWaitValue32( static_cast<CUstream>(queue.m_spQueueImpl->m_CudaQueue), reinterpret_cast<CUdeviceptr>(event.m_spEventImpl->m_devMem), 0x01010101u, CU_STREAM_WAIT_VALUE_GEQ) : 'unrecognized error code': 'unrecognized error code'!
/home/matwerne/cuda-workspace/alpaka/test/unit/event/src//EventTest.cpp(70): last checkpoint: "eventTestShouldBeFalseWhileInQueueAndTrueAfterBeingProcessed" entry.
unknown location(0): fatal error: in "event/eventReEnqueueShouldBePossibleIfNobodyWaitsFor<std__tuple<alpaka__dev__DevCudaRt,_alpaka__queue__QueueCudaRtAsync>>": std::runtime_error: /home/matwerne/cuda-workspace/alpaka/test/common/include/alpaka/test/event/EventHostManualTrigger.hpp(649) cuStreamWaitValue32( static_cast<CUstream>(queue.m_spQueueImpl->m_CudaQueue), reinterpret_cast<CUdeviceptr>(event.m_spEventImpl->m_devMem), 0x01010101u, CU_STREAM_WAIT_VALUE_GEQ) : 'unrecognized error code': 'unrecognized error code'!
/home/matwerne/cuda-workspace/alpaka/test/unit/event/src//EventTest.cpp(115): last checkpoint: "eventReEnqueueShouldBePossibleIfNobodyWaitsFor" entry.
unknown location(0): fatal error: in "event/eventReEnqueueShouldBePossibleIfSomeoneWaitsFor<std__tuple<alpaka__dev__DevCudaRt,_alpaka__queue__QueueCudaRtAsync>>": std::runtime_error: /home/matwerne/cuda-workspace/alpaka/test/common/include/alpaka/test/event/EventHostManualTrigger.hpp(649) cuStreamWaitValue32( static_cast<CUstream>(queue.m_spQueueImpl->m_CudaQueue), reinterpret_cast<CUdeviceptr>(event.m_spEventImpl->m_devMem), 0x01010101u, CU_STREAM_WAIT_VALUE_GEQ) : 'unrecognized error code': 'unrecognized error code'!
/home/matwerne/cuda-workspace/alpaka/test/unit/event/src//EventTest.cpp(176): last checkpoint: "eventReEnqueueShouldBePossibleIfSomeoneWaitsFor" entry.
unknown location(0): fatal error: in "event/waitForEventThatAlreadyFinishedShouldBeSkipped<std__tuple<alpaka__dev__DevCudaRt,_alpaka__queue__QueueCudaRtAsync>>": std::runtime_error: /home/matwerne/cuda-workspace/alpaka/test/common/include/alpaka/test/event/EventHostManualTrigger.hpp(649) cuStreamWaitValue32( static_cast<CUstream>(queue.m_spQueueImpl->m_CudaQueue), reinterpret_cast<CUdeviceptr>(event.m_spEventImpl->m_devMem), 0x01010101u, CU_STREAM_WAIT_VALUE_GEQ) : 'unrecognized error code': 'unrecognized error code'!
/home/matwerne/cuda-workspace/alpaka/test/unit/event/src//EventTest.cpp(254): last checkpoint: "waitForEventThatAlreadyFinishedShouldBeSkipped" entry.

*** 4 failures are detected in the test module "event"

@tdd11235813
Copy link
Contributor

tdd11235813 commented Jun 25, 2018

it is only an issue on CUDA 8. CUDA 9 does not run the test, as stream memory operations are not supported (result is 0 by querying CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS).
However, CUDA 8 does not have this attribute and if stream mem ops are supported on CUDA 8 by default, the issue is somewhere else ...

@BenjaminW3
Copy link
Member Author

Maybe they are not supported by default on CUDA 8? Maybe the flag was simply missing and it was expected to work everywhere but it did not so they introduced the new flag?

@tdd11235813
Copy link
Contributor

I agree with you. It is not really supported in CUDA 8, so that test can be disabled in this case!?

/alpaka/test/common/include/alpaka/test/event/EventHostManualTrigger.hpp(588) cuStreamWaitValue32( static_cast(queue.m_spQueueImpl->m_CudaQueue), reinterpret_cast(event.m_spEventImpl->m_devMem), 0x01010101u, CU_STREAM_WAIT_VALUE_GEQ) : 'CUDA_ERROR_NOT_SUPPORTED': 'operation not supported'!
/alpaka/test/unit/event/src//EventTest.cpp(232): last checkpoint: "waitForEventThatAlreadyFinishedShouldBeSkipped" entry.

(see PR #548, which fixes error reporting for driver API errors)

@BenjaminW3
Copy link
Member Author

@psychocoderHPC As far as I remember, this test once worked on your system using CUDA 8. Can you confirm or reevaluate this?
If we can not be sure that this works with CUDA 8 we may have to disable the test for CUDA 8.

@BenjaminW3
Copy link
Member Author

@psychocoderHPC Could you please retest this on your CUDA 8 system?
I will nevertheless deactivate the EventHostManualTrigger on CUDA 8 because there does not seem to be a way to tell if those operations are supported.

@BenjaminW3 BenjaminW3 force-pushed the topic-event-test-mem-ops branch from 2ea6309 to c0af872 Compare June 28, 2018 07:08
@BenjaminW3
Copy link
Member Author

@tdd11235813 Now the tests should not fail anymore on your system and #537 should be finished. Please approve if it is ok for you.
I am integrating make test / ctest into the CI in another PR.

@BenjaminW3 BenjaminW3 force-pushed the topic-event-test-mem-ops branch from c0af872 to 1945774 Compare June 28, 2018 10:10
Copy link
Contributor

@tdd11235813 tdd11235813 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me, thanks a bunch for your work!

@BenjaminW3 BenjaminW3 merged commit accde3a into alpaka-group:develop Jun 28, 2018
@BenjaminW3 BenjaminW3 deleted the topic-event-test-mem-ops branch June 28, 2018 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants