-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Add support for empty work division grids #2312
base: develop
Are you sure you want to change the base?
Conversation
Use a template parameter to distinguish blocking (sync) and non-blocking (async) queues, instead of writing two almost identical copies of the enqueue function.
f920750
to
7651495
Compare
7651495
to
bb824d7
Compare
bb824d7
to
519ae79
Compare
// No extent is allowed to be zero or greater then the allowed maximum. | ||
if((gridBlockExtent[i] < 1) || (blockThreadExtent[i] < 1) || (threadElemExtent[i] < 1) | ||
|| (gridBlockExtentMax[i] < gridBlockExtent[i]) || (blockThreadExtentMax[i] < blockThreadExtent[i]) | ||
// The grid extent is allowed to be zero, indicating that no work should be done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The grid extent is allowed to be zero, indicating that no work should be done. | |
// The grid-block extent is allowed to be zero (or zero-vector), indicating that no work should be done. |
WorkDiv empty_grid{{0u, 0u, 0u}, {1u, 1u, 1u}, {1u, 1u, 1u}}; | ||
CHECK_NOTHROW(alpaka::exec<Acc>(queue, empty_grid, TestKernelAbort{})); | ||
|
||
CHECK_NOTHROW(alpaka::wait(queue)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] since the queue is blocking it blocks the host and the program will not end before the end of kernel execution. Hence this CHECK_NOTHROW will never be tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the question.
Are you asking
- if
alpaka::wait(queue)
is redundant ? or - if, even if it throws,
CHECK_NOTHROW
will not catch it ?
If the first: maybe, it depends if the implementation of alpaka::exec
for a blocking queue is guaranteed to catch all errors, or not. An extra check will not harm.
If the second: alpaka::wait()
may throw is the runtime is in error; CHECK_NOTHROW
should catch that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last line alpaka::wait(queue) does nothing since alpaka::exec is already blocking the host. But i see extra check will not harm.
// The grid extent is allowed to be zero, indicating that no work should be done. | ||
// The threads and elements extents are not allowed to be zero. | ||
// No extent is allowed to be greater then the allowed maximum. | ||
if((std::is_signed_v<TIdx> && gridBlockExtent[i] < 0) || (blockThreadExtent[i] < 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if((std::is_signed_v<TIdx> && gridBlockExtent[i] < 0) || (blockThreadExtent[i] < 1) | |
if constexpr((std::is_signed_v<TIdx> && gridBlockExtent[i] < 0) || (blockThreadExtent[i] < 1) |
to ensure that is checked in compile time and avoid silly compile warnings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you can do that.
A common pattern in the CMS software has been to fix the block size at compile time and determine the number of blocks at runtime based on the problem size, hat is the total number of elements to be processed.
On a few occasions, the number of elements turned out to be zero.
This pointed out a discrepancy between the CUDA/HIP and the CPU serial back-ends:
To address this discrepancy, this PR proposed to support zero-sized grids in all back-ends, amending the definition of what constitutes a valid work division, and adding an explicit test for the 1D and 3D cases.
For CUDA/HIP back-ends, the library now simply does not submit the kernel launch at all.
Note: this change cannot break any existing code, since until now zero-sized grids could not be launched. However, once this feature is relied upon, users may inadvertently introduce bugs in their code if they rely on some operations to be done on every kernel launch, and do not realise that an empty grid does not result in a kernel not being launched at all.
An alternative approach could be to continue forbidding empty grids, and enforce it in the CPU back-end.
This PR depends on (and includes) #2311.