Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add support for empty work division grids #2312

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Jul 6, 2024

A common pattern in the CMS software has been to fix the block size at compile time and determine the number of blocks at runtime based on the problem size, hat is the total number of elements to be processed.

On a few occasions, the number of elements turned out to be zero.
This pointed out a discrepancy between the CUDA/HIP and the CPU serial back-ends:

  • the CUDA/HIP runtimes fail to launch a kernel with an empty grid (that is, with zero blocks), and report an error;
  • the CPU serial back-end simple does not do any work.

To address this discrepancy, this PR proposed to support zero-sized grids in all back-ends, amending the definition of what constitutes a valid work division, and adding an explicit test for the 1D and 3D cases.

For CUDA/HIP back-ends, the library now simply does not submit the kernel launch at all.


Note: this change cannot break any existing code, since until now zero-sized grids could not be launched. However, once this feature is relied upon, users may inadvertently introduce bugs in their code if they rely on some operations to be done on every kernel launch, and do not realise that an empty grid does not result in a kernel not being launched at all.

An alternative approach could be to continue forbidding empty grids, and enforce it in the CPU back-end.


This PR depends on (and includes) #2311.

Use a template parameter to distinguish blocking (sync) and non-blocking (async)
queues, instead of writing two almost identical copies of the enqueue function.
@fwyzard fwyzard force-pushed the test_empty_workdivision branch 3 times, most recently from f920750 to 7651495 Compare July 6, 2024 23:29
@fwyzard fwyzard changed the title Add a test for empty work divisions Add support for empty work divisions Jul 7, 2024
@fwyzard fwyzard changed the title Add support for empty work divisions Add support for empty work division grids Jul 7, 2024
@fwyzard fwyzard changed the title Add support for empty work division grids [RFC] Add support for empty work division grids Jul 7, 2024
// No extent is allowed to be zero or greater then the allowed maximum.
if((gridBlockExtent[i] < 1) || (blockThreadExtent[i] < 1) || (threadElemExtent[i] < 1)
|| (gridBlockExtentMax[i] < gridBlockExtent[i]) || (blockThreadExtentMax[i] < blockThreadExtent[i])
// The grid extent is allowed to be zero, indicating that no work should be done.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The grid extent is allowed to be zero, indicating that no work should be done.
// The grid-block extent is allowed to be zero (or zero-vector), indicating that no work should be done.

WorkDiv empty_grid{{0u, 0u, 0u}, {1u, 1u, 1u}, {1u, 1u, 1u}};
CHECK_NOTHROW(alpaka::exec<Acc>(queue, empty_grid, TestKernelAbort{}));

CHECK_NOTHROW(alpaka::wait(queue));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] since the queue is blocking it blocks the host and the program will not end before the end of kernel execution. Hence this CHECK_NOTHROW will never be tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the question.

Are you asking

  1. if alpaka::wait(queue) is redundant ? or
  2. if, even if it throws, CHECK_NOTHROW will not catch it ?

If the first: maybe, it depends if the implementation of alpaka::exec for a blocking queue is guaranteed to catch all errors, or not. An extra check will not harm.

If the second: alpaka::wait() may throw is the runtime is in error; CHECK_NOTHROW should catch that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last line alpaka::wait(queue) does nothing since alpaka::exec is already blocking the host. But i see extra check will not harm.

// The grid extent is allowed to be zero, indicating that no work should be done.
// The threads and elements extents are not allowed to be zero.
// No extent is allowed to be greater then the allowed maximum.
if((std::is_signed_v<TIdx> && gridBlockExtent[i] < 0) || (blockThreadExtent[i] < 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if((std::is_signed_v<TIdx> && gridBlockExtent[i] < 0) || (blockThreadExtent[i] < 1)
if constexpr((std::is_signed_v<TIdx> && gridBlockExtent[i] < 0) || (blockThreadExtent[i] < 1)

to ensure that is checked in compile time and avoid silly compile warnings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants