Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document different alpaka terms and it relationships #2275

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions docs/source/basic/terms.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
Terms
=====

This page provides an overview of the terms used in ``alpaka`` and the relationships between them.

Platform
--------

* A ``platform`` contains information about the system, e.g. the available devices.
* Depending on the platform, it also contains a runtime context.
* A ``platform`` can be shared by many devices.

Device
------

* A ``device`` represent a compute unit, such as a CPU or a GPU.
* Each ``device`` is bounded to a specific ``platform``.
* Each ``device`` can be used by many specific ``accelerators``.

Work division
-------------

* Describes the domain decomposition of a contiguous N-dimensional index domain in ``blocks``, ``threads`` and ``elements``. A ``block`` contains one or more ``threads`` and a ``thread`` process one or more ``elements``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Describes the domain decomposition of a contiguous N-dimensional index domain in ``blocks``, ``threads`` and ``elements``. A ``block`` contains one or more ``threads`` and a ``thread`` process one or more ``elements``.
* Describes the domain decomposition of a contiguous N-dimensional index domain in ``blocks``, ``threads`` and ``elements``. A ``block`` contains one or more ``threads`` and each ``thread`` processes one or more ``elements``.

* A ``work division`` has limitations depending on the ``kernel`` function and ``accelerator``.

Accelerator
-----------

* Describes "how" a kernel work division is mapped to device threads.
* N-dimensional work divisions (1D, 2D, 3D) are supported.
* Holds implementations of shared memory, atomic operations, math operations etc.
* ``Accelerators`` are instantiated only when a kernel is executed, and can only be accessed in device code.
* Each device function can (should) be templated on the accelerator type, and take an accelerator as its first argument.
* The accelerator object can be used to extract the ``work division`` and indices of the current block and thread.
* The accelerator type can be used to implement per-accelerator behaviours.
* An ``accelerator`` is bounded to a specific ``platform``.

Queue
-----

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add that a queue can be either "synchronous" (blocking) or "asynchronous" (non-blocking), and that this may determine how the work is submitted to the device, and if a synchronisation is performed after each operation.

* Stores tasks which should be executed on a ``device``.
* Operations can be ``TaskKernels``, ``HostTasks``, ``Events``, ``Sets`` and ``Copies``.
* A Queue can be ``Blocking`` (host thread is waiting for finishing the API call) or ``NonBlocking`` (host thread continues after calling the API independent if the call finished or not).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* A Queue can be ``Blocking`` (host thread is waiting for finishing the API call) or ``NonBlocking`` (host thread continues after calling the API independent if the call finished or not).
* A Queue can be ``Blocking`` (the host thread waits for each API call or task to finish) or ``NonBlocking`` (the host thread continues after making an API call or submitting a task without waiting for it to finish, or even start).

* All operations in a queue will be executed sequential in FIFO order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* All operations in a queue will be executed sequential in FIFO order.
* All operations in a queue will be executed sequentially in FIFO order.

* Operations in different queues can run in parallel even on blocking queues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With blocking queues, this requires using different host tasks, right ?

* ``wait()`` can be executed for queues to block the caller host thread until all previous enqueued work is finished.
* Each ``queue`` is bounded to a specific ``device``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Each ``queue`` is bounded to a specific ``device``.
* Each ``queue`` is bound to a specific ``device``.


Task
----

* A ``TaskKernel`` contains the algorithm which should be executed on a ``device``.
* A ``HostTask`` is a functor without ``acc`` argument, which can be enqueued and is always executed on the host device.

Event
-----

* A ``event`` is a marker in a ``queue``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* A ``event`` is a marker in a ``queue``.
* An ``event`` is a marker in a ``queue``.

* ``events`` can be used to describe dependencies between different ``queues``.
* A ``event`` allows to wait until all previous enqueued work in a queue has finished.

Set
---

* A ``Set`` sets a memory region to a specific value byte-wise.

Copy
----

* Deep memory copy from one memory to another memory location.
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ Individual chapters are based on the information of the chapters before.
basic/intro.rst
basic/install.rst
basic/example.rst
basic/terms.rst
basic/abstraction.rst
basic/library.rst
basic/cheatsheet.rst
Expand Down
Loading