From 208d8e54401f174153d72c7e0eacbb09ba52f7ee Mon Sep 17 00:00:00 2001 From: Simeon Ehrig Date: Tue, 21 May 2024 16:29:58 +0200 Subject: [PATCH 1/5] document different alpaka terms and it relationships --- docs/source/basic/terms.rst | 57 +++++++++++++++++ docs/source/images/alpaka_terms.drawio | 86 ++++++++++++++++++++++++++ docs/source/images/alpaka_terms.svg | 4 ++ docs/source/index.rst | 1 + 4 files changed, 148 insertions(+) create mode 100644 docs/source/basic/terms.rst create mode 100644 docs/source/images/alpaka_terms.drawio create mode 100644 docs/source/images/alpaka_terms.svg diff --git a/docs/source/basic/terms.rst b/docs/source/basic/terms.rst new file mode 100644 index 000000000000..f91a529be084 --- /dev/null +++ b/docs/source/basic/terms.rst @@ -0,0 +1,57 @@ +Terms +===== + +This page provides an overview of the terms used in ``alpaka`` and the relationships between them. + +.. image:: /images/alpaka_terms.svg + +Platform +-------- + +- A ``platform`` contains information about the system, e.g. the available devices. +- Depending on the platform, it also contains a runtime context. +- A ``platform`` has a least one device but it can also has many device. +- Each ``platform`` can be used with N ``accelerator``. ``platforms`` and ``accelerator`` cannot freely combined. An ``accelerator`` supports only a specific ``platform``. + +Device +------ + +- A ``device`` represent a compute unit, such as a CPU or a GPU. +- Each ``device`` is bounded to a specific ``platform``. +- Each ``device`` can have N ``queues``. + +Accelerator +----------- + +- A ``accelerator`` is a index mapping function. It distributes the index space to the chunks. The ``accelerator`` maps a continues index space to a blocked index domain decomposition. +- It is not allowed to create an instance of an ``accelerator``. +- A ``accelerator`` is bounded to a specific ``platform``. + +Queue +----- + +- Stores operations which should be executed on a ``device``. +- Operations can be ``TaskKernels``, ``Events``, ``Sets`` and ``Copies``. +- Each ``queue`` is bounded to a specific ``device``. + +TaskKernel +---------- + +- A ``TaskKernel`` contains the algorithm which should be executed on a ``device``. + +Event +----- + +- A ``event`` is a marker in the ``queue``. +- ``events`` can be used to describe dependencies between different ``queues``. +- A ``event`` allows to wait until a specific time point. + +Set +--- + +- A ``Set`` set byte wise a memory to a specific value. + +Copy +---- + +- Copies memory from memory location to another memory location. diff --git a/docs/source/images/alpaka_terms.drawio b/docs/source/images/alpaka_terms.drawio new file mode 100644 index 000000000000..76395fd1a8c8 --- /dev/null +++ b/docs/source/images/alpaka_terms.drawio @@ -0,0 +1,86 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/source/images/alpaka_terms.svg b/docs/source/images/alpaka_terms.svg new file mode 100644 index 000000000000..83c9b938775c --- /dev/null +++ b/docs/source/images/alpaka_terms.svg @@ -0,0 +1,4 @@ + + + +
Platform
Device
Accelerator
Queue
TaskKernel
Event
Set
Copy
enqueue
1
1..N
1
1..N
1
1..N
1
1..N
\ No newline at end of file diff --git a/docs/source/index.rst b/docs/source/index.rst index 88151199d263..0ca0083181b2 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -41,6 +41,7 @@ Individual chapters are based on the information of the chapters before. basic/intro.rst basic/install.rst basic/example.rst + basic/terms.rst basic/abstraction.rst basic/library.rst basic/cheatsheet.rst From 76280ec286afe27d1d8b5c12b734935c2e745788 Mon Sep 17 00:00:00 2001 From: Simeon Ehrig Date: Thu, 23 May 2024 09:05:44 +0200 Subject: [PATCH 2/5] add HostTask documentation CI_FILTER: ^nope --- docs/source/basic/terms.rst | 14 +++++++++++--- docs/source/images/alpaka_terms.drawio | 20 ++++++++++++++------ docs/source/images/alpaka_terms.svg | 2 +- 3 files changed, 26 insertions(+), 10 deletions(-) diff --git a/docs/source/basic/terms.rst b/docs/source/basic/terms.rst index f91a529be084..a5180768ca00 100644 --- a/docs/source/basic/terms.rst +++ b/docs/source/basic/terms.rst @@ -11,14 +11,14 @@ Platform - A ``platform`` contains information about the system, e.g. the available devices. - Depending on the platform, it also contains a runtime context. - A ``platform`` has a least one device but it can also has many device. -- Each ``platform`` can be used with N ``accelerator``. ``platforms`` and ``accelerator`` cannot freely combined. An ``accelerator`` supports only a specific ``platform``. +- Each ``platform`` can be used with any number of ``accelerator``. ``platforms`` and ``accelerator`` cannot freely combined. An ``accelerator`` supports only a specific ``platform``. Device ------ - A ``device`` represent a compute unit, such as a CPU or a GPU. - Each ``device`` is bounded to a specific ``platform``. -- Each ``device`` can have N ``queues``. +- Each ``device`` can have any number of ``queues``. Accelerator ----------- @@ -31,14 +31,22 @@ Queue ----- - Stores operations which should be executed on a ``device``. -- Operations can be ``TaskKernels``, ``Events``, ``Sets`` and ``Copies``. +- Operations can be ``TaskKernels``, ``HostTasks``, ``Events``, ``Sets`` and ``Copies``. - Each ``queue`` is bounded to a specific ``device``. +- A Queue can be ``Blocking`` (host thread is waiting for finishing the API call) or ``NonBlocking`` (host thread continues after calling the API independent if the call finished or not). +- All operations in a queue will be executed sequentiell. +- Operations in different queues runs in parallel. TaskKernel ---------- - A ``TaskKernel`` contains the algorithm which should be executed on a ``device``. +HostTasks +--------- + +- A ``HostTask`` is a functor without ``acc`` argument, which can be enqueued and is always executed on the host device. + Event ----- diff --git a/docs/source/images/alpaka_terms.drawio b/docs/source/images/alpaka_terms.drawio index 76395fd1a8c8..9868a465102c 100644 --- a/docs/source/images/alpaka_terms.drawio +++ b/docs/source/images/alpaka_terms.drawio @@ -1,4 +1,4 @@ - + @@ -29,7 +29,7 @@ - + @@ -37,13 +37,13 @@ - + - + @@ -51,7 +51,7 @@ - + @@ -75,11 +75,19 @@ - + + + + + + + + + diff --git a/docs/source/images/alpaka_terms.svg b/docs/source/images/alpaka_terms.svg index 83c9b938775c..16326a00c2ed 100644 --- a/docs/source/images/alpaka_terms.svg +++ b/docs/source/images/alpaka_terms.svg @@ -1,4 +1,4 @@ -
Platform
Device
Accelerator
Queue
TaskKernel
Event
Set
Copy
enqueue
1
1..N
1
1..N
1
1..N
1
1..N
\ No newline at end of file +
Platform
Device
Accelerator
Queue
TaskKernel
Event
Set
Copy
enqueue
1
1..N
1
1..N
1
1..N
1
1..N
HostTask
\ No newline at end of file From 1473738cf83e7766a0b93760e95b30c1d2413365 Mon Sep 17 00:00:00 2001 From: Simeon Ehrig Date: Thu, 6 Jun 2024 10:34:16 +0200 Subject: [PATCH 3/5] remove image --- docs/source/basic/terms.rst | 2 - docs/source/images/alpaka_terms.drawio | 94 -------------------------- docs/source/images/alpaka_terms.svg | 4 -- 3 files changed, 100 deletions(-) delete mode 100644 docs/source/images/alpaka_terms.drawio delete mode 100644 docs/source/images/alpaka_terms.svg diff --git a/docs/source/basic/terms.rst b/docs/source/basic/terms.rst index a5180768ca00..9f387671f8b1 100644 --- a/docs/source/basic/terms.rst +++ b/docs/source/basic/terms.rst @@ -3,8 +3,6 @@ Terms This page provides an overview of the terms used in ``alpaka`` and the relationships between them. -.. image:: /images/alpaka_terms.svg - Platform -------- diff --git a/docs/source/images/alpaka_terms.drawio b/docs/source/images/alpaka_terms.drawio deleted file mode 100644 index 9868a465102c..000000000000 --- a/docs/source/images/alpaka_terms.drawio +++ /dev/null @@ -1,94 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/source/images/alpaka_terms.svg b/docs/source/images/alpaka_terms.svg deleted file mode 100644 index 16326a00c2ed..000000000000 --- a/docs/source/images/alpaka_terms.svg +++ /dev/null @@ -1,4 +0,0 @@ - - - -
Platform
Device
Accelerator
Queue
TaskKernel
Event
Set
Copy
enqueue
1
1..N
1
1..N
1
1..N
1
1..N
HostTask
\ No newline at end of file From 8097c57ad489a54c22a8bdf0c3b2a6927bc12e30 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ren=C3=A9=20Widera?= Date: Mon, 10 Jun 2024 10:43:52 +0200 Subject: [PATCH 4/5] small updates of the alpaka terminology --- docs/source/basic/terms.rst | 47 +++++++++++++++++++++---------------- 1 file changed, 27 insertions(+), 20 deletions(-) diff --git a/docs/source/basic/terms.rst b/docs/source/basic/terms.rst index 9f387671f8b1..181fb629ff96 100644 --- a/docs/source/basic/terms.rst +++ b/docs/source/basic/terms.rst @@ -8,49 +8,56 @@ Platform - A ``platform`` contains information about the system, e.g. the available devices. - Depending on the platform, it also contains a runtime context. -- A ``platform`` has a least one device but it can also has many device. -- Each ``platform`` can be used with any number of ``accelerator``. ``platforms`` and ``accelerator`` cannot freely combined. An ``accelerator`` supports only a specific ``platform``. - +- A ``platform`` can be shared by many devices. + Device ------ - A ``device`` represent a compute unit, such as a CPU or a GPU. - Each ``device`` is bounded to a specific ``platform``. -- Each ``device`` can have any number of ``queues``. +- Each ``device`` can be used by many specific ``accelerators``. + +Work division +------------- + +- Describes the domain decomposition of a contiguous N-dimensional index domain in ``blocks``, ``threads`` within a ``block``, and ``elements`` per ``thread``. +- A ``work division`` has limitations depending on the ``kernel`` function and ``accelerator``. Accelerator ----------- -- A ``accelerator`` is a index mapping function. It distributes the index space to the chunks. The ``accelerator`` maps a continues index space to a blocked index domain decomposition. -- It is not allowed to create an instance of an ``accelerator``. -- A ``accelerator`` is bounded to a specific ``platform``. +- Describes "how" a kernel work division is mapped to device threads. + - N-dimensional work divisions (1D, 2D, 3D) are supported. + - Holds implementations of shared memory, atomic operations, math operations etc. +- ``Accelerators`` are instantiated only when a kernel is executed, and can only be accessed in device code. + - Each device function can (should) be templated on the accelerator type, and take an accelerator as its first argument. + - The accelerator object can be used to extract the ``work division`` and indices of the current block and thread. + - The accelerator type can be used to implement per-accelerator behaviours. +- An ``accelerator`` is bounded to a specific ``platform``. Queue ----- -- Stores operations which should be executed on a ``device``. +- Stores tasks which should be executed on a ``device``. - Operations can be ``TaskKernels``, ``HostTasks``, ``Events``, ``Sets`` and ``Copies``. -- Each ``queue`` is bounded to a specific ``device``. - A Queue can be ``Blocking`` (host thread is waiting for finishing the API call) or ``NonBlocking`` (host thread continues after calling the API independent if the call finished or not). -- All operations in a queue will be executed sequentiell. -- Operations in different queues runs in parallel. +- All operations in a queue will be executed sequential in FIFO order. +- Operations in different queues can run in parallel. +- ``wait()`` can be executed for queues to block the caller host thread until all previous enqueued work is finished. +- Each ``queue`` is bounded to a specific ``device``. -TaskKernel ----------- +Task +---- - A ``TaskKernel`` contains the algorithm which should be executed on a ``device``. - -HostTasks ---------- - - A ``HostTask`` is a functor without ``acc`` argument, which can be enqueued and is always executed on the host device. Event ----- -- A ``event`` is a marker in the ``queue``. +- A ``event`` is a marker in a ``queue``. - ``events`` can be used to describe dependencies between different ``queues``. -- A ``event`` allows to wait until a specific time point. +- A ``event`` allows to wait until all previous enqueued work in a queue has finished. Set --- @@ -60,4 +67,4 @@ Set Copy ---- -- Copies memory from memory location to another memory location. +- Deep memory copy from one memory to another memory location. From 2c5316760074a7ec895f47033c5cfa9c2a695fa6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ren=C3=A9=20Widera?= Date: Mon, 17 Jun 2024 10:10:23 +0200 Subject: [PATCH 5/5] applay review comments --- docs/source/basic/terms.rst | 60 ++++++++++++++++++------------------- 1 file changed, 30 insertions(+), 30 deletions(-) diff --git a/docs/source/basic/terms.rst b/docs/source/basic/terms.rst index 181fb629ff96..301cf8ccbfd3 100644 --- a/docs/source/basic/terms.rst +++ b/docs/source/basic/terms.rst @@ -6,65 +6,65 @@ This page provides an overview of the terms used in ``alpaka`` and the relations Platform -------- -- A ``platform`` contains information about the system, e.g. the available devices. -- Depending on the platform, it also contains a runtime context. -- A ``platform`` can be shared by many devices. +* A ``platform`` contains information about the system, e.g. the available devices. +* Depending on the platform, it also contains a runtime context. +* A ``platform`` can be shared by many devices. Device ------ -- A ``device`` represent a compute unit, such as a CPU or a GPU. -- Each ``device`` is bounded to a specific ``platform``. -- Each ``device`` can be used by many specific ``accelerators``. +* A ``device`` represent a compute unit, such as a CPU or a GPU. +* Each ``device`` is bounded to a specific ``platform``. +* Each ``device`` can be used by many specific ``accelerators``. Work division ------------- -- Describes the domain decomposition of a contiguous N-dimensional index domain in ``blocks``, ``threads`` within a ``block``, and ``elements`` per ``thread``. -- A ``work division`` has limitations depending on the ``kernel`` function and ``accelerator``. +* Describes the domain decomposition of a contiguous N-dimensional index domain in ``blocks``, ``threads`` and ``elements``. A ``block`` contains one or more ``threads`` and a ``thread`` process one or more ``elements``. +* A ``work division`` has limitations depending on the ``kernel`` function and ``accelerator``. Accelerator ----------- -- Describes "how" a kernel work division is mapped to device threads. - - N-dimensional work divisions (1D, 2D, 3D) are supported. - - Holds implementations of shared memory, atomic operations, math operations etc. -- ``Accelerators`` are instantiated only when a kernel is executed, and can only be accessed in device code. - - Each device function can (should) be templated on the accelerator type, and take an accelerator as its first argument. - - The accelerator object can be used to extract the ``work division`` and indices of the current block and thread. - - The accelerator type can be used to implement per-accelerator behaviours. -- An ``accelerator`` is bounded to a specific ``platform``. +* Describes "how" a kernel work division is mapped to device threads. + * N-dimensional work divisions (1D, 2D, 3D) are supported. + * Holds implementations of shared memory, atomic operations, math operations etc. +* ``Accelerators`` are instantiated only when a kernel is executed, and can only be accessed in device code. + * Each device function can (should) be templated on the accelerator type, and take an accelerator as its first argument. + * The accelerator object can be used to extract the ``work division`` and indices of the current block and thread. + * The accelerator type can be used to implement per-accelerator behaviours. +* An ``accelerator`` is bounded to a specific ``platform``. Queue ----- -- Stores tasks which should be executed on a ``device``. -- Operations can be ``TaskKernels``, ``HostTasks``, ``Events``, ``Sets`` and ``Copies``. -- A Queue can be ``Blocking`` (host thread is waiting for finishing the API call) or ``NonBlocking`` (host thread continues after calling the API independent if the call finished or not). -- All operations in a queue will be executed sequential in FIFO order. -- Operations in different queues can run in parallel. -- ``wait()`` can be executed for queues to block the caller host thread until all previous enqueued work is finished. -- Each ``queue`` is bounded to a specific ``device``. +* Stores tasks which should be executed on a ``device``. +* Operations can be ``TaskKernels``, ``HostTasks``, ``Events``, ``Sets`` and ``Copies``. +* A Queue can be ``Blocking`` (host thread is waiting for finishing the API call) or ``NonBlocking`` (host thread continues after calling the API independent if the call finished or not). +* All operations in a queue will be executed sequential in FIFO order. +* Operations in different queues can run in parallel even on blocking queues. +* ``wait()`` can be executed for queues to block the caller host thread until all previous enqueued work is finished. +* Each ``queue`` is bounded to a specific ``device``. Task ---- -- A ``TaskKernel`` contains the algorithm which should be executed on a ``device``. -- A ``HostTask`` is a functor without ``acc`` argument, which can be enqueued and is always executed on the host device. +* A ``TaskKernel`` contains the algorithm which should be executed on a ``device``. +* A ``HostTask`` is a functor without ``acc`` argument, which can be enqueued and is always executed on the host device. Event ----- -- A ``event`` is a marker in a ``queue``. -- ``events`` can be used to describe dependencies between different ``queues``. -- A ``event`` allows to wait until all previous enqueued work in a queue has finished. +* A ``event`` is a marker in a ``queue``. +* ``events`` can be used to describe dependencies between different ``queues``. +* A ``event`` allows to wait until all previous enqueued work in a queue has finished. Set --- -- A ``Set`` set byte wise a memory to a specific value. +* A ``Set`` sets a memory region to a specific value byte-wise. Copy ---- -- Deep memory copy from one memory to another memory location. +* Deep memory copy from one memory to another memory location.