Optimise logic in ReorderBuffer::commitMicroOps #366

rahahahat · 2023-12-22T08:53:29Z

No description provided.

rahahahat · 2023-12-22T08:54:43Z

src/lib/pipeline/ReorderBuffer.cc

-      }
-      if (!validForCommit) return;
+  size_t index = 0;
+  uint8_t ins_cnt = 0;


@jj16791 Can the number of micro-ops generated per macro-op exceed 256?

I think this is the number of micro-ops for a single macro-op rather than total micro-ops in the ROB.

In such a case, the max number of micro-ops generated per macro op is implementation defined. The only candidate I can think of for currently supportable instructions is the AArch64 gather/scatter loads IF the vector length is 2048-bits and you are doing a byte load (i.e. loading 256 byte elements).

However, checking the spec the gather / scatter instructions which load/store single bytes only permit the source/destination register to be of the .S or .D variant - meaning that there would be a max of 64 or 32 elements per vector respectively to scatter or gather.

In summary - no, the number of micro-ops per macro-op is very very unlikely to exceed 256.

Sorry, I should've specified i did mean number of micro-ops per macro-op.

In summary - no, the number of micro-ops per macro-op is very very unlikely to exceed 256.

When you say very very unlikely, that means there it could exceed 256 right? If that's the case the data type above needs to be incremented otherwise the logic is broken

Currently it could never exceed 256, but in the future there is a extremely small chance it could. For now I would keep it the same

After a full review, there would be no harm in making this a uint16 to "future proof" this

FinnWilkinson · 2023-12-22T11:34:15Z

Please add a description, assignee, labels, and ensure the status on the associated github project is correct

rahahahat · 2023-12-22T16:15:54Z

Please add a description, assignee, labels, and ensure the status on the associated github project is correct

I've added labels and asignee. I don't think the description is necessary as the title communicates everything that is done. I've just changed one function.

FinnWilkinson

All looks fine, just a few comments made.

Could you add to the description of this PR the performance improvement seen on your local system and on the Mac Studio? (in Release mode). Looking at the Jenkin's performance pipeline results there doesn't seem to be much improvement (with the exception of triad_gcc_a64fx). There are also 2 regressions...

FinnWilkinson · 2024-01-04T09:34:47Z

.gitignore

@@ -2,6 +2,7 @@
 .vscode
 .idea
 .DS_Store
+.cache


This is duplicated from PR #353, please remove it from this PR or close the other PR

FinnWilkinson · 2024-01-04T09:36:46Z

src/lib/pipeline/ReorderBuffer.cc

-      }
-      if (!validForCommit) return;
+  size_t index = 0;
+  uint8_t ins_cnt = 0;


To avoid confusion akin to the other comment left on this variable, perhaps rename this to uopCount or microOpCount?

FinnWilkinson · 2024-01-04T09:42:08Z

src/lib/pipeline/ReorderBuffer.cc

+    if (mop_id == insnId) {
+      if (!uop->isWaitingCommit()) return;
+      ins_cnt++;
+    } else if (ins_cnt && mop_id != insnId) {


mop_id && insnID in this else if seems redundant given that this else if can only be reached if this is true

FinnWilkinson · 2024-01-04T09:49:16Z

src/lib/pipeline/ReorderBuffer.cc

+  while (index < bsize) {
+    auto& uop = buffer_[index];
+    uint64_t mop_id = uop->getInstructionId();
+
+    if (mop_id == insnId) {
+      if (!uop->isWaitingCommit()) return;
+      ins_cnt++;
+    } else if (ins_cnt && mop_id != insnId) {
+      break;
    }
+    index++;
  }
+
+  index = index - ins_cnt;
+  for (int x = 0; x < ins_cnt; x++) {
+    buffer_[index]->setCommitReady();
+    index++;
+  }
+


Generally looks good. I think some comments akin to the previous implementation, plus one explaining that the setCommitReady logic is only reached iff all micro-ops of a macro-op are waitingCommit, would be beneficial.

FinnWilkinson · 2024-01-04T09:50:02Z

src/lib/pipeline/ReorderBuffer.cc

-      }
-      if (!validForCommit) return;
+  size_t index = 0;
+  uint8_t ins_cnt = 0;


After a full review, there would be no harm in making this a uint16 to "future proof" this

FinnWilkinson · 2024-01-04T09:50:24Z

src/lib/pipeline/ReorderBuffer.cc

  }
+
+  index = index - ins_cnt;
+  for (int x = 0; x < ins_cnt; x++) {


Could x be given the same type as ins_cnt

rahahahat · 2024-01-10T12:12:36Z

All looks fine, just a few comments made.

Could you add to the description of this PR the performance improvement seen on your local system and on the Mac Studio? (in Release mode). Looking at the Jenkin's performance pipeline results there doesn't seem to be much improvement (with the exception of triad_gcc_a64fx). There are also 2 regressions...

Hmm that seems strange, my access to mac studio is a bit messed up given my laptop broke. Let me see what I can do to address this. It might be compiler dependant though but I will investigate.
Thanks.

dANW34V3R

This seems like a very "minimal gains" optimisation e.g. no fundamental algorithmic change, rather less repeated array loads/getter calls in the high level language (which may not happen once compiled). I would need to see actual performance gains before approval. Some comments explaining what is going on also needed (took me a bit of time to work it all through and understand)

jj16791 · 2024-01-30T09:56:16Z

#rerun tests

Rahat added 3 commits December 22, 2023 08:48

Added optimised implementation of ReorderBuffer::commitMicroOps

7056228

Removed comments

97ce3c0

Removed unused headers

f50a035

rahahahat requested review from dANW34V3R, jj16791, FinnWilkinson, JosephMoore25 and ABenC377 December 22, 2023 08:53

rahahahat commented Dec 22, 2023

View reviewed changes

rahahahat self-assigned this Dec 22, 2023

rahahahat added the performance Performance optimisation label Dec 22, 2023

FinnWilkinson requested changes Jan 4, 2024

View reviewed changes

dANW34V3R requested changes Jan 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise logic in ReorderBuffer::commitMicroOps #366

Optimise logic in ReorderBuffer::commitMicroOps #366

rahahahat commented Dec 22, 2023

rahahahat Dec 22, 2023 •

edited

Loading

FinnWilkinson Dec 22, 2023

rahahahat Dec 22, 2023 •

edited

Loading

FinnWilkinson Dec 24, 2023

FinnWilkinson Jan 4, 2024

FinnWilkinson commented Dec 22, 2023

rahahahat commented Dec 22, 2023

FinnWilkinson left a comment

FinnWilkinson Jan 4, 2024

FinnWilkinson Jan 4, 2024

FinnWilkinson Jan 4, 2024

FinnWilkinson Jan 4, 2024

FinnWilkinson Jan 4, 2024

FinnWilkinson Jan 4, 2024

rahahahat commented Jan 10, 2024

dANW34V3R left a comment

jj16791 commented Jan 30, 2024

Optimise logic in ReorderBuffer::commitMicroOps #366

Are you sure you want to change the base?

Optimise logic in ReorderBuffer::commitMicroOps #366

Conversation

rahahahat commented Dec 22, 2023

rahahahat Dec 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahahahat Dec 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FinnWilkinson commented Dec 22, 2023

rahahahat commented Dec 22, 2023

FinnWilkinson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahahahat commented Jan 10, 2024

dANW34V3R left a comment

Choose a reason for hiding this comment

jj16791 commented Jan 30, 2024

rahahahat Dec 22, 2023 •

edited

Loading

rahahahat Dec 22, 2023 •

edited

Loading