From 491f5fbf3a480a3be4018e03a0ba195d65902199 Mon Sep 17 00:00:00 2001 From: lkcl Date: Thu, 11 May 2023 17:16:36 +0100 Subject: [PATCH] --- openpower/sv/svp64.mdwn | 31 +++++++++++++++---------------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/openpower/sv/svp64.mdwn b/openpower/sv/svp64.mdwn index 0f198f13e..e0c0a9ee0 100644 --- a/openpower/sv/svp64.mdwn +++ b/openpower/sv/svp64.mdwn @@ -372,21 +372,6 @@ the example having VL=5. Thus on "wrapping" - sequential progression from GPR(1) into GPR(2) - the 5th result modifies **only** the bottom 16 LSBs of GPR(1). -*Engineering note: to avoid a Read-Modify-Write at the register -file it is strongly recommended to implement byte-level write-enable lines -exactly as has been implemented in DRAM ICs for many decades. Additionally -the predicate mask bit is advised to be associated with the element -operation and alongside the result ultimately passed to the register file. -When element-width is set to 64-bit the relevant predicate mask bit -may be repeated eight times and pull all eight write-port byte-level -lines HIGH. Clearly when element-width is set to 8-bit the relevant -predicate mask bit corresponds directly with one single byte-level -write-enable line. It is up to the Hardware Architect to then amortise -(merge) elements together into both PredicatedSIMD Pipelines as well -as simultaneous non-overlapping Register File writes, to achieve High -Performance designs. Overall it helps to think of the GPR and FPR -register files as being much more akin to a 64-bit-wide byte-level-addressable SRAM.* - If the 16-bit operation were to be followed up with a 32-bit Vectorised Operation, the exact same contents would be viewed as follows: @@ -410,6 +395,21 @@ form because `MSR.LE` is directly in control of the Memory-to-Register byte-ordering. This section is exclusively about how to correctly perceive Simple-V-Augmented **Register** Files. +*Engineering note: to avoid a Read-Modify-Write at the register +file it is strongly recommended to implement byte-level write-enable lines +exactly as has been implemented in DRAM ICs for many decades. Additionally +the predicate mask bit is advised to be associated with the element +operation and alongside the result ultimately passed to the register file. +When element-width is set to 64-bit the relevant predicate mask bit +may be repeated eight times and pull all eight write-port byte-level +lines HIGH. Clearly when element-width is set to 8-bit the relevant +predicate mask bit corresponds directly with one single byte-level +write-enable line. It is up to the Hardware Architect to then amortise +(merge) elements together into both PredicatedSIMD Pipelines as well +as simultaneous non-overlapping Register File writes, to achieve High +Performance designs. Overall it helps to think of the GPR and FPR +register files as being much more akin to a 64-bit-wide byte-level-addressable SRAM.* + **Comparative equivalent using VSR registers** For a comparative data point the VSR Registers may be expressed in the @@ -425,7 +425,6 @@ element (numbered zero) being at the bitwise-numbered **LSB** end of the register, where VSX does the reverse: places the numerically-*highest* (last-numbered) element at the LSB end of the register. - ``` #pragma pack typedef union { -- 2.30.2