From: Luke Kenneth Casson Leighton Date: Mon, 3 Apr 2023 14:06:20 +0000 (+0100) Subject: add example of GPR-wrapping X-Git-Tag: opf_rfc_ls012_v1~154 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=cffd66c857f15dcd70ec8def0b2d4c7a50b483a0;p=libreriscv.git add example of GPR-wrapping --- diff --git a/openpower/sv/svp64.mdwn b/openpower/sv/svp64.mdwn index a4980c339..b439f5130 100644 --- a/openpower/sv/svp64.mdwn +++ b/openpower/sv/svp64.mdwn @@ -233,9 +233,24 @@ From this Canonical definition it should be clear that sequential elements begin at the LSB end of any given underlying Scalar GPR, progress to the MSB end, and then to the LSB end of the *next numerically-larger Scalar GPR*. In the example above if VL=5 and RT=1 then the contents of GPR(1) and GPR(2) will -be as follows: +be as follows. For clarity in the table below: +* Both MSB0-ordered bitnumbering *and* LSB-ordered bitnumbering are shown +* The GPR-numbering is considered LSB0-ordered +* The Element-numbering (result0-result4) is LSB0-ordered +``` + | MSB0: | 0:15 | 16:31 | 32:47 | 48:63 | + | LSB0: | 63:48 | 47:32 | 31:16 | 15:0 | + |--------|---------|---------|---------|---------| + | GPR(0) | result3 | result2 | result1 | result0 | + | GPR(1) | same | same | same | result4 | +``` + +Note that the upper 48 bits of GPR(1) would **not** be modified because +the example has VL=5. Thus on "wrapping" - sequential progression from +GPR(0) into GPR(1) - the 5th result modifies +**only** the bottom 16 LSBs of GPR(1). Hardware Architectural note: to avoid a Read-Modify-Write at the register file it is strongly recommended to implement byte-level write-enable lines @@ -249,7 +264,8 @@ predicate mask bit corresponds directly with one single byte-level write-enable line. It is up to the Hardware Architect to then amortise (merge) elements together into both PredicatedSIMD Pipelines as well as simultaneous non-overlapping Register File writes, to achieve High -Performance designs. +Performance designs. Overall it helps to think of the register files +as being much more akin to a byte-level-addressable SRAM. **Comparative equivalent using VSR registers**