From 4e35f44f057adc516b137b03a29f0a476e097dc3 Mon Sep 17 00:00:00 2001 From: lkcl Date: Sat, 13 May 2023 12:03:38 +0100 Subject: [PATCH] --- openpower/sv/normal.mdwn | 46 +++++++++++++++++++++++----------------- 1 file changed, 27 insertions(+), 19 deletions(-) diff --git a/openpower/sv/normal.mdwn b/openpower/sv/normal.mdwn index 8e3a539a6..b0c0516ef 100644 --- a/openpower/sv/normal.mdwn +++ b/openpower/sv/normal.mdwn @@ -47,15 +47,14 @@ The Mode table for Arithmetic and Logical operations, being bits 19-23 of SVP64 `RM`, is laid out as follows: -| 0-1 | 2 | 3 4 | description | -| --- | --- |---------|-------------------------- | -| 00 | 0 | dz sz | simple mode | -| 00 | 1 | 0 RG | scalar reduce mode (mapreduce) | -| 00 | 1 | 1 / | reserved | -| 01 | inv | CR-bit | Rc=1: ffirst CR sel | -| 01 | inv | VLi RC1 | Rc=0: ffirst z/nonz | -| 10 | N | dz sz | sat mode: N=0/1 u/s | -| 11 | / | / / | reserved | +| 0-1 | 2 | 3 4 | description | +| ------ | --- |---------|-------------------------- | +| 0 0 | 0 | dz sz | simple mode | +| 0 0 | 1 | 0 RG | scalar reduce mode (mapreduce) | +| 0 0 | 1 | 1 / | reserved | +| 1 0 | N | dz sz | sat mode: N=0/1 u/s | +| VLi 1 | inv | CR-bit | Rc=1: ffirst CR sel | +| VLi 1 | inv | zz RC1 | Rc=0: ffirst z/nonz | Fields: @@ -165,18 +164,24 @@ each case the assumption is that vector elements are required to appear to be executed in sequential Program Order. When REMAP is not active, element 0 would be the first. -Data-driven (CR-field-driven) fail-on-first activates when Rc=1 or other -CR-creating operation produces a result (including cmp). Similar to -Branch-Conditional, -an analysis of the CR is performed and if the test fails, the -vector operation terminates and discards all element operations **at and -above the current one**, and VL is truncated to either the *previous* +Arithmetic/Logical Data-driven (CR-field-driven) fail-on-first performs a +test ofvthe result, similar to +Branch-Conditional `BO` field testing, and if the test fails, the +Vector Loop operation terminates, and VL is truncated to either the *previous* element or the current one, depending on whether VLi (VL "inclusive") is clear or set, respectively. Thus the new VL comprises a contiguous vector of results, all of which pass the testing criteria (equal to zero, less than zero etc as defined -by the CR-bit test). +by the CR-bit test). When Rc=1 the Condition Regster Field for +the element just tested is always written out (regardless of VLi). + +* **VLi=0** Only elements that passed the test are written out. When Rc=1 + the co-result CR Field element is written out (even if the current test failed). + Vector length is truncated to "elements that passed" +* **VLi=1** Elements that were *tested* are written out. When Rc=1 + the co-result CR Field element is written out. + Vector length is truncated to "elements tested up to the first fail point" *Note: when VLi is clear, the behaviour at first seems counter-intuitive. A result is calculated but if the test fails it is prohibited from being @@ -189,8 +194,9 @@ or RVV. At the same time it is "old" because it is almost identical to a generalised form of Z80's `CPIR` instruction. It is extremely useful for reducing instruction count, however requires speculative execution involving modifications of VL to get high performance implementations. -An additional mode (RC1=1) effectively turns what would otherwise be an -arithmetic operation into a type of `cmp`. The CR is stored (and the +An additional mode (RC1=1) allows instructions that would not normally +have an Rc=1 mode to at least be tested for zero or non-zero. +The CR is stored (and the CR.eq bit tested against the `inv` field). If the CR.eq bit is equal to `inv` then the Vector is truncated and the loop ends. @@ -228,7 +234,9 @@ the test-failure point **MUST** be cancelled. This is no different from standard Out-of-Order Execution and the modification effort to efficiently support Data-Dependent Fail-First within a pre-existing Multi-Issue Out-of-Order Engine is anticipated to be minimal. In-Order systems on -the other hand are expected, unavoidably, to be low-performance*. +the other hand are expected, unavoidably, to be low-performance unless they +also make use of `SVSTATE.hphint` and exploit it to safely implement rudimentary +Shadow-Commit-Hold normally only found in Out-of-Order systems*. Two extremely important aspects of ffirst are: -- 2.30.2