From 55e308f16fe0771bf3deaea741f0cce33a768cbf Mon Sep 17 00:00:00 2001 From: lkcl Date: Thu, 31 Dec 2020 13:24:25 +0000 Subject: [PATCH] --- openpower/sv/overview.mdwn | 71 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) diff --git a/openpower/sv/overview.mdwn b/openpower/sv/overview.mdwn index 218395d95..39a5f1e00 100644 --- a/openpower/sv/overview.mdwn +++ b/openpower/sv/overview.mdwn @@ -368,6 +368,77 @@ of the destination. The only situation where a full overwrite occurs is on "default" behaviour. This is extremely important to consider the register file as a byte-level store, not a 64-bit-level store. +## Source and Destination overrides + +A minor fly in the ointment: what happens if the source and destination are over-ridden to different widths? For example, FP16 arithmetic is not accurate enough and may introduce rounding errors when up-converted to FP32 output. The rule is therefore set: + + The operation MUST take place at the larger of the two widths + +In pseudocode this is: + + for i = 0 to VL-1: + src1 = get_polymorphed_reg(RA, srcwid, i) + src2 = get_polymorphed_reg(RB, srcwid, i) + opwidth = max(srcwid, destwid) + result = op_add(src1, src2, opwidth) # at max width + set_polymorphed_reg(rd, destwid, i, result) + +It will turn out that under some conditions the combination of the extension of the source registers followed by truncation of the result gets rid of bits that didn't matter, and the operation might as well have taken place at the narrower width and could save resources that way. Examples include Logical OR where the source extension would place zeros in the upper bits, the result will be truncated and throw those zeros away. + +Counterexamples include the previously mentioned FP16 arithmetic, where for operations such as division of large numbers by very small ones it should be clear that internal accuracy will play a major role in influencing the result. Hence the rule that the calculation takes place at the maximum bitwidth, and truncation follows afterwards. + +## Signed arithmetic + +What happens when the operation involves signed arithmetic? Here the implementor has to use common sense, and make sure behaviour is accurately documented. If the result of the unmodified operation is sign-extended because one of the inputs is signed, then the input source operands must be first read at their overridden bitwidth and *then* sign-extended: + + for i = 0 to VL-1: + src1 = get_polymorphed_reg(RA, srcwid, i) + src2 = get_polymorphed_reg(RB, srcwid, i) + opwidth = max(srcwid, destwid) + # srces known to be less than result width + src1 = sign_extend(src1, srcwid, destwid) + src2 = sign_extend(src2, srcwid, destwid) + result = op_signed(src1, src2, opwidth) # at max width + set_polymorphed_reg(rd, destwid, i, result) + +The key here is that the cues are taken from the underlying operation. + +## Saturation + +Audio DSPs need to be able to clip sound when the "volume" is adjusted, but if it is too loud and the signal wraps, distortion occurs. The solution is to clip (saturate) the audio and allow this to be detected. In practical terms this is a post-result analysis however it needs to take place at the largest bitwidth i.e. before a result is element width truncated. Only then can the arithmetic saturation condition be detected: + + for i = 0 to VL-1: + src1 = get_polymorphed_reg(RA, srcwid, i) + src2 = get_polymorphed_reg(RB, srcwid, i) + opwidth = max(srcwid, destwid) + # unsigned add + result = op_add(src1, src2, opwidth) # at max width + # now saturate (unsigned) + sat = max(result, (1<