* This page: [http://libre-soc.org/openpower/sv/overview](http://libre-soc.org/openpower/sv/overview)
* [FOSDEM2021 SimpleV for OpenPOWER](https://fosdem.org/2021/schedule/event/the_libresoc_project_simple_v_vectorisation/)
+* FOSDEM2021 presentation <https://www.youtube.com/watch?v=FS6tbfyb2VA>
* [[discussion]] and
[bugreport](https://bugs.libre-soc.org/show_bug.cgi?id=556)
feel free to add comments, questions.
* [[SV|sv]]
* [[sv/svp64]]
+* [x86 REP instruction](https://c9x.me/x86/html/file_module_x86_id_279.html):
+ a useful way to quickly understand that the core of the SV concept
+ is not new.
+* [Article about register tagging](http://science.lpnu.ua/sites/default/files/journal-paper/2019/jul/17084/volum3number1text-9-16_1.pdf) showing
+ that tagging is not a new idea either. Register tags
+ are also used in the Mill Architecture.
-Contents:
[[!toc]]
or in the assembly code.
SimpleV takes the Cray style Vector principle and applies it in the
-abstract to a Scalar ISA, in the process allowing register file size
-increases using "tagging" (similar to how x86 originally extended
+abstract to a Scalar ISA in the same way that x86 used to do its "REP" instruction. In the process, "context" is applied, allowing amongst other things
+a register file size
+increase using "tagging" (similar to how x86 originally extended
registers from 32 to 64 bit).
## SV
-The fundamentals are:
+The fundamentals are (just like x86 "REP"):
* The Program Counter (PC) gains a "Sub Counter" context (Sub-PC)
* Vectorisation pauses the PC and runs a Sub-PC loop from 0 to VL-1
src1 = get_polymorphed_reg(RA, srcwid, i)
src2 = get_polymorphed_reg(RB, srcwid, i)
result = src1 + src2 # actual add here
- set_polymorphed_reg(rd, destwid, i, result)
+ set_polymorphed_reg(RT, destwid, i, result)
With this loop, if elwidth=16 and VL=3 the first 48 bits of the target
register will contain three 16 bit addition results, and the upper 16
# unsigned add
result = op_add(src1, src2, opwidth) # at max width
# now saturate (unsigned)
- sat = max(result, (1<<destwid)-1)
+ sat = min(result, (1<<destwid)-1)
set_polymorphed_reg(rd, destwid, i, sat)
# set sat overflow
if Rc=1:
# logical op, signed has no meaning
result = op_xor(src1, src2, opwidth)
# now saturate (signed)
- sat = max(result, (1<<destwid-1)-1)
- sat = min(result, -(1<<destwid-1))
+ sat = min(result, (1<<destwid-1)-1)
+ sat = max(result, -(1<<destwid-1))
set_polymorphed_reg(rd, destwid, i, sat)
Overall here the rule is: apply common sense then document the behaviour
why CR-based pred-result analysis was added, because that at least is
entirely paralleliseable.
+# Vertical-First Mode
+
+This is a relatively new addition to SVP64 under development as of
+July 2021. Where Horizontal-First is the standard Cray-style for-loop,
+Vertical-First typically executes just the **one** scalar element
+in each Vectorised operation. That element is selected by srcstep
+and dststep *neither of which are changed as a side-effect of execution*.
+Illustrating this in pseodocode, with a branch/loop.
+To create loops, a new instruction `svstep` must be called,
+explicitly, with Rc=1:
+
+```
+loop:
+ sv.addi r0.v, r8.v, 5 # GPR(0+dststep) = GPR(8+srcstep) + 5
+ sv.addi r0.v, r8, 5 # GPR(0+dststep) = GPR(8 ) + 5
+ sv.addi r0, r8.v, 5 # GPR(0 ) = GPR(8+srcstep) + 5
+ svstep. # srcstep++, dststep++, CR0.eq = srcstep==VL
+ beq loop
+```
+
+Three examples are illustrated of different types of Scalar-Vector
+operations. Note that in its simplest form **only one** element is
+executed per instruction **not** multiple elements per instruction.
+(The more advanced version of Vertical-First mode may execute multiple
+elements per instruction, however the number executed **must** remain
+a fixed quantity.)
+
+Now that such explicit loops can increment inexorably towards VL,
+of course we now need a way to test if srcstep or dststep have reached
+VL. This is achieved in one of two ways: [[sv/svstep]] has an Rc=1 mode
+where CR0 will be updated if VL is reached. A standard v3.0B Branch
+Conditional may rely on that. Alternatively, the number of elements
+may be transferred into CTR, as is standard practice in Power ISA.
+Here, SVP64 [[sv/branches]] have a mode which allows CTR to be decremented
+by the number of vertical elements executed.
+
# Instruction format
Whilst this overview shows the internals, it does not go into detail