(no commit message)
[libreriscv.git] / openpower / sv / compliancy_levels.mdwn
1 [[!tag standards]]
2
3 # Simple-V Compliancy Levels
4
5 The purpose of the Compliancy Levels is to provide a documented
6 stable base for implementors to achieve software interoperability
7 without requiring a high and unnecessary hardware cost unrelated
8 to their needs. The bare
9 minimum requirement, particularly suited for Ultra-embedded, requires
10 just one instruction, reservation of SPRs, and the rest may entirely
11 be Soft-emulated by raising Illegal Instruction traps. At the other
12 end of the spectrum is the full REMAP Structure Packing suitable
13 for traditional Vector Processing workloads and High-performance
14 energy-efficient DSP workloads.
15
16 To achieve full soft-emulated interoperability, all implementations
17 **must**, at the bare minimum, raise Illegal Instruction traps for
18 all SPRs including all reserved SPRs, all SVP64-related Context
19 instructions (REMAP), as well as for the entire SVP64 Prefix space.
20
21 *Even if the Power ISA Scalar Specification states that a given
22 Scalar
23 instruction need not or must not raise an illegal instruction on UNDEFINED
24 behaviour, unimiplemented parts of SVP64 *MUST* raise an illegal
25 instruction trap when (and only when)
26 that same Scalar instruction is Prefixed*. It is absolutely critical
27 to note that when not Prefixed, under no circumstances shall the Scalar
28 instruction deviate from the Scalar Power ISA Specification.
29
30 Summary of Compliancy Levels, each Level includes all lower levels:
31
32 * **Ultra-embedded**: `setvl` instruction and context-switching of SVSTATE
33 to/from SVSRR1. Register Files as Standard Power ISA. `scalar identity`
34 implemented.
35 * **Embedded**: `svstep` instruction,
36 and support for Hardware for-looping
37 in both Horizontal-First and Vertical-First Mode as well as Predication
38 (Single and Twin) for the GPRs r3, r10 and r30. CR-Field-based
39 Predicates, if used, may still raise illegal instruction trap.
40 * **DSP/AV**: 128 registers,
41 element-width
42 overrides, and Saturation and Mapreduce/Iteration Modes.
43 * **3D/Advanced/Supercomputing**: all SV Branch instructions;
44 crweird and vector-assist instructions (`set-before-first` etc);
45 Swizzle Move instructions;
46 Matrix, DCT/FFT and Indexing
47 REMAP capability; Fail-First and Predicate-Result Modes.
48
49 These requirements within each Level constitute the minimum mandatory
50 capabilities.
51 It is also permitted that any Level include any part of a higher Compliancy
52 Level. For example:
53 an Embedded Level is permitted to have 128 GPRs, FPRs and CR Fields,
54 but the Compliance Tests for Embedded will only test for 32. DSP/VPU Level
55 is permitted to implement the DCT REMAP capability, but will not be
56 permitted to declare meeting the 3D/Advanced Level unless implementing
57 *all* REMAP Capabilities.
58
59 **Power ISA Compliancy Levels**
60
61 The SV Compliancy Levels have nothing to do with the Power ISA Compliancy
62 Levels (SFS, SFFS, Linux, AIX). They are separate and independent. It
63 is perfectly fine to implement Ultra-Embedded on AIX, and perfectly fine to implement 3D/Advanced on SFS. **Compliance with SV Levels does not convey or remove the obligation of Compliance with SFS/SFFS/Linux/AIX Levels and vice-versa**.
64
65 # Ultra-Embedded Level
66
67 This level exists as an entry-level into SVP64, most suited to resource
68 constrained soft cores, or Hardware implementations where unit cost is a much
69 higher priority than execution speed.
70
71 This level sets the bare minimum requirements, where everything with the
72 exception of `scalar identity` and
73 the `setvl` instruction may be software-emulated through
74 JIT Translation or Illegal Instruction traps. SVSTATE, as effectively
75 a Sub-Program-Counter, joins MSR and PC (CIA, NIA)
76 as direct peers and must be switched on any context-switch (Trap or
77 Exception)
78
79 * PC is saved/restored to/from SRR0
80 * MSR is saved/restored to/from SRR1
81 * SVSTATE **must** also be saved/restored to/from SVSRR1
82
83 Any implementation that implements Hypervisor Mode must also
84 correspondingly follow the Power ISA Spec guidelines for HSRR0 and HSRR1,
85 and must save/restore SVSTATE to/from HSVSRR1 in all circumstances
86 involving save/restore to/from HSRR0 and HSRR1.
87
88 Illegal Instruction Trap **must** be raised on:
89
90 * Any SV instructions not implemented
91 * any unimplemented SV Context SPRs read or written
92 * all unimplemented uses of the SVP64 Prefix
93 * non-scalar-identity SVP64 instructions
94
95 Implementors are free and clear to implement any other features of
96 SVP64 however only by meeting all of the mandatory requirements above
97 will Compliance with the Ultra-Embedded Level be achieved.
98
99 Note that `scalar identity` is defined as being when the execution of
100 an SVP64 Prefixed instruction is identical in every respect to
101 Scalar non-prefixed, i.e. as if the Prefix had not been present.
102 Additionally all SV SPRs must be zero and the 24-bit `RM` field must be zero.
103
104 # Embedded Level
105
106 This level is more suitable for Hardware implementations where performance and power saving begins to matter. A second instruction, `svstep`, used
107 by Vertical-First Mode, is required, as is hardware-level looping in
108 Horizontal-First Mode. Illegal Instruction trap may not be used to
109 emulate `svstep`.
110
111 At the bare minimum, Twin and Single Predication must be supported for
112 at least the GPRs r3, r10 and r30. CR Field Predication may also be
113 supported in hardware but only by also increasing the number of CR Fields
114 to the required total 128.
115
116 Another important aspect is that when Rc=1 is set, CR Field Vector co-results
117 are produced. Should these exceed CR7 (CR8-CR127) and the number of CR Fields
118 has not been increased to 128 then an Illegal Instruction Trap must be
119 raised. In practical terms, to avoid this occurrence in Embedded software,
120 MAXVL should not
121 exceed 8 for Arithmetic or Logical operations with Rc=1.
122
123 Zeroing on source and destination for Predicates
124 must also be supported (sz, dz) however
125 all other Modes (Saturation, Fail-First, Predicate-Result,
126 Iteration/Reduction) are entirely optional. Implementation of Element-Width
127 Overrides is also optional.
128
129 One of the important side-benefits of this SV Compliancy Level is that it
130 brings Hardware-level support for Scalar Predication (VL=MAXVL=1)
131 to the entire Scalar Power
132 ISA, completely without
133 modifying the Scalar Power ISA. The cost in software is that Predicated
134 instructions are Prefixed
135 to 64-bit.
136
137 # DSP / Audio / Video Level
138
139 This level is best suited to high-performance power-efficient but
140 specialist Compute workloads. 128 GPRs, FPRs and CR Fields are all
141 required, as is element-width overrides to allow data processing
142 down to the 8-bit level. SUBVL support (Sub-Vector vec2/3/4) is also
143 required, as is Pack/Unpack EXTRA format (helps with Pixel and
144 Audio Stream Structured data)
145
146 All SVP64 Modes must be implemented in hardware: Saturation
147 in particular is a necessity for Audio DSP work. Reduction as well to
148 assist with Audio/Video.
149
150 It is not mandatory for this Level to have DCT/FFT REMAP Capability in
151 hardware but
152 due to the high prevalence of DCT and FFT in Audio, Video and DSP
153 workloads it is strongly recommended. Matrix (Dimensional) REMAP
154 and Swizzle may also be useful to help with 24-bit (3 byte) Structured Audio Streams and are also recommended but not mandatory.
155
156 # 3D / Advanced / Supercomputing
157
158 This Compliancy Level is for highest performance and energy efficiency.
159 All aspects of SVP64 must be entirely implemented, in full, in Hardware.
160 How that is achieved is entirely at the discretion of the implementor:
161 there are no hard requirements of any kind on the level of performance,
162 just as there are none in the Vulkan(TM) Specification.
163
164 Throughout the SV
165 Specification however there are hints to Micro-Architects: byte-level
166 write-enable lines on Register Files is strongly recommended, for
167 example, in order to avoid unnecessary Read-Modify-Write cycles and
168 additional Register Hazard Dependencies on fine-grained (8/16/32-bit)
169 operations. Just as with SRAMs multiple write-enable lines may be
170 raised to update higher-width elements.
171
172 # Examples
173
174 Assuming that hardware implements scalar operations only,
175 and implements predication but not elwidth overrides:
176
177 setvli r0, 4 # sets VL equal to 4
178 sv.addi r5, r0, 1 # raises an 0x700 trap
179 setvli r0, 1 # sets VL equal to 1
180 sv.addi r5, r0, 1 # gets executed by hardware
181 sv.addi/ew=8 r5, r0, 1 # raises an 0x700 trap
182 sv.ori/sm=EQ r5, r0, 1 # executed by hardware
183
184 The first `sv.addi` raises an illegal instruction trap because
185 VL has been set to 4, and this is not supported. Likewise
186 elwidth overrides if requested always raise illegal instruction
187 traps.
188
189 Such an implementation would qualify for the "Ultra-Embedded" SV Level.
190 It would not qualify for the "Embedded" level because when VL=1 an
191 Illegal Exception is raised, and the Embedded Level requires full
192 VL Loop support in hardware.
193