https://bugs.libre-soc.org/show_bug.cgi?id=985
[libreriscv.git] / openpower / sv / compliancy_levels.mdwn
1 # Simple-V Compliancy Levels
2
3 The purpose of the Compliancy Levels is to provide a documented
4 stable base for implementors to achieve software interoperability
5 without requiring a high and unnecessary hardware cost unrelated
6 to their needs. The bare
7 minimum requirement, particularly suited for Ultra-embedded, requires
8 just one instruction, reservation of SPRs, and the rest may entirely
9 be Soft-emulated by raising Illegal Instruction traps. At the other
10 end of the spectrum is the full REMAP Structure Packing suitable
11 for traditional Vector Processing workloads and High-performance
12 energy-efficient DSP workloads.
13
14 To achieve full soft-emulated interoperability, all implementations
15 **must**, at the bare minimum, raise Illegal Instruction traps for
16 all SPRs including all reserved SPRs, all SVP64-related Context
17 instructions (REMAP), as well as for the entire SVP64 Prefix space.
18
19 *Even if the Power ISA Scalar Specification states that a given
20 Scalar
21 instruction need not or must not raise an illegal instruction on UNDEFINED
22 behaviour, unimiplemented parts of SVP64 *MUST* raise an illegal
23 instruction trap when (and only when)
24 that same Scalar instruction is Prefixed*. It is absolutely critical
25 to note that when not Prefixed, under no circumstances shall the Scalar
26 instruction deviate from the Scalar Power ISA Specification.
27
28 Summary of Compliancy Levels, each Level includes all lower levels:
29
30 * **Zero-Level**: Simple-V is not implemented (at all) in hardware. This
31 Level is required to be listed because all capabilities of Simple-V
32 must be Soft-emulatable by way of Illegal Instruction Traps.
33 * **Ultra-embedded**: `setvl` instruction. Register Files as Standard Power
34 ISA. `scalar identity behaviour` implemented.
35 * **Embedded**: `svstep` instruction,
36 and support for Hardware for-looping
37 in both Horizontal-First and Vertical-First Mode as well as Predication
38 (Single and Twin) for the GPRs r3, r10 and r30. CR-Field-based
39 Predicates do not need to be added.
40 * **Embedded DSP/AV**: 128 registers,
41 element-width
42 overrides, and Saturation and Mapreduce/Iteration Modes.
43 * **High-end DSP/AV**: Same as Embedded-DSP/AV except also
44 including Indexed and Offset REMAP capability.
45 * **3D/Advanced/Supercomputing**: all SV Branch instructions;
46 crweird and vector-assist instructions (`set-before-first` etc);
47 Swizzle Move instructions;
48 Matrix, DCT/FFT and Indexing
49 REMAP capability; Fail-First and Predicate-Result Modes.
50
51 These requirements within each Level constitute the minimum mandatory
52 capabilities.
53 It is also permitted that any Level include any part of a higher Compliancy
54 Level. For example:
55 an Embedded Level is permitted to have 128 GPRs, FPRs and CR Fields,
56 but the Compliance Tests for Embedded will only test for 32. DSP/VPU Level
57 is permitted to implement the DCT REMAP capability, but will not be
58 permitted to declare meeting the 3D/Advanced Level unless implementing
59 *all* REMAP Capabilities.
60
61 **Power ISA Compliancy Levels**
62
63 The SV Compliancy Levels have nothing to do with the Power ISA Compliancy
64 Levels (SFS, SFFS, Linux, AIX). They are separate and independent. It
65 is perfectly fine to implement Ultra-Embedded on AIX, and perfectly fine to implement 3D/Advanced on SFS. **Compliance with SV Levels does not convey or remove the obligation of Compliance with SFS/SFFS/Linux/AIX Levels and vice-versa**.
66
67 ## Zero-Level
68
69 This level exists to indicate the critical importance of all and any
70 features attempted to be executed on hardware that has no support at
71 all for Simple-V being **required** to raise Illegal Exceptions.
72 **This includes existing Power ISA Implementations:** IBM POWER being
73 the most notable.
74
75 With parts of the Power ISA being "silent executed" (hints for example),
76 it is absolutely critical to have all capabilities of Simple-V sit
77 within full Illegal Instruction space of existing and future Hardware.
78
79 ## Ultra-Embedded Level
80
81 This level exists as an entry-level into SVP64, most suited to resource
82 constrained soft cores, or Hardware implementations where unit cost is a much
83 higher priority than execution speed.
84
85 This level sets the bare minimum requirements, where everything with the
86 exception of `scalar identity` and
87 the `setvl` instruction may be software-emulated through
88 JIT Translation or Illegal Instruction traps. SVSTATE, as effectively
89 a Sub-Program-Counter, joins MSR and PC (CIA, NIA)
90 as direct peers and must be switched on any context-switch (Trap or
91 Exception)
92
93 * PC is saved/restored to/from SRR0
94 * MSR is saved/restored to/from SRR1
95 * SVSTATE **must** also be saved/restored to/from SVSRR1
96
97 Any implementation that implements Hypervisor Mode must also
98 correspondingly follow the Power ISA Spec guidelines for HSRR0 and HSRR1,
99 and must save/restore SVSTATE to/from HSVSRR1 in all circumstances
100 involving save/restore to/from HSRR0 and HSRR1.
101
102 Illegal Instruction Trap **must** be raised on:
103
104 * Any SV instructions not implemented
105 * any unimplemented SV Context SPRs read or written
106 * all unimplemented uses of the SVP64 Prefix
107 * non-scalar-identity SVP64 instructions
108
109 Implementors are free and clear to implement any other features of
110 SVP64 however only by meeting all of the mandatory requirements above
111 will Compliance with the Ultra-Embedded Level be achieved.
112
113 Note that `scalar identity` is defined as being when the execution of
114 an SVP64 Prefixed instruction is identical in every respect to
115 Scalar non-prefixed, i.e. as if the Prefix had not been present.
116 Additionally all SV SPRs must be zero and the 24-bit `RM` field must be zero.
117
118 ## Embedded Level
119
120 This level is more suitable for Hardware implementations where performance and power saving begins to matter. A second instruction, `svstep`, used
121 by Vertical-First Mode, is required, as is hardware-level looping in
122 Horizontal-First Mode. Illegal Instruction trap may not be used to
123 emulate `svstep`.
124
125 At the bare minimum, Twin and Single Predication must be supported for
126 at least the GPRs r3, r10 and r30. CR Field Predication may also be
127 supported in hardware but only by also increasing the number of CR Fields
128 to the required total 128.
129
130 Another important aspect is that when Rc=1 is set, CR Field Vector co-results
131 are produced. Should these exceed CR7 (CR8-CR127) and the number of CR Fields
132 has not been increased to 128 then an Illegal Instruction Trap must be
133 raised. In practical terms, to avoid this occurrence in Embedded software,
134 MAXVL should not
135 exceed 8 for Arithmetic or Logical operations with Rc=1.
136
137 Zeroing on source and destination for Predicates
138 must also be supported (sz, dz) however
139 all other Modes (Saturation, Fail-First, Predicate-Result,
140 Iteration/Reduction) are entirely optional. Implementation of Element-Width
141 Overrides is also optional.
142
143 One of the important side-benefits of this SV Compliancy Level is that it
144 brings Hardware-level support for Scalar Predication (VL=MAXVL=1)
145 to the entire Scalar Power
146 ISA, completely without
147 modifying the Scalar Power ISA. The cost in software is that Predicated
148 instructions are Prefixed
149 to 64-bit.
150
151 ## DSP / Audio / Video Level
152
153 This level is best suited to high-performance power-efficient but
154 specialist Compute workloads. 128 GPRs, FPRs and CR Fields are all
155 required, as is element-width overrides to allow data processing
156 down to the 8-bit level. SUBVL support (Sub-Vector vec2/3/4) is also
157 required, as is Pack/Unpack EXTRA format (helps with Pixel and
158 Audio Stream Structured data)
159
160 All SVP64 Modes must be implemented in hardware: Saturation
161 in particular is a necessity for Audio DSP work. Reduction as well to
162 assist with Audio/Video.
163
164 It is not mandatory for this Level to have DCT/FFT REMAP Capability in
165 hardware but
166 due to the high prevalence of DCT and FFT in Audio, Video and DSP
167 workloads it is strongly recommended. Matrix (Dimensional) REMAP
168 and Swizzle may also be useful to help with 24-bit (3 byte) Structured Audio Streams and are also recommended but not mandatory.
169
170 ## High-end DSP
171
172 In this Compliancy Level the benefits of the Offset and Index REMAP
173 subsystem becomes worth its hardware cost. In lower-performing DSP
174 and A/V workloads it is not.
175
176 ## 3D / Advanced / Supercomputing
177
178 This Compliancy Level is for highest performance and energy efficiency.
179 All aspects of SVP64 must be entirely implemented, in full, in Hardware.
180 How that is achieved is entirely at the discretion of the implementor:
181 there are no hard requirements of any kind on the level of performance,
182 just as there are none in the Vulkan(TM) Specification.
183
184 Throughout the SV
185 Specification however there are hints to Micro-Architects: byte-level
186 write-enable lines on Register Files is strongly recommended, for
187 example, in order to avoid unnecessary Read-Modify-Write cycles and
188 additional Register Hazard Dependencies on fine-grained (8/16/32-bit)
189 operations. Just as with SRAMs multiple write-enable lines may be
190 raised to update higher-width elements.
191
192 ## Examples
193
194 Assuming that hardware implements scalar operations only,
195 and implements predication but not elwidth overrides:
196
197 setvli r0, 4 # sets VL equal to 4
198 sv.addi r5, r0, 1 # raises an 0x700 trap
199 setvli r0, 1 # sets VL equal to 1
200 sv.addi r5, r0, 1 # gets executed by hardware
201 sv.addi/ew=8 r5, r0, 1 # raises an 0x700 trap
202 sv.ori/sm=EQ r5, r0, 1 # executed by hardware
203
204 The first `sv.addi` raises an illegal instruction trap because
205 VL has been set to 4, and this is not supported. Likewise
206 elwidth overrides if requested always raise illegal instruction
207 traps.
208
209 Such an implementation would qualify for the "Ultra-Embedded" SV Level.
210 It would not qualify for the "Embedded" level because when VL=4 an
211 Illegal Exception is raised, and the Embedded Level requires full
212 VL Loop support in hardware.
213
214 [[!tag standards]]
215
216 -------
217
218 \newpage{}
219
220