From b72fb1493fe7c5957e77fb794d812636393d45b2 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Mon, 10 Apr 2023 12:52:14 +0100 Subject: [PATCH] add appendix and compliancy levels to ls010, update po9 page with additional Definitions --- openpower/sv/compliancy_levels.mdwn | 23 +++++---- openpower/sv/po9_encoding.mdwn | 75 +++++++++++++++++------------ openpower/sv/rfc/ls010.mdwn | 2 + openpower/sv/svp64/appendix.mdwn | 54 +++++++++++---------- 4 files changed, 89 insertions(+), 65 deletions(-) diff --git a/openpower/sv/compliancy_levels.mdwn b/openpower/sv/compliancy_levels.mdwn index 5eceaff8a..d46fa732b 100644 --- a/openpower/sv/compliancy_levels.mdwn +++ b/openpower/sv/compliancy_levels.mdwn @@ -1,5 +1,3 @@ -[[!tag standards]] - # Simple-V Compliancy Levels The purpose of the Compliancy Levels is to provide a documented @@ -67,7 +65,7 @@ The SV Compliancy Levels have nothing to do with the Power ISA Compliancy Levels (SFS, SFFS, Linux, AIX). They are separate and independent. It is perfectly fine to implement Ultra-Embedded on AIX, and perfectly fine to implement 3D/Advanced on SFS. **Compliance with SV Levels does not convey or remove the obligation of Compliance with SFS/SFFS/Linux/AIX Levels and vice-versa**. -# Zero-Level +## Zero-Level This level exists to indicate the critical importance of all and any features attempted to be executed on hardware that has no support at @@ -79,7 +77,7 @@ With parts of the Power ISA being "silent executed" (hints for example), it is absolutely critical to have all capabilities of Simple-V sit within full Illegal Instruction space of existing and future Hardware. -# Ultra-Embedded Level +## Ultra-Embedded Level This level exists as an entry-level into SVP64, most suited to resource constrained soft cores, or Hardware implementations where unit cost is a much @@ -118,7 +116,7 @@ an SVP64 Prefixed instruction is identical in every respect to Scalar non-prefixed, i.e. as if the Prefix had not been present. Additionally all SV SPRs must be zero and the 24-bit `RM` field must be zero. -# Embedded Level +## Embedded Level This level is more suitable for Hardware implementations where performance and power saving begins to matter. A second instruction, `svstep`, used by Vertical-First Mode, is required, as is hardware-level looping in @@ -151,7 +149,7 @@ modifying the Scalar Power ISA. The cost in software is that Predicated instructions are Prefixed to 64-bit. -# DSP / Audio / Video Level +## DSP / Audio / Video Level This level is best suited to high-performance power-efficient but specialist Compute workloads. 128 GPRs, FPRs and CR Fields are all @@ -170,13 +168,13 @@ due to the high prevalence of DCT and FFT in Audio, Video and DSP workloads it is strongly recommended. Matrix (Dimensional) REMAP and Swizzle may also be useful to help with 24-bit (3 byte) Structured Audio Streams and are also recommended but not mandatory. -# High-end DSP +## High-end DSP In this Compliancy Level the benefits of the Offset and Index REMAP subsystem becomes worth its hardware cost. In lower-performing DSP and A/V workloads it is not. -# 3D / Advanced / Supercomputing +## 3D / Advanced / Supercomputing This Compliancy Level is for highest performance and energy efficiency. All aspects of SVP64 must be entirely implemented, in full, in Hardware. @@ -192,7 +190,7 @@ additional Register Hazard Dependencies on fine-grained (8/16/32-bit) operations. Just as with SRAMs multiple write-enable lines may be raised to update higher-width elements. -# Examples +## Examples Assuming that hardware implements scalar operations only, and implements predication but not elwidth overrides: @@ -214,3 +212,10 @@ It would not qualify for the "Embedded" level because when VL=4 an Illegal Exception is raised, and the Embedded Level requires full VL Loop support in hardware. +[[!tag standards]] + +------- + +\newpage() + + diff --git a/openpower/sv/po9_encoding.mdwn b/openpower/sv/po9_encoding.mdwn index 5e186dfcb..e355e1a0d 100644 --- a/openpower/sv/po9_encoding.mdwn +++ b/openpower/sv/po9_encoding.mdwn @@ -2,31 +2,45 @@ **Proposal: Add the following Definition to Section 1.3.1 of Book I** -**Definition of SVP64 Prefixing:** +**Definition of Simple-V:** -In its simpest form, SVP64 is a 32-bit Prefix conceptually similar to Intel 8086 `REP` -instruction that both augments its following Defined Word Suffix, and also may -repeat that instruction with optional sequential register offsets from those given in the +In its simpest form, the Simple-V Loop/Vector concept is a Prefixing +system (sililar to the 8086 `REP` instruction) that both augments its +following Defined Word Suffix, and also may repeat that instruction +with optional sequential register offsets from those given in the Suffix. Register numbers may also be extended (larger register files). -More advanced features add predication, element-width overrides, and Vertical-First -Mode. +More advanced features add predication, element-width overrides, and +Vertical-First Mode. + +**Definition of SVP64 Prefixing:** + +SVP64 is a well-defined implementation of the Simple-V Loop/Vector concept, +in a 32-bit Prefix format, that exploits the following instruction +(the Defined Word) using it as a "template". It requires 24 bits, +some of which are common to all Suffixes, and some Mode bits are specific +to the Defined Word class: Load/Store-Immediate, Load/Store-Indexed, +Arithmetic/Logical, Condition Register operations, and Branch-Conditional. +Anything not falling into those five categories is termed "UnVectoriseable". **Definition of Vertical-First:** -Normal Cray-style Vectorisation, designated Horizontal-First, performs element-level -operations (often in parallel) before moving in the usual fashion to the next -instruction. Vertical-First on the other hand executes *one element operation only* -then moves on to the next instruction, whereupon if that is also an SVP64-Prefixed -instruction the exact same element offset is used. Element offsets are then explicitly -advanced by calling a special instruction, `svstep`. The term "Vertical-First" -stems from visually listing program instructions vertically and register files horizontally. +Normal Cray-style Vectorisation, designated Horizontal-First, performs +element-level operations (often in parallel) before moving in the usual +fashion to the next instruction. Vertical-First on the other hand executes +*one element operation only* then moves on to the next instruction, +whereupon if that is also an SVP64-Prefixed instruction the exact same +element offset is used. Element offsets are then explicitly advanced +by calling a special instruction, `svstep`. The term "Vertical-First" +stems from visually listing program instructions vertically and register +files horizontally. **Definition of SVP64Single Prefixing:** -A 32-bit Prefix in front of a Defined Word that extends register numbers -(allows larger register files), adds single-bit predication, element-width overrides, -and optionally adds Saturation to Arithmetic instructions that normally would not -have it. *SVP64 is in Draft only* and is yet to be defined. +A 32-bit Prefix in front of a Defined Word that extends register +numbers (allows larger register files), adds single-bit predication, +element-width overrides, and optionally adds Saturation to Arithmetic +instructions that normally would not have it. *SVP64 is in Draft only* +and is yet to be defined. **Definition of "UnVectoriseable":** @@ -35,14 +49,13 @@ Prefixing) is termed "UnVectoriseable" or "UnVectorised". Examples include `sc` or `sync` which have no registers. `mtmsr` is also classed as UnVectoriseable because there is only one `MSR`. -UnVectorised instructions are required to be detected as such if -Prefixed (either SVP64 or SVP64Single) and an Illegal Instruction -Trap raised. +UnVectorised instructions are required to be detected as such if Prefixed +(either SVP64 or SVP64Single) and an Illegal Instruction Trap raised. -*Architectural Note: Given that a "pre-classification" Decode Phase is -required (identifying whether the Suffix - Defined Word - is -Arithmetic/Logical, CR-op, Load/Store or Branch-Conditional), -adding "UnVectorised" to this phase is not unreasonable.* +*Architectural Note: Given that a "pre-classification" Decode Phase +is required (identifying whether the Suffix - Defined Word - is +Arithmetic/Logical, CR-op, Load/Store or Branch-Conditional), adding +"UnVectorised" to this phase is not unreasonable.* # New 64-bit Instruction Encoding spaces @@ -86,13 +99,13 @@ the Post-Increment Mode of LD/ST-Update and Vectorised Branch-Conditional.* Encoding spaces and their potential are illustrated: -| Encoding |Available bits|Scalar|Vectoriseable | SVP64Single | PO1-Prefixable | -|----------|--------------|------|--------------|--------------|----------------| -|EXT000-063| 32 | yes | yes |yes |yes | -|EXT100-163| 64 | yes | no |no |not twice | -|RESERVED2 | 57 | N/A |not applicable|not applicable|not applicable | -|EXT232-263| 32 | yes | yes |yes |no | -|RESERVED1 | 32 | N/A | no |no |no | +| Encoding |Available bits|Scalar|Vectoriseable | SVP64Single |PO1-Prefixable | +|----------|--------------|------|--------------|--------------|---------------| +|EXT000-063| 32 | yes | yes |yes |yes | +|EXT100-163| 64 | yes | no |no |not twice | +|RESERVED2 | 57 | N/A |not applicable|not applicable|not applicable | +|EXT232-263| 32 | yes | yes |yes |no | +|RESERVED1 | 32 | N/A | no |no |no | Notes: diff --git a/openpower/sv/rfc/ls010.mdwn b/openpower/sv/rfc/ls010.mdwn index d9383c99b..e5da902f9 100644 --- a/openpower/sv/rfc/ls010.mdwn +++ b/openpower/sv/rfc/ls010.mdwn @@ -100,3 +100,5 @@ Add the following entries to: [[!inline pages="openpower/sv/ldst" raw=yes ]] [[!inline pages="openpower/sv/branches" raw=yes ]] [[!inline pages="openpower/sv/cr_ops" raw=yes ]] +[[!inline pages="openpower/sv/svp64/appendix" raw=yes ]] +[[!inline pages="openpower/sv/compliancy_levels" raw=yes ]] diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn index 46b399834..5e6478ae0 100644 --- a/openpower/sv/svp64/appendix.mdwn +++ b/openpower/sv/svp64/appendix.mdwn @@ -1,5 +1,3 @@ -[[!tag standards]] - # Appendix * Saturation @@ -17,7 +15,7 @@ Table of contents: [[!toc]] -# Partial Implementations +## Partial Implementations It is perfectly legal to implement subsets of SVP64 as long as illegal instruction traps are always raised on unimplemented features, @@ -32,7 +30,7 @@ opportunity to emulate the context created by the given SPR. See [[sv/compliancy_levels]] for full details. -# XER, SO and other global flags +## XER, SO and other global flags Vector systems are expected to be high performance. This is achieved through parallelism, which requires that elements in the vector be @@ -81,7 +79,7 @@ may be performed by setting VL=8, and a one-instruction 1024-bit Add-with-Carry by setting VL=16, and so on. More on this in [[openpower/sv/biginteger]] -# EXTRA Field Mapping +## EXTRA Field Mapping The purpose of the 9-bit EXTRA field mapping is to mark individual registers (RT, RA, BFA) as either scalar or vector, and to extend @@ -139,7 +137,7 @@ through the Power ISA WG Process). It would be similar to deciding that `add` should be changed from X-Form to D-Form. -# Single Predication +## Single Predication This is a standard mode normally found in Vector ISAs. every element in every source Vector and in the destination uses the same bit of one single predicate mask. @@ -201,7 +199,7 @@ The following schedule for srcstep and dststep will occur: Here, both srcstep and dststep remain in lockstep because sz=dz=1 -# Twin Predication +## Twin Predication This is a novel concept that allows predication to be applied to a single source and a single dest register. The following types of traditional @@ -245,7 +243,7 @@ is not actually a Vector ISA: it is a loop-abstraction-concept that is applied *in general* to Scalar operations, just like the x86 `REP` instruction (if put on steroids). -# Pack/Unpack +## Pack/Unpack The pack/unpack concept of VSX `vpack` is abstracted out as Sub-Vector reordering. @@ -314,7 +312,7 @@ for Vertical-First Mode. Pack/Unpack is enabled (set up) through [[sv/svstep]]. -# Reduce modes +## Reduce modes Reduction in SVP64 is deterministic and somewhat of a misnomer. A normal Vector ISA would have explicit Reduce opcodes with defined characteristics @@ -344,7 +342,7 @@ Order. In essence it becomes the programmer's responsibility to leverage the pre-determined schedules to desired effect. -## Scalar result reduction and iteration +### Scalar result reduction and iteration Scalar Reduction per se does not exist, instead is implemented in SVP64 as a simple and natural relaxation of the usual restriction on the Vector @@ -462,7 +460,7 @@ as far as the user is concerned, all exceptions and interrupts **MUST** be precise. -# Fail-on-first +## Fail-on-first Data-dependent fail-on-first has two distinct variants: one for LD/ST (see [[sv/ldst]], @@ -548,7 +546,7 @@ will rely. REMAP will need to be activated to invert the ordering of element traversal.* -## Data-dependent fail-first on CR operations (crand etc) +### Data-dependent fail-first on CR operations (crand etc) Operations that actually produce or alter CR Field as a result do not also in turn have an Rc=1 mode. However it makes no @@ -566,7 +564,7 @@ There are two primary different types of CR operations: More details can be found in [[sv/cr_ops]]. -# pred-result mode +## pred-result mode Pred-result mode may not be applied on CR-based operations. @@ -584,7 +582,7 @@ there can be no pred-result mode for mtcr and other CR-based instructions Arithmetic and Logical Pred-result, which does have Rc=1 or for which RC1 Mode makes sense, is covered in [[sv/normal]] -# CR Operations +## CR Operations CRs are slightly more involved than INT or FP registers due to the possibility for indexing individual bits (crops BA/BB/BT). Again however @@ -592,7 +590,7 @@ the access pattern needs to be understandable in relation to v3.0B / v3.1B numbering, with a clear linear relationship and mapping existing when SV is applied. -## CR EXTRA mapping table and algorithm +### CR EXTRA mapping table and algorithm Numbering relationships for CR fields are already complex due to being in BE format (*the relationship is not clearly explained in the v3.0B @@ -667,7 +665,7 @@ batches of aligned 32-bit chunks (CR0-7, CR7-15). This is to greatly simplify internal design. If instructions are issued where CR Vectors do not start on a 32-bit aligned boundary, performance may be affected. -## CR fields as inputs/outputs of vector operations +### CR fields as inputs/outputs of vector operations CRs (or, the arithmetic operations associated with them) may be marked as Vectorised or Scalar. When Rc=1 in arithmetic operations that have no explicit EXTRA to cover the CR, the CR is Vectorised if the destination is Vectorised. Likewise if the destination is scalar then so is the CR. @@ -725,7 +723,7 @@ and VL truncation provide several benefits. (see [[discussion]]. some alternative schemes are described there) -## Rc=1 when SUBVL!=1 +### Rc=1 when SUBVL!=1 sub-vectors are effectively a form of Packed SIMD (length 2 to 4). Only 1 bit of predicate is allocated per subvector; likewise only one CR is allocated @@ -737,7 +735,7 @@ is to perform a bitwise OR or AND of the subvector tests. Given that OE is ignored in SVP64, this field may (when available) be used to select OR or AND behavior. -### Table of CR fields +#### Table of CR fields CRn is the notation used by the OpenPower spec to refer to CR field #i, so FP instructions with Rc=1 write to CR1 (n=1). @@ -753,15 +751,15 @@ are arranged. TODO a python program that auto-generates a CSV file which can be included in a table, which is in a new page (so as not to overwhelm this one). [[svp64/cr_names]] -# Register Profiles +## Register Profiles Instructions are broken down by Register Profiles as listed in the following auto-generated page: [[opcode_regs_deduped]]. These tables, despite being auto-generated, are part of the Specification. -# SV pseudocode illustration +## SV pseudocode illustration -## Single-predicated Instruction +### Single-predicated Instruction illustration of normal mode add operation: zeroing not included, elwidth overrides not included. if there is no predicate, it is set to all 1s @@ -805,7 +803,7 @@ intended, then an all-Scalar operation should be used. See -# Assembly Annotation +## Assembly Annotation Assembly code annotation is required for SV to be able to successfully mark instructions as "prefixed". @@ -861,7 +859,7 @@ For modes: - mr OR crm: "normal" map-reduce mode or CR-mode. - mr.svm OR crm.svm: when vec2/3/4 set, sub-vector mapreduce is enabled -# Parallel-reduction algorithm +## Parallel-reduction algorithm The principle of SVP64 is that SVP64 is a fully-independent Abstraction of hardware-looping in between issue and execute phases @@ -910,7 +908,7 @@ insert micro-architectural lane-crossing Move operations if necessary or desired, to give the level of efficiency or performance required.** -# Element-width overrides +## Element-width overrides Element-width overrides are best illustrated with a packed structure union in the c programming language. The following should be taken @@ -991,7 +989,7 @@ Thus it can be clearly seen that elements are packed by their element width, and the packing starts from the source (or destination) specified by the instruction. -# Twin (implicit) result operations +## Twin (implicit) result operations Some operations in the Power ISA already target two 64-bit scalar registers: `lq` for example, and LD with update. @@ -1090,3 +1088,9 @@ with an implicit 2nd destination: * [[isa/svfixedarith]] * [[isa/svfparith]] +[[!tag standards]] + +------ + +\newpage() + -- 2.30.2