replace Vectorised with Vectorized

author Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Mon, 29 May 2023 12:27:18 +0000 (13:27 +0100)

committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Mon, 29 May 2023 12:27:18 +0000 (13:27 +0100)
author Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Mon, 29 May 2023 12:27:18 +0000 (13:27 +0100)
committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Mon, 29 May 2023 12:27:18 +0000 (13:27 +0100)
diff --git a/openpower/sv/16_bit_compressed.mdwn b/openpower/sv/16_bit_compressed.mdwn

index 2a302ea5d7d9929ad1e50f3cc37bf47622964086..564138e1b85b113924a28940363cf4287ace45e9 100644 (file)
--- a/openpower/sv/16_bit_compressed.mdwn
+++ b/openpower/sv/16_bit_compressed.mdwn
@@ -41,7 +41,7 @@ standard 32 bit and 16 bit to intermingle cleanly.  To achieve the same
  thing on OpenPOWER would require a whopping 24 6-bit Major Opcodes which
  is clearly impractical: other schemes need to be devised.
  
-In addition we would like to add SV-C32 which is a Vectorised version
+In addition we would like to add SV-C32 which is a Vectorized version
  of 16 bit Compressed, and ideally have a variant that adds the 27-bit
  prefix format from SV-P64, as well.
  
diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn

index 89ead34078ca1ce6460db4f0865422cf94cba489..ea1bf2b9e6f0b9652eee9f54e00dec0ecc21525f 100644 (file)
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -21,7 +21,7 @@ Inventing a new Scalar ISA from scratch is over a decade-long task
  including simulators and compilers: OpenRISC 1200 took 12 years to
  mature.  Stable Open ISAs require Standards and Compliance Suites that
  take more. A Vector or Packed SIMD ISA to reach stable *general-purpose*
-auto-vectorisation compiler support has never been achieved in the
+auto-vectorization compiler support has never been achieved in the
  history of computing, not with the combined resources of ARM, Intel,
  AMD, MIPS, Sun Microsystems, SGI, Cray, and many more. (*Hand-crafted
  assembler and direct use of intrinsics is the Industry-standard norm
@@ -129,7 +129,7 @@ performance is concerned.
  
  Slowly, at this point, a realisation should be sinking in that, actually,
  there aren't as many really truly viable Vector ISAs out there, as the
-ones that are evolving in the general direction of Vectorisation are,
+ones that are evolving in the general direction of Vectorization are,
  in various completely different ways, flawed.
  
  **Successfully identifying a limitation marks the beginning of an
@@ -381,7 +381,7 @@ Remarkably, very little: the devil is in the details though.
    sequential carry-flag chaining of these scalar instructions.
  * The Condition Register Fields of the Power ISA make a great candidate
    for use as Predicate Masks, particularly when combined with
-  Vectorised `cmp` and Vectorised `crand`, `crxor` etc.
+  Vectorized `cmp` and Vectorized `crand`, `crxor` etc.
  
  It is only when looking slightly deeper into the Power ISA that
  certain things turn out to be missing, and this is down in part to IBM's
@@ -389,9 +389,9 @@ primary focus on the 750 Packed SIMD opcodes at the expense of the 250 or
  so Scalar ones.  Examples include that transfer operations between the
  Integer and Floating-point Scalar register files were dropped approximately
  a decade ago after the Packed SIMD variants were considered to be
-duplicates.  With it being completely inappropriate to attempt to Vectorise
+duplicates.  With it being completely inappropriate to attempt to Vectorize
  a Packed SIMD ISA designed 20 years ago with no Predication of any kind,
-the Scalar ISA, a much better all-round candidate for Vectorisation 
+the Scalar ISA, a much better all-round candidate for Vectorization 
  (the Scalar parts of Power ISA) is left anaemic.
  
  A particular key instruction that is missing is `MV.X` which is
@@ -399,7 +399,7 @@ illustrated as `GPR(dest) = GPR(GPR(src))`. This horrendously
  expensive instruction causing a huge swathe of Register Hazards
  in one single hit is almost never added to a Scalar ISA but
  is almost always added to a Vector one. When `MV.X` is
-Vectorised it allows for arbitrary
+Vectorized it allows for arbitrary
  remapping of elements within a Vector to positions specified
  by another Vector. A typical Scalar ISA will use Memory to
  achieve this task, but with Vector ISAs the Vector Register Files are
@@ -872,7 +872,7 @@ not that straightforward: programs
  have to be "massaged" by tools that insert intrinsics into the
  source code, in order to identify the Basic Blocks that the Zero-Overhead
  Loops can run. Can this be merged into standard gcc and llvm
-compilers? As intrinsics: of course. Can it become part of auto-vectorisation? Probably,
+compilers? As intrinsics: of course. Can it become part of auto-vectorization? Probably,
  if an infinite supply of money and engineering time is thrown at it.
  Is a half-way-house solution of compiler intrinsics good enough?
  Intel, ARM, MIPS, Power ISA and RISC-V have all already said "yes" on that,
@@ -912,7 +912,7 @@ definitely compelling enough to warrant in-depth investigation.
  <img src="/openpower/sv/sv_horizontal_vs_vertical.svg" />
  
  First, some important definitions, because there are two different
-Vectorisation Modes in SVP64:
+Vectorization Modes in SVP64:
  
  * **Horizontal-First**: (aka standard Cray Vectors) walk
    through **elements** first before moving to next **instruction**
@@ -943,7 +943,7 @@ traditional CPU in no way can help: only by loading the data through
  the L1-L4 Cache and Virtual Memory Barriers is it possible to
  ascertain, retrospectively, that time and power had just been wasted.
  
-SVP64 is able to do what is termed "Vertical-First" Vectorisation,
+SVP64 is able to do what is termed "Vertical-First" Vectorization,
  combined with SVREMAP Matrix Schedules.  Imagine that SVREMAP has been
  extended, Snitch-style, to perform a deterministic memory-array walk of
  a large Matrix.
@@ -977,7 +977,7 @@ L1/L2/L3 Caches only to find, at the CPU, that it is zero.
  
  The reason in this case for the use of Vertical-First Mode is the
  conditional execution of the Multiply-and-Accumulate.
-Horizontal-First Mode is the standard Cray-Style Vectorisation:
+Horizontal-First Mode is the standard Cray-Style Vectorization:
  loop on all *elements* with the same instruction before moving
  on to the next instruction. Horizontal-First
  Predication needs to be pre-calculated
diff --git a/openpower/sv/av_opcodes.mdwn b/openpower/sv/av_opcodes.mdwn

index 9614d0090532bd0329826cb1b05c0b4224708bfb..3894a698494173ee7680796ca7c3c14e05f76316 100644 (file)
--- a/openpower/sv/av_opcodes.mdwn
+++ b/openpower/sv/av_opcodes.mdwn
@@ -2,7 +2,7 @@
  
  # Scalar OpenPOWER Audio and Video Opcodes
  
-the fundamental principle of SV is a hardware for-loop. therefore the first (and in nearly 100% of cases only) place to put Vector operations is first and foremost in the *scalar* ISA.  However only by analysing those scalar opcodes *in* a SV Vectorisation context does it become clear why they are needed and how they may be designed.
+the fundamental principle of SV is a hardware for-loop. therefore the first (and in nearly 100% of cases only) place to put Vector operations is first and foremost in the *scalar* ISA.  However only by analysing those scalar opcodes *in* a SV Vectorization context does it become clear why they are needed and how they may be designed.
  
  This page therefore has accompanying discussion at <https://bugs.libre-soc.org/show_bug.cgi?id=230> for evolution of suitable opcodes.
  
diff --git a/openpower/sv/av_opcodes/analysis.mdwn b/openpower/sv/av_opcodes/analysis.mdwn

index a189c51409acdbc9e0bb8a5fe38338072d8c8940..a94628498cf65304f8e07e3ad28abdde1b692e77 100644 (file)
--- a/openpower/sv/av_opcodes/analysis.mdwn
+++ b/openpower/sv/av_opcodes/analysis.mdwn
@@ -2,7 +2,7 @@
  
  # Scalar OpenPOWER Audio and Video Opcodes
  
-the fundamental principle of SV is a hardware for-loop. therefore the first (and in nearly 100% of cases only) place to put Vector operations is first and foremost in the *scalar* ISA.  However only by analysing those scalar opcodes *in* a SV Vectorisation context does it become clear why they are needed and how they may be designed.
+the fundamental principle of SV is a hardware for-loop. therefore the first (and in nearly 100% of cases only) place to put Vector operations is first and foremost in the *scalar* ISA.  However only by analysing those scalar opcodes *in* a SV Vectorization context does it become clear why they are needed and how they may be designed.
  
  This page therefore has accompanying discussion at <https://bugs.libre-soc.org/show_bug.cgi?id=230> for evolution of suitable opcodes.
  
@@ -21,7 +21,7 @@ Links
  The fundamental principle for these instructions is:
  
  * identify the scalar primitive
-* assume that longer runs of scalars will have Simple-V vectorisatin applied
+* assume that longer runs of scalars will have Simple-V vectorizatin applied
  * assume that "swizzle" may be applied at the (vec2 - SUBVL=2) Vector level,
   (even if that involves a mv.swizxle which may be macro-op fused)
    in order to perform the necessary HI/LO selection normally hard-coded
diff --git a/openpower/sv/biginteger.mdwn b/openpower/sv/biginteger.mdwn

index c84971b82731f7fadf7de77a34abcbef480c43df..26b2534c08771ed538497728bcc26bba4676f16a 100644 (file)
--- a/openpower/sv/biginteger.mdwn
+++ b/openpower/sv/biginteger.mdwn
@@ -20,14 +20,14 @@ top of Scalar operations, where those scalar operations are useful in
  their own right without SVP64. Thus the operations here are proposed
  first as Scalar Extensions to the Power ISA.
  
-A secondary focus is that if Vectorised, implementors may choose
+A secondary focus is that if Vectorized, implementors may choose
  to deploy macro-op fusion targetting back-end 256-bit or greater
  Dynamic SIMD ALUs for maximum performance and effectiveness.
  
  # Analysis
  
  Covered in [[biginteger/analysis]] the summary is that standard `adde`
-is sufficient for SVP64 Vectorisation of big-integer addition (and `subfe`
+is sufficient for SVP64 Vectorization of big-integer addition (and `subfe`
  for subtraction) but that big-integer shift, multiply and divide require an
  extra 3-in 2-out instructions, similar to Intel's
  [shld](https://www.felixcloutier.com/x86/shld)
diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn

index 6075a6df4a0898170b9337ccc49941bab9acbbd4..f970ada96cec3994af8e17c879a23c816ee8a2d5 100644 (file)
--- a/openpower/sv/biginteger/analysis.mdwn
+++ b/openpower/sv/biginteger/analysis.mdwn
@@ -21,10 +21,10 @@ of existing Scalar Power ISA instructions is also explained.
  
  Use of smaller sub-operations is a given: worst-case in a Scalar
  context, addition is O(N) whilst multiply and divide are O(N^2),
-and their Vectorisation would reduce those (for small N) to
+and their Vectorization would reduce those (for small N) to
  O(1) and O(N).  Knuth's big-integer scalar algorithms provide
  useful real-world grounding into the types of operations needed,
-making it easy to demonstrate how they would be Vectorised.
+making it easy to demonstrate how they would be Vectorized.
  
  The basic principle behind Knuth's algorithms is to break the
  problem down into a single scalar op against a Vector operand.
@@ -45,7 +45,7 @@ Links
  # Vector Add and Subtract
  
  Surprisingly, no new additional instructions are required to perform
-a straightforward big-integer add or subtract.  Vectorised `adde`
+a straightforward big-integer add or subtract.  Vectorized `adde`
  or `addex` is perfectly sufficient to produce arbitrary-length
  big-integer add due to the rules set in SVP64 that all Vector Operations
  are directly equivalent to the strict Program Order Execution of
@@ -69,7 +69,7 @@ with incrementing register numbers - is precisely the very definition
  of how SVP64 works!
  Thus, due to sequential execution of `adde` both consuming and producing
  a CA Flag, with no additions to SVP64 or to the v3.0 Power ISA,
-`sv.adde` is in effect an alias for Big-Integer Vectorised add.  As such,
+`sv.adde` is in effect an alias for Big-Integer Vectorized add.  As such,
  implementors are entirely at liberty to recognise Horizontal-First Vector
  adds and send the vector of registers to a much larger and wider back-end
  ALU, and short-cut the intermediate storage of XER.CA on an element
@@ -107,7 +107,7 @@ Cray-style Vector Loop:
        bnz loop            # do more digits
  
  This is not that different from a Scalar Big-Int add, it is
-just that like all Cray-style Vectorisation, a variable number
+just that like all Cray-style Vectorization, a variable number
  of elements are covered by one instruction.  Of interest
  to people unfamiliar with Cray-style Vectors: if VL is not
  permitted to exceed 1 (because MAXVL is set to 1) then the above
@@ -119,7 +119,7 @@ Like add and subtract, strictly speaking these need no new instructions.
  Keeping the shift amount within the range of the element (64 bit)
  a Vector bit-shift may be synthesised from a pair of shift operations
  and an OR, all of which are standard Scalar Power ISA instructions
-that when Vectorised are exactly what is needed.
+that when Vectorized are exactly what is needed.
  
  ```
  void bigrsh(unsigned s, uint64_t r[], uint64_t un[], int n) {
@@ -286,7 +286,7 @@ setting an additional CA flag. We first cover the chain of
      RT2, RC2 = RA2 * RB2 + RC1
  
  Following up to add each partially-computed row to what will become
-the final result is achieved with a Vectorised big-int
+the final result is achieved with a Vectorized big-int
  `sv.adde`. Thus, the key inner loop of
  Knuth's Algorithm M may be achieved in four instructions, two of
  which are scalar initialisation:
@@ -385,13 +385,13 @@ this time using subtract instead of add.
          bool need_fixup = !ca; // for phase 3 correction
  ```
  
-In essence then the primary focus of Vectorised Big-Int divide is in
+In essence then the primary focus of Vectorized Big-Int divide is in
  fact big-integer multiply
  
  Detection of the fixup (phase 3) is determined by the Carry (borrow)
  bit at the end. Logically: if borrow was required then the qhat estimate
  was too large and the correction is required, which is, again,
-nothing more than a Vectorised big-integer add (one instruction).
+nothing more than a Vectorized big-integer add (one instruction).
  However this is not the full story
  
  **128/64-bit divisor**
@@ -438,7 +438,7 @@ implemented bit-wise, with all that implies.
  
  The irony is, therefore, that attempting to
  improve big-integer divide by moving to 64-bit digits in order to take
-advantage of the efficiency of 64-bit scalar multiply when Vectorised
+advantage of the efficiency of 64-bit scalar multiply when Vectorized
  would instead
  lock up CPU time performing a 128/64 scalar division.  With the Vector
  Multiply operations being critically dependent on that `qhat` estimate, and
diff --git a/openpower/sv/bitmanip.mdwn b/openpower/sv/bitmanip.mdwn

index 55574820979358cfccd0e4f6b858d91c8beb15b0..82155aad27b4a5d18410847fe0cc35c46e0bd225 100644 (file)
--- a/openpower/sv/bitmanip.mdwn
+++ b/openpower/sv/bitmanip.mdwn
@@ -19,8 +19,8 @@ pseudocode: [[openpower/isa/bitmanip]]
  this extension amalgamates bitmanipulation primitives from many sources,
  including RISC-V bitmanip, Packed SIMD, AVX-512 and OpenPOWER VSX.
  Also included are DSP/Multimedia operations suitable for Audio/Video.
-Vectorisation and SIMD are removed: these are straight scalar (element)
-operations making them suitable for embedded applications.  Vectorisation
+Vectorization and SIMD are removed: these are straight scalar (element)
+operations making them suitable for embedded applications.  Vectorization
  Context is provided by [[openpower/sv]].
  
  When combined with SV, scalar variants of bitmanip operations found in
@@ -109,7 +109,7 @@ For bincrlut, `BFA` selects the 4-bit CR Field as the LUT2:
      for i in range(64): 
          RT[i] = lut2(CRs{BFA}, RB[i], RA[i]) 
  
-When Vectorised with SVP64, as usual both source and destination may be
+When Vectorized with SVP64, as usual both source and destination may be
  Vector or Scalar.
  
  *Programmer's note: a dynamic ternary lookup may be synthesised from
@@ -159,7 +159,7 @@ CRB-Form:
          a,b = CRs[BF][i], CRs[BF][i])
          if msk[i] CRs[BF][i] = lut2(CRs[BFB], a, b)
  
-When SVP64 Vectorised any of the 4 operands may be Scalar or
+When SVP64 Vectorized any of the 4 operands may be Scalar or
  Vector, including `BFB` meaning that multiple different dynamic
  lookups may be performed with a single instruction.  Note that
  this instruction is deliberately an overwrite in order to reduce
diff --git a/openpower/sv/branches.mdwn b/openpower/sv/branches.mdwn

index 789ec1d6b348db010dc208865dd973f89ea2d17d..7aea7936fcff5fb7709bae9459255c593185af4d 100644 (file)
--- a/openpower/sv/branches.mdwn
+++ b/openpower/sv/branches.mdwn
@@ -4,7 +4,7 @@ Please note: although similar, SVP64 Branch instructions should be
  considered completely separate and distinct from standard scalar
  OpenPOWER-approved v3.0B branches.  **v3.0B branches are in no way
  impacted, altered, changed or modified in any way, shape or form by the
-SVP64 Vectorised Variants**.
+SVP64 Vectorized Variants**.
  
  It is also extremely important to note that Branches are the sole
  pseudo-exception in SVP64 to `Scalar Identity Behaviour`.  SVP64 Branches
@@ -45,7 +45,7 @@ are false.
  
  Unless Branches are aware and capable of such analysis, additional
  instructions would be required which perform Horizontal Cumulative
-analysis of Vectorised Condition Register Fields, in order to reduce
+analysis of Vectorized Condition Register Fields, in order to reduce
  the Vector of CR Fields down to one single yes or no decision that a
  Scalar-only v3.0B Branch-Conditional could cope with.  Such instructions
  would be unavoidable, required, and costly by comparison to a single
@@ -55,7 +55,7 @@ high priority for 3D GPU (and OpenCL-style) workloads.
  
  Given that Power ISA v3.0B is already quite powerful, particularly
  the Condition Registers and their interaction with Branches, there are
-opportunities to create extremely flexible and compact Vectorised Branch
+opportunities to create extremely flexible and compact Vectorized Branch
  behaviour.  In addition, the side-effects (updating of CTR, truncation
  of VL, described below) make it a useful instruction even if the branch
  points to the next instruction (no actual branch).
@@ -74,7 +74,7 @@ which just leaves two modes:
    a Great Big AND of all condition tests.  Exit occurs
    on the first **failed** test.
  
-Early-exit is enacted such that the Vectorised Branch does not
+Early-exit is enacted such that the Vectorized Branch does not
  perform needless extra tests, which will help reduce reads on
  the Condition Register file.
  
@@ -122,12 +122,12 @@ than testing only against zero, the option to test against one is also
  prudent. This introduces a new immediate field, `SNZ`, which works in
  conjunction with `sz`.
  
-Vectorised Branches can be used in either SVP64 Horizontal-First or
+Vectorized Branches can be used in either SVP64 Horizontal-First or
  Vertical-First Mode. Essentially, at an element level, the behaviour
  is identical in both Modes, although the `ALL` bit is meaningless in
  Vertical-First Mode.
  
-It is also important to bear in mind that, fundamentally, Vectorised
+It is also important to bear in mind that, fundamentally, Vectorized
  Branch-Conditional is still extremely close to the Scalar v3.0B
  Branch-Conditional instructions, and that the same v3.0B Scalar
  Branch-Conditional instructions are still *completely separate and
@@ -206,7 +206,7 @@ Big OR of all Condition Tests) and `VL=0` the Branch is guaranteed not
  to occur because there will be no *successful* Condition Tests to make
  it happen.
  
-## Vectorised CR Field numbering, and Scalar behaviour
+## Vectorized CR Field numbering, and Scalar behaviour
  
  It is important to keep in mind that just like all SVP64 instructions,
  the `BI` field of the base v3.0B Branch Conditional instruction may be
@@ -301,7 +301,7 @@ is applied to the Branch because in `ALL` mode all nonmasked bits have
  to be tested, and when `sz=0` skipping occurs.  Even when VLSET mode is
  not used, CTR may still be decremented by the total number of nonmasked
  elements, acting in effect as either a popcount or cntlz depending
-on which mode bits are set.  In short, Vectorised Branch becomes an
+on which mode bits are set.  In short, Vectorized Branch becomes an
  extremely powerful tool.
  
  **Micro-Architectural Implementation Note**: *when implemented on top
@@ -532,7 +532,7 @@ why CTR is decremented (CTRtest Mode) and whether LR is updated (which
  is unconditional in v3.0B when LK=1, and conditional in SVP64 when LRu=1).
  
  Inline comments highlight the fact that the Scalar Branch behaviour and
-pseudocode is still clearly visible and embedded within the Vectorised
+pseudocode is still clearly visible and embedded within the Vectorized
  variant:
  
  ```
@@ -640,7 +640,7 @@ Pseudocode for Vertical-First Mode:
      CRbits = CR{SVCRf}
      # select predicate bit or zero/one
      if predicate[srcstep]:
-        if BRc = 1 then # CR0 vectorised
+        if BRc = 1 then # CR0 vectorized
              CR{SVCRf+srcstep} = CRbits
          testbit = CRbits[BI & 0b11]
      else if not SVRMmode.sz:
diff --git a/openpower/sv/comparison_table.mdwn b/openpower/sv/comparison_table.mdwn

index 029638f50cf2a2039181d3d48f94567bc66460f4..15a73c836f2ec99d5bd72e5d1c4333315b62fa9d 100644 (file)
--- a/openpower/sv/comparison_table.mdwn
+++ b/openpower/sv/comparison_table.mdwn
@@ -16,8 +16,8 @@
  [^3]: A 2-Dimensional Scalable Vector ISA **specifically designed for the Power ISA** with both Horizontal-First and Vertical-First Modes. See [[sv/vector_isa_comparison]]
  [^4]: on specific operations.  See [[opcode_regs_deduped]] for full list. Key: 2P - Twin Predication, 1P - Single-Predicate
  [^5]: SVP64 provides a Vector concept on top of the **Scalar** GPR, FPR and CR Fields, extended to 128 entries.
-[^6]: SVP64 Vectorises Scalar ops. It is up to the **implementor** to choose (**optionally**) whether to apply SVP64 to e.g. VSX Quad-Precision (128-bit) instructions, to create 128-bit Vector ops.
-[^7]: big-integer add is just `sv.adde`. For optimal performance Bigint Mul and divide first require addition of two scalar operations (in turn, naturally Vectorised by SVP64). See [[sv/biginteger/analysis]]
+[^6]: SVP64 Vectorizes Scalar ops. It is up to the **implementor** to choose (**optionally**) whether to apply SVP64 to e.g. VSX Quad-Precision (128-bit) instructions, to create 128-bit Vector ops.
+[^7]: big-integer add is just `sv.adde`. For optimal performance Bigint Mul and divide first require addition of two scalar operations (in turn, naturally Vectorized by SVP64). See [[sv/biginteger/analysis]]
  [^8]: LD/ST Fault-First: see [[sv/svp64/appendix]] and [ARM SVE Fault-First](https://alastairreid.github.io/papers/sve-ieee-micro-2017.pdf)
  [^9]: Data-dependent Fail-First: Based on LD/ST Fail-first, extended to data. Truncates VL based on failing Rc=1 test. Similar to Z80 CPIR. See [[sv/svp64/appendix]]
  [^10]: Predicate-result effectively turns any standard op into a type of "cmp". See [[sv/svp64/appendix]]
@@ -51,6 +51,6 @@
      which are power-2 based on Silicon-partner SIMD width. Non-power-2 not supported but [zero-input masking](https://www.realworldtech.com/forum/?threadid=202688&curpostid=207774) is.
  [^x4]: [Advanced matrix Extensions](https://en.wikipedia.org/wiki/Advanced_Matrix_Extensions) supports BF16 and INT8 only. Separate regfile, power-of-two "tiles". Not general-purpose at all.
  [^b1]: Although registers may be 128-bit in NEON, SVE2, and AVX, unlike VSX there are very few (or no) actual arithmetic 128-bit operations. Only RVV and SVP64 have the possibility of 128-bit ops 
-[^m1]: Mitch Alsup's MyISA 66000 is available on request. A powerful RISC ISA with a **Hardware-level auto-vectorisation** LOOP built-in as an extension named VVM.  Classified as "Vertical-First".
+[^m1]: Mitch Alsup's MyISA 66000 is available on request. A powerful RISC ISA with a **Hardware-level auto-vectorization** LOOP built-in as an extension named VVM.  Classified as "Vertical-First".
  [^m2]: MyISA 66000 has a CARRY register up to 64-bit. Repeated application of FMA (esp. within Auto-Vectored LOOPS) automatically and inherently creates big-int operations with zero effort.
  [^nc]: "Silicon-Partner" Scaling achieved through allowing same instruction to act on different regfile size and bitwidth. This catastrophically results in binary non-interoperability.
diff --git a/openpower/sv/cookbook/chacha20.mdwn b/openpower/sv/cookbook/chacha20.mdwn

index 34dcae479e0de549bbd54f4f852b70d1e6fd6b22..d578c98b2559b6e871abdcaa0049a7fc29a53861 100644 (file)
--- a/openpower/sv/cookbook/chacha20.mdwn
+++ b/openpower/sv/cookbook/chacha20.mdwn
@@ -11,7 +11,7 @@ need to be called again.
  Firstly, we analyse the xchacha20 algorithm, showing what operations
  are performed and in what order.  Secondly, two innovative features
  of SVP64 are described which are crucial to understanding of Simple-V
-Vectorisation: Vertical-First Mode and Indexed REMAP.  Then we show
+Vectorization: Vertical-First Mode and Indexed REMAP.  Then we show
  how Index REMAP eliminates the need entirely for inline-loop-unrolling,
  but note that in this particular algorithm REMAP is only useful for
  us in Vertical-First Mode.
diff --git a/openpower/sv/cr_int_predication.mdwn b/openpower/sv/cr_int_predication.mdwn

index f2c510202b8050738a9deeb594cbc844035da781..8476b30c4504883d9ef26c8a556ded9f8fbdc086 100644 (file)
--- a/openpower/sv/cr_int_predication.mdwn
+++ b/openpower/sv/cr_int_predication.mdwn
@@ -239,10 +239,10 @@ on the CR Field containing `BT`.
  
  \newpage{}
  
-# Vectorised versions involving GPRs
+# Vectorized versions involving GPRs
  
  The name "weird" refers to a minor violation of SV rules when it comes
-to deriving the Vectorised versions of these instructions.
+to deriving the Vectorized versions of these instructions.
  
  Normally the progression of the SV for-loop would move on to the
  next register.  Instead however in the scalar case these instructions
diff --git a/openpower/sv/cr_ops.mdwn b/openpower/sv/cr_ops.mdwn

index bcd5a3be1bac9d82b932d040e6f2ec4552a13c00..ddb54d7873bdb85b3b26c86d4677eea13b2c0525 100644 (file)
--- a/openpower/sv/cr_ops.mdwn
+++ b/openpower/sv/cr_ops.mdwn
@@ -17,8 +17,8 @@ Condition Register Fields are only 4 bits wide: this presents some
  interesting conceptual challenges for SVP64, which was designed
  primarily for vectors of arithmetic and logical operations. However
  if predicates may be bits of CR Fields it makes sense to extend
-Simple-V to cover CR Operations, especially given that Vectorised Rc=1
-may be processed by Vectorised CR Operations that usefully in turn
+Simple-V to cover CR Operations, especially given that Vectorized Rc=1
+may be processed by Vectorized CR Operations that usefully in turn
  may become Predicate Masks to yet more Vector operations, like so:
  
  ```
@@ -44,7 +44,7 @@ considered to be a "co-result". Such CR Field "co-result" arithmeric
  operations are firmly out of scope for this section, being covered fully
  by [[sv/normal]].
  
-* Examples of Vectoriseable Defined Words to which this section does
+* Examples of Vectorizeable Defined Words to which this section does
    apply is
    - `mfcr` and `cmpi` (3 bit operands) and
    - `crnor` and `crand` (5 bit operands).
@@ -107,7 +107,7 @@ Register Field generated from Rc=1 is used as the basis for the truncation
  decision.  However with CR-based operations that CR Field result to be
  tested is provided *by the operation itself*.
  
-Data-dependent SVP64 Vectorised Operations involving the creation
+Data-dependent SVP64 Vectorized Operations involving the creation
  or modification of a CR can require an extra two bits, which are not
  available in the compact space of the SVP64 RM `MODE` Field. With the
  concept of element width overrides being meaningless for CR Fields it
@@ -267,13 +267,13 @@ Adding entirely new pipelines and a new Vector CR Register file
  is a much easier proposition to consider.
  
  The prohibitions utilise the CR Field numbers implicitly to
-split out Vectorised CR operations to be considered completely
+split out Vectorized CR operations to be considered completely
  separare and distinct from Scalar CR operations *even though
  they both use the same binary encoding*.  This does in turn
  mean that at the Decode Phase it becomes necessary to examine
  not only the operation (`sv.crand`, `sv.cmp`) but also
  the CR Field numbers as well as whether, in the EXTRA2/3 Mode
-bits, the operands are Vectorised.
+bits, the operands are Vectorized.
  
  A future version of Power ISA, where SVP64Single is proposed,
  would in fact introduce "Conditional Execution", including
diff --git a/openpower/sv/implementation.mdwn b/openpower/sv/implementation.mdwn

index f55d5c58a0bdc14923cf5678202990b65383d174..8264127dff5f6860847ff6c81a227bfc486b71ad 100644 (file)
--- a/openpower/sv/implementation.mdwn
+++ b/openpower/sv/implementation.mdwn
@@ -183,7 +183,7 @@ unit tests:
  * Condition Registers.  see note below
  * FPR (if present)
  
-When Rc=1 is encountered in an SVP64 Context the destination is different (TODO) i.e. not CR0 or CR1.  Implicit Rc=1 Condition Registers are still Vectorised but do **not** have EXTRA2/3 spec adjustments.  The only part if the EXTRA2/3 spec that is observed and respected is whether the CR is Vectorised (isvec).
+When Rc=1 is encountered in an SVP64 Context the destination is different (TODO) i.e. not CR0 or CR1.  Implicit Rc=1 Condition Registers are still Vectorized but do **not** have EXTRA2/3 spec adjustments.  The only part if the EXTRA2/3 spec that is observed and respected is whether the CR is Vectorized (isvec).
  
  ## Increasing register file sizes
  
@@ -251,10 +251,10 @@ TODO
  * <https://libre-soc.org/openpower/sv/propagation/>
  * <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/svp64.py;hb=HEAD>
  
-## Vectorised Branches
+## Vectorized Branches
  
  TODO [[sv/branches]]
  
-## Vectorised LD/ST
+## Vectorized LD/ST
  
  TODO [[sv/ldst]]
diff --git a/openpower/sv/int_fp_mv/appendix.mdwn b/openpower/sv/int_fp_mv/appendix.mdwn

index 02a002e84f767fbd72bcaeadbd737f6868412ba3..b94fadaa2d0025a318774c3a8f018b3aa4973092 100644 (file)
--- a/openpower/sv/int_fp_mv/appendix.mdwn
+++ b/openpower/sv/int_fp_mv/appendix.mdwn
@@ -2,7 +2,7 @@
  
  # SVP64 polymorphic elwidth overrides
  
-SimpleV, the Draft Cray-style Vectorisation for OpenPOWER, may
+SimpleV, the Draft Cray-style Vectorization for OpenPOWER, may
  independently override both or either of the source or destination
  register bitwidth in the base operation used to create the Vector
  operation.  In the case of IEEE754 FP operands this gives an
diff --git a/openpower/sv/ldst.mdwn b/openpower/sv/ldst.mdwn

index 98a194239600f660bb57d1fbac8a1dafaf502715..cbebc0032534c8d98adabc868d70ea389131f30f 100644 (file)
--- a/openpower/sv/ldst.mdwn
+++ b/openpower/sv/ldst.mdwn
@@ -33,7 +33,7 @@ Vector Operations, then in order to keep the ALUs 100% occupied the
  Memory infrastructure (and the ISA itself) correspondingly needs Vector
  Memory Operations as well.
  
-Vectorised Load and Store also presents an extra dimension (literally)
+Vectorized Load and Store also presents an extra dimension (literally)
  which creates scenarios unique to Vector applications, that a Scalar (and
  even a SIMD) ISA simply never encounters: not even the complex Addressing
  Modes of the 68,000 or S/360 resemble Vector Load/Store.
@@ -46,7 +46,7 @@ instructions)
  
  ## Modes overview
  
-Vectorisation of Load and Store requires creation, from scalar operations,
+Vectorization of Load and Store requires creation, from scalar operations,
  a number of different modes:
  
  * **fixed aka "unit" stride** - contiguous sequence with no gaps
@@ -192,17 +192,17 @@ Vector Indexed Strided Mode is qualified as follows:
          svctx.ldstmode = elementstride
  ```
  
-A summary of the effect of Vectorisation of src or dest:
+A summary of the effect of Vectorization of src or dest:
  
  ```
      imm(RA)  RT.v   RA.v   no stride allowed
      imm(RA)  RT.s   RA.v   no stride allowed
      imm(RA)  RT.v   RA.s   stride-select allowed
-    imm(RA)  RT.s   RA.s   not vectorised
+    imm(RA)  RT.s   RA.s   not vectorized
      RA,RB    RT.v  {RA|RB}.v Standard Indexed
      RA,RB    RT.s  {RA|RB}.v Indexed but single LD (no VSPLAT)
      RA,RB    RT.v  {RA&RB}.s VSPLAT possible. stride selectable
-    RA,RB    RT.s  {RA&RB}.s not vectorised (scalar identity)
+    RA,RB    RT.s  {RA&RB}.s not vectorized (scalar identity)
  ```
  
  Signed Effective Address computation is only relevant for Vector Indexed
@@ -229,7 +229,7 @@ peripheral reads, stopping at the first NULL-terminated character and
  truncating VL to that point. No branch is needed to issue that large
  burst of LDs, which may be valuable in Embedded scenarios.
  
-## Vectorisation of Scalar Power ISA v3.0B
+## Vectorization of Scalar Power ISA v3.0B
  
  Scalar Power ISA Load/Store operations may be seen from [[isa/fixedload]]
  and [[isa/fixedstore]] pseudocode to be of the form:
@@ -365,7 +365,7 @@ contexts, potentially causing confusion.
    named LD/ST Indexed**.
  
  Whilst it may be costly in terms of register reads to allow REMAP Indexed
-Mode to be applied to any Vectorised LD/ST Indexed operation such as
+Mode to be applied to any Vectorized LD/ST Indexed operation such as
  `sv.ld *RT,RA,*RB`, or even misleadingly labelled  as redundant, firstly
  the strict application of the RISC Paradigm that Simple-V follows makes
  it awkward to consider *preventing* the application of Indexed REMAP to
@@ -407,7 +407,7 @@ the LD/ST that would otherwise have caused an exception is *required*
  to be cancelled. Additionally an implementor may choose to truncate VL
  for any arbitrary reason *except for the very first*.
  
-ffirst LD/ST to multiple pages via a Vectorised Index base is
+ffirst LD/ST to multiple pages via a Vectorized Index base is
  considered a security risk due to the abuse of probing multiple
  pages in rapid succession and getting speculative feedback on which
  pages would fail.  Therefore Vector Indexed LD/ST is prohibited
@@ -555,7 +555,7 @@ Load-Reservation and Store-Conditional are required to be executed
  in pairs.
  
  By contrast, in Vertical-First Mode it is in fact possible to issue
-the pairs, and consequently allowing Vectorised Data-Dependent Fail-First is
+the pairs, and consequently allowing Vectorized Data-Dependent Fail-First is
  useful.
  
  Programmer's note: Care should be taken when VL is truncated in
@@ -565,7 +565,7 @@ Vertical-First Mode.
  
  Although Rc=1 on LD/ST is a rare occurrence at present, future versions
  of Power ISA *might* conceivably have Rc=1 LD/ST Scalar instructions, and
-with the SVP64 Vectorisation Prefixing being itself a RISC-paradigm that
+with the SVP64 Vectorization Prefixing being itself a RISC-paradigm that
  is itself fully-independent of the Scalar Suffix Defined Words, prohibiting
  the possibility of Rc=1 Data-Dependent Mode on future potential LD/ST
  operations is not strategically sound.
@@ -732,7 +732,7 @@ NEON covers this as shown in the diagram below:
  
  REMAP easily covers this capability, and with dest elwidth overrides
  and saturation may do so with built-in conversion that would normally
-require additional width-extension, sign-extension and min/max Vectorised
+require additional width-extension, sign-extension and min/max Vectorized
  instructions as post-processing stages.
  
  Thus we do not need to provide specialist LD/ST "Structure Packed" opcodes
diff --git a/openpower/sv/ldst/discussion.mdwn b/openpower/sv/ldst/discussion.mdwn

index a6a113e3df6837c32e0a84c75661c56626ca5756..b9522023444e77d94f1cbc12a5a5c55c4c34a5b8 100644 (file)
--- a/openpower/sv/ldst/discussion.mdwn
+++ b/openpower/sv/ldst/discussion.mdwn
@@ -2,7 +2,7 @@
  
  this section covers assembly notation for the immediate and indexed LD/ST.
  the summary is that in immediate mode for LD it is not clear that if the 
-destination register is Vectorised `RT.v` but the source `imm(RA)` is scalar
+destination register is Vectorized `RT.v` but the source `imm(RA)` is scalar
  the memory being read is *still a vector load*, known as "unit or element strides".
  
  This anomaly is made clear with the following notation:
@@ -39,7 +39,7 @@ permutations of vector selection, to identify above asm-syntax:
           sv.ld/els r#.v, ofst(r#2).v -> vector at ofst*elidx+r#2
             mem@r#2  +0 ...   +offs ...  +offs*2
             destreg  r#       r#+1       r#+2
-     imm(RA)  RT.s   RA.s   not vectorised
+     imm(RA)  RT.s   RA.s   not vectorized
           sv.ld r#, ofst(r#2)
  
  indexed mode:
@@ -55,4 +55,4 @@ indexed mode:
       RA,RB    RT.s  RA.v  RB.v
       RA,RB    RT.s  RA.s  RB.v
       RA,RB    RT.s  RA.v  RB.s
-     RA,RB    RT.s  RA.s  RB.s not vectorised
+     RA,RB    RT.s  RA.s  RB.s not vectorized
diff --git a/openpower/sv/ls010/hypothetical_addi.mdwn b/openpower/sv/ls010/hypothetical_addi.mdwn

index 429f899d1111be8450fec5d42453578e11f2ee71..66e52a4588ec24d9f905937df3c5b95826fe4335 100644 (file)
--- a/openpower/sv/ls010/hypothetical_addi.mdwn
+++ b/openpower/sv/ls010/hypothetical_addi.mdwn
@@ -18,8 +18,8 @@ extension to `addi` / `paddi` (Public v3.1 I 3.3.9 p76).
  * Thirdly, just because of the PO9-Prefix it is prohibited to
    put an entirely different instruction into the Suffix position.
    If `{PO14}` as a 32-bit instruction is defined as "addi", then
-  it is **required** that `{PO9}-{PO14}` **be** a Vectorised "addi",
-  **not** a Vectorised multiply.
+  it is **required** that `{PO9}-{PO14}` **be** a Vectorized "addi",
+  **not** a Vectorized multiply.
  * Fourthly, where PO1-Prefixing of operand fields (often resulting
    in "split field" redefinitions such as `si0||si1`) is an arbitrary
    manually-hand-crafted procedure,
diff --git a/openpower/sv/microcontroller_power_isa_for_ai.mdwn b/openpower/sv/microcontroller_power_isa_for_ai.mdwn

index 0b09bff420b11a56756b6e656aaa5f5ce010a754..c174be2f02cafe9120be89ae5418902e1a7d344c 100644 (file)
--- a/openpower/sv/microcontroller_power_isa_for_ai.mdwn
+++ b/openpower/sv/microcontroller_power_isa_for_ai.mdwn
@@ -61,7 +61,7 @@ let that sink in a moment because the implications are startling:
        [and anticipate someone in the future to
        define a 128-bit variant to match RISC-V RV128].
  
-bear in mind that SVP64 *has* to have Scalar Operations first, because by design and by definition *only Scalar operations may be Vectorised*.  SVP64 *DOES NOT* add *ANY* Vector Instructions. SVP64 is a generic loop around *Scalar* operations and it us up to the Architecture to take advantage of that, at the back-end.
+bear in mind that SVP64 *has* to have Scalar Operations first, because by design and by definition *only Scalar operations may be Vectorized*.  SVP64 *DOES NOT* add *ANY* Vector Instructions. SVP64 is a generic loop around *Scalar* operations and it us up to the Architecture to take advantage of that, at the back-end.
  
  without SVP64 Sub-Looping it would on the face of it seem absolutely mental and a total waste of time and resources to define an 8 or 16 bit General-Purpose ISA in the year 2022 until you recall that:
  
@@ -88,7 +88,7 @@ anyone who has tried either CUDA, 3D Shader programs, deep or wide SIMD Programm
  
  (in particular, anyone who remembers how hard programming the Cell Processor turned out to be will be having that familiar "lightbulb moment" right about now)
  
-more than that: what if those 8 and 16 bit cores had a Supercomputing-class Vectorisation option in the ISA, and there were implementations out there with back-end ALUs that could perform 64 or 128 8 or 16 bit operations per clock cycle?
+more than that: what if those 8 and 16 bit cores had a Supercomputing-class Vectorization option in the ISA, and there were implementations out there with back-end ALUs that could perform 64 or 128 8 or 16 bit operations per clock cycle?
  
  Quantity several thousand per processor, all of them capable of adapting to run massive AI number crunching or (at lower IPC than "normal" processors) general-purpose compute?
  
diff --git a/openpower/sv/mv.swizzle.mdwn b/openpower/sv/mv.swizzle.mdwn

index 2d413b6a37659f9654fb5152a015c485276ae7b9..3dc801822c42e3b110e0273c0042f78570256c30 100644 (file)
--- a/openpower/sv/mv.swizzle.mdwn
+++ b/openpower/sv/mv.swizzle.mdwn
@@ -28,7 +28,7 @@ contiguous array of vec3 (XYZ) may only have 2 elements (ZY)
  swizzle-copied to
  a contiguous array of vec2.  A contiguous array of vec2 sources
  may have multiple of each vec2 elements (XY) copied to a contiguous
-vec4 array (YYXX or XYXX). For this reason, *when Vectorised*
+vec4 array (YYXX or XYXX). For this reason, *when Vectorized*
  Swizzle Moves support independent subvector lengths for both
  source and destination.
  
@@ -106,7 +106,7 @@ be an eye-popping 8 64-bit operands: 4-in, 4-out. As part of a Scalar
  ISA this not practical. A compromise is to cut the registers required
  by half, placing it on-par with `lq`, `stq` and Indexed
  Load-with-update instructions.
-When part of the Scalar Power ISA (not SVP64 Vectorised)
+When part of the Scalar Power ISA (not SVP64 Vectorized)
  mv.swiz and fmv.swiz operate on four 32-bit
  quantities, reducing this instruction to a feasible
  2-in, 2-out pairs of 64-bit registers:
@@ -131,15 +131,15 @@ Also, making life easier, RT and RA are only permitted to be even
  as in `lq` and `stq`.  Scalar Swizzle instructions must be atomically
  indivisible: an Exception or Interrupt may not occur during the Moves.
  
-Note that unlike the Vectorised variant, when `RT=RA` the Scalar variant
+Note that unlike the Vectorized variant, when `RT=RA` the Scalar variant
  *must* buffer (read) both 64-bit RA registers before writing to the
  RT pair (in an Out-of-Order Micro-architecture, both of the register
  pair must be "in-flight").
  This ensures that register file corruption does not occur.
  
-**SVP64 Vectorised**
+**SVP64 Vectorized**
  
-Vectorised Swizzle may be considered to 
+Vectorized Swizzle may be considered to 
  contain an extended static predicate
  mask for subvectors (SUBVL=2/3/4). Due to the skipping caused by
  the static predication capability, the destination
@@ -147,7 +147,7 @@ subvector length can be *different* from the source subvector
  length, and consequently the destination subvector length is
  encoded into the Swizzle.
  
-When Vectorised, given the use-case is for a High-performance GPU,
+When Vectorized, given the use-case is for a High-performance GPU,
  the fundamental assumption is that Micro-coding or
  other technique will
  be deployed in hardware to issue multiple Scalar MV operations and
@@ -159,7 +159,7 @@ quantities as the default is lifted on `sv.mv.swiz`.
  Additionally, in order to make life easier for implementers, some of
  whom may wish, especially for Embedded GPUs, to use multi-cycle Micro-coding,
  the usual strict Element-level Program Order is relaxed.
-An overlap between all and any Vectorised
+An overlap between all and any Vectorized
  sources and destination Elements for the entirety of
  the Vector Loop `0..VL-1` is `UNDEFINED` behaviour.
  
@@ -236,7 +236,7 @@ than operating on the entire vec2/3/4 together would
  violate that expectation.  The exceptions to this, explained
  later, are when Pack/Unpack is enabled.
  
-**Effect of Saturation on Vectorised Swizzle**
+**Effect of Saturation on Vectorized Swizzle**
  
  A useful convenience for pixel data is to be able to insert values
  0x7f or 0xff as magic constants for arbitrary R,G,B or A. Therefore,
@@ -250,7 +250,7 @@ zero because there is no encoding space to select between -1, 0 and 1, and
  
  # Pack/Unpack Mode:
  
-It is possible to apply Pack and Unpack to Vectorised 
+It is possible to apply Pack and Unpack to Vectorized 
  swizzle moves. The interaction requires specific explanation
  because it involves the separate SUBVLs (with destination SUBVL
  being separate). Key to understanding is that the 
diff --git a/openpower/sv/mv.vec.mdwn b/openpower/sv/mv.vec.mdwn

index 9190fcdc7dc81f06a23b9c44da96422fd105d825..1f78f939f1d9830f80ab7465f5fdcf3987a171d3 100644 (file)
--- a/openpower/sv/mv.vec.mdwn
+++ b/openpower/sv/mv.vec.mdwn
@@ -6,7 +6,7 @@ In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack
  also exist.
  
  In SVP64, Pack and Unpack are achieved *in the abstract* for application on *all*
-Vectoriseable instructions.
+Vectorizeable instructions.
  
  * See <https://bugs.libre-soc.org/show_bug.cgi?id=230#c30>
  * <https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-June/004911.html>
diff --git a/openpower/sv/normal.mdwn b/openpower/sv/normal.mdwn

index e104522dab60c2dbbec628d2a11b380fd3aa4658..f2ead2270bc1d6f0a9618ff69953db090e9a1eda 100644 (file)
--- a/openpower/sv/normal.mdwn
+++ b/openpower/sv/normal.mdwn
@@ -22,7 +22,7 @@ others are Vector-based (mapreduce, fail-on-first).
  [[sv/ldst]], [[sv/cr_ops]] and [[sv/branches]] are covered separately:
  the following Modes apply to Arithmetic and Logical SVP64 operations:
  
-* **simple** mode is straight vectorisation. No augmentations: the
+* **simple** mode is straight vectorization. No augmentations: the
    vector comprises an array of independently created results.
  * **ffirst** or data-dependent fail-on-first: see separate section.
    The vector may be truncated depending on certain criteria.
@@ -102,11 +102,11 @@ XER.SO is **ignored** completely and is **not** brought into play here.
  The CR overflow bit is therefore simply set to zero if saturation did
  not occur, and to one if it did.  This behaviour (ignoring XER.SO) is
  actually optional in the SFFS Compliancy Subset: for SVP64 it is made
-mandatory *but only on Vectorised instructions*.
+mandatory *but only on Vectorized instructions*.
  
  Note also that saturate on operations that set OE=1 must raise an Illegal
  Instruction due to the conflicting use of the CR.so bit for storing
-if saturation occurred. Vectorised Integer Operations that produce a
+if saturation occurred. Vectorized Integer Operations that produce a
  Carry-Out (CA, CA32): these two bits will be `UNDEFINED` if saturation
  is also requested.
  
@@ -209,7 +209,7 @@ to include the terminating zero.
  
  In CR-based data-driven fail-on-first there is only the option to select
  and test one bit of each CR (just as with branch BO).  For more complex
-tests this may be insufficient.  If that is the case, a vectorised crop
+tests this may be insufficient.  If that is the case, a vectorized crop
  such as crand, cror or [[sv/cr_int_predication]] crweirder may be used,
  and ffirst applied to the crop instead of to the arithmetic vector. Note
  that crops are covered by the [[sv/cr_ops]] Mode format.
@@ -245,7 +245,7 @@ Two extremely important aspects of ffirst are:
  * CR-based data-dependent ffirst on the other hand **can** set VL equal
    to zero.  When VL is set
    zero due to the first element failing the CR bit-test, all subsequent
-  vectorised operations are effectively `nops` which is
+  vectorized operations are effectively `nops` which is
    *precisely the desired and intended behaviour*.
  
  The second crucial aspect, compared to LDST Ffirst:
diff --git a/openpower/sv/overview.mdwn b/openpower/sv/overview.mdwn

index a17afdcb001407765de228a0309f7ef77b234679..794155bdc56d254af7622fa1ba0528bd554b0ee2 100644 (file)
--- a/openpower/sv/overview.mdwn
+++ b/openpower/sv/overview.mdwn
@@ -12,7 +12,7 @@ This document provides an overview and introduction as to why SV (a
  Links:
  
  * This page: [http://libre-soc.org/openpower/sv/overview](http://libre-soc.org/openpower/sv/overview)
-* [FOSDEM2021 SimpleV for Power ISA](https://fosdem.org/2021/schedule/event/the_libresoc_project_simple_v_vectorisation/)
+* [FOSDEM2021 SimpleV for Power ISA](https://fosdem.org/2021/schedule/event/the_libresoc_project_simple_v_vectorization/)
  * FOSDEM2021 presentation <https://www.youtube.com/watch?v=FS6tbfyb2VA>
  * [[discussion]] and
    [bugreport](https://bugs.libre-soc.org/show_bug.cgi?id=556)
@@ -85,7 +85,7 @@ registers from 32 to 64 bit).
  The fundamentals are (just like x86 "REP"):
  
  * The Program Counter (PC) gains a "Sub Counter" context (Sub-PC)
-* Vectorisation pauses the PC and runs a Sub-PC loop from 0 to VL-1
+* Vectorization pauses the PC and runs a Sub-PC loop from 0 to VL-1
    (where VL is Vector Length)
  * The [[Program Order]] of "Sub-PC" instructions must be preserved,
    just as is expected of instructions ordered by the PC.
@@ -215,14 +215,14 @@ on the standard register file, just with a loop.  Scalar happens to set
  that loop size to one.
  
  The important insight from the above is that, strictly speaking, Simple-V
-is not really a Vectorisation scheme at all: it is more of a hardware
+is not really a Vectorization scheme at all: it is more of a hardware
  ISA "Compression scheme", allowing as it does for what would normally
  require multiple sequential instructions to be replaced with just one.
  This is where the rule that Program Order must be preserved in Sub-PC
  execution derives from.  However in other ways, which will emerge below,
  the "tagging" concept presents an opportunity to include features
  definitely not common outside of Vector ISAs, and in that regard it's
-definitely a class of Vectorisation.
+definitely a class of Vectorization.
  
  ## Register "tagging"
  
@@ -233,7 +233,7 @@ is encoded in two to three bits, depending on the instruction.
  The reason for using so few bits is because there are up to *four*
  registers to mark in this way (`fma`, `isel`) which starts to be of
  concern when there are only 24 available bits to specify the entire SV
-Vectorisation Context.  In fact, for a small subset of instructions it
+Vectorization Context.  In fact, for a small subset of instructions it
  is just not possible to tag every single register.  Under these rare
  circumstances a tag has to be shared between two registers.
  
@@ -261,8 +261,8 @@ Readers familiar with the Power ISA will know of Rc=1 operations that create
  an associated post-result "test", placing this test into an implicit
  Condition Register.  The original researchers who created the POWER ISA
  chose CR0 for Integer, and CR1 for Floating Point.  These *also become
-Vectorised* - implicitly - if the associated destination register is
-also Vectorised.  This allows for some very interesting savings on
+Vectorized* - implicitly - if the associated destination register is
+also Vectorized.  This allows for some very interesting savings on
  instruction count due to the very same CR Vectors being predication masks.
  
  # Adding single predication
@@ -799,12 +799,12 @@ The only one missing from the list here, because it is non-sequential,
  is VGATHER (and VSCATTER): moving registers by specifying a vector of
  register indices (`regs[rd] = regs[regs[rs]]` in a loop).  This one is
  tricky because it typically does not exist in standard scalar ISAs.
-If it did it would be called [[sv/mv.x]]. Once Vectorised, it's a
+If it did it would be called [[sv/mv.x]]. Once Vectorized, it's a
  VGATHER/VSCATTER.
  
  # Exception-based Fail-on-first
  
-One of the major issues with Vectorised LD/ST operations is when a
+One of the major issues with Vectorized LD/ST operations is when a
  batch of LDs cross a page-fault boundary.  With considerable resources
  being taken up with in-flight data, a large Vector LD being cancelled
  or unable to roll back is either a detriment to performance or can cause
@@ -887,7 +887,7 @@ implementations may cause pipeline stalls.
  This is a relatively new addition to SVP64 under development as of
  July 2021.  Where Horizontal-First is the standard Cray-style for-loop,
  Vertical-First typically executes just the **one** scalar element
-in each Vectorised operation. That element is selected by srcstep
+in each Vectorized operation. That element is selected by srcstep
  and dststep *neither of which are changed as a side-effect of execution*.
  Illustrating this in pseodocode, with a branch/loop.
  To create loops, a new instruction `svstep` must be called,
@@ -952,12 +952,12 @@ with conceptual sub-loops, a Scalar ISA can be turned into a Vector one,
  by embedding Scalar instructions - unmodified - into a Vector "context"
  using "Prefixing".  With careful thought, this technique reaches 90%
  par with good Vector ISAs, increasing to 95% with the addition of a
-mere handful of additional context-vectoriseable scalar instructions
+mere handful of additional context-vectorizeable scalar instructions
  ([[sv/mv.x]] amongst them).
  
  What is particularly cool about the SV concept is that custom extensions
  and research need not be concerned about inventing new Vector instructions
  and how to get them to interact with the Scalar ISA: they are effectively
  one and the same.  Any new instruction added at the Scalar level is
-inherently and automatically Vectorised, following some simple rules.
+inherently and automatically Vectorized, following some simple rules.
  
diff --git a/openpower/sv/po9_encoding.mdwn b/openpower/sv/po9_encoding.mdwn

index 3ad335b9366b9a0ee15e13c930eaf2c17a397e25..f0489c3593191e6d61232ae66f48842245770d6a 100644 (file)
--- a/openpower/sv/po9_encoding.mdwn
+++ b/openpower/sv/po9_encoding.mdwn
@@ -25,7 +25,7 @@ Anything not falling into those five categories is termed "Unvectorizable".
  
  **Definition of Horizontal-First:**
  
-Normal Cray-style Vectorisation, designated Horizontal-First, performs
+Normal Cray-style Vectorization, designated Horizontal-First, performs
  element-level operations (often in parallel) before moving in the usual
  fashion to the next instruction. The term "Horizontal-First"
  stems from naturally visually listing program instructions vertically,
@@ -116,7 +116,7 @@ as an inviolate hard rule governing Primary Opcode 9 that may not be
  revoked under any circumstances. A useful way to think of this is that
  the Prefix Encoding is, like the 8086 REP instruction, an independent
  32-bit Defined Word. The only semi-exceptions are the Post-Increment
-Mode of LD/ST-Update and Vectorised Branch-Conditional.*
+Mode of LD/ST-Update and Vectorized Branch-Conditional.*
  
  Note a particular consequence of the application of the above paragraph:
  due to the fact that the Prefix Encodings are independent, **by
@@ -132,7 +132,7 @@ area, just as *all* Scalar Defined Words are.
  
  Encoding spaces and their potential are illustrated:
  
-| Encoding |Available bits|Scalar|Vectoriseable | SVP64Single  |PO1-Prefixable |
+| Encoding |Available bits|Scalar|Vectorizeable | SVP64Single  |PO1-Prefixable |
  |----------|--------------|------|--------------|--------------|---------------|
  |EXT000-063| 32           | yes  | yes          |yes           |yes            |
  |EXT100-163| 64           | yes  | no           |no            |not twice      |
@@ -164,10 +164,10 @@ Notes:
    SVP64Single.
  * Considerable care is needed both on Architectural Resource Allocation
    as well as instruction design itself. All new Scalar instructions automatically
-  and inherently must be designed taking their Vectoriseable potential into
+  and inherently must be designed taking their Vectorizeable potential into
    consideration *including VSX* in future.
  * Once an instruction is allocated
-  in an Unvectorizable area it can never be Vectorised without providing
+  in an Unvectorizable area it can never be Vectorized without providing
    an entirely new Encoding.
  
  [[!tag standards]]
diff --git a/openpower/sv/predication.mdwn b/openpower/sv/predication.mdwn

index 53a56d22d5c24724fdc709f1a6733917079d68f5..da0e42742a0dd4b668ecd428114eca328b45e0b2 100644 (file)
--- a/openpower/sv/predication.mdwn
+++ b/openpower/sv/predication.mdwn
@@ -35,9 +35,9 @@ Implementation note: even in in-order microarchitectures it is strongly adviseab
  
  XER.SO (sticky overflow) is known to cause massive slowdown in pretty much every microarchitecture and it definitely compromises the performance of out-of-order systems.  The reason is that it introduces a READ-MODIFY-WRITE cycle between XER.SO and CR0 (which contains a copy of the SO field after inclusion of the overflow). The result and source registers branch off as RaW and WaR hazards from this RMW chain.
  
-This is even before predication or vectorisation were to be added on top, i.e. these are existing weaknesses in OpenPOWER as a scalar ISA.
+This is even before predication or vectorization were to be added on top, i.e. these are existing weaknesses in OpenPOWER as a scalar ISA.
  
-As well-known weaknesses that compromise performance, very little use of OE=1 is actually made, outside of unit tests and Conformance Tests.  Consequently it makes very little sense to continue to propagate OE=1 in the Vectorisation context of SV.
+As well-known weaknesses that compromise performance, very little use of OE=1 is actually made, outside of unit tests and Conformance Tests.  Consequently it makes very little sense to continue to propagate OE=1 in the Vectorization context of SV.
  
  ### Vector Chaining
  
@@ -66,7 +66,7 @@ They also involve adding extra scalar bitmanip opcodes, such that by utilising
  
  In addition those scalar 64-bit bitmanip operations, although some of them are obscure and unusual in the scalar world, do actually have practical applications outside of a vector context.
  
-(Hilariously and confusingly those very same scalar bitmanip opcodes may themselves be SV-vectorised however with VL only being up to 64 elements it is not anticipated that SV-bitmanip would be used to generate up to 64 bit predicate masks, when a single 64 bit scalar operation will suffice).
+(Hilariously and confusingly those very same scalar bitmanip opcodes may themselves be SV-vectorized however with VL only being up to 64 elements it is not anticipated that SV-bitmanip would be used to generate up to 64 bit predicate masks, when a single 64 bit scalar operation will suffice).
  
  The summary is that adding a full set special vector opcodes just for manipulating predicate masks and being able to transfer them to other regfiles (a la mfcr) is anomalous, costly, and unnecessary.
  
@@ -91,7 +91,7 @@ datapath to the relevant FUs.  This could be reduced by adding yet another
  type of special virtual register port or datapath that masks out the
  required predicate bits closer to the regfile.
  
-another disadvantage is that the CR regfile needs to be expanded from 8x 4bit CRs to a minimum of 64x or preferably 128x 4-bit CRs.  Beyond that they can be transferred using vectorised mfcr and mtcrf into INT regs.  this is a huge number of CR regs, each of which will need a DM column in the FU-REGs Matrix.  however this cost can be mitigated through regfile cacheing, bringing FU-REGs column numbers back down to "sane".
+another disadvantage is that the CR regfile needs to be expanded from 8x 4bit CRs to a minimum of 64x or preferably 128x 4-bit CRs.  Beyond that they can be transferred using vectorized mfcr and mtcrf into INT regs.  this is a huge number of CR regs, each of which will need a DM column in the FU-REGs Matrix.  however this cost can be mitigated through regfile cacheing, bringing FU-REGs column numbers back down to "sane".
  
  ### Predicated SIMD HI32-LO32 FUs
  
@@ -163,17 +163,17 @@ Implementation-wise just like in the CR-based case a special regfile port could
  The disadvantages appear on closer analysis:
  
  * Unlike the "full" CR port (which reads 8x CRs CR0-7 in one hit) trying the same trick on the scalar integer regfile, to obtain just 8 predicate bits (each being an LSB of a given 64 bit scalar int), would require a whopping 8x64bit set of reads to the INT regfile instead of a scant 1x32bit read.  Resource-wise, then, this idea is expensive.
-* With predicate bits being distributed out amongst 64 bit scalar registers, scalar bitmanipulation operations that can be performed after transferring Vectors of CMP operations from CRs to INTs (vectorised-mfcr) are more challenging and costly.  Rather than use vectorised mfcr, complex transfers of the LSBs into a single scalar int are required.
+* With predicate bits being distributed out amongst 64 bit scalar registers, scalar bitmanipulation operations that can be performed after transferring Vectors of CMP operations from CRs to INTs (vectorized-mfcr) are more challenging and costly.  Rather than use vectorized mfcr, complex transfers of the LSBs into a single scalar int are required.
  
  In a "normal" Vector ISA this would be solved by adding opcodes that perform the kinds of bitmanipulation operations normally needed for predicate masks, as specialist operations *on* those masks.  However for SV the rule has been set: "no unnecessary additional Vector Instructions" because it is possible to use existing PowerISA scalar bitmanip opcodes to cover the same job.
  
  The problem is that vectors of LSBs need to be transferred *to* scalar int regs, bitmanip operations carried out, *and then transferred back*, which is exceptionally costly.
  
-On balance this is a less favourable option than vectorising CRs
+On balance this is a less favourable option than vectorizing CRs
  
  ## Scalar (single) integer as predicate, with one DM row
  
-This idea has merit in that to perform predicate bitmanip operations the predicate is already in scalar INT reg form and consequently standard scalar INT bitmanip operations can be done straight away.  Vectorised mfcr can be used to get CMP results or Vectorised Rc=1 CRs into the scalar INT, easily.
+This idea has merit in that to perform predicate bitmanip operations the predicate is already in scalar INT reg form and consequently standard scalar INT bitmanip operations can be done straight away.  Vectorized mfcr can be used to get CMP results or Vectorized Rc=1 CRs into the scalar INT, easily.
  
  This idea has several disadvantages.
  
diff --git a/openpower/sv/propagation.mdwn b/openpower/sv/propagation.mdwn

index 1d6e8242a66e5e39bff09f1d0a5cf3da1e0b3023..a8d84099348c97de7ccd3cbd88b43b17e68d8a19 100644 (file)
--- a/openpower/sv/propagation.mdwn
+++ b/openpower/sv/propagation.mdwn
@@ -174,7 +174,7 @@ to be algorithmically arbitrarily remapped via 1D, 2D or 3D reshaping.
  The amount of information needed to do so is however quite large: consequently it is only practical to apply indirectly, via Context propagation.
  
  Vectors may be remapped such that Matrix multiply of any arbitrary size
-is performed in one Vectorised `fma` instruction as long as the total
+is performed in one Vectorized `fma` instruction as long as the total
  number of elements is less than 64 (maximum for VL).
  
  Additionally, in a fashion known as "Structure Packing" in NEON and RVV, it may be used to perform "zipping" and "unzipping" of
diff --git a/openpower/sv/remap/appendix.mdwn b/openpower/sv/remap/appendix.mdwn

index 39a9ec8c904cf60a843b9cc6903931833390691d..80fcb41ed78a3fda61e7fc2c51ebe66f083e7948 100644 (file)
--- a/openpower/sv/remap/appendix.mdwn
+++ b/openpower/sv/remap/appendix.mdwn
@@ -501,7 +501,7 @@ used even there.
    otherwise usual `0..VL-1` hardware for-loop
  * `svremap` to set which registers a given reordering is to apply to
    (RA, RT etc)
-* `sv.{instruction}` where any Vectorised register marked by `svremap`
+* `sv.{instruction}` where any Vectorized register marked by `svremap`
    will have its ordering REMAPPED according to the schedule set
    by `svshape`.
  
diff --git a/openpower/sv/rfc/ls001.mdwn b/openpower/sv/rfc/ls001.mdwn

index 46f60b88e43c9bc2d20877d86c2e3100acefa6c5..ea0942180f39bd8eecaa9d21a07d989fe2a67184 100644 (file)
--- a/openpower/sv/rfc/ls001.mdwn
+++ b/openpower/sv/rfc/ls001.mdwn
@@ -11,9 +11,9 @@
  * <https://bugs.libre-soc.org/show_bug.cgi?id=924>
  
  This proposal is to extend the Power ISA with an Abstract RISC-Paradigm
-Vectorisation Concept that may be orthogonally applied to **all and any**
+Vectorization Concept that may be orthogonally applied to **all and any**
  suitable Scalar instructions, present and future, in the Scalar Power ISA.
-The Vectorisation System is called
+The Vectorization System is called
  ["Simple-V"](https://libre-soc.org/openpower/sv/)
  and the Prefix Format is called
  ["SVP64"](https://libre-soc.org/openpower/sv/).
@@ -27,7 +27,7 @@ Simple-V is designed for Embedded Scenarios right the way through
  Audio/Visual DSPs to 3D GPUs and Supercomputing.  As it does **not**
  add actual Vector Instructions, relying solely and exclusively on the
  **Scalar** ISA, it is **Scalar** instructions that need to be added to
-the **Scalar** Power ISA before Simple-V may orthogonally Vectorise them.
+the **Scalar** Power ISA before Simple-V may orthogonally Vectorize them.
  
  The goal of RED Semiconductor Ltd, an OpenPOWER
  Stakeholder, is to bring to market mass-volume general-purpose compute
@@ -58,14 +58,14 @@ at the same time*.
  
  It is also critical to note that Simple-V **does not modify the Scalar
  Power ISA**, that **only** Scalar words may be
-Vectorised, and that Vectorised instructions are **not** permitted to be
+Vectorized, and that Vectorized instructions are **not** permitted to be
  different from their Scalar words (`addi` must use the same Word encoding
  as `sv.addi`, and any new Prefixed instruction added **must** also
  be added as Scalar).
-The sole semi-exception is Vectorised
+The sole semi-exception is Vectorized
  Branch Conditional, in order to provide the usual Advanced Branching
  capability present in every Commercial 3D GPU ISA, but it
-is the *Vectorised* Branch-Conditional that is augmented, not Scalar
+is the *Vectorized* Branch-Conditional that is augmented, not Scalar
  Branch.
  
  # Basic principle
@@ -195,7 +195,7 @@ of the Management Operations are anticipated for a future revision.
  
  **Simple-V SPRs**
  
-* **SVSTATE** - 64-bit Vectorisation State sufficient for Precise-Interrupt
+* **SVSTATE** - 64-bit Vectorization State sufficient for Precise-Interrupt
    Context-switching and no adverse latency, it may be considered to
    be a "Sub-PC" and as such absolutely must be treated with the same
    respect and priority as MSR and PC.
@@ -305,7 +305,7 @@ decode performed above the Vector operations may then
  easily be passed downstream in a fully forward-progressive piplined fashion
  to independent parallel units for further analysis.
  
-**Vectorised Branch-Conditional**
+**Vectorized Branch-Conditional**
  
  As mentioned in the introduction this is the one sole instruction group
  that
@@ -313,7 +313,7 @@ is different pseudocode from its scalar equivalent. However even there
  its various Mode bits and options can be set such that in the degenerate
  case the behaviour becomes identical to Scalar Branch-Conditional.
  
-The two additional Modes within Vectorised Branch-Conditional, both of
+The two additional Modes within Vectorized Branch-Conditional, both of
  which may be combined, are `CTR-Mode` and `VLI-Test` (aka "Data Fail First").
  CTR Mode extends the way that CTR may be decremented unconditionally
  within Scalar Branch-Conditional, and not only makes it conditional but
@@ -334,7 +334,7 @@ Also `SVLR` is introduced, which is a parallel twin of `LR`, and saving
  and restoring of LR and SVLR may be deferred until the final decision
  as to whether to branch.  In this way `sv.bclrl` does not corrupt `LR`.
  
-Vectorised Branch-Conditional due to its side-effects (e.g. reducing CTR
+Vectorized Branch-Conditional due to its side-effects (e.g. reducing CTR
  or truncating VL) has practical uses even if the Branch is deliberately
  set to the next instruction (CIA+8). For example it may be used to reduce
  CTR by the number of bits set in a GPR, if that GPR is given as the predicate
@@ -361,7 +361,7 @@ as it is the Base Address.
  One confusing thing is the unfortunate naming of LD/ST Indexed and
  REMAP Indexed: some care is taken in the spec to discern the two.
  LD/ST Indexed is Scalar `EA=RA+RB` (where **either** RA or RB
-may be marked as Vectorised), where obviously the order in which
+may be marked as Vectorized), where obviously the order in which
  that Vector of RA (or RB) is read in the usual linear sequential
  fashion. REMAP Indexed affects the
  **order** in which the Vector of RA (or RB) is accessed,
@@ -478,7 +478,7 @@ it is easier to then conceptualise VF vs HF Mode:
    through **registers** (or, register *elements* in traditional
    Cray-Vector ISAs) in full before moving on to the next *instruction*.
  
-Mitch Alsup's VVM Extension is a form of hardware-level auto-vectorisation
+Mitch Alsup's VVM Extension is a form of hardware-level auto-vectorization
  based around Zero-Overhead Loops. Using a Variable-Length Encoding all
  loop-invariant registers are "tagged" such that the Hazard Management
  Engine may perform optimally and do less work in automatically identifying
@@ -490,7 +490,7 @@ The biggest advantage inherent in Vertical-First is that it is very easy
  to introduce into compilers, because all looping, as far as programs
  is concerned, remains expressed as *Scalar assembler*.[^autovec]
  Whilst Mitch Alsup's
-VVM biggest strength is its hardware-level auto-vectorisation
+VVM biggest strength is its hardware-level auto-vectorization
  but is limited in its ability to call
  functions, Simple-V's Vertical-First provides explicit control over the
  parallelism ("hphint")[^hphint] and also allows for full state to be stored/restored
@@ -500,7 +500,7 @@ nested VF Loops.
  
  Simple-V Vertical-First Looping requires an explicit instruction to
  move `SVSTATE` regfile offsets forward: `svstep`. An early version of
-Vectorised
+Vectorized
  Branch-Conditional attempted to merge the functionality of `svstep`
  into `sv.bc`: it became CISC-like in its complexity and was quickly reverted.
  
@@ -528,7 +528,7 @@ REMAP Schedules, such as Complex Number FFTs, by using Scalar intermediary
  temporary registers to compute results that have a Vector source
  or destination or both.
  Contrast this with a Standard Horizontal-First Vector ISA where the only
-way to perform Vectorised Complex Arithmetic would be to add Complex Vector
+way to perform Vectorized Complex Arithmetic would be to add Complex Vector
  Arithmetic operations, because due to the Horizontal (element-level)
  progression there is no way to utilise intermediary temporary (scalar)
  variables.[^complex]
@@ -599,10 +599,10 @@ Note critically that:
    be required.  The entire 24-bits is **required** for the abstracted
    Hardware-Looping Concept **even when these 24-bits are zero**
  * Any Scalar 64-bit instruction (regardless of how it is encoded) is unsafe to
-  then Vectorise because this creates the situation of Prefixed-Prefixed,
+  then Vectorize because this creates the situation of Prefixed-Prefixed,
    resulting in deep complexity in Hardware Decode at a critical juncture, as
    well as introducing 96-bit instructions.
-* **All** of these Scalar instructions are candidates for Vectorisation.
+* **All** of these Scalar instructions are candidates for Vectorization.
    Thus none of them may be 64-bit-Scalar-only.
  
  **Minor Opcodes to fit candidates above**
@@ -656,18 +656,18 @@ legal and illegal allocations are given later.
  The primary point is that once an instruction is defined in Scalar
  32-bit form its corresponding space **must** be reserved in the
  SVP64 area with the exact same 32-bit form, even if that instruction
-is "Unvectoriseable" (`sc`, `sync`, `rfid` and `mtspr` for example).
+is "Unvectorizeable" (`sc`, `sync`, `rfid` and `mtspr` for example).
  Instructions may **not** be added in the Vector space without also
-being added in the Scalar space, and vice-versa, *even if Unvectoriseable*.
+being added in the Scalar space, and vice-versa, *even if Unvectorizeable*.
  
  This is extremely important because the worst possible situation
  is if a conflicting Scalar instruction is added by another Stakeholder,
-which then turns out to be Vectoriseable: it would then have to be
+which then turns out to be Vectorizeable: it would then have to be
  added to the Vector Space with a *completely different Defined Word*
  and things go rapidly downhill in the Decode Phase from there.
  Setting a simple inviolate rule helps avoid this scenario but does
  need to be borne in mind when discussing potential allocation
-schemes, as well as when new Vectoriseable Opcodes are proposed
+schemes, as well as when new Vectorizeable Opcodes are proposed
  for addition by future RFCs: the opcodes **must** be uniformly
  added to Scalar **and** Vector spaces, or added in one and reserved
  in the other, or
@@ -682,14 +682,14 @@ There are unfortunately some inviolate requirements that directly place
  pressure on the EXT000-EXT063 (32-bit) opcode space to such a degree that
  it risks jeapordising the Power ISA. These requirements are:
  
-* all of the scalar operations must be Vectoriseable
-* all of the scalar operations intended for Vectorisation
+* all of the scalar operations must be Vectorizeable
+* all of the scalar operations intended for Vectorization
    must be in a 32-bit encoding (not prefixed-prefixed to 96-bit)
  * bringing Scalar Power ISA up-to-date from the past 12 years
    needs 75% of two Major opcodes all on its own
  
  There exists a potential scheme which meets (exceeds) the above criteria,
-providing plenty of room for both Scalar (and Vectorised) operations,
+providing plenty of room for both Scalar (and Vectorized) operations,
  *and* provides SVP64-Single with room to grow.  It
  is based loosely around Public v3.1 EXT001 Encoding.[^ext001]
  
@@ -724,7 +724,7 @@ is based loosely around Public v3.1 EXT001 Encoding.[^ext001]
    If not allocated within the scope of this RFC
    then these are requested to be `RESERVED` for a future Simple-V
    proposal.
-* **SVP64** - a (well-defined, 2 years) DRAFT Proposal for a Vectorisation
+* **SVP64** - a (well-defined, 2 years) DRAFT Proposal for a Vectorization
    Augmentation of suffixes.
  
  For the needs identified by Libre-SOC (75% of 2 POs),
@@ -737,7 +737,7 @@ allocation to new POs, `RESERVED2` does not.[^only2]
  |old bit6=1| `RESERVED2`:{EXT300-363}  | `RESERVED4`:SVP64-Single:{EXT000-063}   | SVP64:{EXT000-063}   |
  
  * **`RESERVED2`:{EXT300-363}** (not strictly necessary to be added) is not
-  and **cannot** ever be Vectorised or Augmented by Simple-V or any future
+  and **cannot** ever be Vectorized or Augmented by Simple-V or any future
    Simple-V Scheme.
    it is a pure **Scalar-only** word-length PO Group. It may remain `RESERVED`.
  * **`RESERVED1`:{EXT200-263}** is also a new set of 64 word-length Major
@@ -756,7 +756,7 @@ allocation to new POs, `RESERVED2` does not.[^only2]
    in effect Single-Augmented-Prefixed variants of the v3.0 32-bit Power ISA.
    Alternative instruction encodings other than the exact same 32-bit word
    from EXT000-EXT063 are likewise prohibited.
-* **`SVP64:{EXT000-063}`** and **`SVP64:{EXT200-263}`** - Full Vectorisation
+* **`SVP64:{EXT000-063}`** and **`SVP64:{EXT200-263}`** - Full Vectorization
    of EXT000-063 and EXT200-263 respectively, these Prefixed instructions
    are likewise prohibited from being a different encoding from their
    32-bit scalar versions.
@@ -773,7 +773,7 @@ overwhelmingly made moot. The only downside is that there is no
  `SVP64-Reserved` which will have to be achieved with SPRs (PCR or MSR).
  
  *Most importantly what this scheme does not do is provide large areas
-for other (non-Vectoriseable) RFCs.*
+for other (non-Vectorizeable) RFCs.*
  
  # Potential Opcode allocation solution (2)
  
@@ -788,7 +788,7 @@ that is as follows:
    as a Prefix, which is a new RESERVED encoding.
  * when bit 6 is 0b0 and bits 32-33 are 0b11 are **defined** as also
    allocated to Simple-V
-* all other patterns are `RESERVED` for other non-Vectoriseable
+* all other patterns are `RESERVED` for other non-Vectorizeable
    purposes (just over 37.5%).
  
  | 0-5 | 6 | 7 | 8-31  | 32:33 |  Description               |
@@ -804,7 +804,7 @@ that is as follows:
  This ensures that any potential for future conflict over uses of the
  EXT009 space, jeapordising Simple-V in the process, are avoided,
  yet leaves huge areas (just over 37.5% of the 64-bit space) for other
-(non-Vectoriseable) uses.
+(non-Vectorizeable) uses.
  
  These areas thus need to be Allocated (SVP64 and Scalar EXT248-263):
  
@@ -828,28 +828,28 @@ and reserved areas, QTY 1of 32-bit, and QTY 3of 55-bit, are:
  * SVP64Single (`RESERVED3/4`) is *planned* for a future RFC
    (but needs reserving as part of this RFC)
  * `RESERVED1/2` is available for new general-purpose
-  (non-Vectoriseable) 32-bit encodings (other RFCs)
+  (non-Vectorizeable) 32-bit encodings (other RFCs)
  * EXT248-263 is for "new" instructions
    which **must** be granted corresponding space
    in SVP64.
-* Anything Vectorised-EXT000-063 is **automatically** being
+* Anything Vectorized-EXT000-063 is **automatically** being
    requested as 100% Reserved for every single "Defined Word"
-  (Public v3.1 1.6.3 definition). Vectorised-EXT001 or EXT009
+  (Public v3.1 1.6.3 definition). Vectorized-EXT001 or EXT009
    is defined as illegal.
  * Any **future** instruction
    added to EXT000-063 likewise, must **automatically** be
    assigned corresponding reservations in the SVP64:EXT000-063
    and SVP64Single:EXT000-063 area, regardless of whether the
-  instruction is Vectoriseable or not.
+  instruction is Vectorizeable or not.
  
  Bit-allocation Summary:
  
  * EXT3nn and other areas provide space for up to
-  QTY 4of non-Vectoriseable EXTn00-EXTn47 ranges.
+  QTY 4of non-Vectorizeable EXTn00-EXTn47 ranges.
  * QTY 3of 55-bit spaces also exist for future use (longer by 3 bits
    than opcodes allocated in EXT001)
  * Simple-V EXT2nn is restricted to range EXT248-263
-* non-Simple-V (non-Vectoriseable) EXT2nn (if ever requested in any future RFC) is restricted to range EXT200-247
+* non-Simple-V (non-Vectorizeable) EXT2nn (if ever requested in any future RFC) is restricted to range EXT200-247
  * Simple-V EXT0nn takes up 50% of PO9 for this and future Simple-V RFCs
  
  **This however potentially puts SVP64 under pressure (in 5-10 years).**
@@ -866,8 +866,8 @@ it may be better to allocate 25% to `RESERVED`:
  
  The clear separation between Simple-V and non-Simple-V stops
  conflict in future RFCs, both of which get plenty of space.
-EXT000-063 pressure is reduced in both Vectoriseable and
-non-Vectoriseable, and the 100+ Vectoriseable Scalar operations
+EXT000-063 pressure is reduced in both Vectorizeable and
+non-Vectorizeable, and the 100+ Vectorizeable Scalar operations
  identified by Libre-SOC may safely be proposed and each evaluated
  on their merits.
  
@@ -901,9 +901,9 @@ Augmenting EXT001 or EXT009 is prohibited.
  **SVP64:{EXT000-063}** bit6=old bit7=vector
  
  This encoding is identical to **SVP64:{EXT248-263}** except it
-is the Vectorisation of existing v3.0/3.1 Scalar-words, EXT000-063.
+is the Vectorization of existing v3.0/3.1 Scalar-words, EXT000-063.
  All the same rules apply with the addition that
-Vectorisation of EXT001 or EXT009 is prohibited.
+Vectorization of EXT001 or EXT009 is prohibited.
  
  | 0-5    | 6 | 7 | 8-31  | 32-63   |
  |--------|---|---|-------|---------|
@@ -935,7 +935,7 @@ Must be allocated under Scalar *and* SVP64 simultaneously.
  **SVP64:{EXT248-263}** bit6=new bit7=vector
  
  This encoding, which permits VL to be dynamic (settable from GPR or CTR)
-is the Vectorisation of EXT248-263.
+is the Vectorization of EXT248-263.
  Instructions may not be placed in this category without also being
  implemented as pure Scalar *and* SVP64Single. Unlike SVP64Single
  however, there is **no reserved encoding** (bits 8-24 zero).
@@ -1006,10 +1006,10 @@ to Simple-V, some not.
  | 64bit | ss.fishmv | 0x26!zero    | 0x12345678| scalar SVP64Single:EXT0nn |
  | 64bit | unallocated | 0x27nnnnnn   | 0x12345678| vector SVP64:EXT0nn |
  
-This is illegal because the instruction is possible to Vectorise,
-therefore it should be **defined** as Vectoriseable.
+This is illegal because the instruction is possible to Vectorize,
+therefore it should be **defined** as Vectorizeable.
  
-**illegal due to unvectoriseable**
+**illegal due to unvectorizeable**
  
  | width | assembler | prefix?      | suffix    | description   |
  |-------|-----------|--------------|-----------|---------------|
@@ -1017,11 +1017,11 @@ therefore it should be **defined** as Vectoriseable.
  | 64bit | ss.mtmsr  | 0x26!zero    | 0x12345678| scalar SVP64Single:EXT0nn |
  | 64bit | sv.mtmsr  | 0x27nnnnnn   | 0x12345678| vector SVP64:EXT0nn |
  
-This is illegal because the instruction `mtmsr` is not possible to Vectorise,
+This is illegal because the instruction `mtmsr` is not possible to Vectorize,
  at all.  This does **not** convey an opportunity to allocate the
  space to an alternative instruction.
  
-**illegal unvectoriseable in EXT2nn**
+**illegal unvectorizeable in EXT2nn**
  
  | width | assembler | prefix?      | suffix    | description   |
  |-------|-----------|--------------|-----------|---------------|
@@ -1029,11 +1029,11 @@ space to an alternative instruction.
  | 64bit | ss.mtmsr2 | 0x24!zero    | 0x12345678| scalar SVP64Single:EXT2nn |
  | 64bit | sv.mtmsr2 | 0x25nnnnnn   | 0x12345678| vector SVP64:EXT2nn |
  
-For a given hypothetical `mtmsr2` which is inherently Unvectoriseable
+For a given hypothetical `mtmsr2` which is inherently Unvectorizeable
  whilst it may be put into the scalar EXT2nn space it may **not** be
-allocated in the Vector space. As with Unvectoriseable EXT0nn opcodes
+allocated in the Vector space. As with Unvectorizeable EXT0nn opcodes
  this does not convey the right to use the 0x24/0x26 space for alternative
-opcodes.  This hypothetical Unvectoriseable operation would be better off
+opcodes.  This hypothetical Unvectorizeable operation would be better off
  being allocated as EXT001 Prefixed, EXT000-063, or hypothetically in
  EXT300-363.
  
@@ -1047,7 +1047,7 @@ EXT300-363.
  
  the use of 0x12345678 for fredmv in scalar but fishmv in Vector is
  illegal.  the suffix in both 64-bit locations
-must be allocated to a Vectoriseable EXT000-063
+must be allocated to a Vectorizeable EXT000-063
  "Defined Word" (Public v3.1 Section 1.6.3 definition)
  or not at all.
  
@@ -1090,7 +1090,7 @@ MSBs are actually *zero*, and the Vector EXT2nn space is only
  legal for Primary Opcodes in the range 232-263, where the top
  two MSBs are 0b11.  Thus this faulty attempt actually falls
  unintentionally
-into `RESERVED` "Non-Vectoriseable" Encoding space.
+into `RESERVED` "Non-Vectorizeable" Encoding space.
  
  **illegal attempt to put Scalar EXT001 into Vector space**
  
@@ -1104,7 +1104,7 @@ This becomes in effect an effort to define 96-bit instructions,
  which are illegal due to cost at the Decode Phase (Variable-Length
  Encoding). Likewise attempting to embed EXT009 (chained) is also
  illegal. The implications are clear unfortunately that all 64-bit
-EXT001 Scalar instructions are Unvectoriseable.
+EXT001 Scalar instructions are Unvectorizeable.
  
  \newpage{}
  # Use cases
@@ -1293,18 +1293,18 @@ performance and also greatly simplify unlimited-length biginteger algorithms.
  <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_bigint.py;hb=HEAD>
  
  \newpage{}
-# Vectorised strncpy
+# Vectorized strncpy
  
  Aside from the `blr` return instruction this is an entire fully-functional
  implementation of `strncpy` which demonstrates some of the remarkably
  powerful capabilities of Simple-V.  Load Fault-First avoids instruction
-traps and page faults in the middle of the Vectorised Load, providing
+traps and page faults in the middle of the Vectorized Load, providing
  the *micro-architecture* with the opportunity to notify the program of
  the successful Vector Length.  `sv.cmpi` is the next strategically-critical
  instruction, as it searches for a zero and yet *includes* it in a new
  Vector Length - bearing in mind that the previous instruction (the Load)
  *also* truncated down to the valid number of LDs performed.  Finally,
-a Vectorised Branch-Conditional automatically decrements CTR by the number
+a Vectorized Branch-Conditional automatically decrements CTR by the number
  of elements copied (VL), rather than decrementing simply by one.
  
  ```
@@ -1344,7 +1344,7 @@ of elements copied (VL), rather than decrementing simply by one.
  [^ext001]: Recall that EXT100 to EXT163 is for Public v3.1 64-bit-augmented Operations prefixed by EXT001, for which, from Section 1.6.3, bit 6 is set to 1.  This concept is where the above scheme originated. Section 1.6.3 uses the term "defined word" to refer to pre-existing EXT000-EXT063 32-bit instructions so prefixed to create the new numbering EXT100-EXT163, respectively
  [^futurevsx]: A future version or other Stakeholder *may* wish to drop Simple-V onto VSX: this would be a separate RFC
  [^vsx256]: imagine a hypothetical future VSX-256 using the exact same instructions as VSX. the binary incompatibility introducrd would catastrophically **and retroactively** damage existing IBM POWER8,9,10 hardware's reputation and that of Power ISA overall.
-[^autovec]: Compiler auto-vectorisation for best exploitation of SIMD and Vector ISAs on Scalar programming languages (c, c++) is an Indusstry-wide known-hard decades-long problem. Cross-reference the number of hand-optimised assembler algorithms.
+[^autovec]: Compiler auto-vectorization for best exploitation of SIMD and Vector ISAs on Scalar programming languages (c, c++) is an Indusstry-wide known-hard decades-long problem. Cross-reference the number of hand-optimised assembler algorithms.
  [^hphint]: intended for use when the compiler has determined the extent of Memory or register aliases in loops: `a[i] += a[i+4]` would necessitate a Vertical-First hphint of 4
  [^svshape]: although SVSHAPE0-3 should, realistically, be regarded as high a priority as SVSTATE, and given corresponding SVSRR and SVLR equivalents, it was felt that having to context-switch **five** SPRs on Interrupts and function calls was too much.
  [^whoops]: two efforts were made to mix non-uniform encodings into Simple-V space: one deliberate to see how it would go, and one accidental. They both went extremely badly, the deliberate one costing over two months to add then remove.
diff --git a/openpower/sv/rfc/ls001/discussion.mdwn b/openpower/sv/rfc/ls001/discussion.mdwn

index bc687843036f01144dd4f3fdcda3e8819c52ccc0..a1e4453a380dfbbdde8363bea14665cb50a9b3f2 100644 (file)
--- a/openpower/sv/rfc/ls001/discussion.mdwn
+++ b/openpower/sv/rfc/ls001/discussion.mdwn
@@ -20,7 +20,7 @@ of this discussion)
  
  the additional requirements are:
  
-* all of the scalar operations must be Vectoriseable
+* all of the scalar operations must be Vectorizeable
  * all of the scalar operations must be in a 32-bit encoding (not prefixed-prefixed)
  
  # use 75% of QTY 3 MAJOR ops
@@ -114,7 +114,7 @@ it would be:
  having this `RESERVED` encoding in the middle of the
  space does complexify multi-issue decoding somewhat,
  but it does provide an entire new (independent,
-non-vectorisable) 32-bit opcode space.  **two** separate
+non-vectorizable) 32-bit opcode space.  **two** separate
  RESERVED Major opcode areas can be provided: numbering them
  EXT200-263 and EXT300-363 respectively seems sane.
  EXT300-363 for `RESERVED1` comes with a caveat that it can
diff --git a/openpower/sv/rfc/ls002.fmi/discussion.mdwn b/openpower/sv/rfc/ls002.fmi/discussion.mdwn

index 77705be5b8a513f92e201325bed546f74e0868c1..a92f0bb826171beb1e4f80c824a2000d05cdc38e 100644 (file)
--- a/openpower/sv/rfc/ls002.fmi/discussion.mdwn
+++ b/openpower/sv/rfc/ls002.fmi/discussion.mdwn
@@ -206,7 +206,7 @@ ack. done. (actually, removed the duplicate sentence/phrase)
  **
  
  it is unlikely that we (Libre-SOC) will initially implement any of v3.1
-64-bit prefixing (it cannot be Vectorised, resulting unacceptably in
+64-bit prefixing (it cannot be Vectorized, resulting unacceptably in
  96-bit instructions which we decided is too much). that said, the LD
  addressing immediate extended range is extremely useful
  (along with the PC-relative modes and also other instructions
diff --git a/openpower/sv/rfc/ls008.mdwn b/openpower/sv/rfc/ls008.mdwn

index 7f4884bcf939d9458304e4c0f7059fc1c6cd7154..570c0f8c7e3076d769bfc4dbcf3f1301437d4504 100644 (file)
--- a/openpower/sv/rfc/ls008.mdwn
+++ b/openpower/sv/rfc/ls008.mdwn
@@ -57,7 +57,7 @@
  **Keywords**:
  
  ```
-    Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control (ZOLC),
+    Cray Supercomputing, Vectorization, Zero-Overhead-Loop-Control (ZOLC),
      Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model,
      Digital Signal Processing (DSP)
  ```
@@ -65,7 +65,7 @@
  **Motivation**
  
  Power ISA is synonymous with Supercomputing and the early Supercomputers
-(ETA-10, ILLIAC-IV, CDC200, Cray) had Vectorisation. It is therefore anomalous
+(ETA-10, ILLIAC-IV, CDC200, Cray) had Vectorization. It is therefore anomalous
  that Power ISA does not have Scalable Vectors.  This presents the opportunity to
  modernise Power ISA keeping it at the top of Supercomputing.
  
diff --git a/openpower/sv/rfc/ls009.mdwn b/openpower/sv/rfc/ls009.mdwn

index 3aa7fe83d68bbdc4bc55ec86ce80f9d5d5bc2458..6a3b90941c6e4fb6e958ae8e48dd103765904dc8 100644 (file)
--- a/openpower/sv/rfc/ls009.mdwn
+++ b/openpower/sv/rfc/ls009.mdwn
@@ -57,7 +57,7 @@
  **Keywords**:
  
  ```
-    Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control (ZOLC),
+    Cray Supercomputing, Vectorization, Zero-Overhead-Loop-Control (ZOLC),
      Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model,
      Digital Signal Processing (DSP)
  ```
@@ -771,7 +771,7 @@ if __name__ == '__main__':
  DCT REMAP is RADIX2 only.  Convolutions may be applied as usual
  to create non-RADIX2 DCT. Combined with appropriate Twin-butterfly
  instructions the algorithm below (written in python3), becomes part
-of an in-place in-registers Vectorised DCT.  The algorithms work
+of an in-place in-registers Vectorized DCT.  The algorithms work
  by loading data such that as the nested loops progress the result
  is sorted into correct sequential order.
  
diff --git a/openpower/sv/rfc/ls010.mdwn b/openpower/sv/rfc/ls010.mdwn

index 316347fa01aad9171a2bba0c5e6d2b1c3a89c5a4..044e087b76f5b73008cd6fc208f8202d66358d87 100644 (file)
--- a/openpower/sv/rfc/ls010.mdwn
+++ b/openpower/sv/rfc/ls010.mdwn
@@ -58,7 +58,7 @@
  **Keywords**:
  
  ```
-    Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control (ZOLC),
+    Cray Supercomputing, Vectorization, Zero-Overhead-Loop-Control (ZOLC),
      True-Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model,
      Digital Signal Processing (DSP), High-level Assembler
  ```
diff --git a/openpower/sv/rfc/ls012.mdwn b/openpower/sv/rfc/ls012.mdwn

index 07fd9e04fd5928a29d3d9fa6595e97491e44125c..84ed809fb18adf1f151f18314e86a4220934bf80 100644 (file)
--- a/openpower/sv/rfc/ls012.mdwn
+++ b/openpower/sv/rfc/ls012.mdwn
@@ -11,7 +11,7 @@
  The purpose of this RFC is:
  
  * to give a full list of upcoming **Scalar** opcodes developed by Libre-SOC
-  (being cognisant that *all* of them are Vectoriseable)
+  (being cognisant that *all* of them are Vectorizeable)
  * to give OPF Members and non-Members alike the opportunity to comment and get
    involved early in RFC submission
  * formally agree a priority order on an iterative basis with new versions
@@ -43,10 +43,10 @@ that "Libre-SOC" != "RED Semiconductor Ltd". The two are completely
  **separate** organisations*.
  
  Worth bearing in mind during evaluation that every "Defined Word" may
-or may not be Vectoriseable, but that every "Defined Word" should have
-merits on its own, not just when Vectorised, precisely because the
+or may not be Vectorizeable, but that every "Defined Word" should have
+merits on its own, not just when Vectorized, precisely because the
  instructions are Scalar.  An example of a borderline
-Vectoriseable Defined Word is `mv.swizzle` which only really becomes
+Vectorizeable Defined Word is `mv.swizzle` which only really becomes
  high-priority for Audio/Video, Vector GPU and HPC Workloads, but has
  less merit as a Scalar-only operation, yet when SVP64Single-Prefixed
  can be part of an atomic Compare-and-Swap sequence.
@@ -126,7 +126,7 @@ why the instructions are needed.
  
  Future versions of SVP64 and SVP64Single are expected to be developed
  by future Power ISA Stakeholders on top of VSX.  The decisions made
-there about the meaning of Prefixed Vectorised VSX may be *completely
+there about the meaning of Prefixed Vectorized VSX may be *completely
  different* from those made for Prefixed SFFS instructions.  At which
  point the lack of SFFS equivalents would penalise SFFS implementors in a
  much more severe way, effectively expecting them and SFFS programmers to
@@ -139,15 +139,15 @@ allow it to be stand-alone on its own merits.
  These without question have to go in EXT0xx.  Future extended variants,
  bringing even more powerful capabilities, can be followed up later with
  EXT1xx prefixed variants, which is not possible if placed in EXT2xx.
-*Only `svstep` is actually Vectoriseable*, all other Management
-instructions are UnVectoriseable.  PO1-Prefixed examples include
+*Only `svstep` is actually Vectorizeable*, all other Management
+instructions are UnVectorizeable.  PO1-Prefixed examples include
  adding psvshape in order to support both Inner and Outer Product Matrix
  Schedules, by providing the option to directly reverse the order of the
  triple loops.  Outer is used for standard Matrix Multiply (on top of a
  standard MAC or FMAC instruction), but Inner is required for Warshall
  Transitive Closure (on top of a cumulatively-applied max instruction).
  
-Excpt for `svstep` which is Vectoriseable the Management Instructions
+Excpt for `svstep` which is Vectorizeable the Management Instructions
  themselves are all 32-bit Defined Words (Scalar Operations), so
  PO1-Prefixing is perfectly reasonable.  SVP64 Management instructions
  of which there are only 6 are all 5 or 6 bit XO, meaning that the opcode
@@ -188,7 +188,7 @@ specialist.
  Found at [[sv/av_opcodes]] these do not require Saturated variants
  because Saturation is added via [[sv/svp64]] (Vector Prefixing) and
  via [[sv/svp64-single]] Scalar Prefixing. This is important to note for
-Opcode Allocation because placing these operations in the UnVectoriseable
+Opcode Allocation because placing these operations in the UnVectorizeable
  areas would irredeemably damage their value.  Unlike PackedSIMD ISAs
  the actual number of AV Opcodes is remarkably small once the usual
  cascading-option-multipliers (SIMD width, bitwidth, saturation,
@@ -215,7 +215,7 @@ the Scalar side of the ISA to add the prerequisite "Twin Butterfly"
  operations, typically performing for example one multiply but in-place
  subtracting that product from one operand and adding it to the other.
  The *in-place* aspect is strategically extremely important for significant
-reductions in Vectorised register usage, particularly for DCT.
+reductions in Vectorized register usage, particularly for DCT.
  Further: even without Simple-V the number of instructions saved is huge: 8 for
  integer and 4 for floating-point vs one.
  
@@ -308,7 +308,7 @@ as just one example.
  
  Whilst some of these instructions have VSX equivalents they must not
  be excluded on that basis.  SVP64/VSX may have a different meaning from
-SVP64/SFFS i e. the two *Vectorised* instructions may not be equivalent.
+SVP64/SFFS i e. the two *Vectorized* instructions may not be equivalent.
  
  ## Bitmanip LUT2/3
  
@@ -348,7 +348,7 @@ as a Scalar instruction is limited *except* if combined with `cmpi` and
  SVP64Single Predication, whereupon the end result is the RISC-synthesis
  of Compare-and-Swap, in two instructions.
  
-Where this instruction comes into its full value is when Vectorised.
+Where this instruction comes into its full value is when Vectorized.
  3D GPU and HPC numerical workloads astonishingly contain between 10 to 15%
  swizzle operations: access to YYZ, XY, of an XYZW Quaternion, performing
  balancing of ARGB pixel data. The usage is so high that 3D GPU ISAs make
@@ -458,7 +458,7 @@ EXT2xx).
  
  \newpage{}
  
-# Vectorisation: SVP64 and SVP64Single
+# Vectorization: SVP64 and SVP64Single
  
  To be submitted as part of [[ls001]], [[ls008]], [[ls009]] and [[ls010]],
  with SVP64Single to follow in a subsequent RFC, SVP64 is conceptually
@@ -479,7 +479,7 @@ Secondly: **any** Scalar instruction involving registers **automatically**
  becomes a candidate for Vector-Prefixing.  This in turn means that when
  a new instruction is proposed, it becomes a hard requirement to consider
  not only the implications of its inclusion as a Scalar-only instruction,
-but how it will best be utilised as a Vectorised instruction **as well**.
+but how it will best be utilised as a Vectorized instruction **as well**.
  Extreme examples of this are the Big-Integer 3-in 2-out instructions
  that use one 64-bit register effectively as a Carry-in and Carry-out. The
  instructions were designed in a *Scalar* context to be inline-efficient
@@ -487,7 +487,7 @@ in hardware (use of Operand-Forwarding to reduce the chain down to 2-in
  1-out), but in a *Vector* context it is extremely straightforward to
  Micro-code an entire batch onto 128-bit SIMD pipelines, 256-bit SIMD
  pipelines, and to perform a large internal Forward-Carry-Propagation on
-for example the Vectorised-Multiply instruction.
+for example the Vectorized-Multiply instruction.
  
  Thirdly: as far as Opcode Allocation is concerned, SVP64 needs to be
  considered as an independent stand-alone instruction (just like `REP`).
@@ -714,17 +714,17 @@ The key to headings and sections are as follows:
    upcoming RFCs in development may be found.
    *Reading advance Draft RFCs and providing feedback strongly advised*,
    it saves time and effort for the OPF ISA Workgroup.
-* **SVP64** - Vectoriseable (SVP64-Prefixable) - also implies that
+* **SVP64** - Vectorizeable (SVP64-Prefixable) - also implies that
    SVP64Single is also permitted (required).
  * **page** - Libre-SOC wiki page at which further information can
    be found.  Again: **advance reading strongly advised due to the
    sheer volume of information**.
  * **PO1** - the instruction is capable of being PO1-Prefixed
    (given an EXT1xx Opcode Allocation). Bear in mind that this option
-  is **mutually exclusively incompatible** with Vectorisation.
+  is **mutually exclusively incompatible** with Vectorization.
  * **group** - the Primary Opcode Group recommended for this instruction.
    Options are EXT0xx (EXT000-EXT063), EXT1xx and EXT2xx.  A third area
-  (UnVectoriseable),
+  (UnVectorizeable),
    EXT3xx, was available in an early Draft RFC but has been made "RESERVED"
    instead.  see [[sv/po9_encoding]].
  * **Level** - Compliancy Subset and Simple-V Level. `SFFS` indicates "mandatory"
diff --git a/openpower/sv/rfc/ls015.mdwn b/openpower/sv/rfc/ls015.mdwn

index e85b11702cfdfa5abafa653ce0889d99a3dd348d..d7210bdd4528c4ff4621c30b69002f0169c6e023 100644 (file)
--- a/openpower/sv/rfc/ls015.mdwn
+++ b/openpower/sv/rfc/ls015.mdwn
@@ -118,14 +118,14 @@ Basic concept:
    register to selectively target any four bits of a given CR Field
  * CR-to-CR version of the same, allowing multiple bits to be AND/OR/XORed
    in one hit.
-* Optional Vectorisation of the same when SVP64 is implemented
+* Optional Vectorization of the same when SVP64 is implemented
  
  Purpose:
  
  * To provide a merged version of what is currently a multi-sequence of
    CR operations (crand, cror, crxor) with mfcr and mtcrf, reducing
    instruction count.
-* To provide a vectorised version of the same, suitable for advanced
+* To provide a vectorized version of the same, suitable for advanced
    predication
  
  Useful side-effects:
diff --git a/openpower/sv/rfc/ls016.mdwn b/openpower/sv/rfc/ls016.mdwn

index ca45e559196c1b32d9167313f91305dca000b0bd..40c28787f7fff92e5afadbd92585b5e5b5116fc3 100644 (file)
--- a/openpower/sv/rfc/ls016.mdwn
+++ b/openpower/sv/rfc/ls016.mdwn
@@ -85,14 +85,14 @@ get efficient execution.
     RAp instructions, these instructions would not be proposed.
  4. The read and write of two overlapping registers normally requires
     an intermediate register (similar to the justifcation for CAS -
-   Compare-and-Swap).  When Vectorised the situation becomes even
+   Compare-and-Swap).  When Vectorized the situation becomes even
     worse: an entire *Vector* of intermediate temporaries is required.
     Thus *even if implemented inefficiently* requiring more cycles to
     complete (taking an extra cycle to write the second result) these
     instructions still save on resources.
  5. Macro-op fusion equivalents of these instructions is *not possible* for
     exactly the same reason that the equivalent CAS sequence may not be
-   macro-op fused.  Full in-place Vectorised FFT and DCT algorithms *only*
+   macro-op fused.  Full in-place Vectorized FFT and DCT algorithms *only*
     become possible due to these instructions atomically reading **both**
     Butterfly operands into internal Reservation Stations (exactly like CAS).
  5. Although desirable (particularly to detect overflow) Rc=1 is hard to
diff --git a/openpower/sv/sprs.mdwn b/openpower/sv/sprs.mdwn

index 243f7ebd4dd406f52564ac8df6394a94d799fe04..82467f6388a5dd9c9224734dae29fb488c74997b 100644 (file)
--- a/openpower/sv/sprs.mdwn
+++ b/openpower/sv/sprs.mdwn
@@ -255,7 +255,7 @@ can be made considerably faster than on other Implementations.
  
  SV Link Register, exactly analogous to LR (Link Register) may
  be used for temporary storage of SVSTATE, and, in particular,
-Vectorised Branch-Conditional instructions may interchange
+Vectorized Branch-Conditional instructions may interchange
  SVLR and SVSTATE whenever LR and NIA are.
  
  Note that there is no equivalent Link variant of SVREMAP or
diff --git a/openpower/sv/sv_analysis.mdwn b/openpower/sv/sv_analysis.mdwn

index 317689fa9754954ad724d38f5200ade2b16ae123..f947438a036da017747daaed7e077dbe1a5c3527 100644 (file)
--- a/openpower/sv/sv_analysis.mdwn
+++ b/openpower/sv/sv_analysis.mdwn
@@ -3,7 +3,7 @@
  The creation and maintenance of SVP64 Categorisation is an automated
  process that uses "Register profiling", reading machine-readable
  versions of the Power ISA Specification and tables in order to
-make the Vectorisation Categorisation.  To create this information
+make the Vectorization Categorisation.  To create this information
  by hand is neither sensible nor desirable: it may take far longer
  and introduce errors.
  
diff --git a/openpower/sv/svp64.mdwn b/openpower/sv/svp64.mdwn

index d97b24f2f396e8a09397b80533ee55710a4d3c36..397f54ab66325a5344c4d9fab9ad39ba32f38e48 100644 (file)
--- a/openpower/sv/svp64.mdwn
+++ b/openpower/sv/svp64.mdwn
@@ -44,7 +44,7 @@ Table of contents
  
  ## Introduction
  
-Simple-V is a type of Vectorisation best described as a "Prefix Loop
+Simple-V is a type of Vectorization best described as a "Prefix Loop
  Subsystem" similar to the 5 decades-old Zilog Z80 `LDIR`[^bib_ldir] instruction and
  to the 8086 `REP`[^bib_rep] Prefix instruction.  More advanced features are similar
  to the Z80 `CPIR`[^bib_cpir] instruction. If naively viewed one-dimensionally as an
@@ -88,7 +88,7 @@ Mode.  Post-Increment was considered sufficiently high priority
  (significantly reducing hot-loop instruction count) that one bit in
  the Prefix is reserved for it (*Note the intention to release that bit
  and move Post-Increment instructions to EXT2xx, as part of [[sv/rfc/ls011]]*).
-Vectorised Branch-Conditional operations "embed" the original Scalar
+Vectorized Branch-Conditional operations "embed" the original Scalar
  Branch-Conditional behaviour into a much more advanced variant that is
  highly suited to High-Performance Computation (HPC), Supercomputing,
  and parallel GPU Workloads.
@@ -178,7 +178,7 @@ Trap raised.
  *Architectural Note: Given that a "pre-classification" Decode Phase is
  required (identifying whether the Suffix - Defined Word - is
  Arithmetic/Logical, CR-op, Load/Store or Branch-Conditional),
-adding "Unvectorised" to this phase is not unreasonable.*
+adding "Unvectorized" to this phase is not unreasonable.*
  
  Vectorizable Defined Word-instructions are **required** to be Vectorized,
  or they may not be permitted to be added at all to the Power ISA as Defined
@@ -427,7 +427,7 @@ For clarity in the table below:
  * The GPR-numbering is considered LSB0-ordered
  * The Element-numbering (result0-result4) is LSB0-ordered
  * Each of the results (result0-result4) are 16-bit
-* "same" indicates "no change as a result of the Vectorised add"
+* "same" indicates "no change as a result of the Vectorized add"
  
  ```
      | MSB0:  | 0:15    | 16:31   | 32:47   | 48:63   |
@@ -446,7 +446,7 @@ the example having VL=5.  Thus on "wrapping" - sequential progression
  from GPR(1) into GPR(2) - the 5th result modifies **only** the bottom
  16 LSBs of GPR(1).
  
-If the 16-bit operation were to be followed up with a 32-bit Vectorised
+If the 16-bit operation were to be followed up with a 32-bit Vectorized
  Operation, the exact same contents would be viewed as follows:
  
  ```
@@ -620,7 +620,7 @@ MAXVL when understanding this key aspect of SimpleV.
  ## Register Naming and size
  
  As indicated above SV Registers are simply the GPR, FPR and CR register
-files extended linearly to larger sizes; SV Vectorisation iterates
+files extended linearly to larger sizes; SV Vectorization iterates
  sequentially through these registers (LSB0 sequential ordering from 0
  to VL-1).
  
@@ -989,12 +989,12 @@ following meaning:
  | 110   | so/un    | `CR[offs+i].FU` is set   |
  | 111   | ns/nu    | `CR[offs+i].FU` is clear |
  
-`offs` is defined as CR32 (4x8) so as to mesh cleanly with Vectorised
+`offs` is defined as CR32 (4x8) so as to mesh cleanly with Vectorized
  Rc=1 operations (see below).  Rc=1 operations start from CR8 (TBD).
  
-The CR Predicates chosen must start on a boundary that Vectorised CR
+The CR Predicates chosen must start on a boundary that Vectorized CR
  operations can access cleanly, in full.  With EXTRA2 restricting starting
-points to multiples of 8 (CR0, CR8, CR16...) both Vectorised Rc=1 and
+points to multiples of 8 (CR0, CR8, CR16...) both Vectorized Rc=1 and
  CR Predicate Masks have to be adapted to fit on these boundaries as well.
  
  ## Extra Remapped Encoding <a name="extra_remap"> </a>
diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn

index 170ab08b04373c894338e875cd61123e3f875f71..73426d430b51facb4d43d1c59bd51b9ad211d7d2 100644 (file)
--- a/openpower/sv/svp64/appendix.mdwn
+++ b/openpower/sv/svp64/appendix.mdwn
@@ -56,7 +56,7 @@ but only for SVP64 Prefixed Operations.
  XER.CA/CA32 on the other hand is expected and required to be implemented
  according to standard Power ISA Scalar behaviour.  Interestingly, due
  to SVP64 being in effect a hardware for-loop around Scalar instructions
-executing in precise Program Order, a little thought shows that a Vectorised
+executing in precise Program Order, a little thought shows that a Vectorized
  Carry-In-Out add is in effect a Big Integer Add, taking a single bit Carry In
  and producing, at the end, a single bit Carry out.  High performance
  implementations may exploit this observation to deploy efficient
@@ -516,7 +516,7 @@ to include the terminating zero.
  
  In CR-based data-driven fail-on-first there is only the option to select
  and test one bit of each CR (just as with branch BO).  For more complex
-tests this may be insufficient.  If that is the case, a vectorised crops
+tests this may be insufficient.  If that is the case, a vectorized crops
  (crand, cror) may be used, and ffirst applied to the crop instead of to
  the arithmetic vector.
  
@@ -528,7 +528,7 @@ One extremely important aspect of ffirst is:
    to zero. This is the only means in the entirety of SV that VL may be set
    to zero (with the exception of via the SV.STATE SPR).  When VL is set
    zero due to the first element failing the CR bit-test, all subsequent
-  vectorised operations are effectively `nops` which is
+  vectorized operations are effectively `nops` which is
    *precisely the desired and intended behaviour*.
  
  Another aspect is that for ffirst LD/STs, VL may be truncated arbitrarily
@@ -656,10 +656,10 @@ do not start on a 32-bit aligned boundary, performance may be affected.
  ### CR fields as inputs/outputs of vector operations
  
  CRs (or, the arithmetic operations associated with them)
-may be marked as Vectorised or Scalar.  When Rc=1 in arithmetic operations that have no explicit EXTRA to cover the CR, the CR is Vectorised if the destination is Vectorised.  Likewise if the destination is scalar then so is the CR.
+may be marked as Vectorized or Scalar.  When Rc=1 in arithmetic operations that have no explicit EXTRA to cover the CR, the CR is Vectorized if the destination is Vectorized.  Likewise if the destination is scalar then so is the CR.
  
  When vectorized, the CR inputs/outputs are sequentially read/written
-to 4-bit CR fields.  Vectorised Integer results, when Rc=1, will begin
+to 4-bit CR fields.  Vectorized Integer results, when Rc=1, will begin
  writing to CR8 (TBD evaluate) and increase sequentially from there.
  This is so that:
  
@@ -676,8 +676,8 @@ EXTRA field the *standard* v3.0B behaviour applies: the accompanying
  CR when Rc=1 is written to.  This is CR0 for integer operations and CR1
  for FP operations.
  
-Note that yes, the CR Fields are genuinely Vectorised.  Unlike in SIMD VSX which
-has a single CR (CR6) for a given SIMD result, SV Vectorised OpenPOWER
+Note that yes, the CR Fields are genuinely Vectorized.  Unlike in SIMD VSX which
+has a single CR (CR6) for a given SIMD result, SV Vectorized OpenPOWER
  v3.0B scalar operations produce a **tuple** of element results: the
  result of the operation as one part of that element *and a corresponding
  CR element*.  Greatly simplified pseudocode:
@@ -697,7 +697,7 @@ then a followup instruction must be performed, setting "reduce" mode on
  the Vector of CRs, using cr ops (crand, crnor) to do so.  This provides far
  more flexibility in analysing vectors than standard Vector ISAs.  Normal
  Vector ISAs are typically restricted to "were all results nonzero" and
-"were some results nonzero". The application of mapreduce to Vectorised
+"were some results nonzero". The application of mapreduce to Vectorized
  cr operations allows far more sophisticated analysis, particularly in
  conjunction with the new crweird operations see [[sv/cr_int_predication]].
  
@@ -731,7 +731,7 @@ CRn is the notation used by the OpenPower spec to refer to CR field #i,
  so FP instructions with Rc=1 write to CR1 (n=1).
  
  CRs are not stored in SPRs: they are registers in their own right.
-Therefore context-switching the full set of CRs involves a Vectorised
+Therefore context-switching the full set of CRs involves a Vectorized
  mfcr or mtcr, using VL=8 to do so.  This is exactly as how
  scalar OpenPOWER context-switches CRs: it is just that there are now
  more of them.
@@ -1018,7 +1018,7 @@ In Scalar mode, `maddedu` therefore stores the two halves of the 128-bit
  multiply into RT and RT+1.
  
  What, then, of `sv.maddedu`? If the destination is hard-coded to RT and
-RT+1 the instruction is not useful when Vectorised because the output
+RT+1 the instruction is not useful when Vectorized because the output
  will be overwritten on the next element.  To solve this is easy: define
  the destination registers as RT and RT+MAXVL respectively.  This makes
  it easy for compilers to statically allocate registers even when VL
diff --git a/openpower/sv/svp64/discussion.mdwn b/openpower/sv/svp64/discussion.mdwn

index 78e782c28355ee4f57ac043b57fe3fc5af780db8..f64b79291736cc8efde41aaa05b5800fcf37c515 100644 (file)
--- a/openpower/sv/svp64/discussion.mdwn
+++ b/openpower/sv/svp64/discussion.mdwn
@@ -153,7 +153,7 @@ i get the idea "r0 to be used therefore it is all zeros" but that makes 001 the
  | 111  | ~R30        |
  
  
-# CR Vectorisation
+# CR Vectorization
  
  Some thoughts on this: the sensible (sane) number of CRs to have is 64.  A case could be made for having 128 but it is an awful lot.  64 CRs also has the advantage that it is only 4x 64 bit registers on a context-switch (programmerjake: yeah, but we already have 256 64-bit registers, a few more won't change much).
  
@@ -231,7 +231,7 @@ Summary so far:
  
  ## only 1 src/dest
  
-Instructions in this category are usually Unvectoriseable
+Instructions in this category are usually Unvectorizeable
  or they are Load-Immediates. `fmvis` for example, is 1-Write,
  whilst SV.Branch-Conditional is BI (CR field bit).
  
diff --git a/openpower/sv/svp64_quirks.mdwn b/openpower/sv/svp64_quirks.mdwn

index 75f3d98bdcf9290456d779c00aea526b3c2d766e..b87bf0da4e775a880182696425b8e8612b5ae172 100644 (file)
--- a/openpower/sv/svp64_quirks.mdwn
+++ b/openpower/sv/svp64_quirks.mdwn
@@ -60,7 +60,7 @@ The complexity that resulted
  in the decode phase was too great. The lesson was learned, the
  hard way: it would be infinitely preferable
  to add a 32-bit Scalar Load-with-Shift
-instruction *first*, which then inherently becomes Vectorised.
+instruction *first*, which then inherently becomes Vectorized.
  Perhaps a future Power ISA spec will have this Load-with-Shift instruction:
  both ARM and x86 have it, because it saves greatly on instruction count in
  hot-loops.
@@ -70,7 +70,7 @@ also having it as a Scalar un-prefixed instruction is that if the
  32-bit encoding is ever allocated in a future revision
  of the Power ISA
  to a completely unrelated operation
-then how can a Vectorised version of that new instruction ever be added?
+then how can a Vectorized version of that new instruction ever be added?
  The uniformity and RISC Abstraction is irreparably damaged.
  Bottom line here is that the fundamental RISC Principle is strictly adhered
  to, even though these are Advanced 64-bit Vector instructions.
@@ -81,7 +81,7 @@ SVP64 and the level of systematic abstraction kept between Prefix and Suffix.
  
  The basic principle of SVP64 is the prefix, which contains mode
  as well as register augmentation and predicates.  When thinking of
-instructions and Vectorising them, it is natural for arithmetic
+instructions and Vectorizing them, it is natural for arithmetic
  operations (ADD, OR) to be the first to spring to mind.
  Arithmetic instructions have registers, therefore augmentation
  applies, end of story, right?
@@ -90,7 +90,7 @@ Except, Load and Store deals also with Memory, not just registers.
  Power ISA has Condition Register Fields: how can element widths
  apply there? And branches: how can you have Saturation on something
  that does not return an arithmetic result? In short: there are actually
-four different categories (five including those for which Vectorisation
+four different categories (five including those for which Vectorization
  makes no sense at all, such as `sc` or `mtmsr`). The categories are:
  
  * arithmetic/logical including floating-point
@@ -139,13 +139,13 @@ number of instructions in tight inner loop situations.
  
  Condition Register Fields are 4-bit wide and consequently element-width
  overrides make absolutely no sense whatsoever. Therefore the elwidth
-override field bits can be used for other purposes when Vectorising
+override field bits can be used for other purposes when Vectorizing
  CR Field instructions.  Moreover, Rc=1 is completely invalid for
  CR operations such as `crand`: Rc=1 is for arithmetic operations, producing
  a "co-result" that goes into CR0 or CR1. Thus, Saturation makes no sense.
  All of these differences, which require quite a lot of logical
  reasoning and deduction, help explain why there is an entirely different
-CR ops Vectorisation Category.
+CR ops Vectorization Category.
  
  A particularly strange quirk of CR-based Vector Operations is that the
  Scalar Power ISA CR Register is 32-bits, but actually comprises eight
@@ -164,7 +164,7 @@ EQ/LT/GT/SO within that Field*)
  With SVP64 extending the number of CR *Fields* to 128, the number of
  32-bit CR *Registers* extends to 16, in order to hold all 128 CR *Fields*
  (8 per CR Register). Then, it gets even more strange, when it comes
-to Vectorisation, which applies to the CR Field *numbers*.  The
+to Vectorization, which applies to the CR Field *numbers*.  The
  hardware-for-loop for Rc=1 for example starts at CR0 for element 0,
  and moves to CR1 for element 1, and so on.  The reason here is quite
  simple: each element result has to have its own CR Field co-result.
@@ -197,24 +197,24 @@ elwidth overrides, was particularly obtuse and hard to derive: some care
  and attention is advised, here, when reading the specification,
  especially on arithmetic loads (lbarx, lharx etc.)
  
-**Non-vectorised**
+**Non-vectorized**
  
-The concept of a Vectorised halt (`attn`) makes no sense. There are never
+The concept of a Vectorized halt (`attn`) makes no sense. There are never
  going to be a Vector of global MSRs (Machine Status Register). `mtcr`
-on the other hand is a grey area: `mtspr` is clearly Vectoriseable.
+on the other hand is a grey area: `mtspr` is clearly Vectorizeable.
  Even `td` and `tdi` makes a strange type of sense to permit it to be
-Vectorised, because a sequence of comparisons could be Vectorised.
-Vectorised System Calls (`sc`) or `tlbie` and other Cache or Virtual
+Vectorized, because a sequence of comparisons could be Vectorized.
+Vectorized System Calls (`sc`) or `tlbie` and other Cache or Virtual
  Nemory Management
-instructions, these make no sense to Vectorise.
+instructions, these make no sense to Vectorize.
  
  However, it is really quite important to not be tempted to conclude that
-just because these instructions are un-vectoriseable, the Prefix opcode space
+just because these instructions are un-vectorizeable, the Prefix opcode space
  must be free for reiterpretation and use for other purposes. This would
  be a serious mistake because a future revision of the specification
  might *retire* the Scalar instruction, and, worse, replace it with another.
  Again this comes down to being quite strict about the rules: only Scalar
-instructions get Vectorised: there are *no* actual explicit Vector
+instructions get Vectorized: there are *no* actual explicit Vector
  instructions.
  
  **Summary**
@@ -223,7 +223,7 @@ Where a traditional Vector ISA effectively duplicates the entirety
  of a Scalar ISA and then adds additional instructions which only
  make sense in a Vector Context, such as Vector Shuffle, SVP64 goes to
  considerable lengths to keep strictly to augmentation and embedding
-of an entire Scalar ISA's instructions into an abstract Vectorisation
+of an entire Scalar ISA's instructions into an abstract Vectorization
  Context. That abstraction subdivides down into Categories appropriate
  for the type of operation (Branch, CRs, Memory, Arithmetic),
  and each Category has its own relevant but
@@ -440,12 +440,12 @@ One key difference is that LR is only updated if certain additional
  conditions are met, whereas Scalar `bclrl` for example unconditionally
  overwrites LR.
  
-Another is that the Vectorised Branch-Conditional instructions are the
+Another is that the Vectorized Branch-Conditional instructions are the
  only ones where there are side-effects on predication when skipping
  is enabled. This is so as to be able to use CTR to count down
  *masked-out* elements.
  
-Well over 500 Vectorised branch instructions exist in SVP64 due to the
+Well over 500 Vectorized branch instructions exist in SVP64 due to the
  number of options available: close integration and interaction with
  the base Scalar Branch was unavoidable in order to create Conditional
  Branching suitable for parallel 3D / CUDA GPU workloads.
@@ -693,7 +693,7 @@ of the Condition Register(s) which are always zero anyway.
  
  As explained in the introduction [[sv/svp64]] and [[sv/cr_ops]]
  Scalar Power ISA lacks "Conditional Execution" present in ARM
-Scalar ISA of several decades.  When Vectorised the fact that
+Scalar ISA of several decades.  When Vectorized the fact that
  Rc=1 Vector results can immediately be used as a Predicate Mask
  back into the following instruction can result in large latency
  unless "Vector Chaining" is used in the Micro-Architecture.
diff --git a/openpower/sv/svstep.mdwn b/openpower/sv/svstep.mdwn

index 7356957bcbce90ecf5bb2c9aed08eb25fdba461c..ecb1d2b471f85af4773e41245230e64538f649dd 100644 (file)
--- a/openpower/sv/svstep.mdwn
+++ b/openpower/sv/svstep.mdwn
@@ -30,7 +30,7 @@ Special Registers Altered:
  **Description**
  
  svstep may be used to enquire about the REMAP Schedule and it may be
-used to alter Vectorisation State.  When `vf=1` then stepping occurs.
+used to alter Vectorization State.  When `vf=1` then stepping occurs.
  When `vf=0` the enquiry is performed without altering internal state.
  If `SVi=0, Rc=0, vf=0` the instruction is a `nop`.
  
@@ -58,7 +58,7 @@ to skip (or zero) elements.
  * Horizontal-First Mode can be used to return all indices,
    i.e. walks through all possible states.
  
-**Vectorisation of svstep itself**
+**Vectorization of svstep itself**
  
  As a 32-bit instruction, `svstep` may be itself be Vector-Prefixed, as
  `sv.svstep`. This will work perfectly well in Horizontal-First
@@ -104,7 +104,7 @@ any standard scalar v3.0B instruction.
  A mode of srcstep (SVi=0) is called which can move srcstep and dststep
  on to the next element, still respecting predicate masks.
  
-In other words, where normal SVP64 Vectorisation acts "horizontally"
+In other words, where normal SVP64 Vectorization acts "horizontally"
  by looping first through 0 to VL-1 and only then moving the PC to the
  next instruction, Vertical-First moves the PC onwards (vertically)
  through multiple instructions **with the same srcstep and dststep**,
diff --git a/openpower/sv/vector_isa_comparison.mdwn b/openpower/sv/vector_isa_comparison.mdwn

index b8c51490c8ac124ecc1042200eaf56680965bf2f..45cfd3b22b9c402523843eaa1f18fa4542e65e72 100644 (file)
--- a/openpower/sv/vector_isa_comparison.mdwn
+++ b/openpower/sv/vector_isa_comparison.mdwn
@@ -139,9 +139,9 @@ All of these are not Scalable Vector ISAs, they are SIMD ISAs.
  * Mitch Alsup's MyISA 66000 Vector Processor ISA Manual is available from
    Mitch under NDA
    on direct contact with him.  It is a different approach from the
-  others, which may be termed "Cray-Style Horizontal-First" Vectorisation.
+  others, which may be termed "Cray-Style Horizontal-First" Vectorization.
    66000 is a *Vertical-First* Vector ISA with hardware-level
-  auto-vectorisation.
+  auto-vectorization.
  * [ETA-10](http://50.204.185.175/collections/catalog/102641713)
    an extremely rare Scalable Vector Architecture from 1986,
    similar to the CDC Cyber 205.
diff --git a/openpower/sv/vector_ops/discussion.mdwn b/openpower/sv/vector_ops/discussion.mdwn

index e09283373c046478c3b960afddcc167413958d9c..e9d54b19c3c24d9e134ef9087eaaef2bb49e6731 100644 (file)
--- a/openpower/sv/vector_ops/discussion.mdwn
+++ b/openpower/sv/vector_ops/discussion.mdwn
@@ -199,7 +199,7 @@ where bit 2 is inv, bits 0:1 select the bit of the CR.
  
  the variant of iotacr which is vidcr, this is not appropriate to have BA=0, plus, it is pointless to have it anyway.  The integer version covers it, by not reading the int regfile at all.
  
-scalar variant which can be Vectorised to give iotacr:
+scalar variant which can be Vectorized to give iotacr:
  
       def crtaddi(RT, RA, BA, BO, D): 
           if test_CR_bit(BA, BO):
author	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Mon, 29 May 2023 12:27:18 +0000 (13:27 +0100)
committer	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Mon, 29 May 2023 12:27:18 +0000 (13:27 +0100)
openpower/sv/16_bit_compressed.mdwn		patch \| blob \| history
openpower/sv/SimpleV_rationale.mdwn		patch \| blob \| history
openpower/sv/av_opcodes.mdwn		patch \| blob \| history
openpower/sv/av_opcodes/analysis.mdwn		patch \| blob \| history
openpower/sv/biginteger.mdwn		patch \| blob \| history
openpower/sv/biginteger/analysis.mdwn		patch \| blob \| history
openpower/sv/bitmanip.mdwn		patch \| blob \| history
openpower/sv/branches.mdwn		patch \| blob \| history
openpower/sv/comparison_table.mdwn		patch \| blob \| history
openpower/sv/cookbook/chacha20.mdwn		patch \| blob \| history
openpower/sv/cr_int_predication.mdwn		patch \| blob \| history
openpower/sv/cr_ops.mdwn		patch \| blob \| history
openpower/sv/implementation.mdwn		patch \| blob \| history
openpower/sv/int_fp_mv/appendix.mdwn		patch \| blob \| history
openpower/sv/ldst.mdwn		patch \| blob \| history
openpower/sv/ldst/discussion.mdwn		patch \| blob \| history
openpower/sv/ls010/hypothetical_addi.mdwn		patch \| blob \| history
openpower/sv/microcontroller_power_isa_for_ai.mdwn		patch \| blob \| history
openpower/sv/mv.swizzle.mdwn		patch \| blob \| history
openpower/sv/mv.vec.mdwn		patch \| blob \| history
openpower/sv/normal.mdwn		patch \| blob \| history
openpower/sv/overview.mdwn		patch \| blob \| history
openpower/sv/po9_encoding.mdwn		patch \| blob \| history
openpower/sv/predication.mdwn		patch \| blob \| history
openpower/sv/propagation.mdwn		patch \| blob \| history
openpower/sv/remap/appendix.mdwn		patch \| blob \| history
openpower/sv/rfc/ls001.mdwn		patch \| blob \| history
openpower/sv/rfc/ls001/discussion.mdwn		patch \| blob \| history
openpower/sv/rfc/ls002.fmi/discussion.mdwn		patch \| blob \| history
openpower/sv/rfc/ls008.mdwn		patch \| blob \| history
openpower/sv/rfc/ls009.mdwn		patch \| blob \| history
openpower/sv/rfc/ls010.mdwn		patch \| blob \| history
openpower/sv/rfc/ls012.mdwn		patch \| blob \| history
openpower/sv/rfc/ls015.mdwn		patch \| blob \| history
openpower/sv/rfc/ls016.mdwn		patch \| blob \| history
openpower/sv/sprs.mdwn		patch \| blob \| history
openpower/sv/sv_analysis.mdwn		patch \| blob \| history
openpower/sv/svp64.mdwn		patch \| blob \| history
openpower/sv/svp64/appendix.mdwn		patch \| blob \| history
openpower/sv/svp64/discussion.mdwn		patch \| blob \| history
openpower/sv/svp64_quirks.mdwn		patch \| blob \| history
openpower/sv/svstep.mdwn		patch \| blob \| history
openpower/sv/vector_isa_comparison.mdwn		patch \| blob \| history
openpower/sv/vector_ops/discussion.mdwn		patch \| blob \| history