From 7d51e7aae7028af9ab77bf9c4c12e7bd308182d9 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Mon, 16 Apr 2018 00:58:13 +0100 Subject: [PATCH] clarify conclusion --- simple_v_extension.mdwn | 39 +++++++++++++++++++++++++-------------- 1 file changed, 25 insertions(+), 14 deletions(-) diff --git a/simple_v_extension.mdwn b/simple_v_extension.mdwn index 030667d26..e4459f758 100644 --- a/simple_v_extension.mdwn +++ b/simple_v_extension.mdwn @@ -271,10 +271,16 @@ implementation*. Normally, it requires a superscalar architecture and out-of-order execution capabilities to "pre-process" instructions in order to keep ALU pipelines 100% occupied. -By bringing that capability in, this proposal offers a way to increase +By bringing that capability in, this proposal could offer a way to increase pipeline activity even in simpler implementations in the one key area which really matters: the inner loop. +However when looking at much more comprehensive schemes +"A portable specification of zero-overhead loop control hardware +applied to embedded processors" (ZOLC), optimising only the single +inner loop seems inadequate, tending to suggest that ZOLC may be +better off being proposed as an entirely separate Extension. + ## Mask and Tagging (Predication) Tagging (aka Masks aka Predication) is a pseudo-method of implementing @@ -416,19 +422,24 @@ follows: * Fixed vs variable parallelism: variable * Implicit (indirect) vs fixed (integral) instruction bit-width: indirect * Implicit vs explicit type-conversion: explicit -* Implicit vs explicit inner loops: implicit -* Tag or no-tag: Complex and needs further thought - -In particular: variable-length vectors came out on top because of the -high setup, teardown and corner-cases associated with the fixed width -of SIMD. Implicit bit-width helps to extend the ISA to escape from -former limitations and restrictions (in a backwards-compatible fashion), -and implicit (zero-overhead) loops provide a means to keep pipelines -potentially 100% occupied *without* requiring a super-scalar or out-of-order -architecture. - -Constructing a SIMD/Simple-Vector proposal based around even only these four -(five?) requirements would therefore seem to be a logical thing to do. +* Implicit vs explicit inner loops: implicit but best done separately +* Tag or no-tag: Complex but highly beneficial + +In particular: + +* variable-length vectors came out on top because of the high setup, teardown + and corner-cases associated with the fixed width of SIMD. +* Implicit bit-width helps to extend the ISA to escape from + former limitations and restrictions (in a backwards-compatible fashion), + whilst also leaving implementors free to simmplify implementations + by using actual explicit internal parallelism. +* Implicit (zero-overhead) loops provide a means to keep pipelines + potentially 100% occupied in a single-issue in-order implementation + i.e. *without* requiring a super-scalar or out-of-order architecture, + but doing a proper, full job (ZOLC) is an entirely different matter. + +Constructing a SIMD/Simple-Vector proposal based around four of these five +requirements would therefore seem to be a logical thing to do. # Instruction Format -- 2.30.2