X-Git-Url: https://git.libre-soc.org/?p=libreriscv.git;a=blobdiff_plain;f=openpower.mdwn;h=5e534f619e4b55f7659b12415aa080d6479dc827;hp=551dd6e8d76726845121fd1646ea565397ffcd7c;hb=HEAD;hpb=a222e34f37a52cb94627ef56b72ac10a3448d0c3 diff --git a/openpower.mdwn b/openpower.mdwn index 551dd6e8d..da186f35b 100644 --- a/openpower.mdwn +++ b/openpower.mdwn @@ -1,41 +1,182 @@ +# OpenPOWER + +In the late 1980s [[!wikipedia IBM]] developed a POWER family of processors. +This evolved to a specification known as the POWER ISA. In 2019 IBM made the POWER ISA [[!wikipedia Open_source]], to be looked after by the existing [[!wikipedia OpenPOWER_Foundation]]. Here is a longer history of [[!wikipedia IBM_POWER_microprocessors]]. These IBM proprietary processors +happen to implement what is now known as the POWER ISA. The names +POWER8, POWER9, POWER10 etc. are product designations equivalent to Intel +i5, i7, i9 etc. and are frequently conflated with versions of the POWER ISA (v2.07, v3.0c, v3.1b). + +Libre-SOC is basing its [[Simple-V Vectorisation|sv]] CPU extensions on POWER ISA, because it wants to be able to specify a machine that can be completely trusted, and because POWER, thanks to IBM's involvement, +is designed for high performance. + +See wikipedia page + + +very useful resource describing all assembly instructions + + # Evaluation -* FP32 is converted to FP64. Requires SV to be active. +EULA released! looks good. + + +# Links + +* OpenPOWER Membership + +* OpenPower HDL Mailing list +* [[openpower/isatables]] +* [[openpower/whitepapers]] +* [[openpower/isa]] - pseudo-code extracted from POWER V3.0B PDF spec +* [[openpower/gem5]] +* [[openpower/sv]] +* [[openpower/prefix_codes]] Decode/encode prefix-codes, used by JPEG, DEFLATE, etc. +* [[openpower/opcode_regs_deduped]] +* [[openpower/simd_vsx]] +* [[openpower/ISA_WG]] - OpenPOWER ISA Working Group +* [[openpower/pearpc]] +* [[openpower/pipeline_operands]] - the allocation of operands on each pipeline +* [[3d_gpu/architecture/decoder]] +* +* +* +* +* +* + +PowerPC Unit Tests + +* +* + +Summary + +* FP32 is converted to FP64. Requires SimpleV to be active. * FP16 needed +* transcendental FP opcodes needed (sin, cos, atan2, root, log1p) * FCVT between 16/32/64 needed * c++11 atomics not very efficient -* no 16/48/64 opcodes, needs a shuffle of opcodes -* needs escape sequencing (ISAMUX/NS) +* no 16/48/64 opcodes, needs a shuffle of opcodes. TODO investigate Power VLE +* needs escape sequencing (ISAMUX/NS) - see [[openpower/isans_letter]] + +# What we are *NOT* doing: + +* A processor that is fundamentally incompatible (noncompliant) with Power. + (**escape-sequencing requires and guarantees compatibility**). +* Opcode 4 Signal Processing (SPE) +* Opcode 4 Vectors or Opcode 60 VSX (600+ additional instructions) +* Avoidable legacy opcodes +* SIMD. it's awful. + +# SimpleV + +see [[openpower/sv]]. +SimpleV: a "hardware for-loop" which involves type-casting (both) the +register files to "a sequence of elements". The **one** instruction +(an unmodified **scalar** instruction) is interpreted as a *hardware +for-loop* that issues **multiple** internal instructions with +sequentially-incrementing register numbers. + +Thus it is completely unnecessary to add any vector opcodes - at all - +saving hugely on both hardware and compiler development time when +the concept is dropped on top of a pre-existing ISA. + +# Integer Overflow / Saturate + +Typically used on vector operations (audio DSP), it makes no sense to have separate opcodes (Opcode 4 SPE). To be done instead as CSRs / vector-flags on *standard* arithmetic operations. # atomics Single instruction on RV, and x86, but multiple on Power. Needs investigation, particularly as to why cache flush exists. +https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html + +Hot loops contain significant instruction count, really need new c++11 atomics. To be proposed as new extension because other OpenPower members will need them too + # FP16 Doesn't exist in Power, need to work out suitable opcodes, basically means duplicating the entire range of FP32/64 ops, symmetrically. Usually done with a fmt field, 2 bit, last one is FP128 +idea: rather than add dozens of new opcodes, add "repurposer" instructions that remap FP32 to 16/32/64/128 and FP64 likewise. can also be done as C instruction, only needs 4 bits to specify. + # Escape Sequencing -Absolutely critical, also to have official endorsement from OpenPower Foundation. +aka "ISAMUX/NS". Absolutely critical, also to have official endorsement +from OpenPower Foundation. + +This will allow extending ISA (see ISAMUX/NS) in a clean fashion +(including for and by OpenPower Foundation) + +## Branches in namespaces + +Branches are fine as it is up to the compiler to decide whether to let the +ISAMUX/NS/escape-sequence countdown run out. + +This is all a software / compiler / ABI issue. + +## Function calls in namespaces + +Storing and restoring the state of the page/subpage CSR should be done by the caller. Or, again, let the countdowns run out. + +If certain alternative configs are expected, they are part of the function ABI which must be spec'd. + +All of this is a software issue (compiler / ABI). # Compressed, 48, 64, VBLOCK -Under Esc Seq, move mulli, twi, tdi out of major OP000 then use the entire row, 2 bits instead of 3. +TODO investigate Power VLE (Freescale doc Ref 314-68105) + +Under Esc Seq, move mulli, twi, tdi out of major OP000 then use the +entire row, 2 bits instead of 3. greatly simplifies decoder. * OP 000-000 and 000-001 for 16 bit compressed, 11 bit instructions * OP 000-010 and 000-011 for 48 bit. 11 bits for SVP P48 * OP 000-100 and 000-201 for 64 bit. 11 bits for SVP P64 * OP 000-110 and 000-111 for VBLOCK. 11 bits available. +Note that this requires BE instruction encoding (separate from +data BE/LE encoding). BE encoding always places the major opcode in +the first 2 bytes of the raw (uninterpreted) sequential instruction +byte stream. + +Thus in BE-instruction-mode, the first 2 bytes may be analysed to +detect whether the instruction is 16-bit Compressed, 48-bit SVP-P48, +64-bit SVP-64, variable-length VBLOCK, or plain 32-bit. + +It is not possible to distinguish LE-encoded 32-bit instructions +from LE-encoded 16-bit instructions because in LE-encoded 32-bit +instructions, the opcode falls into: + +* bytes 2 and 3 of any given raw (uninterpreted) sequential instruction + byte stream for a 32-bit instruction +* bytes 0 and 1 for a 16-bit Compressed instruction +* bytes 4 and 5 for a 48-bit SVP P48 +* bytes 6 and 7 for a 64-bit SVP P64 + +Clearly this is an impossible situation, therefore BE is the only +option. Note: *this is completely separate from BE/LE for data* + # Compressed 16 -Only 11 bits. Idea: have "pages" where one instruction selects the page number. It also specifies for how long that page is activated (terminated on a branch) +Further "escape-sequencing". + +Only 11 bits available. Idea: have "pages" where one instruction selects +the page number. It also specifies for how long that page is activated +(terminated on a branch) The length to be a maximum of 4 bits, where 0b1111 indicates "permanently active". Perhaps split OP000-000 and OP000-001 so that 2 pages can be active. Store activation length in a CSR. + +2nd idea: 11 bits can be used for extremely common operations, then length-encoding page selection for further ops, using the full 16 bit range and an entirely new encoding scheme. 1 bit specifies which of 2 pages was selected? + +3rd idea: "stack" mechanism. Allow subpages like a stack, to page in new pages. + +3 bits for subpage number. 4 bits for length, gives 7 bits. 4x7 is 28, then 3 bits can be used to specify "stack depth". + +Requirements are to have one instruction in each subpage which resets all the way back to PowerISA default. The other is a "back up stack by 1". +