X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=openpower.mdwn;h=da186f35b69cb6ad4c6d427ea795e555f6f0888e;hb=9f37111eb74b3ca2505cdc12b0f701a7405b37c1;hp=4c7ceef5db1aa352c43e0ca680ba0f1d0d431a30;hpb=9b56efb6e8bb373b200b9ac4f646a836df15755f;p=libreriscv.git diff --git a/openpower.mdwn b/openpower.mdwn index 4c7ceef5d..da186f35b 100644 --- a/openpower.mdwn +++ b/openpower.mdwn @@ -1,8 +1,182 @@ +# OpenPOWER + +In the late 1980s [[!wikipedia IBM]] developed a POWER family of processors. +This evolved to a specification known as the POWER ISA. In 2019 IBM made the POWER ISA [[!wikipedia Open_source]], to be looked after by the existing [[!wikipedia OpenPOWER_Foundation]]. Here is a longer history of [[!wikipedia IBM_POWER_microprocessors]]. These IBM proprietary processors +happen to implement what is now known as the POWER ISA. The names +POWER8, POWER9, POWER10 etc. are product designations equivalent to Intel +i5, i7, i9 etc. and are frequently conflated with versions of the POWER ISA (v2.07, v3.0c, v3.1b). + +Libre-SOC is basing its [[Simple-V Vectorisation|sv]] CPU extensions on POWER ISA, because it wants to be able to specify a machine that can be completely trusted, and because POWER, thanks to IBM's involvement, +is designed for high performance. + +See wikipedia page + + +very useful resource describing all assembly instructions + + # Evaluation -* FP32 is converted to FP64. Requires SV to be active. +EULA released! looks good. + + +# Links + +* OpenPOWER Membership + +* OpenPower HDL Mailing list +* [[openpower/isatables]] +* [[openpower/whitepapers]] +* [[openpower/isa]] - pseudo-code extracted from POWER V3.0B PDF spec +* [[openpower/gem5]] +* [[openpower/sv]] +* [[openpower/prefix_codes]] Decode/encode prefix-codes, used by JPEG, DEFLATE, etc. +* [[openpower/opcode_regs_deduped]] +* [[openpower/simd_vsx]] +* [[openpower/ISA_WG]] - OpenPOWER ISA Working Group +* [[openpower/pearpc]] +* [[openpower/pipeline_operands]] - the allocation of operands on each pipeline +* [[3d_gpu/architecture/decoder]] +* +* +* +* +* +* + +PowerPC Unit Tests + +* +* + +Summary + +* FP32 is converted to FP64. Requires SimpleV to be active. * FP16 needed +* transcendental FP opcodes needed (sin, cos, atan2, root, log1p) * FCVT between 16/32/64 needed * c++11 atomics not very efficient -* no 16/48/64 opcodes, needs a shuffle of opcodes -* needs escape sequencing (ISAMUX/NS) +* no 16/48/64 opcodes, needs a shuffle of opcodes. TODO investigate Power VLE +* needs escape sequencing (ISAMUX/NS) - see [[openpower/isans_letter]] + +# What we are *NOT* doing: + +* A processor that is fundamentally incompatible (noncompliant) with Power. + (**escape-sequencing requires and guarantees compatibility**). +* Opcode 4 Signal Processing (SPE) +* Opcode 4 Vectors or Opcode 60 VSX (600+ additional instructions) +* Avoidable legacy opcodes +* SIMD. it's awful. + +# SimpleV + +see [[openpower/sv]]. +SimpleV: a "hardware for-loop" which involves type-casting (both) the +register files to "a sequence of elements". The **one** instruction +(an unmodified **scalar** instruction) is interpreted as a *hardware +for-loop* that issues **multiple** internal instructions with +sequentially-incrementing register numbers. + +Thus it is completely unnecessary to add any vector opcodes - at all - +saving hugely on both hardware and compiler development time when +the concept is dropped on top of a pre-existing ISA. + +# Integer Overflow / Saturate + +Typically used on vector operations (audio DSP), it makes no sense to have separate opcodes (Opcode 4 SPE). To be done instead as CSRs / vector-flags on *standard* arithmetic operations. + +# atomics + +Single instruction on RV, and x86, but multiple on Power. Needs investigation, particularly as to why cache flush exists. + +https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html + +Hot loops contain significant instruction count, really need new c++11 atomics. To be proposed as new extension because other OpenPower members will need them too + +# FP16 + +Doesn't exist in Power, need to work out suitable opcodes, basically means duplicating the entire range of FP32/64 ops, symmetrically. + +Usually done with a fmt field, 2 bit, last one is FP128 + +idea: rather than add dozens of new opcodes, add "repurposer" instructions that remap FP32 to 16/32/64/128 and FP64 likewise. can also be done as C instruction, only needs 4 bits to specify. + +# Escape Sequencing + +aka "ISAMUX/NS". Absolutely critical, also to have official endorsement +from OpenPower Foundation. + +This will allow extending ISA (see ISAMUX/NS) in a clean fashion +(including for and by OpenPower Foundation) + +## Branches in namespaces + +Branches are fine as it is up to the compiler to decide whether to let the +ISAMUX/NS/escape-sequence countdown run out. + +This is all a software / compiler / ABI issue. + +## Function calls in namespaces + +Storing and restoring the state of the page/subpage CSR should be done by the caller. Or, again, let the countdowns run out. + +If certain alternative configs are expected, they are part of the function ABI which must be spec'd. + +All of this is a software issue (compiler / ABI). + +# Compressed, 48, 64, VBLOCK + +TODO investigate Power VLE (Freescale doc Ref 314-68105) + +Under Esc Seq, move mulli, twi, tdi out of major OP000 then use the +entire row, 2 bits instead of 3. greatly simplifies decoder. + +* OP 000-000 and 000-001 for 16 bit compressed, 11 bit instructions +* OP 000-010 and 000-011 for 48 bit. 11 bits for SVP P48 +* OP 000-100 and 000-201 for 64 bit. 11 bits for SVP P64 +* OP 000-110 and 000-111 for VBLOCK. 11 bits available. + +Note that this requires BE instruction encoding (separate from +data BE/LE encoding). BE encoding always places the major opcode in +the first 2 bytes of the raw (uninterpreted) sequential instruction +byte stream. + +Thus in BE-instruction-mode, the first 2 bytes may be analysed to +detect whether the instruction is 16-bit Compressed, 48-bit SVP-P48, +64-bit SVP-64, variable-length VBLOCK, or plain 32-bit. + +It is not possible to distinguish LE-encoded 32-bit instructions +from LE-encoded 16-bit instructions because in LE-encoded 32-bit +instructions, the opcode falls into: + +* bytes 2 and 3 of any given raw (uninterpreted) sequential instruction + byte stream for a 32-bit instruction +* bytes 0 and 1 for a 16-bit Compressed instruction +* bytes 4 and 5 for a 48-bit SVP P48 +* bytes 6 and 7 for a 64-bit SVP P64 + +Clearly this is an impossible situation, therefore BE is the only +option. Note: *this is completely separate from BE/LE for data* + +# Compressed 16 + +Further "escape-sequencing". + +Only 11 bits available. Idea: have "pages" where one instruction selects +the page number. It also specifies for how long that page is activated +(terminated on a branch) + +The length to be a maximum of 4 bits, where 0b1111 indicates "permanently active". + +Perhaps split OP000-000 and OP000-001 so that 2 pages can be active. + +Store activation length in a CSR. + +2nd idea: 11 bits can be used for extremely common operations, then length-encoding page selection for further ops, using the full 16 bit range and an entirely new encoding scheme. 1 bit specifies which of 2 pages was selected? + +3rd idea: "stack" mechanism. Allow subpages like a stack, to page in new pages. + +3 bits for subpage number. 4 bits for length, gives 7 bits. 4x7 is 28, then 3 bits can be used to specify "stack depth". + +Requirements are to have one instruction in each subpage which resets all the way back to PowerISA default. The other is a "back up stack by 1". +