03291680ac559f26f67428256a556067084f03a8
[libreriscv.git] / openpower / isans_letter.mdwn
1 (Draft Status)
2
3 # Letter regarding ISAMUX / NS
4
5 This is a quick overview of the way that we would like to add changes
6 that we are proposing to the PowerPC instruction set. It is based on
7 a Open Standardisation of the way that existing "mode switches",
8 already found in the POWER instruction set, are added:
9
10 * FPSCR's "NI" bit, setting non-IEEE754 FP mode
11 * MSR's "LE" bit (and associated HILE bit), setting little-endian mode
12 * MSR's "SF" bit, setting either 32-bit or 64-bit mode
13 * PCR's "compatibility" bits 60-62, V2.05 V2.06 V2.07 mode
14
15 All of these are set by one instruction, that, once set, radically
16 changes the entire behaviour and characteristics of subsequent instructions.
17
18 With these (and other) long-established precedents already in POWER,
19 there is therefore essentially conceptually nothing new about what we
20 propose: we simply seek that the process by which such "switching" is
21 added is formalised and standardised, such that we (and others) have
22 a clear, standards-non-disruptive, atomic and non-intrusive path to
23 extend the POWER ISA.
24
25 # Summary of Libre-SOC Project
26
27 TODO brief summary of Libre-SOC project (hybrid CPU-GPU-VPU), thereby
28 helping explain exactly why we need extensive augmentation of POWER ISA.
29
30 Basically it's because it is not a separate GPU-VPU, it's an *actual*
31 CPU-GPU-VPU. No separate GPU, because the CPU *is* the GPU. No separate
32 VPU, because the CPU *is* the GPU. There is not even a separate "pipeline":
33 the CPU pipelines *are* the GPU and VPU pipelines.
34
35 Closest equivalents include the ARC core (which has VPU extensions and
36 3D extensions in the form of Broadcom's VideoCore IV) and the ICubeCorp
37 IC3128. Both are considered "hybrid" CPU-GPU-VPU processors.
38
39 With the project being Libre - not proprietary and secretive and never
40 to be published, ever - it is no good having the extensions as "custom"
41 because "custom" is specifically for the cases where the augmented
42 toolchain is never, under any circumstances, published and made public by
43 the proprietary company. For business commercial reasons, Libre-SOC is
44 the total opposite of this proprietary, secretive approach.
45
46 ## Overview
47
48 The PowerPC Instruction Set Architecture (ISA) is an abstract model of a
49 computer. This is what programmers use when they write programs for the machine,
50 even if indirectly via a compiler for a high level language. We must be
51 conservative in how we add to the ISA to:
52
53 * not break existing programs
54 * be mindful as to how others may wish to add to the ISA in the future
55
56 This document describes our strategy.
57
58
59 ## ISA modes and escape sequences
60
61 New chips usually need to be able to run older (legacy) software that is
62 incompatible with the latest and greatest ISA. Eg: 64 bit chip must be able to
63 run older 16 bit and 32 bit software.
64
65 To enable backwards compatability the CPU will be set into 'legacy' mode. This
66 is done with an ISA Mode switch, also known as ISA Muxing or ISA Namespaces.
67
68 The operating system is able to quickly change between 'modern' ISA mode and
69 various legacy modes.
70
71 Another technique is an ISA escape-sequence. This is a type of mode that is
72 only operational for a short time, unlike 32 or 64 bit which would be for the
73 entire run of a program.
74
75
76 ## What are we adding to the ISA
77
78 When high quality graphical display were developed the CPUs at the time were
79 shown to not be able to run the display fast enough. The solution was the use of
80 Graphics cards, these are specialised computers that are good at rendering
81 pixels; often by doing the same thing in different parts of the screen at the
82 same time (in parallel). These specialised computers are called Graphical
83 Processing Units (GPUs).
84
85 The parallelism of some GPUs is thousands. This has led to GPUs being used to
86 solve non graphical problems where high parallelism is useful.
87
88 **break**
89
90 # Letter regarding ISAMUX / NS
91
92 Hardware-level dynamic ISA Muxing (also known as ISA Namespaces and ISA
93 escape-sequencing) is commonly used in instruction sets, in an arbitrary
94 and ad-hoc fashion, added often on an on-demand basis. Examples include:
95
96 * Setting a SPR to switch the meaning of certain opcodes for Little-Endian /
97 Big-Endian behaviour (present in POWER and SPARC)
98 * Setting a SPR to provide "backwards-compatibility" for features from
99 older versions of an ISA (such as changing to new ratified versions of
100 the IEEE754 standard)
101
102 (These we term "ISA Muxing" because, ultimately, they are extra bits
103 (or change existing bits) in the actual instruction decoder phase,
104 which involves "MUXes" to switch them on and off).
105
106 The Libre-SOC team, developing a hybrid CPU-VPU-GPU, needs to add
107 significantly and strategically to the POWER ISA to support, for example,
108 Khronos Vulkan IEEE754 Conformance, whilst *at the same time being able
109 to run full POWER9 compliant instructions*.
110
111 There is absolutely no way that we are going to duplicate the
112 entire FP opcode set as a custom extension to POWER, just to add a
113 literally-identical suite of FP opcodes that are compliant with the
114 Khronos Conformance Suites: this would be a significant and irresponsible
115 use of opcode space.
116
117 In addition, as this processor is likely to be used for parallel
118 compute purposes in high-efficiency environments, we also need to add
119 FP16 support. Again: there is no way that we are going to add *triple*
120 duplicated opcodes to POWER, given that the opcodes needed are absolutely
121 identical to those that already exist, apart from the FP bitwidth (32
122 / 64).
123
124 There are several other strategically critical uses to which we would
125 like to put such a scheme (related to power consumption and reducing
126 throughput bottlenecks needed for heavy-computation workloads in GPU
127 and VPU scenarios).
128
129 In addition, the scheme has several other key advantages over other ISA
130 "extending" ideas (such as extending the general ISA POWER space to
131 64 bit) in that, unlike 64 bit opcodes, its judicious and careful use
132 does not require large increases in I-Cache size because all opcodes,
133 ultimately, remain 32-bit. The scheme also allows future *official*
134 POWER extensions to the ISA - managed by the OpenPOWER Foundation -
135 to be strategically managed in a controlled, long-term, non-damaging
136 way to the reputation and stability of OpenPOWER.
137
138 Therefore we advocate being able to set "ISAMUX/NS" mode-switching bits
139 that, like the *existing* LE/BE mode-switching bits, change the behaviour
140 of *existing* opcodes to an alternative "meaning" (followed by another
141 mode-switch that returns them to their original meaning. Note: to reduce
142 binary code-size, alternative schemes include setting a countdown which,
143 when it expires, automatically disables the requested mode-switch)
144
145 Note also that to ensure that kernels and hypervisors are not impacted
146 by userspace ISAMUX/NS mode-switching, it is critical that Supervisor
147 and Hypervisor modes have their own completely separate ISAMUX/NS SPRs
148 (imagine a userspace application setting the LE/BE bit on a global basis,
149 or setting a global IEEE754 FP Standards compatibility flag).
150
151 Further, that Supervisor / Hypervisor modes have access to and control
152 over userspace ISAMUX/NS SPRs (without themselves being affected by
153 setting *of* userspace ISAMUX/NS SPRs), in order to be able to correctly
154 context-switch userspace applications to their correct (former) running
155 state.
156
157 Given the number of mode-switch bits that we anticipate using, we advocate
158 that such a scheme be formalised, and that the OpenPOWER Foundation be
159 the "atomic arbiter" similar to IANA and JEDEC in the formal allocation
160 of mode-switch bits to OpenPOWER implementors.
161
162 We envisage that some of these bits will be unary, some will be binary,
163 some will be allocated for exclusive use by the OpenPOWER Foundation,
164 some allocated to OpenPOWER Members (by the OpenPOWER Foundation),
165 and some reserved for "custom and experimentation usage".
166
167 (This latter - custom experimentation - to be explicitly documented
168 that upstream compiler and toolchain support will never, under any
169 circumstances be accepted by the OpenPOWER Foundation, and that this be
170 enforced through the EULA and through Trademark law).
171
172
173 However as we are quite new to POWER 3.0B (1300+ page PDF), we do
174 appreciate that such a formal scheme may already be present in POWER9
175 3.0B, that we have simply overlooked.
176