Add 'Draft' to every page + something at the very start
[libresoc-isa-manual.git] / powerpc-add / src / intro.tex
1 % Introduction
2 \chapter{Introduction}
3
4 \section{Why has Libre-SOC chosen PowerPC ?}
5
6 For a hybrid CPU-VPU-GPU, intended for mass-volume adoption in tablets,
7 netbooks, chromebooks and industrial embedded (SBC) systems, our choice was
8 between Nyuzi, MIAOW, RISC-V, PowerPC, MIPS and OpenRISC.
9
10 Of all the options, the PowerPC architecture is more complete and far more
11 mature. It also has a deeper adoption by Linux distributions.
12
13 Following IBM's release of the Power Architecture instruction set to the Linux
14 Foundation in August 2019 the barrier to using it is no more than that of using
15 RISC-V. We are encouraged that the OpenPOWER Foundation is supportive of what
16 we are doing and helping, e.g by putting us in touch with people who can help
17 us.
18
19 \subsection{Summary}
20
21 \vspace{-0.1in}
22 \begin{itemize}
23 \parskip 0pt
24 \itemsep 1pt
25
26 \item
27 We propose the standardisation of the way that the \gls{PowerPC} Instruction Set
28 Architecture (PPC ISA) is extended, enabling many different flavours within a
29 well supported family to co-exist, long-term, without conflict, right across
30 the board.
31
32 \item
33
34 This is about more than just our project. Our proposals will facilitate the
35 use of PPC in novel or niche applications without breaking the PPC ISA into
36 incompatible islands.
37
38 \item
39
40 PPC will gain a competitive market advantage by removing the need for
41 separate VPU or GPU functions in RTL or ASICs thus enabling lower cost
42 systems. Libre-SOC's project is to extend the PPC to integrate the GPU and
43 VPU functionality directly as part of the PPC ISA (example: Broadcom
44 VideoCore IV being based around extensions to an ARC core).
45
46 \item
47
48 Libre-SOC's extensions will be easily adopted, as the standard GNU/Linux
49 distributions will very deliberately run unmodified on our ISA, including
50 full compatibility with illegal instruction trap requirements.
51
52 \end{itemize}
53
54 \subsection{One CPU multiple ISAs}
55
56 This is a quick overview of the way that we would like to add changes that we
57 are proposing to the PowerPC instruction set (ISA). It is based on a Open
58 Standardisation of the way that existing \textbf{mode switches}, already found in the
59 POWER instruction set, are added:
60
61 \begin{itemize}
62 \parskip 0pt
63 \itemsep 1pt
64
65 \item
66
67 FPSCR's \textbf{NI} bit, setting non-\gls{IEEE754} FP mode
68
69 \item
70
71 MSR's \textbf{LE} bit (and associated HILE bit), setting little-endian mode
72
73 \item
74
75 MSR's \textbf{SF} bit, setting either 32-bit or 64-bit mode
76
77 \item
78
79 PCR's \textbf{compatibility} bits 60-62, V2.05 V2.06 V2.07 mode
80
81 \end{itemize}
82
83 [It is well-noted that unless each \textbf{mode switch} bit is set, any alternative
84 (additional) instructions (and functionality) are completely inaccessible, and
85 will result in \textbf{illegal instruction} traps being thrown. This is recognised as
86 being critically important.]
87
88 These bits effectively create multiple, incompatible run-time switchable ISAs
89 within one CPU. They are selectable for the needs of the individual program (or
90 OS) being run.
91
92 All of these bits are set by an instruction, that, once set, radically changes
93 the entire behaviour and characteristics of subsequent instructions.
94
95 With these (and other) long-established precedents already in POWER, there is
96 therefore essentially conceptually nothing new about what we propose: we simply
97 seek that the process by which such \textbf{switching} is added is formalised and
98 standardised, such that we (and others, including IBM itself) have a clear,
99 well-defined standards-non-disruptive, atomic and non-intrusive path to extend
100 the POWER ISA for use in markets that it presently cannot enter.
101
102 We advocate that some of \textbf{mode-setting} (escape-sequencing) bits be binary
103 encoded, some unary encoded, and that some space marked for \textbf{offical} use, some
104 \textbf{experimental}, some \textbf{custom} and some \textbf{reserved}. The available space in a
105 suitably-chosen SPR to be formalised, and recommend the OpenPOWER Foundation be
106 given the IANA-like role in atomically allocating mode bits.
107
108 The IANA-like atomic role ensures that new PCR mode bits are allocated
109 world-wide unique. In combination with a mandatory illegal instruction
110 exception to be thrown on any system not supporting any given mode, the
111 opportunity exists for all systems to trap and emulate all other systems and
112 thus retain some semblance of interoperability. (Contrast this with either
113 allocating the same mode bit(s) to two (or more) designers, or not making
114 illegal exceptions mandatory: binary interoperability becomes unachievable and
115 the result is irrevocable damage to POWER's reputation.)
116
117 We also advocate to consider reserving some bits as a \textbf{countdown} where the new
118 mode will be enabled only for a certain number of instructions. This avoids an
119 explicit need to \textbf{flip back}, reducing binary code size. Note that it is not a
120 good idea to let the counter cross a branch or other change in PC (and to throw
121 illegal instruction trap if attempted). However traps and exceptions themselves
122 will need to save (and restore) the countdown, just as the rest of the PCR and
123 other modeswitching bits need to be saved.
124
125 Instructions that we need to add, which are a normal part of GPUs, include
126 ATAN2, LOG, NORMALISE, YUV2RGB, Khronos Compliance FP mode (different from both
127 IEEE754 and \textbf{NI} mode), and many more. Many of these may turn out to be useful
128 in a wider context: they however need to be fully isolated behind
129 \textbf{mode-setting} before being in any way considered for Standards-track formal
130 adoption.
131
132 Some mode-setting instructions are privileged, i.e can only be set by the
133 kernel (e.g 32 or 64 bit mode). Most of the escape sequences that we propose
134 will be (have to be) usable without the need for an expensive system call
135 overhead (because some of the instructions needed will be in extremely tight
136 inner loops).
137
138 \subsection{About Libre-SOC Commercial Project}
139
140 The Libre-SOC Commercial Product is a hybrid \gls{CPU}-\gls{GPU}-\gls{VPU} intended for
141 mass-volume production. There is no separate GPU, because the CPU is the GPU.
142 There is no separate VPU, because the CPU is the GPU. There is not even a
143 separate pipeline: the CPU pipelines are the GPU and VPU pipelines.
144
145 Closest equivalents include the ARC core (which has VPU extensions and 3D
146 extensions in the form of Broadcom's \gls{VideoCoreIV}) and the \gls{ICubeCorpIC3128}.
147 Both are considered \textbf{hybrid} CPU-GPU-VPU processors.
148
149 \textbf{Normal} Commercial GPUs are entirely separate processors. The development cost
150 and complexity purely in terms of Software Drivers alone is immense. We reject
151 that approach (and as a small team we do not have the resources anyway).
152
153 With the project being Libre - not proprietary and secretive and never to be
154 published, ever - it is no good having the extensions as \textbf{custom} because
155 \textbf{custom} is specifically for the cases where the augmented toolchain is never,
156 under any circumstances, published and made public by the proprietary company
157 (and would never be accepted upstream anyway). For business commercial reasons,
158 Libre-SOC is the total opposite of this proprietary, secretive approach.
159
160 Therefore, to meet our business objectives:
161
162 \begin{itemize}
163 \parskip 0pt
164 \itemsep 1pt
165
166 \item
167
168 As shown from Nyuzi and Larrabee, although ideally suited to high
169 performance compute tasks, a \textbf{traditional} general-purpose full
170 IEEE754-compliant Vector ISA (such as that in POWER9) is not an adequate
171 basis for a commercially competitive GPU. Nyuzi's conclusion is that using
172 such general-purpose Vector ISAs results in reaching only 25% performance
173 (or requiring 4-fold increase in power consumption) to achieve par with
174 current commercial-grade GPUs.
175
176 \item
177
178 We are not going the \textbf{traditional} (separate custom GPU) route because it
179 is not practical for a new team to design hardware and spend 8+ man-years
180 on massively complex inter-processor driver development as well
181
182 \item
183
184 We cannot meet our objectives with a \textbf{custom extension} because the
185 financial burden on our team to maintain a total hard fork of not just
186 toolchains, but also entire GNU/Linux Distros, is highly undesirable, and
187 completely impractical (we know for certain that Redhat would strongly
188 object to any efforts to hard-fork Fedora)
189
190 \item
191
192 We could invent our own custom GPU instruction set (or use and extend an
193 existing one, to save a man-decade on toolchain development) however even
194 to switch over to that \textbf{Dual ISA} GPU instruction set in the next clock
195 cycle still requires a PCR modeswitch bit in order to avoid needing a full
196 Inter-Processor Bus Architecture like on \textbf{traditional} GPUs.
197
198 \item
199
200 If extending any instruction set, rather than have a Dual ISA (which needs
201 the PCR modeswitch bit to access it) we would rather extend POWER.
202
203 \item
204
205 We cannot \textbf{go ahead anyway} because to do so would be highly irresponsible
206 and cause massive disruption to the POWER community.
207
208 \end{itemize}
209
210 With all impractical options eliminated the only remaining responsible option
211 is to extend the POWER ISA in an atomically-managed (IANA-style) formal
212 fashion, whilst (critically and absolutely essentially) always providing a PCR
213 compatibility mode that is fully POWER compliant, including all illegal
214 instruction traps.
215
216