e380475d7c41ca65c56278b35fc56396b568c665
[libreriscv.git] / shakti / m_class.mdwn
1 # Shakti M-Class Libre SoC
2
3 This SoC is a propsed libre design that draws in expertise from mass-volume
4 SoCs of the past six years and beyond, and is being designed to cover just
5 as wide a range of target embedded / low-power / industrial markets as those
6 SoCs. Pincount is to be kept low in order to reduce cost as well as increase
7 yields.
8
9 * See <http://rise.cse.iitm.ac.in/shakti.html> M-Class for top-level
10 * See [[pinouts]] for auto-generated table of pinouts (including mux)
11 * See [[peripheralschematics]] for example Reference Layouts
12 * See [[ramanalysis]] for a comprehensive analysis of why DDR3 is to be used.
13
14 ## Rough specification.
15
16 Quad-core 28nm RISC-V 64-bit (RISCV64GC core with Vector SIMD Media / 3D
17 extensions), 300-pin 15x15mm BGA 0.8mm pitch, 32-bit DDR3/DDR3L/LPDDR3
18 memory interface and libre / open interfaces and accelerated hardware
19 functions suitable for the higher-end, low-power, embedded, industrial
20 and mobile space.
21
22 A 0.8mm pitch BGA allows relatively large (low-cost) VIA drill sizes
23 to be used (8-10mil) and 4-5mil tracks with 4mil clearance. For
24 details see
25 <http://processors.wiki.ti.com/index.php/General_hardware_design/BGA_PCB_design>
26
27 ## Targetting full Libre Licensing to the bedrock.
28
29 The only barrier to being able to replicate the masks from scratch
30 is the proprietary cells (e.g. memory cells) designed by the Foundries:
31 there is a potential long-term strategy in place to deal with that issue.
32
33 The only proprietary interface utilised in the entire SoC is the DDR3
34 PHY plus Controller, which will be replaced in a future revision, making
35 the entire SoC exclusively designed and made from fully libre-licensed
36 BSD and LGPL openly and freely accessible VLSI and VHDL source.
37
38 In addition, no proprietary firmware whatsoever will be required to
39 operate or boot the device right from the bedrock: the entire software
40 stack will also be libre-licensed (even for programming the initial
41 proprietary DDR3 PHY+Controller)
42
43 # Inspiration from several sources
44
45 The design of this SoC is drawn from at least the following SoCs, which
46 have significant multiplexing for pinouts, reducing pincount whilst at
47 the same time permitting the SoC to be utilised across a very wide range
48 of markets:
49
50 * A10/A20 EVB <http://hands.com/~lkcl/eoma/A10-EVB-V1-2-20110726.pdf>
51 * RK3288 T-Firefly <http://www.t-firefly.com/download/firefly-rk3288/hardware/FR_RK3288_0930.pdf>
52 * Ingenic JZ4760B <ftp://ftp.ingenic.cn/SOC/JZ4760B/JZ4760B_DS_REVISION.PDF>
53 LEPUS Board <ftp://ftp.ingenic.cn/DevSupport/Hardware/RD4760B_LEPUS/RD4760B_LEPUS_V1.3.2.PDF>
54 * GPL-violating CT-PC89e <http://hands.com/~lkcl/seatron/>,
55 and <http://lkcl.net/arm_systems/CT-PC89E/> this was an 8.9in netbook
56 weighing only 0.72kg and having a 3 HOUR battery life on a single 2100mAh
57 cell, its casework alone inspired a decade of copycat china clone
58 netbooks as it was slowly morphed from its original 8.9in up to (currently)
59 an 11in form-factor almost a decade later in 2017.
60 * A64 Reference Designs for example this: <http://linux-sunxi.org/images/3/32/Banana_pi_BPI-M64-V1_1-Release_201609.pdf>
61
62 TI Boards such as the BeagleXXXX Series, or the Freescale iMX6
63 WandBoard etc., are, whilst interesting, have a different kind of focus
64 and "feel" about them, as they are typically designed by Western firms
65 with less access or knowledge of the kinds of low-cost tricks deployed
66 to ingenious and successful effect by Chinese Design Houses. Not only
67 that but they typically know the best components to buy. Western-designed
68 PCBs typically source exclusively from Digikey, AVNet, Mouser etc. and
69 the prices are often two to **TEN** times more costly as a result.
70
71 The TI and Freescale (now NXP) series SoCs themselves are also just as
72 interesting to study, but again have a subtly different focus: cost of
73 manufacture of PCBs utilising them not being one of those primary focii.
74 Freescale's iMX6 is well-known for its awesome intended lifespan and support:
75 **ninteen** years. That does however have some unintended knock-on effects
76 on its pricing.
77
78 Instead, the primary input is taken from Chinese-designed SoCs, where cost
79 and ease of production, manufacturing and design of a PCB using the planned
80 SoC, as well as support for high-volume mass-produced peripherals is
81 firmly a priority focus.
82
83 # Target Markets
84
85 * EOMA68 Computer Card form-factor (general-purpose, eco-conscious)
86 * Smartphone / Tablet (basically the same thing, different LCD/CTP size)
87 * Low-end (ChromeOS style) laptop
88 * Industrial uses when augmented by a suitable MCU (for ADC/DAC/CAN etc.)
89
90 ## Common Peripherals to majority of target markets
91
92 * SPI or 8080 or RGB/TTL or LVDS LCD display. SPI: 320x240. LVDS: 1440x900.
93 * LCD Backlight, requires GPIO power-control plus PWM for brightness control
94 * USB-OTG Port (OTG-Host, OTG Client, Charging capability)
95 * Baseband Modem (GSM / GPRS / 3G / LTE) requiring USB, UART, and PCM audio
96 * Bluetooth, requires either full UART or SD/MMC or USB, plus control GPIO
97 * WIFI, requires either USB (but with power penalties) or better SD/MMC
98 * SD/MMC for external MicroSD
99 * SD/MMC for on-PCB eMMC (care needed on power/boot sequence)
100 * NAND Flash (not recommended), requires 8080/ATI-style Bus with dedicated CS#
101 * Optional 4-wire SPI NAND/NOR for boot (XIP - Execute In-place - recommended).
102 * Audio over I2S (5-pin: 4 for output, 1 for input), fall-back to USB Audio
103 * Some additional SPI peripherals, e.g. connection to low-power MCU.
104 * GPIO (EINT-capable, with wakeup) for buttons, power, volume etc.
105 * Camera(s) either by CSI-1 (parallel CSI) or better by USB
106 * I2C sensors: accelerometer, compass, etc. Each requires EINT and RST GPIO.
107 * Capacitive Touchpanel (I2C and also requiring EINT and RST GPIO)
108 * Real-time Clock (usually an I2C device but may be on-board a support MCU)
109
110 ## Peripherals unique to laptop market
111
112 * Keyboard (USB or keyboard-matrix managed by MCU)
113 * USB, I2C or SPI Mouse-trackpad (plus button GPIO, EINT capable)
114
115 ## Peripherals common to laptop and Industrial Market
116
117 * Ethernet (RGMII or better 8080-style XT/AT/ATI MCU bus)
118
119 ## Augmentation by an embedded MCU
120
121 Some functions, particularly analog, are particularly tricky to implement
122 in an early SoC. In addition, CAN is still patented. For unusual, patented
123 or analog functionality such as CAN, RTC, ADC, DAC, SPDIF, One-wire Bus
124 and so on it is easier and simpler to deploy an ultra-low-cost low-speed
125 companion Micro-Controller such as the crystal-less STMS8003 ($0.24) or
126 the crystal-less STM32F072 or other suitable MCU, depending on requirements.
127 For high-speed interconnect it may be wired up as an SPI device, and for
128 lower-speed communication UART would be the simplest and easiest means of
129 two-way communication.
130
131 This technique can be deployed in all scenarios (phone, tablet, laptop,
132 industrial), and is an extremely low-cost way of getting RTC functionality
133 for example. The cost of, for example, dedicated I2C sensors that provide
134 RTC functionality, or ADC or DAC or "Digipot", are actually incredibly
135 high, relatively speaking. Some very simple software and a general-purpose
136 MCU does the exact same job. In particularly cost-sensitive applications,
137 DAC may be substituted by a PWM, an RC circuit, and an optional feedback
138 loop into an ADC pin to monitor situations where changing load on the RC
139 circuit alters the output voltage. All done entirely in the MCU's software.
140
141 An MCU may even be used to emulate SPI "XIP" (Execute in-place) NAND
142 memory, such that there is no longer a need to deploy a dedicated SPI
143 NOR bootloader IC (which are really quite expensive). By emulating
144 an SPI XIP device the SoC may boot from the NAND Flash storage built-in
145 to the embedded MCU, or may even feed the SoC data from a USB-OTG
146 or other interface. This makes for an extremely flexible bootloader
147 capability, without the need for totally redoing the SoC masks just to
148 add extra BOOTROM functions.
149
150 ## Common Internal (on-board) acceleration and hardware functions
151
152 * 2D accelerated display
153 * 3D accelerated graphics
154 * Video encode / decode
155 * Image encode / decode
156 * Crypto functions (SHA, Rijndael, DES, etc., Diffie-Hellman, RSA)
157 * Cryptographically-secure PRNG (hard to get right)
158
159 ### 2D acceleration
160
161 The ORSOC GPU contains basic primitives for 2D: rectangles, sprites,
162 image acceleration, scalable fonts, and Z-buffering and much more.
163
164 <https://opencores.org/project,orsoc_graphics_accelerator>
165
166 ### 3D acceleration
167
168 * MIAOW: ATI-compatible shader engine <http://miaowgpu.org/>
169 * ORSOC GPU contains some primitives that can be used
170 * SIMD RISC-V extensions can obviate the need for a "full" separate GPU
171
172 ### Video encode / decode
173
174 * video primitives <https://opencores.org/project,video_systems>
175 * MPEG decoder <https://opencores.org/project,mpeg2fpga>
176 * Google make free VP8 and VP9 hard macros available for production use only
177
178 ### Image encode / decode
179
180 partially covered by the ORSOC GPU
181
182 ### Crypto functions
183
184 TBD
185
186 ### Cryptographically-secure PRNG
187
188 TBD
189
190 # Proposed Interfaces
191
192 * RGB/TTL up to 1440x900 @ 60fps, 24-bit colour
193 * 2x 1-lane SPI
194 * 1x 4-lane (quad) SPI
195 * 4x SD/MMC (1x 1/2/4/8-bit, 3x 1/2/4-bit)
196 * 2x full UART incl. CTS/RTS
197 * 3x UART (TX/RX only)
198 * 3x [[I2C]] (in case of address clashes between peripherals)
199 * 8080-style AT/XT/ATI MCU Bus Interface, with multiple (8x CS#) lines
200 * 3x PWM-capable GPIO
201 * 32x EINT-cable GPIO with full edge-triggered and low/high IRQ capability
202 * 1x I2S audio with 4-wire output and 1-wire input.
203 * 3x USB2 (ULPI for reduced pincount) each capable of USB-OTG support
204 * DDR3/DDR3L/LPDDR3 32-bit-wide memory controller
205
206 Some interfaces at:
207
208 * <https://github.com/sifive/sifive-blocks/tree/master/src/main/scala/devices/>
209 includes GPIO, SPI, UART, JTAG, I2C, PinCtrl, UART and PWM. Also included
210 is a Watchdog Timer and others.
211 * <https://github.com/sifive/freedom/blob/master/src/main/scala/everywhere/e300artydevkit/Platform.scala>
212 Pinmux ("IOF") for multiplexing several I/O functions onto a single pin
213
214 ## I2C
215
216 At its own page [[I2C]]
217
218 ## I2S
219
220 <https://github.com/skristiansson/i2s>
221
222 ## FlexBus
223
224 FlexBus is capable of emulating the 8080-style / ATI MCU Bus, as well as
225 providing support for access to SRAM. It is extremely likely that it will
226 provide access to MCU-style Ethernet PHY ICs such as the DM9000, the
227 AX88180 (gigabit ethernet but an enormous number of pins), the AX88796A
228 (8/16-bit 80186 or MC68k).
229
230 ## RGB/TTL interface
231
232 <https://opencores.org/project,vga_lcd> full linux kernel driver also available
233
234 ## SPI
235
236 * APB to SPI <https://opencores.org/project,apb2spi>
237 * ASIC-proven <https://opencores.org/project,spi_master_slave>
238 * Wishbone-compliant <https://opencores.org/project,simple_spi>
239
240 ## SD/MMC (including eMMC)
241
242 * <https://opencores.org/project,sd_mmc_emulator>
243 * (needs work) <https://opencores.org/project,sdcard_mass_storage_controller>
244
245 # Pin Multiplexing
246
247 Complex! Covered in [[pinouts]]. The general idea is to target several
248 distinct applications and, by trial-and-error, create a pinmux table that
249 successfully covers all the target scenarios by providing absolutely all
250 required functions for each and every target. A few general rules:
251
252 * Different functions (SPI, I2C) which overlap on the same pins on one
253 bank should also be duplicated on completely different banks, both from
254 each other and also the bank on which they overlap. With each bank having
255 separate Power Domains this strategy increases the chances of being able
256 to place low-power and high-power peripherals and sensors on separate
257 GPIO banks without needing external level-shifters.
258 * Functions which have optional bus-widths (eMMC: 1/2/4/8) may have more
259 functions overlapping them than would otherwise normally be considered.
260 * Then the same overlapped high-order bus pins can also be mapped onto
261 other pins. This particularly applies to the very large buses, such
262 as FlexBus (over 50 pins). However if the overlapped pins are on a
263 different bank it becomes necessary to have both banks run in the same
264 GPIO Power Domain.
265 * All functions should really be pin-muxed at least twice, preferably
266 three times. Four or more times on average makes it pointless to
267 even have four-way pinmuxing at all, so this should be avoided.
268 The only exceptions (functions which have not been pinmuxed multiple
269 times) are the RGB/TTL LCD channel, and both ULPI interfaces.
270
271 ## GPIO Pinmux Power Domains
272
273 Of particular importance is the Power Domains for the GPIO. Realistically
274 it has to be flexible (simplest option: recommended to be between
275 1.8v and 3.3v) as the majority of low-cost mass-produced sensors and
276 peripherals on I2C, SPI, UART and SD/MMC are at or are compatible with
277 this voltage range. Long-tail (older / stable / low-cost / mass-produced)
278 peripherals in particular tend to be 3.3v, whereas newer ones with a
279 particular focus on Mobile tend to be 1.2v to 1.8v.
280
281 A large percentage of sensors and peripherals have separate IO voltage
282 domains from their main supply voltage: a good example is the SN75LVDS83b
283 which has one power domain for the RGB/TTL I/O, one for the LVDS output,
284 and one for the internal logic controller (typical deployments tend not
285 to notice the different power-domain capability, as they usually supply all
286 three voltages at 3.3v).
287
288 Relying on this capability, however, by selecting a fixed voltage for
289 the entire SoC's GPIO domain, is simply not a good idea: all sensors
290 and peripherals which do not have a variable (VREF) capability for the
291 logic side, or coincidentally are not at the exact same fixed voltage,
292 will simply not be compatible if they are high-speed CMOS-level push-push
293 driven. Open-Drain on the other hand can be handled with a MOSFET for
294 two-way or even a diode for one-way depending on the levels, but this means
295 significant numbers of external components if the number of lines is large.
296
297 So, selecting a fixed voltage (such as 1.8v or 3.3v) results in a bit of a
298 problem: external level-shifting is required on pretty much absolutely every
299 single pin, particularly the high-speed (CMOS) push-push I/O. An example: the
300 DM9000 is best run at 3.3v. A fixed 1.8v FlexBus would
301 require a whopping 18 pins (possibly even 24 for a 16-bit-wide bus)
302 worth of level-shifting, which is not just costly
303 but also a huge amount of PCB space: bear in mind that for level-shifting, an
304 IC with **double** the number of pins being level-shifted is required.
305
306 Given that level-shifting is an unavoidable necessity, and external
307 level-shifting has such high cost(s), the workable solution is to
308 actually include GPIO-group level-shifting actually on the SoC die,
309 after the pin-muxer at the front-end (on the I/O pads of the die),
310 on a per-bank basis. This is an extremely common technique that is
311 deployed across a very wide range of mass-volume SoCs.
312
313 One very useful side-effect for example of a variable Power Domain voltage
314 on a GPIO bank containing SD/MMC functionality is to be able to change the
315 bank's voltage from 3.3v to 1.8v, to match an SD Card's capabilities, as
316 permitted under the SD/MMC Specification. The alternative is to be forced to
317 deploy an external level-shifter IC (if PCB space and BOM target allows) or to
318 fix the voltage at 3.3v and thus lose access to the low-power and higher-speed
319 capabilities of modern SD Cards.
320
321 In summary: putting level shifters right at the I/O pads of the SoC, after
322 the pin-mux (so that the core logic remains at the core voltage) is a
323 cost-effective solution that can have additional unintended side-benefits
324 and cost savings beyond simply saving on external level-shifting components
325 and board space.
326
327 # Items requiring clarification, or proposals TBD
328
329 ## Core Voltage Domains from the PMIC
330
331 See [[peripheralschematics]] - what default (start-up) voltage can the
332 core of the proposed 28nm SoC cope with for short durations? The AXP209
333 PMIC defaults to a 1.25v CPU core voltage, and 1.2v for the logic. It
334 can be changed by the SoC by communicating over I2C but the start-up
335 voltage of the PMIC may not be changed. What is the maximum voltage
336 that the SoC can run at, for short durations at a greatly-reduced clock rate?
337
338 ## 3.3v tolerance
339
340 Can the GPIO be made at least 3.3v tolerant?
341
342 ## Shakti Flexbus implementation: 32-bit word-aligned access
343
344 The FlexBus implementation may only make accesses onto the back-end
345 AXI bus on 32-bit word-aligned boundaries. How this affects FlexBus
346 memory accesses (read and write) on 8-bit and 16-bit boundaries is
347 yet to be determined. It is particularly relevant e.g. for 24-bit
348 pixel accesses on 8080 (MCU) style LCD controllers that have their
349 own on-board SRAM.
350
351 ## Confirmation of GPIO Power Domains
352
353 The proposed plan is to stick with a fixed 1.8v GPIO level across all
354 GPIO banks. However as outlined in the section above, this has some
355 distinct disadvantages, particularly for e.g. SRAM access over FlexBus:
356 that would often require a 50-way bi-directional level-shifter Bus IC,
357 with over 100 pins!
358
359 ## Proposal / Concept to include "Minion Cores" on a 7-way pinmux
360
361 The lowRISC team first came up with the idea, instead of having a pinmux,
362 to effectively bit-bang pretty much all GPIO using **multiple** 32-bit
363 RISC-V non-SMP integer-only cores each with a tiny instruction and data
364 cache (or, simpler, access to their own independent on-die SRAM).
365 The reasoning behind this is: if it's a dedicated core, it's not really
366 bit-banging any more. The technique is very commonly deployed, typically
367 using an 8051 MCU engine, as it means that a mass-produced peripheral may
368 be firmware-updated in the field for example if a Standard has unanticipated
369 flaws or otherwise requires updating.
370
371 The proposal here is to add four extra pin-mux selectors (an extra bit
372 to what is currently a 2-bit mux per pin), and for each GPIO bank to map to
373 one of four such ultra-small "Minion Cores". For each pin, Pin-mux 4 would
374 select the first Minion core, Pin-mux 5 would select the second and so on.
375 The sizes of the GPIO banks are as follows:
376
377 * Bank A: 16
378 * Bank B: 28
379 * Bank C: 24
380 * Bank D: 24
381 * Bank E: 24
382 * Bank F: 10
383
384 Therefore, it is proposed that each Minion Core have 28 EINT-capable
385 GPIOs, and that all but Bank A and F map their GPIO number (minus the
386 Bank Designation letter) direct to the Minion Core GPIOs. For Banks
387 A and F, the numbering is proposed to be concatenated, so that A0 through
388 A15 maps to a Minion Core's GPIO 0 to 15, and F0 to F10 map to a Minion
389 Core's GPIO 16 to 25 (another alternative idea would be to split Banks
390 A and F to complete B through E, taking them up to 32 I/O per Minion core).
391
392 With careful selection from different banks it should be possible to map
393 unused spare pins to a complete, contiguous, sequential set of any given
394 Minion Core, such that the Minion Core could then bit-bang anything up to
395 a 28-bit-wide Bus. Theoretically this could make up a second RGB/TTL
396 LCD interface with up to 24 bits per pixel.
397
398 For low-speed interfaces, particularly those with an independent clock
399 that the interface takes into account that the clock changes on a different
400 time-cycle from the data, this should work perfectly fine. Whether the
401 idea is practical for higher-speed interfaces or or not will critically
402 depend on whether the Minion Core can do mask-spread atomic
403 reads/writes from a register to/from memory-addressed GPIO or not,
404 and faster I/O streams will almost certainly require some form of
405 serialiser/de-serialiser hardware-assist, and definitely each their
406 own DMA Engine.
407
408 If the idea proves successful it would be extremely nice to have a
409 future version that has direct access to generic LVDS lines, plus
410 S8/10 ECC hardware-assist engines. If the voltage may be set externally
411 and accurate PLL clock timing provided, it may become possible to bit-bang
412 and software-emulate high-speed interfaces such as SATA, HDMI, PCIe and
413 many more.
414
415 # Research (to investigate)
416
417 * <https://level42.ca/projects/ultra64/Documentation/man/pro-man/pro25/index25.1.html>
418 * <http://n64devkit.square7.ch/qa/graphics/ucode.htm>
419 * <https://dac.com/media-center/exhibitor-news/synopsys%E2%80%99-designware-universal-ddr-memory-controller-delivers-30-percent> 110nm DDR3 PHY
420 [[!tag cpus]]
421