(no commit message)
[libreriscv.git] / docs / pinmux.mdwn
1 # Pinmux, IO Pads, and JTAG Boundary scan
2
3 Links:
4
5 * <http://www2.eng.cam.ac.uk/~dmh/4b7/resource/section14.htm>
6 * <https://www10.edacafe.com/book/ASIC/CH02/CH02.7.php>
7 * <https://ftp.libre-soc.org/Pin_Control_Subsystem_Overview.pdf>
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=50>
9 * <https://git.libre-soc.org/?p=c4m-jtag.git;a=tree;hb=HEAD>
10
11 Managing IO on an ASIC is nowhere near as simple as on an FPGA.
12 An FPGA has built-in IO Pads, the wires terminate inside an
13 existing silicon block which has been tested for you.
14 In an ASIC, you are going to have to do everything yourself.
15 In an ASIC, a bi-directional IO Pad requires three wires (in, out,
16 out-enable) to be routed right the way from the ASIC, all
17 the way to the IO PAD, where only then does a wire bond connect
18 it to a single pin.
19
20 [[!img CH02-44.gif]]
21
22 Designing an ASIC, there is no guarantee that the IO pad is
23 working when manufactured. Worse, the peripheral could be
24 faulty. How can you tell what the cause is? There are two
25 possible faults, but only one symptom ("it dunt wurk").
26 This problem is what JTAG Boundary Scan is designed to solve.
27 JTAG can be operated from an external digital clock,
28 at very low frequencies (5 khz is perfectly acceptable)
29 so there is very little risk of clock skew during that testing.
30
31 Additionally, an SoC is designed to be low cost, to use low cost
32 packaging. ASICs are typically only 32 to 128 pins QFP
33 in the Embedded
34 Controller range, and between 300 to 650 FBGA in the Tablet /
35 Smartphone range, absolute maximum of 19 mm on a side.
36 2 to 3 in square 1,000 pin packages common to Intel desktop processors are
37 absolutely out of the question.
38
39 (*With each pin wire bond smashing
40 into the ASIC using purely heat of impact to melt the wire,
41 cracks in the die can occur. The more times
42 the bonding equipment smashes into the die, the higher the
43 chances of irreversible damage, hence why larger pin packaged
44 ASICs are much more expensive: not because of their manufacturing
45 cost but because far more of them fail due to having been
46 literally hit with a hammer many more times*)
47
48 Yet, the expectation from the market is to be able to fit 1,000+
49 pins worth of peripherals into only 200 to 400 worth of actual
50 IO Pads. The solution here: a GPIO Pinmux, described in some
51 detail here <https://ftp.libre-soc.org/Pin_Control_Subsystem_Overview.pdf>
52
53 This page goes over the details and issues involved in creating
54 an ASIC that combines **both** JTAG Boundary Scan **and** GPIO
55 Muxing, down to layout considerations using coriolis2.
56
57 # Resources, Platforms and Pins
58
59 When creating nmigen HDL as Modules, they typically know nothing about FPGA
60 Boards or ASICs. They especially do not know anything about the
61 Peripheral ICs (UART, I2C, USB, SPI, PCIe) connected to a given FPGA
62 on a given PCB, and they should not have to.
63
64 Through the Resources, Platforms and Pins API, a level of abstraction
65 between peripherals, boards and HDL designs is provided. Peripherals
66 may be given `(name, number)` tuples, the HDL design may "request"
67 a peripheral, which is described in terms of Resources, managed
68 by a ResourceManager, and a Platform may provide that peripheral.
69 The Platform is given
70 the resposibility to wire up the Pins to the correct FPGA (or ASIC)
71 IO Pads, and it is the HDL design's responsibility to connect up
72 those same named Pins, on the other side, to the implementation
73 of the PHY/Controller, in the HDL.
74
75 Here is a function that defines a UART Resource:
76
77 #!/usr/bin/env python3
78 from nmigen.build.dsl import Resource, Subsignal, Pins
79
80 def UARTResource(*args, rx, tx):
81 io = []
82 io.append(Subsignal("rx", Pins(rx, dir="i", assert_width=1)))
83 io.append(Subsignal("tx", Pins(tx, dir="o", assert_width=1)))
84 return Resource.family(*args, default_name="uart", ios=io)
85
86 Note that the Subsignal is given a convenient name (tx, rx) and that
87 there are Pins associated with it.
88 UARTResource would typically be part of a larger function that defines,
89 for either an FPGA or an ASIC, a full array of IO Connections:
90
91 def create_resources(pinset):
92 resources = []
93 resources.append(UARTResource('uart', 0, tx='A20', rx='A21'))
94 # add clock and reset
95 clk = Resource("clk", 0, Pins("sys_clk", dir="i"))
96 rst = Resource("rst", 0, Pins("sys_rst", dir="i"))
97 resources.append(clk)
98 resources.append(rst)
99 return resources
100
101 For an FPGA, the Pins names are typically the Ball Grid Array
102 Pad or Pin name: A12, or N20. ASICs can do likewise: it is
103 for convenience when referring to schematics, to use the most
104 recogniseable well-known name.
105
106 Next, these Resources need to be handed to a ResourceManager or
107 a Platform (Platform derives from ResourceManager)
108
109 from nmigen.build.plat import TemplatedPlatform
110
111 class ASICPlatform(TemplatedPlatform):
112 def __init__(self, resources):
113 super().__init__()
114 self.add_resources(resources)
115
116 An HDL Module may now be created, which, if given
117 a platform instance during elaboration, may request
118 a UART (caveat below):
119
120 from nmigen import Elaboratable, Module, Signal
121
122 class Blinker(Elaboratable):
123 def elaborate(self, platform):
124 m = Module()
125 # get the UART resource, mess with the output tx
126 uart = platform.request('uart')
127 intermediary = Signal()
128 m.d.comb += uart.tx.eq(~intermediary) # invert, for fun
129 m.d.comb += intermediary.eq(uart.rx) # pass rx to tx
130
131 return m
132
133 The caveat here is that the Resources of the platform actually
134 have to have a UART in order for it to be requestable! Thus:
135
136 resources = create_resources()
137 asic = ASICPlatform(resources)
138 hdl = Blinker()
139 asic.build(hdl)
140
141 Finally the association between HDL, Resources, and ASIC Platform
142 is made:
143
144 * The Resources contain the abstract expression of the
145 type of peripheral, its port names, and the corresponding
146 names of the IO Pads associated with each port.
147 * The HDL which knows nothing about IO Pad names requests
148 a Resource by name
149 * The ASIC Platform, given the list of Resources, takes care
150 of connecting requests for Resources to actual IO Pads.
151
152 This is the simple version. When JTAG Boundary Scan needs
153 to be added, it gets a lot more complex.
154
155 # JTAG Boundary Scan
156
157 JTAG Scanning is a (paywalled) IEEE Standard: 1149.1 which with
158 a little searching can be found online. Its purpose is to allow
159 a well-defined method of testing ASIC IO pads that a Foundry or
160 ASIC test house may apply easily with off-the-shelf equipment.
161 Scan chaining can also connect multiple ASICs together so that
162 the same test can be run on a large batch of ASICs at the same
163 time.
164
165 IO Pads generally come in four primary different types:
166
167 * Input
168 * Output
169 * Output with Tristate (enable)
170 * Bi-directional Tristate Input/Output with direction enable
171
172 Interestingly these can all be synthesised from one
173 Bi-directional Tristate IO Pad. Other types such as Differential
174 Pair Transmit may also be constructed from an inverter and a pair
175 of IO Pads. Other more advanced features include pull-up
176 and pull-down resistors, Schmidt triggering for interrupts,
177 different drive strengths, and so on, but the basics are
178 that the Pad is either an input, or an output, or both.
179
180 The JTAG Boundary Scan therefore needs to know what type
181 each pad is (In/Out/Bi) and has to "insert" itself in between
182 *all* the Pad's wires, which may be just an input, or just an output,
183 and, if bi-directional, an "output enable" line.
184
185 The "insertion" (or, "Tap") into those wires requires a
186 pair of Muxes for each wire. Under normal operation
187 the Muxes bypass JTAG entirely: the IO Pad is connected,
188 through the two Muxes,
189 directly to the Core (a hardware term for a "peripheral",
190 in Software terminology).
191
192 When JTAG Scan is enabled, then for every pin that is
193 "tapped into", the Muxes flip such that:
194
195 * The IO Pad is connected directly to latches controlled
196 by the JTAG Shift Register
197 * The Core (peripheral) likewise but to *different bits*
198 from those that the Pad is connected to
199
200 In this way, not only can JTAG control or read the IO Pad,
201 but it can also read or control the Core (peripheral).
202 This is its entire purpose: interception to allow for the detection
203 and triaging of faults.
204
205 * Software may be uploaded and run which sets a bit on
206 one of the peripheral outputs (UART Tx for example).
207 If the UART TX IO Pad was faulty, no possibility existd
208 without Boundary Scan to determine if the peripheral
209 was at fault. With the UART TX pin function being
210 redirected to a JTAG Shift Register, the results of the
211 software setting UART Tx may be detected by checking
212 the appropriate Shift Register bit.
213 * Likewise, a voltage may be applied to the UART RX Pad,
214 and the corresponding SR bit checked to see if the
215 pad is working. If the UART Rx peripheral was faulty
216 this would not be possible.
217
218 <img src="https://libre-soc.org/shakti/m_class/JTAG/jtag-block.jpg"
219 width=500 />
220
221 ## C4M JTAG TAP
222
223 Staf Verhaegen's Chips4Makers JTAG TAP module includes everything
224 needed to create JTAG Boundary Scan Shift Registers. However,
225 connecting up cores (a hardware term: the equivalent software
226 term is "peripherals") on one side and the pads on the other is
227 especially confusing, but deceptively simple. The actual addition
228 to the Scan Shift Register is this straightforward:
229
230 from c4m.nmigen.jtag.tap import IOType, TAP
231
232 class JTAG(DMITAP, Pins):
233 def __init__(self, pinset, domain, wb_data_wid=32):
234 TAP.__init__(self, ir_width=4)
235 self.u_tx = self.add_io(iotype=IOType.Out, name="tx")
236 self.u_rx = self.add_io(iotype=IOType.In, name="rx")
237
238 This results in the creation of:
239
240 * Two Records, one of type In named rx, the other an output
241 named tx
242 * Each Record contains a pair of sub-Records: one core-side
243 and the other pad-side
244 * Entries in the Boundary Scan Shift Register which if set
245 may control (or read) either the peripheral / core or
246 the IO PAD
247 * A suite of Muxes (as shown in the diagrams above) which
248 allow either direct connection between pad and core
249 (bypassing JTAG) or interception
250
251 It is then your responsibility to:
252
253 * connect up each and every peripheral input and output
254 to the right IO Core Record in your HDL
255 * connect up each and every IO Pad input and output
256 to the right IO Pad in the Platform
257
258 Both of these tasks are painstaking and tedious in the
259 extreme if done manually, and prone to either sheer boredom,
260 transliteration errors, dyslexia triggering or just utter
261 confusion. Despite this, let us proceed, and, augmenting
262 the Blinky example, wire up a JTAG instance:
263
264 class Blinker(Elaboratable):
265 def elaborate(self, platform):
266 m = Module()
267 m.submodules.jtag = jtag = JTAG()
268
269 # get the records from JTAG instance
270 utx, urx = jtag.u_tx, jtag.u_rx
271 # get the UART resource, mess with the output tx
272 p_uart = platform.request('uart')
273
274 # uart core-side from JTAG
275 intermediary = Signal()
276 m.d.comb += utx.core.o.eq(~intermediary) # invert, for fun
277 m.d.comb += intermediary.eq(urx.core.i) # pass rx to tx
278
279 # wire up the IO Pads (in right direction) to Platform
280 m.d.comb += uart.tx.eq(utx.pad.i) # transmit JTAG to pad
281 m.d.comb += utx.pad.o.eq(uart.rx) # pass rx to JTAG
282 return m
283
284 JTAG TAP capability on UART TX and RX has now been inserted into
285 the chain. Using openocd or other program it is possible to
286 send TDI, TMS, TDO and TCK signals according to IEEE 1149.1 in order
287 to intercept both the core and IO Pads, both input and output,
288 and confirm the correct functionality of one even if the other is
289 broken, during ASIC testing.
290
291 Libre-SOC's JTAG TAP Boundary Scan system is a little more sophisticated:
292 it hooks into (replaces) ResourceManager.request(), intercepting the request
293 and recording what was requested. The above manual linkup to JTAG TAP
294 is then taken care of **automatically and transparently**, but to
295 all intents and purposes looking exactly like a Platform even to
296 the extent of taking the exact same list of Resources.
297
298 ## Clock synchronisation
299
300 Take for example USB ULPI:
301
302 <img src="https://www.crifan.com/files/pic/serial_story/other_site/p_blog_bb.JPG"
303 width=400 />
304
305 Here there is an external incoming clock, generated by the PHY, to which
306 both Received *and Transmitted* data and control is synchronised. Notice
307 very specifically that it is *not the main processor* generating that clock
308 Signal, but the external peripheral (known as a PHY in Hardware terminology)
309
310 Firstly: note that the Clock will, obviously, also need to be routed
311 through JTAG Boundary Scan, because, after all, it is being received
312 through just another ordinary IO Pad, after all. Secondly: note thst
313 if it didn't, then clock skew would occur for that peripheral because
314 although the Data Wires went through JTAG Boundary Scan MUXes, the
315 clock did not. Clearly this would be a problem.
316
317 However, clocks are very special signals: they have to be distributed
318 evenly to all and any Latches (DFFs) inside the peripheral so that
319 data corruption does not occur because of tiny delays.
320 To avoid that scenario, Clock Domain Crossing (CDC) is used, with
321 Asynchronous FIFOs:
322
323 rx_fifo = stream.AsyncFIFO([("data", 8)], self.rx_depth, w_domain="ulpi", r_domain="sync")
324 tx_fifo = stream.AsyncFIFO([("data", 8)], self.tx_depth, w_domain="sync", r_domain="ulpi")
325 m.submodules.rx_fifo = rx_fifo
326 m.submodules.tx_fifo = tx_fifo
327
328 However the entire FIFO must be covered by two Clock H-Trees: one
329 by the ULPI external clock, and the other the main system clock.
330 The size of the ULPI clock H-Tree, and consequently the size of
331 the PHY on-chip, will result in more Clock Tree Buffers being
332 inserted into the chain, and, correspondingly, matching buffers
333 on the ULPI data input side likewise must be inserted so that
334 the input data timing precisely matches that of its clock.
335
336 The problem is not receiving of data, though: it is transmission
337 on the output ULPI side. With the ULPI Clock Tree having buffers
338 inserted, each buffer creates delay. The ULPI output FIFO has to
339 correspondingly be synchronised not to the original incoming clock
340 but to that clock *after going through H Tree Buffers*. Therefore,
341 there will be a lag on the output data compared to the incoming
342 (external) clock
343
344 # GPIO Muxing
345
346 [[!img gpio_block.png]]
347
348 # Core/Pad Connection + JTAG Mux
349
350 Diagram constructed from the nmigen plat.py file.
351
352 [[!img i_o_io_tristate_jtag.JPG]]
353