Adding a page on additional info regarding JTAG/pinmux test code
[libreriscv.git] / docs / pinmux.mdwn
1 # Pinmux, IO Pads, and JTAG Boundary scan
2
3 Links:
4
5 * <http://www2.eng.cam.ac.uk/~dmh/4b7/resource/section14.htm>
6 * <https://www10.edacafe.com/book/ASIC/CH02/CH02.7.php>
7 * <https://ftp.libre-soc.org/Pin_Control_Subsystem_Overview.pdf>
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=50>
9 * <https://git.libre-soc.org/?p=c4m-jtag.git;a=tree;hb=HEAD>
10 * Extra info: [[/docs/pinmux/temp_pinmux_info]]
11
12 Managing IO on an ASIC is nowhere near as simple as on an FPGA.
13 An FPGA has built-in IO Pads, the wires terminate inside an
14 existing silicon block which has been tested for you.
15 In an ASIC, you are going to have to do everything yourself.
16 In an ASIC, a bi-directional IO Pad requires three wires (in, out,
17 out-enable) to be routed right the way from the ASIC, all
18 the way to the IO PAD, where only then does a wire bond connect
19 it to a single external pin.
20
21 [[!img CH02-44.gif]]
22
23 Designing an ASIC, there is no guarantee that the IO pad is
24 working when manufactured. Worse, the peripheral could be
25 faulty. How can you tell what the cause is? There are two
26 possible faults, but only one symptom ("it dunt wurk").
27 This problem is what JTAG Boundary Scan is designed to solve.
28 JTAG can be operated from an external digital clock,
29 at very low frequencies (5 khz is perfectly acceptable)
30 so there is very little risk of clock skew during that testing.
31
32 Additionally, an SoC is designed to be low cost, to use low cost
33 packaging. ASICs are typically only 32 to 128 pins QFP
34 in the Embedded
35 Controller range, and between 300 to 650 FBGA in the Tablet /
36 Smartphone range, absolute maximum of 19 mm on a side.
37 2 to 3 in square 1,000 pin packages common to Intel desktop processors are
38 absolutely out of the question.
39
40 (*With each pin wire bond smashing
41 into the ASIC using purely heat of impact to melt the wire,
42 cracks in the die can occur. The more times
43 the bonding equipment smashes into the die, the higher the
44 chances of irreversible damage, hence why larger pin packaged
45 ASICs are much more expensive: not because of their manufacturing
46 cost but because far more of them fail due to having been
47 literally hit with a hammer many more times*)
48
49 Yet, the expectation from the market is to be able to fit 1,000+
50 pins worth of peripherals into only 200 to 400 worth of actual
51 IO Pads. The solution here: a GPIO Pinmux, described in some
52 detail here <https://ftp.libre-soc.org/Pin_Control_Subsystem_Overview.pdf>
53
54 This page goes over the details and issues involved in creating
55 an ASIC that combines **both** JTAG Boundary Scan **and** GPIO
56 Muxing, down to layout considerations using coriolis2.
57
58 # Resources, Platforms and Pins
59
60 When creating nmigen HDL as Modules, they typically know nothing about FPGA
61 Boards or ASICs. They especially do not know anything about the
62 Peripheral ICs (UART, I2C, USB, SPI, PCIe) connected to a given FPGA
63 on a given PCB, and they should not have to.
64
65 Through the Resources, Platforms and Pins API, a level of abstraction
66 between peripherals, boards and HDL designs is provided. Peripherals
67 may be given `(name, number)` tuples, the HDL design may "request"
68 a peripheral, which is described in terms of Resources, managed
69 by a ResourceManager, and a Platform may provide that peripheral.
70 The Platform is given
71 the resposibility to wire up the Pins to the correct FPGA (or ASIC)
72 IO Pads, and it is the HDL design's responsibility to connect up
73 those same named Pins, on the other side, to the implementation
74 of the PHY/Controller, in the HDL.
75
76 Here is a function that defines a UART Resource:
77
78 #!/usr/bin/env python3
79 from nmigen.build.dsl import Resource, Subsignal, Pins
80
81 def UARTResource(*args, rx, tx):
82 io = []
83 io.append(Subsignal("rx", Pins(rx, dir="i", assert_width=1)))
84 io.append(Subsignal("tx", Pins(tx, dir="o", assert_width=1)))
85 return Resource.family(*args, default_name="uart", ios=io)
86
87 Note that the Subsignal is given a convenient name (tx, rx) and that
88 there are Pins associated with it.
89 UARTResource would typically be part of a larger function that defines,
90 for either an FPGA or an ASIC, a full array of IO Connections:
91
92 def create_resources(pinset):
93 resources = []
94 resources.append(UARTResource('uart', 0, tx='A20', rx='A21'))
95 # add clock and reset
96 clk = Resource("clk", 0, Pins("sys_clk", dir="i"))
97 rst = Resource("rst", 0, Pins("sys_rst", dir="i"))
98 resources.append(clk)
99 resources.append(rst)
100 return resources
101
102 For an FPGA, the Pins names are typically the Ball Grid Array
103 Pad or Pin name: A12, or N20. ASICs can do likewise: it is
104 for convenience when referring to schematics, to use the most
105 recogniseable well-known name.
106
107 Next, these Resources need to be handed to a ResourceManager or
108 a Platform (Platform derives from ResourceManager)
109
110 from nmigen.build.plat import TemplatedPlatform
111
112 class ASICPlatform(TemplatedPlatform):
113 def __init__(self, resources):
114 super().__init__()
115 self.add_resources(resources)
116
117 An HDL Module may now be created, which, if given
118 a platform instance during elaboration, may request
119 a UART (caveat below):
120
121 from nmigen import Elaboratable, Module, Signal
122
123 class Blinker(Elaboratable):
124 def elaborate(self, platform):
125 m = Module()
126 # get the UART resource, mess with the output tx
127 uart = platform.request('uart')
128 intermediary = Signal()
129 m.d.comb += uart.tx.eq(~intermediary) # invert, for fun
130 m.d.comb += intermediary.eq(uart.rx) # pass rx to tx
131
132 return m
133
134 The caveat here is that the Resources of the platform actually
135 have to have a UART in order for it to be requestable! Thus:
136
137 resources = create_resources() # contains resource named "uart"
138 asic = ASICPlatform(resources)
139 hdl = Blinker()
140 asic.build(hdl)
141
142 Finally the association between HDL, Resources, and ASIC Platform
143 is made:
144
145 * The Resources contain the abstract expression of the
146 type of peripheral, its port names, and the corresponding
147 names of the IO Pads associated with each port.
148 * The HDL which knows nothing about IO Pad names requests
149 a Resource by name
150 * The ASIC Platform, given the list of Resources, takes care
151 of connecting requests for Resources to actual IO Pads.
152
153 This is the simple version. When JTAG Boundary Scan needs
154 to be added, it gets a lot more complex.
155
156 # JTAG Boundary Scan
157
158 JTAG Scanning is a (paywalled) IEEE Standard: 1149.1 which with
159 a little searching can be found online. Its purpose is to allow
160 a well-defined method of testing ASIC IO pads that a Foundry or
161 ASIC test house may apply easily with off-the-shelf equipment.
162 Scan chaining can also connect multiple ASICs together so that
163 the same test can be run on a large batch of ASICs at the same
164 time.
165
166 IO Pads generally come in four primary different types:
167
168 * Input
169 * Output
170 * Output with Tristate (enable)
171 * Bi-directional Tristate Input/Output with direction enable
172
173 Interestingly these can all be synthesised from one
174 Bi-directional Tristate IO Pad. Other types such as Differential
175 Pair Transmit may also be constructed from an inverter and a pair
176 of IO Pads. Other more advanced features include pull-up
177 and pull-down resistors, Schmidt triggering for interrupts,
178 different drive strengths, and so on, but the basics are
179 that the Pad is either an input, or an output, or both.
180
181 The JTAG Boundary Scan therefore needs to know what type
182 each pad is (In/Out/Bi) and has to "insert" itself in between
183 *all* the Pad's wires, which may be just an input, or just an output,
184 and, if bi-directional, an "output enable" line.
185
186 The "insertion" (or, "Tap") into those wires requires a
187 pair of Muxes for each wire. Under normal operation
188 the Muxes bypass JTAG entirely: the IO Pad is connected,
189 through the two Muxes,
190 directly to the Core (a hardware term for a "peripheral",
191 in Software terminology).
192
193 When JTAG Scan is enabled, then for every pin that is
194 "tapped into", the Muxes flip such that:
195
196 * The IO Pad is connected directly to latches controlled
197 by the JTAG Shift Register
198 * The Core (peripheral) likewise but to *different bits*
199 from those that the Pad is connected to
200
201 In this way, not only can JTAG control or read the IO Pad,
202 but it can also read or control the Core (peripheral).
203 This is its entire purpose: interception to allow for the detection
204 and triaging of faults.
205
206 * Software may be uploaded and run which sets a bit on
207 one of the peripheral outputs (UART Tx for example).
208 If the UART TX IO Pad was faulty, no possibility existd
209 without Boundary Scan to determine if the peripheral
210 was at fault. With the UART TX pin function being
211 redirected to a JTAG Shift Register, the results of the
212 software setting UART Tx may be detected by checking
213 the appropriate Shift Register bit.
214 * Likewise, a voltage may be applied to the UART RX Pad,
215 and the corresponding SR bit checked to see if the
216 pad is working. If the UART Rx peripheral was faulty
217 this would not be possible.
218
219 <img src="https://libre-soc.org/shakti/m_class/JTAG/jtag-block.jpg"
220 width=500 />
221
222 ## C4M JTAG TAP
223
224 Staf Verhaegen's Chips4Makers JTAG TAP module includes everything
225 needed to create JTAG Boundary Scan Shift Registers,
226 as well as the IEEE 1149.1 Finite State Machine to access
227 them through TMS, TDO, TDI and TCK Signalling. However,
228 connecting up cores (a hardware term: the equivalent software
229 term is "peripherals") on one side and the pads on the other is
230 especially confusing, but deceptively simple. The actual addition
231 to the Scan Shift Register is this straightforward:
232
233 from c4m.nmigen.jtag.tap import IOType, TAP
234
235 class JTAG(TAP):
236 def __init__(self):
237 TAP.__init__(self, ir_width=4)
238 self.u_tx = self.add_io(iotype=IOType.Out, name="tx")
239 self.u_rx = self.add_io(iotype=IOType.In, name="rx")
240
241 This results in the creation of:
242
243 * Two Records, one of type In named rx, the other an output
244 named tx
245 * Each Record contains a pair of sub-Records: one core-side
246 and the other pad-side
247 * Entries in the Boundary Scan Shift Register which if set
248 may control (or read) either the peripheral / core or
249 the IO PAD
250 * A suite of Muxes (as shown in the diagrams above) which
251 allow either direct connection between pad and core
252 (bypassing JTAG) or interception
253
254 During Interception Mode (Scanning) pad and core are connected
255 to the Shift Register. During "Production" Mode, pad and
256 core are wired directly to each other (on a per-pin basis,
257 for every pin. Clearly this is a lot of work).
258
259 It is then your responsibility to:
260
261 * connect up each and every peripheral input and output
262 to the right IO Core Record in your HDL
263 * connect up each and every IO Pad input and output
264 to the right IO Pad in the Platform. **This
265 does not happen automatically and is not the
266 responsibility of the TAP Interface*
267
268 The TAP interface connects the **other** side of the pads
269 and cores Records: **to the Muxes**. You **have** to
270 connect **your** side of both core and pads Records in
271 order for the Scan to be fully functional.
272
273 Both of these tasks are painstaking and tedious in the
274 extreme if done manually, and prone to either sheer boredom,
275 transliteration errors, dyslexia triggering or just utter
276 confusion. Despite this, let us proceed, and, augmenting
277 the Blinky example, wire up a JTAG instance:
278
279 class Blinker(Elaboratable):
280 def elaborate(self, platform):
281 m = Module()
282 m.submodules.jtag = jtag = JTAG()
283
284 # get the records from JTAG instance
285 utx, urx = jtag.u_tx, jtag.u_rx
286 # get the UART resource, mess with the output tx
287 p_uart = platform.request('uart')
288
289 # uart core-side from JTAG
290 intermediary = Signal()
291 m.d.comb += utx.core.o.eq(~intermediary) # invert, for fun
292 m.d.comb += intermediary.eq(urx.core.i) # pass rx to tx
293
294 # wire up the IO Pads (in right direction) to Platform
295 m.d.comb += uart.tx.eq(utx.pad.i) # transmit JTAG to pad
296 m.d.comb += utx.pad.o.eq(uart.rx) # pass rx to JTAG
297 return m
298
299 Compared to the non-scan-capable version, which connected UART
300 Core Tx and Rx directly to the Platform Resource (and the Platform
301 took care of wiring to IO Pads):
302
303 * Core HDL is instead wired to the core-side of JTAG Scan
304 * JTAG Pad side is instead wired to the Platform
305 * (the Platform still takes care of wiring to actual IO Pads)
306
307 JTAG TAP capability on UART TX and RX has now been inserted into
308 the chain. Using openocd or other program it is possible to
309 send TDI, TMS, TDO and TCK signals according to IEEE 1149.1 in order
310 to intercept both the core and IO Pads, both input and output,
311 and confirm the correct functionality of one even if the other is
312 broken, during ASIC testing.
313
314 ## Libre-SOC Automatic Boundary Scan
315
316 Libre-SOC's JTAG TAP Boundary Scan system is a little more sophisticated:
317 it hooks into (replaces) ResourceManager.request(), intercepting the request
318 and recording what was requested. The above manual linkup to JTAG TAP
319 is then taken care of **automatically and transparently**, but to
320 all intents and purposes looking exactly like a Platform even to
321 the extent of taking the exact same list of Resources.
322
323 class Blinker(Elaboratable):
324 def __init__(self, resources):
325 self.jtag = JTAG(resources)
326
327 def elaborate(self, platform):
328 m = Module()
329 m.submodules.jtag = jtag = self.jtag
330
331 # get the UART resource, mess with the output tx
332 uart = jtag.request('uart')
333 intermediary = Signal()
334 m.d.comb += uart.tx.eq(~intermediary) # invert, for fun
335 m.d.comb += intermediary.eq(uart.rx) # pass rx to tx
336
337 return jtag.boundary_elaborate(m, platform)
338
339 Connecting up and building the ASIC is as simple as a non-JTAG,
340 non-scanning-aware Platform:
341
342 resources = create_resources()
343 asic = ASICPlatform(resources)
344 hdl = Blinker(resources)
345 asic.build(hdl)
346
347 The differences:
348
349 * The list of resources was also passed to the HDL Module
350 such that JTAG may create a complete identical list
351 of both core and pad matching Pins
352 * Resources were requested from the JTAG instance,
353 not the Platform
354 * A "magic function" (JTAG.boundary_elaborate) is called
355 which wires up all of the seamlessly intercepted
356 Platform resources to the JTAG core/pads Resources,
357 where the HDL connected to the core side, exactly
358 as if this was a non-JTAG-Scan-aware Platform.
359 * ASICPlatform still takes care of connecting to actual
360 IO Pads, except that the Platform.resource requests were
361 triggered "behind the scenes". For that to work it
362 is absolutely essential that the JTAG instance and the
363 ASICPlatform be given the exact same list of Resources.
364
365
366 ## Clock synchronisation
367
368 Take for example USB ULPI:
369
370 <img src="https://www.crifan.com/files/pic/serial_story/other_site/p_blog_bb.JPG"
371 width=400 />
372
373 Here there is an external incoming clock, generated by the PHY, to which
374 both Received *and Transmitted* data and control is synchronised. Notice
375 very specifically that it is *not the main processor* generating that clock
376 Signal, but the external peripheral (known as a PHY in Hardware terminology)
377
378 Firstly: note that the Clock will, obviously, also need to be routed
379 through JTAG Boundary Scan, because, after all, it is being received
380 through just another ordinary IO Pad, after all. Secondly: note thst
381 if it didn't, then clock skew would occur for that peripheral because
382 although the Data Wires went through JTAG Boundary Scan MUXes, the
383 clock did not. Clearly this would be a problem.
384
385 However, clocks are very special signals: they have to be distributed
386 evenly to all and any Latches (DFFs) inside the peripheral so that
387 data corruption does not occur because of tiny delays.
388 To avoid that scenario, Clock Domain Crossing (CDC) is used, with
389 Asynchronous FIFOs:
390
391 rx_fifo = stream.AsyncFIFO([("data", 8)], self.rx_depth, w_domain="ulpi", r_domain="sync")
392 tx_fifo = stream.AsyncFIFO([("data", 8)], self.tx_depth, w_domain="sync", r_domain="ulpi")
393 m.submodules.rx_fifo = rx_fifo
394 m.submodules.tx_fifo = tx_fifo
395
396 However the entire FIFO must be covered by two Clock H-Trees: one
397 by the ULPI external clock, and the other the main system clock.
398 The size of the ULPI clock H-Tree, and consequently the size of
399 the PHY on-chip, will result in more Clock Tree Buffers being
400 inserted into the chain, and, correspondingly, matching buffers
401 on the ULPI data input side likewise must be inserted so that
402 the input data timing precisely matches that of its clock.
403
404 The problem is not receiving of data, though: it is transmission
405 on the output ULPI side. With the ULPI Clock Tree having buffers
406 inserted, each buffer creates delay. The ULPI output FIFO has to
407 correspondingly be synchronised not to the original incoming clock
408 but to that clock *after going through H Tree Buffers*. Therefore,
409 there will be a lag on the output data compared to the incoming
410 (external) clock
411
412 # GPIO Muxing
413
414 [[!img gpio_block.png]]
415
416 [[!img io_mux_bank_planning.JPG]]
417
418 # Core/Pad Connection + JTAG Mux
419
420 Diagram constructed from the nmigen plat.py file.
421
422 [[!img i_o_io_tristate_jtag.JPG]]
423