(no commit message)
[libreriscv.git] / docs / pinmux.mdwn
1 # Pinmux, IO Pads, and JTAG Boundary scan
2
3 Links:
4
5 * <http://www2.eng.cam.ac.uk/~dmh/4b7/resource/section14.htm>
6 * <https://www10.edacafe.com/book/ASIC/CH02/CH02.7.php>
7 * <https://ftp.libre-soc.org/Pin_Control_Subsystem_Overview.pdf>
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=50>
9 * <https://git.libre-soc.org/?p=c4m-jtag.git;a=tree;hb=HEAD>
10
11 Managing IO on an ASIC is nowhere near as simple as on an FPGA.
12 An FPGA has built-in IO Pads, the wires terminate inside an
13 existing silicon block which has been tested for you.
14 In an ASIC, you are going to have to do everything yourself.
15 In an ASIC, a bi-directional IO Pad requires three wires (in, out,
16 out-enable) to be routed right the way from the ASIC, all
17 the way to the IO PAD, where only then does a wire bond connect
18 it to a single pin.
19
20 [[!img CH02-44.gif]]
21
22 Designing an ASIC, there is no guarantee that the IO pad is
23 working when manufactured. Worse, the peripheral could be
24 faulty. How can you tell what the cause is? There are two
25 possible faults, but only one symptom ("it dunt wurk").
26 This problem is what JTAG Boundary Scan is designed to solve.
27 JTAG can be operated from an external digital clock,
28 at very low frequencies (5 khz is perfectly acceptable)
29 so there is very little risk of clock skew during that testing.
30
31 Additionally, an SoC is designed to be low cost, to use low cost
32 packaging. ASICs are typically only 32 to 128 pins QFP
33 in the Embedded
34 Controller range, and between 300 to 650 FBGA in the Tablet /
35 Smartphone range, absolute maximum of 19 mm on a side.
36 2 to 3 in square 1,000 pin packages common to Intel desktop processors are
37 absolutely out of the question.
38
39 (*With each pin wire bond smashing
40 into the ASIC using purely heat of impact to melt the wire,
41 cracks in the die can occur. The more times
42 the bonding equipment smashes into the die, the higher the
43 chances of irreversible damage, hence why larger pin packaged
44 ASICs are much more expensive: not because of their manufacturing
45 cost but because far more of them fail due to having been
46 literally hit with a hammer many more times*)
47
48 Yet, the expectation from the market is to be able to fit 1,000+
49 pins worth of peripherals into only 200 to 400 worth of actual
50 IO Pads. The solution here: a GPIO Pinmux, described in some
51 detail here <https://ftp.libre-soc.org/Pin_Control_Subsystem_Overview.pdf>
52
53 This page goes over the details and issues involved in creating
54 an ASIC that combines **both** JTAG Boundary Scan **and** GPIO
55 Muxing, down to layout considerations using coriolis2.
56
57 # JTAG Boundary Scan
58
59 JTAG Scanning is a (paywalled) IEEE Standard: 1149.1 which with
60 a little searching can be found online. Its purpose is to allow
61 a well-defined method of testing ASIC IO pads that a Foundry or
62 ASIC test house may apply easily with off-the-shelf equipment.
63 Scan chaining can also connect multiple ASICs together so that
64 the same test can be run on a large batch of ASICs at the same
65 time.
66
67 IO Pads generslly come in four primary different types:
68
69 * Input
70 * Output
71 * Output with Tristate (enable)
72 * Bi-directional Tristate Input/Output with direction enable
73
74 Interestingly these can all be synthesised from one
75 Bi-directional Tristate IO Pad. Other types such as Differential
76 Pair Transmit may also be constructed from an inverter and a pair
77 of IO Pads. Other more advanced features include pull-up
78 and pull-down resistors, Schmidt triggering for interrupts,
79 different drive strengths, and so on, but the basics are
80 that the Pad is either an input, or an output, or both.
81
82 The JTAG Boundary Scan therefore needs to know what type
83 each pad is (In/Out/Bi) and has to "insert" itself in between
84 *all* the Pad's wires, which may be just an input, or just an output,
85 and, if bi-directional, an "output enable" line.
86
87 The "insertion" (or, "Tap") into those wires requires a
88 pair of Muxes for each wire. Under normal operation
89 the Muxes bypass JTAG entirely: the IO Pad is connected,
90 through the two Muxes,
91 directly to the Core (a hardware term for a "peripheral",
92 in Software terminology).
93
94 When JTAG Scan is enabled, then for every pin that is
95 "tapped into", the Muxes flip such that:
96
97 * The IO Pad is connected directly to latches controlled
98 by the JTAG Shift Register
99 * The Core (peripheral) likewise but to *different bits*
100 from those that the Pad is connected to
101
102 In this way, not only can JTAG control or read the IO Pad,
103 but it can also read or control the Core (peripheral).
104 This is its entire purpose: interception to allow for the detection
105 and triaging of faults.
106
107 * Software may be uploaded and run which sets a bit on
108 one of the peripheral outputs (UART Tx for example).
109 If the UART TX IO Pad was faulty, no possibility existd
110 without Boundary Scan to determine if the peripheral
111 was at fault. With the UART TX pin function being
112 redirected to a JTAG Shift Register, the results of the
113 software setting UART Tx may be detected by checking
114 the appropriate Shift Register bit.
115 * Likewise, a voltage may be applied to the UART RX Pad,
116 and the corresponding SR bit checked to see if the
117 pad is working. If the UART Rx peripheral was faulty
118 this would not be possible.
119
120 <img src="https://libre-soc.org/shakti/m_class/JTAG/jtag-block.jpg"
121 width=500 />
122
123 ## Clock synchronisation
124
125 Take for example USB ULPI:
126
127 <img src="https://www.crifan.com/files/pic/serial_story/other_site/p_blog_bb.JPG"
128 width=400 />
129
130 Here there is an external incoming clock, generated by the PHY, to which
131 both Received *and Transmitted* data and control is synchronised. Notice
132 very specifically that it is *not the main processor* generating that clock
133 Signal, but the external peripheral (known as a PHY in Hardware terminology)
134
135 Firstly: note that the Clock will, obviously, also need to be routed
136 through JTAG Boundary Scan, because, after all, it is being received
137 through just another ordinary IO Pad, after all. Secondly: note thst
138 if it didn't, then clock skew would occur for that peripheral because
139 although the Data Wires went through JTAG Boundary Scan MUXes, the
140 clock did not. Clearly this would be a problem.
141
142 However, clocks are very special signals: they have to be distributed
143 evenly to all and any Latches (DFFs) inside the peripheral so that
144 data corruption does not occur because of tiny delays.
145 To avoid that scenario, Clock Domain Crossing (CDC) is used, with
146 Asynchronous FIFOs:
147
148 rx_fifo = stream.AsyncFIFO([("data", 8)], self.rx_depth, w_domain="ulpi", r_domain="sync")
149 tx_fifo = stream.AsyncFIFO([("data", 8)], self.tx_depth, w_domain="sync", r_domain="ulpi")
150 m.submodules.rx_fifo = rx_fifo
151 m.submodules.tx_fifo = tx_fifo
152
153 However the entire FIFO must be covered by two Clock H-Trees: one
154 by the ULPI external clock, and the other the main system clock.
155 The size of the ULPI clock H-Tree, and consequently the size of
156 the PHY on-chip, will result in more Clock Tree Buffers being
157 inserted into the chain, and, correspondingly, matching buffers
158 on the ULPI data input side likewise must be inserted so that
159 the input data timing precisely matches that of its clock.
160
161 The problem is not receiving of data, though: it is transmission
162 on the output ULPI side. With the ULPI Clock Tree having buffers
163 inserted, each buffer creates delay. The ULPI output FIFO has to
164 correspondingly be synchronised not to the original incoming clock
165 but to that clock *after going through H Tree Buffers*. Therefore,
166 there will be a lag on the output data compared to the incoming
167 (external) clock
168
169 # GPIO Muxing
170
171 [[!img gpio_block.png]]
172
173 # Core/Pad Connection + JTAG Mux
174
175 Diagram constructed from the nmigen plat.py file.
176
177 [[!img i_o_io_tristate_jtag.JPG]]
178
179 # Resources, Platforms and Pins
180
181 When creating nmigen HDL as Modules, they typically know nothing about FPGA
182 Boards or ASICs. They especially do not know anything about the
183 Peripheral ICs (UART, I2C, USB, SPI, PCIe) connected to a given FPGA
184 on a given PCB, and they should not have to.
185
186 Through the Resources, Platforms and Pins API, a level of abstraction
187 between peripherals, boards and HDL designs is provided. Peripherals
188 may be given `(nane, number)` tuples, the HDL design may "request"
189 a peripheral, which is described in terms of Resources, managed
190 by a ResourceManager, and a Platform may provide that peripheral.
191 The Platform is given
192 the resposibility to wire up the Pins to the correct FPGA (or ASIC)
193 IO Pads, and it is the HDL design's responsibility to connect up
194 those same named Pins, on the other side, to the implementation
195 of the PHY/Controller, in the HDL.
196
197 Here is a function that defines a UART Resource:
198
199 #!/usr/bin/env python3
200 from nmigen.build.dsl import Resource, Subsignal, Pins
201
202 def UARTResource(*args, rx, tx):
203 io = []
204 io.append(Subsignal("rx", Pins(rx, dir="i", assert_width=1)))
205 io.append(Subsignal("tx", Pins(tx, dir="o", assert_width=1)))
206 return Resource.family(*args, default_name="uart", ios=io)
207
208 It would typically be part of a larger function that defines, for either
209 an FPGA or an ASIC, a full array of IO Connections:
210
211 def create_resources(pinset):
212 resources = []
213 resources.append(UARTResource('uart', 0, tx='tx', rx='rx'))
214 # add clock and reset
215 clk = Resource("clk", 0, Pins("sys_clk", dir="i"))
216 rst = Resource("rst", 0, Pins("sys_rst", dir="i"))
217 resources.append(clk)
218 resources.append(rst)
219 return resources
220
221 For an FPGA, the Pins names are typically the Ball Grid Array
222 Pad or Pin name: A12, or N20. ASICs can do likewise: it is
223 for convenience when referring to schematics, to use the most
224 recogniseable well-known name.
225
226 Next, these Resources need to be handed to a ResourceManager or
227 a Platform (Platform derives from ResourceManager)
228
229 from nmigen.build.plat import TemplatedPlatform
230
231 class ASICPlatform(TemplatedPlatform):
232 def __init__(self, resources):
233 super().__init__()
234 self.add_resources(resources)
235
236 An HDL Module may now be created, which, if given
237 a platform instance during elaboration, may request
238 a UART (caveat below):
239