Merge pull request #829 from abdelrahmanhosny/master
[yosys.git] / passes / techmap / flowmap.cc
1 /*
2 * yosys -- Yosys Open SYnthesis Suite
3 *
4 * Copyright (C) 2018 whitequark <whitequark@whitequark.org>
5 *
6 * Permission to use, copy, modify, and/or distribute this software for any
7 * purpose with or without fee is hereby granted, provided that the above
8 * copyright notice and this permission notice appear in all copies.
9 *
10 * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11 * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12 * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13 * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14 * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15 * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16 * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17 *
18 */
19
20 // [[CITE]] FlowMap algorithm
21 // Jason Cong; Yuzheng Ding, "An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs,"
22 // Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, Vol. 13, pp. 1-12, Jan. 1994.
23 // doi: 10.1109/43.273754
24
25 // [[CITE]] FlowMap-r algorithm
26 // Jason Cong; Yuzheng Ding, "On Area/Depth Tradeoff in LUT-Based FPGA Technology Mapping,"
27 // Very Large Scale Integration Systems, IEEE Transactions on, Vol. 2, June 1994.
28 // doi: 10.1109/92.28574
29
30 // Required reading material:
31 //
32 // Min-cut max-flow theorem:
33 // https://www.coursera.org/lecture/algorithms-part2/maxflow-mincut-theorem-beb9G
34 // FlowMap paper:
35 // http://cadlab.cs.ucla.edu/~cong/papers/iccad92.pdf (short version)
36 // https://limsk.ece.gatech.edu/book/papers/flowmap.pdf (long version)
37 // FlowMap-r paper:
38 // http://cadlab.cs.ucla.edu/~cong/papers/dac93.pdf (short version)
39 // https://sci-hub.tw/10.1109/92.285741 (long version)
40
41 // Notes on correspondence between paper and implementation:
42 //
43 // 1. In the FlowMap paper, the nodes are logic elements (analogous to Yosys cells) and edges are wires. However, in our implementation,
44 // we use an inverted approach: the nodes are Yosys wire bits, and the edges are derived from (but aren't represented by) Yosys cells.
45 // This may seem counterintuitive. Three observations may help understanding this. First, for a cell with a 1-bit Y output that is
46 // the sole driver of its output net (which is the typical case), these representations are equivalent, because there is an exact
47 // correspondence between cells and output wires. Second, in the paper, primary inputs (analogous to Yosys cell or module ports) are
48 // nodes, and in Yosys, inputs are wires; our approach allows a direct mapping from both primary inputs and 1-output logic elements to
49 // flow graph nodes. Third, Yosys cells may have multiple outputs or multi-bit outputs, and by using Yosys wire bits as flow graph nodes,
50 // such cells are supported without any additional effort; any Yosys cell with n output wire bits ends up being split into n flow graph
51 // nodes.
52 //
53 // 2. The FlowMap paper introduces three networks: Nt, Nt', and Nt''. The network Nt is directly represented by a subgraph of RTLIL graph,
54 // which is parsed into an equivalent but easier to traverse representation in FlowmapWorker. The network Nt' is built explicitly
55 // from a subgraph of Nt, and uses a similar representation in FlowGraph. The network Nt'' is implicit in FlowGraph, which is possible
56 // because of the following observation: each Nt' node corresponds to an Nt'' edge of capacity 1, and each Nt' edge corresponds to
57 // an Nt'' edge of capacity ∞. Therefore, we only need to explicitly record flow for Nt' edges and through Nt' nodes.
58 //
59 // 3. The FlowMap paper ambiguously states: "Moreover, we can find such a cut (X′′, X̅′′) by performing a depth first search starting at
60 // the source s, and including in X′′ all the nodes which are reachable from s." This actually refers to a specific kind of search,
61 // min-cut computation. Min-cut computation involves computing the set of nodes reachable from s by an undirected path with no full
62 // (i.e. zero capacity) forward edges or empty (i.e. no flow) backward edges. In addition, the depth first search is required to compute
63 // a max-volume max-flow min-cut specifically, because a max-flow min-cut is not, in general, unique.
64
65 // Notes on implementation:
66 //
67 // 1. To compute depth optimal packing, an intermediate representation is used, where each cell with n output bits is split into n graph
68 // nodes. Each such graph node is represented directly with the wire bit (RTLIL::SigBit instance) that corresponds to the output bit
69 // it is created from. Fan-in and fan-out are represented explicitly by edge lists derived from the RTLIL graph. This IR never changes
70 // after it has been computed.
71 //
72 // In terms of data, this IR is comprised of `inputs`, `outputs`, `nodes`, `edges_fw` and `edges_bw` fields.
73 //
74 // We call this IR "gate IR".
75 //
76 // 2. To compute area optimal packing, another intermediate representation is used, which consists of some K-feasible cone for every node
77 // that exists in the gate IR. Immediately after depth optimal packing with FlowMap, each such cone occupies the lowest possible depth,
78 // but this is not true in general, and transformations of this IR may change the cones, although each transformation has to keep each
79 // cone K-feasible. In this IR, LUT fan-in and fan-out are represented explicitly by edge lists; if a K-feasible cone chosen for node A
80 // includes nodes B and C, there are edges between all predecessors of A, B and C in the gate IR and node A in this IR. Moreover, in
81 // this IR, cones may be *realized* or *derealized*. Only realized cones will end up mapped to actual LUTs in the output of this pass.
82 //
83 // Intuitively, this IR contains (some, ideally but not necessarily optimal) LUT representation for each input cell. By starting at outputs
84 // and traversing the graph of this IR backwards, each K-feasible cone is converted to an actual LUT at the end of the pass. This is
85 // the same as iterating through each realized LUT.
86 //
87 // The following are the invariants of this IR:
88 // a) Each gate IR node corresponds to a K-feasible cut.
89 // b) Each realized LUT is reachable through backward edges from some output.
90 // c) The LUT fan-in is exactly the fan-in of its constituent gates minus the fan-out of its constituent gates.
91 // The invariants are kept even for derealized LUTs, since the whole point of this IR is ease of packing, unpacking, and repacking LUTs.
92 //
93 // In terms of data, this IR is comprised of `lut_nodes` (the set of all realized LUTs), `lut_gates` (the map from a LUT to its
94 // constituent gates), `lut_edges_fw` and `lut_edges_bw` fields. The `inputs` and `outputs` fields are shared with the gate IR.
95 //
96 // We call this IR "LUT IR".
97
98 #include "kernel/yosys.h"
99 #include "kernel/sigtools.h"
100 #include "kernel/modtools.h"
101 #include "kernel/consteval.h"
102
103 USING_YOSYS_NAMESPACE
104 PRIVATE_NAMESPACE_BEGIN
105
106 struct GraphStyle
107 {
108 string label;
109 string color, fillcolor;
110
111 GraphStyle(string label = "", string color = "black", string fillcolor = "") :
112 label(label), color(color), fillcolor(fillcolor) {}
113 };
114
115 static string dot_escape(string value)
116 {
117 std::string escaped;
118 for (char c : value) {
119 if (c == '\n')
120 {
121 escaped += "\\n";
122 continue;
123 }
124 if (c == '\\' || c == '"')
125 escaped += "\\";
126 escaped += c;
127 }
128 return escaped;
129 }
130
131 static void dump_dot_graph(string filename,
132 pool<RTLIL::SigBit> nodes, dict<RTLIL::SigBit, pool<RTLIL::SigBit>> edges,
133 pool<RTLIL::SigBit> inputs, pool<RTLIL::SigBit> outputs,
134 std::function<GraphStyle(RTLIL::SigBit)> node_style =
135 [](RTLIL::SigBit) { return GraphStyle{}; },
136 std::function<GraphStyle(RTLIL::SigBit, RTLIL::SigBit)> edge_style =
137 [](RTLIL::SigBit, RTLIL::SigBit) { return GraphStyle{}; },
138 string name = "")
139 {
140 FILE *f = fopen(filename.c_str(), "w");
141 fprintf(f, "digraph \"%s\" {\n", name.c_str());
142 fprintf(f, " rankdir=\"TB\";\n");
143
144 dict<RTLIL::SigBit, int> ids;
145 for (auto node : nodes)
146 {
147 ids[node] = ids.size();
148
149 string shape = "ellipse";
150 if (inputs[node])
151 shape = "box";
152 if (outputs[node])
153 shape = "octagon";
154 auto prop = node_style(node);
155 string style = "";
156 if (!prop.fillcolor.empty())
157 style = "filled";
158 fprintf(f, " n%d [ shape=%s, fontname=\"Monospace\", label=\"%s\", color=\"%s\", fillcolor=\"%s\", style=\"%s\" ];\n",
159 ids[node], shape.c_str(), dot_escape(prop.label.c_str()).c_str(), prop.color.c_str(), prop.fillcolor.c_str(), style.c_str());
160 }
161
162 fprintf(f, " { rank=\"source\"; ");
163 for (auto input : inputs)
164 if (nodes[input])
165 fprintf(f, "n%d; ", ids[input]);
166 fprintf(f, "}\n");
167
168 fprintf(f, " { rank=\"sink\"; ");
169 for (auto output : outputs)
170 if (nodes[output])
171 fprintf(f, "n%d; ", ids[output]);
172 fprintf(f, "}\n");
173
174 for (auto edge : edges)
175 {
176 auto source = edge.first;
177 for (auto sink : edge.second) {
178 if (nodes[source] && nodes[sink])
179 {
180 auto prop = edge_style(source, sink);
181 fprintf(f, " n%d -> n%d [ label=\"%s\", color=\"%s\", fillcolor=\"%s\" ];\n",
182 ids[source], ids[sink], dot_escape(prop.label.c_str()).c_str(), prop.color.c_str(), prop.fillcolor.c_str());
183 }
184 }
185 }
186
187 fprintf(f, "}\n");
188 fclose(f);
189 }
190
191 struct FlowGraph
192 {
193 const RTLIL::SigBit source;
194 RTLIL::SigBit sink;
195 pool<RTLIL::SigBit> nodes = {source};
196 dict<RTLIL::SigBit, pool<RTLIL::SigBit>> edges_fw, edges_bw;
197
198 const int MAX_NODE_FLOW = 1;
199 dict<RTLIL::SigBit, int> node_flow;
200 dict<pair<RTLIL::SigBit, RTLIL::SigBit>, int> edge_flow;
201
202 dict<RTLIL::SigBit, pool<RTLIL::SigBit>> collapsed;
203
204 void dump_dot_graph(string filename)
205 {
206 auto node_style = [&](RTLIL::SigBit node) {
207 string label = (node == source) ? "(source)" : log_signal(node);
208 for (auto collapsed_node : collapsed[node])
209 label += stringf(" %s", log_signal(collapsed_node));
210 int flow = node_flow[node];
211 if (node != source && node != sink)
212 label += stringf("\n%d/%d", flow, MAX_NODE_FLOW);
213 else
214 label += stringf("\n%d/∞", flow);
215 return GraphStyle{label, flow < MAX_NODE_FLOW ? "green" : "black"};
216 };
217 auto edge_style = [&](RTLIL::SigBit source, RTLIL::SigBit sink) {
218 int flow = edge_flow[{source, sink}];
219 return GraphStyle{stringf("%d/∞", flow), flow > 0 ? "blue" : "black"};
220 };
221 ::dump_dot_graph(filename, nodes, edges_fw, {source}, {sink}, node_style, edge_style);
222 }
223
224 // Here, we are working on the Nt'' network, but our representation is the Nt' network.
225 // The difference between these is that where in Nt' we have a subgraph:
226 //
227 // v1 -> v2 -> v3
228 //
229 // in Nt'' we have a corresponding subgraph:
230 //
231 // v'1b -∞-> v'2t -f-> v'2b -∞-> v'3t
232 //
233 // To address this, we split each node v into two nodes, v't and v'b. This representation is virtual,
234 // in the sense that nodes v't and v'b are overlaid on top of the original node v, and only exist
235 // in paths and worklists.
236
237 struct NodePrime
238 {
239 RTLIL::SigBit node;
240 bool is_bottom;
241
242 NodePrime(RTLIL::SigBit node, bool is_bottom) :
243 node(node), is_bottom(is_bottom) {}
244
245 bool operator==(const NodePrime &other) const
246 {
247 return node == other.node && is_bottom == other.is_bottom;
248 }
249 bool operator!=(const NodePrime &other) const
250 {
251 return !(*this == other);
252 }
253 unsigned int hash() const
254 {
255 return hash_ops<pair<RTLIL::SigBit, int>>::hash({node, is_bottom});
256 }
257
258 static NodePrime top(RTLIL::SigBit node)
259 {
260 return NodePrime(node, /*is_bottom=*/false);
261 }
262
263 static NodePrime bottom(RTLIL::SigBit node)
264 {
265 return NodePrime(node, /*is_bottom=*/true);
266 }
267
268 NodePrime as_top() const
269 {
270 log_assert(is_bottom);
271 return top(node);
272 }
273
274 NodePrime as_bottom() const
275 {
276 log_assert(!is_bottom);
277 return bottom(node);
278 }
279 };
280
281 bool find_augmenting_path(bool commit)
282 {
283 NodePrime source_prime = {source, true};
284 NodePrime sink_prime = {sink, false};
285 vector<NodePrime> path = {source_prime};
286 pool<NodePrime> visited = {};
287 bool found;
288 do {
289 found = false;
290
291 auto node_prime = path.back();
292 visited.insert(node_prime);
293
294 if (!node_prime.is_bottom) // vt
295 {
296 if (!visited[node_prime.as_bottom()] && node_flow[node_prime.node] < MAX_NODE_FLOW)
297 {
298 path.push_back(node_prime.as_bottom());
299 found = true;
300 }
301 else
302 {
303 for (auto node_pred : edges_bw[node_prime.node])
304 {
305 if (!visited[NodePrime::bottom(node_pred)] && edge_flow[{node_pred, node_prime.node}] > 0)
306 {
307 path.push_back(NodePrime::bottom(node_pred));
308 found = true;
309 break;
310 }
311 }
312 }
313 }
314 else // vb
315 {
316 if (!visited[node_prime.as_top()] && node_flow[node_prime.node] > 0)
317 {
318 path.push_back(node_prime.as_top());
319 found = true;
320 }
321 else
322 {
323 for (auto node_succ : edges_fw[node_prime.node])
324 {
325 if (!visited[NodePrime::top(node_succ)] /* && edge_flow[...] < ∞ */)
326 {
327 path.push_back(NodePrime::top(node_succ));
328 found = true;
329 break;
330 }
331 }
332 }
333 }
334
335 if (!found && path.size() > 1)
336 {
337 path.pop_back();
338 found = true;
339 }
340 } while(path.back() != sink_prime && found);
341
342 if (commit && path.back() == sink_prime)
343 {
344 auto prev_prime = path.front();
345 for (auto node_prime : path)
346 {
347 if (node_prime == source_prime)
348 continue;
349
350 log_assert(prev_prime.is_bottom ^ node_prime.is_bottom);
351 if (prev_prime.node == node_prime.node)
352 {
353 auto node = node_prime.node;
354 if (!prev_prime.is_bottom && node_prime.is_bottom)
355 {
356 log_assert(node_flow[node] == 0);
357 node_flow[node]++;
358 }
359 else
360 {
361 log_assert(node_flow[node] != 0);
362 node_flow[node]--;
363 }
364 }
365 else
366 {
367 if (prev_prime.is_bottom && !node_prime.is_bottom)
368 {
369 log_assert(true /* edge_flow[...] < ∞ */);
370 edge_flow[{prev_prime.node, node_prime.node}]++;
371 }
372 else
373 {
374 log_assert((edge_flow[{node_prime.node, prev_prime.node}] > 0));
375 edge_flow[{node_prime.node, prev_prime.node}]--;
376 }
377 }
378 prev_prime = node_prime;
379 }
380
381 node_flow[source]++;
382 node_flow[sink]++;
383 }
384 return path.back() == sink_prime;
385 }
386
387 int maximum_flow(int order)
388 {
389 int flow = 0;
390 while (flow < order && find_augmenting_path(/*commit=*/true))
391 flow++;
392 return flow + find_augmenting_path(/*commit=*/false);
393 }
394
395 pair<pool<RTLIL::SigBit>, pool<RTLIL::SigBit>> edge_cut()
396 {
397 pool<RTLIL::SigBit> x, xi;
398
399 NodePrime source_prime = {source, true};
400 pool<NodePrime> visited;
401 vector<NodePrime> worklist = {source_prime};
402 while (!worklist.empty())
403 {
404 auto node_prime = worklist.back();
405 worklist.pop_back();
406 if (visited[node_prime])
407 continue;
408 visited.insert(node_prime);
409
410 if (!node_prime.is_bottom)
411 x.insert(node_prime.node);
412
413 // Mincut is constructed by traversing a graph in an undirected way along forward edges that aren't full, or backward edges
414 // that aren't empty.
415 if (!node_prime.is_bottom) // top
416 {
417 if (node_flow[node_prime.node] < MAX_NODE_FLOW)
418 worklist.push_back(node_prime.as_bottom());
419 for (auto node_pred : edges_bw[node_prime.node])
420 if (edge_flow[{node_pred, node_prime.node}] > 0)
421 worklist.push_back(NodePrime::bottom(node_pred));
422 }
423 else // bottom
424 {
425 if (node_flow[node_prime.node] > 0)
426 worklist.push_back(node_prime.as_top());
427 for (auto node_succ : edges_fw[node_prime.node])
428 if (true /* edge_flow[...] < ∞ */)
429 worklist.push_back(NodePrime::top(node_succ));
430 }
431 }
432
433 for (auto node : nodes)
434 if (!x[node])
435 xi.insert(node);
436
437 for (auto collapsed_node : collapsed[sink])
438 xi.insert(collapsed_node);
439
440 log_assert(!x[sink] && xi[sink]);
441 return {x, xi};
442 }
443 };
444
445 struct FlowmapWorker
446 {
447 int order;
448 int r_alpha, r_beta, r_gamma;
449 bool debug, debug_relax;
450
451 RTLIL::Module *module;
452 SigMap sigmap;
453 ModIndex index;
454
455 dict<RTLIL::SigBit, ModIndex::PortInfo> node_origins;
456
457 // Gate IR
458 pool<RTLIL::SigBit> nodes, inputs, outputs;
459 dict<RTLIL::SigBit, pool<RTLIL::SigBit>> edges_fw, edges_bw;
460 dict<RTLIL::SigBit, int> labels;
461
462 // LUT IR
463 pool<RTLIL::SigBit> lut_nodes;
464 dict<RTLIL::SigBit, pool<RTLIL::SigBit>> lut_gates;
465 dict<RTLIL::SigBit, pool<RTLIL::SigBit>> lut_edges_fw, lut_edges_bw;
466 dict<RTLIL::SigBit, int> lut_depths, lut_altitudes, lut_slacks;
467
468 int gate_count = 0, lut_count = 0, packed_count = 0;
469 int gate_area = 0, lut_area = 0;
470
471 enum class GraphMode {
472 Label,
473 Cut,
474 Slack,
475 };
476
477 void dump_dot_graph(string filename, GraphMode mode,
478 pool<RTLIL::SigBit> subgraph_nodes = {}, dict<RTLIL::SigBit, pool<RTLIL::SigBit>> subgraph_edges = {},
479 dict<RTLIL::SigBit, pool<RTLIL::SigBit>> collapsed = {},
480 pair<pool<RTLIL::SigBit>, pool<RTLIL::SigBit>> cut = {})
481 {
482 if (subgraph_nodes.empty())
483 subgraph_nodes = nodes;
484 if (subgraph_edges.empty())
485 subgraph_edges = edges_fw;
486
487 auto node_style = [&](RTLIL::SigBit node) {
488 string label = log_signal(node);
489 for (auto collapsed_node : collapsed[node])
490 if (collapsed_node != node)
491 label += stringf(" %s", log_signal(collapsed_node));
492 switch (mode)
493 {
494 case GraphMode::Label:
495 if (labels[node] == -1)
496 {
497 label += "\nl=?";
498 return GraphStyle{label};
499 }
500 else
501 {
502 label += stringf("\nl=%d", labels[node]);
503 string fillcolor = stringf("/set311/%d", 1 + labels[node] % 11);
504 return GraphStyle{label, "", fillcolor};
505 }
506
507 case GraphMode::Cut:
508 if (cut.first[node])
509 return GraphStyle{label, "blue"};
510 if (cut.second[node])
511 return GraphStyle{label, "red"};
512 return GraphStyle{label};
513
514 case GraphMode::Slack:
515 label += stringf("\nd=%d a=%d\ns=%d", lut_depths[node], lut_altitudes[node], lut_slacks[node]);
516 return GraphStyle{label, lut_slacks[node] == 0 ? "red" : "black"};
517 }
518 return GraphStyle{label};
519 };
520 auto edge_style = [&](RTLIL::SigBit, RTLIL::SigBit) {
521 return GraphStyle{};
522 };
523 ::dump_dot_graph(filename, subgraph_nodes, subgraph_edges, inputs, outputs, node_style, edge_style, module->name.str());
524 }
525
526 void dump_dot_lut_graph(string filename, GraphMode mode)
527 {
528 pool<RTLIL::SigBit> lut_and_input_nodes;
529 lut_and_input_nodes.insert(lut_nodes.begin(), lut_nodes.end());
530 lut_and_input_nodes.insert(inputs.begin(), inputs.end());
531 dump_dot_graph(filename, mode, lut_and_input_nodes, lut_edges_fw, lut_gates);
532 }
533
534 pool<RTLIL::SigBit> find_subgraph(RTLIL::SigBit sink)
535 {
536 pool<RTLIL::SigBit> subgraph;
537 pool<RTLIL::SigBit> worklist = {sink};
538 while (!worklist.empty())
539 {
540 auto node = worklist.pop();
541 subgraph.insert(node);
542 for (auto source : edges_bw[node])
543 {
544 if (!subgraph[source])
545 worklist.insert(source);
546 }
547 }
548 return subgraph;
549 }
550
551 FlowGraph build_flow_graph(RTLIL::SigBit sink, int p)
552 {
553 FlowGraph flow_graph;
554 flow_graph.sink = sink;
555
556 pool<RTLIL::SigBit> worklist = {sink}, visited;
557 while (!worklist.empty())
558 {
559 auto node = worklist.pop();
560 visited.insert(node);
561
562 auto collapsed_node = labels[node] == p ? sink : node;
563 if (node != collapsed_node)
564 flow_graph.collapsed[collapsed_node].insert(node);
565 flow_graph.nodes.insert(collapsed_node);
566
567 for (auto node_pred : edges_bw[node])
568 {
569 auto collapsed_node_pred = labels[node_pred] == p ? sink : node_pred;
570 if (node_pred != collapsed_node_pred)
571 flow_graph.collapsed[collapsed_node_pred].insert(node_pred);
572 if (collapsed_node != collapsed_node_pred)
573 {
574 flow_graph.edges_bw[collapsed_node].insert(collapsed_node_pred);
575 flow_graph.edges_fw[collapsed_node_pred].insert(collapsed_node);
576 }
577 if (inputs[node_pred])
578 {
579 flow_graph.edges_bw[collapsed_node_pred].insert(flow_graph.source);
580 flow_graph.edges_fw[flow_graph.source].insert(collapsed_node_pred);
581 }
582
583 if (!visited[node_pred])
584 worklist.insert(node_pred);
585 }
586 }
587 return flow_graph;
588 }
589
590 void discover_nodes(pool<IdString> cell_types)
591 {
592 for (auto cell : module->selected_cells())
593 {
594 if (!cell_types[cell->type])
595 continue;
596
597 if (!cell->known())
598 log_error("Cell %s (%s.%s) is unknown.\n", cell->type.c_str(), log_id(module), log_id(cell));
599
600 pool<RTLIL::SigBit> fanout;
601 for (auto conn : cell->connections())
602 {
603 if (!cell->output(conn.first)) continue;
604 int offset = -1;
605 for (auto bit : conn.second)
606 {
607 offset++;
608 if (!bit.wire) continue;
609 auto mapped_bit = sigmap(bit);
610 if (nodes[mapped_bit])
611 log_error("Multiple drivers found for wire %s.\n", log_signal(mapped_bit));
612 nodes.insert(mapped_bit);
613 node_origins[mapped_bit] = ModIndex::PortInfo(cell, conn.first, offset);
614 fanout.insert(mapped_bit);
615 }
616 }
617
618 int fanin = 0;
619 for (auto conn : cell->connections())
620 {
621 if (!cell->input(conn.first)) continue;
622 for (auto bit : sigmap(conn.second))
623 {
624 if (!bit.wire) continue;
625 for (auto fanout_bit : fanout)
626 {
627 edges_fw[bit].insert(fanout_bit);
628 edges_bw[fanout_bit].insert(bit);
629 }
630 fanin++;
631 }
632 }
633
634 if (fanin > order)
635 log_error("Cell %s (%s.%s) with fan-in %d cannot be mapped to a %d-LUT.\n",
636 cell->type.c_str(), log_id(module), log_id(cell), fanin, order);
637
638 gate_count++;
639 gate_area += 1 << fanin;
640 }
641
642 for (auto edge : edges_fw)
643 {
644 if (!nodes[edge.first])
645 {
646 inputs.insert(edge.first);
647 nodes.insert(edge.first);
648 }
649 }
650
651 for (auto node : nodes)
652 {
653 auto node_info = index.query(node);
654 if (node_info->is_output && !inputs[node])
655 outputs.insert(node);
656 for (auto port : node_info->ports)
657 if (!cell_types[port.cell->type] && !inputs[node])
658 outputs.insert(node);
659 }
660
661 if (debug)
662 {
663 dump_dot_graph("flowmap-initial.dot", GraphMode::Label);
664 log("Dumped initial graph to `flowmap-initial.dot`.\n");
665 }
666 }
667
668 void label_nodes()
669 {
670 for (auto node : nodes)
671 labels[node] = -1;
672 for (auto input : inputs)
673 {
674 if (input.wire->attributes.count("\\$flowmap_level"))
675 labels[input] = input.wire->attributes["\\$flowmap_level"].as_int();
676 else
677 labels[input] = 0;
678 }
679
680 pool<RTLIL::SigBit> worklist = nodes;
681 int debug_num = 0;
682 while (!worklist.empty())
683 {
684 auto sink = worklist.pop();
685 if (labels[sink] != -1)
686 continue;
687
688 bool inputs_have_labels = true;
689 for (auto sink_input : edges_bw[sink])
690 {
691 if (labels[sink_input] == -1)
692 {
693 inputs_have_labels = false;
694 break;
695 }
696 }
697 if (!inputs_have_labels)
698 continue;
699
700 if (debug)
701 {
702 debug_num++;
703 log("Examining subgraph %d rooted in %s.\n", debug_num, log_signal(sink));
704 }
705
706 pool<RTLIL::SigBit> subgraph = find_subgraph(sink);
707
708 int p = 1;
709 for (auto subgraph_node : subgraph)
710 p = max(p, labels[subgraph_node]);
711
712 FlowGraph flow_graph = build_flow_graph(sink, p);
713 int flow = flow_graph.maximum_flow(order);
714 pool<RTLIL::SigBit> x, xi;
715 if (flow <= order)
716 {
717 labels[sink] = p;
718 auto cut = flow_graph.edge_cut();
719 x = cut.first;
720 xi = cut.second;
721 }
722 else
723 {
724 labels[sink] = p + 1;
725 x = subgraph;
726 x.erase(sink);
727 xi.insert(sink);
728 }
729 lut_gates[sink] = xi;
730
731 pool<RTLIL::SigBit> k;
732 for (auto xi_node : xi)
733 {
734 for (auto xi_node_pred : edges_bw[xi_node])
735 if (x[xi_node_pred])
736 k.insert(xi_node_pred);
737 }
738 log_assert((int)k.size() <= order);
739 lut_edges_bw[sink] = k;
740 for (auto k_node : k)
741 lut_edges_fw[k_node].insert(sink);
742
743 if (debug)
744 {
745 log(" Maximum flow: %d. Assigned label %d.\n", flow, labels[sink]);
746 dump_dot_graph(stringf("flowmap-%d-sub.dot", debug_num), GraphMode::Cut, subgraph, {}, {}, {x, xi});
747 log(" Dumped subgraph to `flowmap-%d-sub.dot`.\n", debug_num);
748 flow_graph.dump_dot_graph(stringf("flowmap-%d-flow.dot", debug_num));
749 log(" Dumped flow graph to `flowmap-%d-flow.dot`.\n", debug_num);
750 log(" LUT inputs:");
751 for (auto k_node : k)
752 log(" %s", log_signal(k_node));
753 log(".\n");
754 log(" LUT packed gates:");
755 for (auto xi_node : xi)
756 log(" %s", log_signal(xi_node));
757 log(".\n");
758 }
759
760 for (auto sink_succ : edges_fw[sink])
761 worklist.insert(sink_succ);
762 }
763
764 if (debug)
765 {
766 dump_dot_graph("flowmap-labeled.dot", GraphMode::Label);
767 log("Dumped labeled graph to `flowmap-labeled.dot`.\n");
768 }
769 }
770
771 int map_luts()
772 {
773 pool<RTLIL::SigBit> worklist = outputs;
774 while (!worklist.empty())
775 {
776 auto lut_node = worklist.pop();
777 lut_nodes.insert(lut_node);
778 for (auto input_node : lut_edges_bw[lut_node])
779 if (!lut_nodes[input_node] && !inputs[input_node])
780 worklist.insert(input_node);
781 }
782
783 int depth = 0;
784 for (auto label : labels)
785 depth = max(depth, label.second);
786 log("Mapped to %zu LUTs with maximum depth %d.\n", lut_nodes.size(), depth);
787
788 if (debug)
789 {
790 dump_dot_lut_graph("flowmap-mapped.dot", GraphMode::Label);
791 log("Dumped mapped graph to `flowmap-mapped.dot`.\n");
792 }
793
794 return depth;
795 }
796
797 void realize_derealize_lut(RTLIL::SigBit lut, pool<RTLIL::SigBit> *changed = nullptr)
798 {
799 pool<RTLIL::SigBit> worklist = {lut};
800 while (!worklist.empty())
801 {
802 auto lut = worklist.pop();
803 if (inputs[lut])
804 continue;
805
806 bool realized_successors = false;
807 for (auto lut_succ : lut_edges_fw[lut])
808 if (lut_nodes[lut_succ])
809 realized_successors = true;
810
811 if (realized_successors && !lut_nodes[lut])
812 lut_nodes.insert(lut);
813 else if (!realized_successors && lut_nodes[lut])
814 lut_nodes.erase(lut);
815 else
816 continue;
817
818 for (auto lut_pred : lut_edges_bw[lut])
819 worklist.insert(lut_pred);
820
821 if (changed)
822 changed->insert(lut);
823 }
824 }
825
826 void add_lut_edge(RTLIL::SigBit pred, RTLIL::SigBit succ, pool<RTLIL::SigBit> *changed = nullptr)
827 {
828 log_assert(!lut_edges_fw[pred][succ] && !lut_edges_bw[succ][pred]);
829 log_assert((int)lut_edges_bw[succ].size() < order);
830
831 lut_edges_fw[pred].insert(succ);
832 lut_edges_bw[succ].insert(pred);
833 realize_derealize_lut(pred, changed);
834
835 if (changed)
836 {
837 changed->insert(pred);
838 changed->insert(succ);
839 }
840 }
841
842 void remove_lut_edge(RTLIL::SigBit pred, RTLIL::SigBit succ, pool<RTLIL::SigBit> *changed = nullptr)
843 {
844 log_assert(lut_edges_fw[pred][succ] && lut_edges_bw[succ][pred]);
845
846 lut_edges_fw[pred].erase(succ);
847 lut_edges_bw[succ].erase(pred);
848 realize_derealize_lut(pred, changed);
849
850 if (changed)
851 {
852 if (lut_nodes[pred])
853 changed->insert(pred);
854 changed->insert(succ);
855 }
856 }
857
858 pair<pool<RTLIL::SigBit>, pool<RTLIL::SigBit>> cut_lut_at_gate(RTLIL::SigBit lut, RTLIL::SigBit lut_gate)
859 {
860 pool<RTLIL::SigBit> gate_inputs = lut_edges_bw[lut];
861 pool<RTLIL::SigBit> other_inputs;
862 pool<RTLIL::SigBit> worklist = {lut};
863 while (!worklist.empty())
864 {
865 auto node = worklist.pop();
866 for (auto node_pred : edges_bw[node])
867 {
868 if (node_pred == lut_gate)
869 continue;
870 if (lut_gates[lut][node_pred])
871 worklist.insert(node_pred);
872 else
873 {
874 gate_inputs.erase(node_pred);
875 other_inputs.insert(node_pred);
876 }
877 }
878 }
879 return {gate_inputs, other_inputs};
880 }
881
882 void compute_lut_distances(dict<RTLIL::SigBit, int> &lut_distances, bool forward,
883 pool<RTLIL::SigBit> initial = {}, pool<RTLIL::SigBit> *changed = nullptr)
884 {
885 pool<RTLIL::SigBit> terminals = forward ? inputs : outputs;
886 auto &lut_edges_next = forward ? lut_edges_fw : lut_edges_bw;
887 auto &lut_edges_prev = forward ? lut_edges_bw : lut_edges_fw;
888
889 if (initial.empty())
890 initial = terminals;
891 for (auto node : initial)
892 lut_distances.erase(node);
893
894 pool<RTLIL::SigBit> worklist = initial;
895 while (!worklist.empty())
896 {
897 auto lut = worklist.pop();
898 int lut_distance = 0;
899 if (forward && inputs[lut])
900 lut_distance = labels[lut]; // to support (* $flowmap_level=n *)
901 for (auto lut_prev : lut_edges_prev[lut])
902 if ((lut_nodes[lut_prev] || inputs[lut_prev]) && lut_distances.count(lut_prev))
903 lut_distance = max(lut_distance, lut_distances[lut_prev] + 1);
904 if (!lut_distances.count(lut) || lut_distances[lut] != lut_distance)
905 {
906 lut_distances[lut] = lut_distance;
907 if (changed != nullptr && !inputs[lut])
908 changed->insert(lut);
909 for (auto lut_next : lut_edges_next[lut])
910 if (lut_nodes[lut_next] || inputs[lut_next])
911 worklist.insert(lut_next);
912 }
913 }
914 }
915
916 void check_lut_distances(const dict<RTLIL::SigBit, int> &lut_distances, bool forward)
917 {
918 dict<RTLIL::SigBit, int> gold_lut_distances;
919 compute_lut_distances(gold_lut_distances, forward);
920 for (auto lut_distance : lut_distances)
921 if (lut_nodes[lut_distance.first])
922 log_assert(lut_distance.second == gold_lut_distances[lut_distance.first]);
923 }
924
925 // LUT depth is the length of the longest path from any input in LUT fan-in to LUT.
926 // LUT altitude (for lack of a better term) is the length of the longest path from LUT to any output in LUT fan-out.
927 void update_lut_depths_altitudes(pool<RTLIL::SigBit> worklist = {}, pool<RTLIL::SigBit> *changed = nullptr)
928 {
929 compute_lut_distances(lut_depths, /*forward=*/true, worklist, changed);
930 compute_lut_distances(lut_altitudes, /*forward=*/false, worklist, changed);
931 if (debug_relax && !worklist.empty()) {
932 check_lut_distances(lut_depths, /*forward=*/true);
933 check_lut_distances(lut_altitudes, /*forward=*/false);
934 }
935 }
936
937 // LUT critical output set is the set of outputs whose depth will increase (equivalently, slack will decrease) if the depth of
938 // the LUT increases. (This is referred to as RPOv for LUTv in the paper.)
939 void compute_lut_critical_outputs(dict<RTLIL::SigBit, pool<RTLIL::SigBit>> &lut_critical_outputs,
940 pool<RTLIL::SigBit> worklist = {})
941 {
942 if (worklist.empty())
943 worklist = lut_nodes;
944
945 while (!worklist.empty())
946 {
947 bool updated_some = false;
948 for (auto lut : worklist)
949 {
950 if (outputs[lut])
951 lut_critical_outputs[lut] = {lut};
952 else
953 {
954 bool all_succ_computed = true;
955 lut_critical_outputs[lut] = {};
956 for (auto lut_succ : lut_edges_fw[lut])
957 {
958 if (lut_nodes[lut_succ] && lut_depths[lut_succ] == lut_depths[lut] + 1)
959 {
960 if (lut_critical_outputs.count(lut_succ))
961 lut_critical_outputs[lut].insert(lut_critical_outputs[lut_succ].begin(), lut_critical_outputs[lut_succ].end());
962 else
963 {
964 all_succ_computed = false;
965 break;
966 }
967 }
968 }
969 if (!all_succ_computed)
970 {
971 lut_critical_outputs.erase(lut);
972 continue;
973 }
974 }
975 worklist.erase(lut);
976 updated_some = true;
977 }
978 log_assert(updated_some);
979 }
980 }
981
982 // Invalidating LUT critical output sets is tricky, because increasing the depth of a LUT may take other, adjacent LUTs off the critical
983 // path to the output. Conservatively, if we increase depth of some LUT, every LUT in its input cone needs to have its critical output
984 // set invalidated, too.
985 pool<RTLIL::SigBit> invalidate_lut_critical_outputs(dict<RTLIL::SigBit, pool<RTLIL::SigBit>> &lut_critical_outputs,
986 pool<RTLIL::SigBit> worklist)
987 {
988 pool<RTLIL::SigBit> changed;
989 while (!worklist.empty())
990 {
991 auto lut = worklist.pop();
992 changed.insert(lut);
993 lut_critical_outputs.erase(lut);
994 for (auto lut_pred : lut_edges_bw[lut])
995 {
996 if (lut_nodes[lut_pred] && !changed[lut_pred])
997 {
998 changed.insert(lut_pred);
999 worklist.insert(lut_pred);
1000 }
1001 }
1002 }
1003 return changed;
1004 }
1005
1006 void check_lut_critical_outputs(const dict<RTLIL::SigBit, pool<RTLIL::SigBit>> &lut_critical_outputs)
1007 {
1008 dict<RTLIL::SigBit, pool<RTLIL::SigBit>> gold_lut_critical_outputs;
1009 compute_lut_critical_outputs(gold_lut_critical_outputs);
1010 for (auto lut_critical_output : lut_critical_outputs)
1011 if (lut_nodes[lut_critical_output.first])
1012 log_assert(lut_critical_output.second == gold_lut_critical_outputs[lut_critical_output.first]);
1013 }
1014
1015 void update_lut_critical_outputs(dict<RTLIL::SigBit, pool<RTLIL::SigBit>> &lut_critical_outputs,
1016 pool<RTLIL::SigBit> worklist = {})
1017 {
1018 if (!worklist.empty())
1019 {
1020 pool<RTLIL::SigBit> invalidated = invalidate_lut_critical_outputs(lut_critical_outputs, worklist);
1021 compute_lut_critical_outputs(lut_critical_outputs, invalidated);
1022 check_lut_critical_outputs(lut_critical_outputs);
1023 }
1024 else
1025 compute_lut_critical_outputs(lut_critical_outputs);
1026 }
1027
1028 void update_breaking_node_potentials(dict<RTLIL::SigBit, dict<RTLIL::SigBit, int>> &potentials,
1029 const dict<RTLIL::SigBit, pool<RTLIL::SigBit>> &lut_critical_outputs)
1030 {
1031 for (auto lut : lut_nodes)
1032 {
1033 if (potentials.count(lut))
1034 continue;
1035 if (lut_gates[lut].size() == 1 || lut_slacks[lut] == 0)
1036 continue;
1037
1038 if (debug_relax)
1039 log(" Computing potentials for LUT %s.\n", log_signal(lut));
1040
1041 for (auto lut_gate : lut_gates[lut])
1042 {
1043 if (lut == lut_gate)
1044 continue;
1045
1046 if (debug_relax)
1047 log(" Considering breaking node %s.\n", log_signal(lut_gate));
1048
1049 int r_ex, r_im, r_slk;
1050
1051 auto cut_inputs = cut_lut_at_gate(lut, lut_gate);
1052 pool<RTLIL::SigBit> gate_inputs = cut_inputs.first, other_inputs = cut_inputs.second;
1053 if (gate_inputs.empty() && (int)other_inputs.size() == order)
1054 {
1055 if (debug_relax)
1056 log(" Breaking would result in a (k+1)-LUT.\n");
1057 continue;
1058 }
1059
1060 pool<RTLIL::SigBit> elim_fanin_luts;
1061 for (auto gate_input : gate_inputs)
1062 {
1063 if (lut_edges_fw[gate_input].size() == 1)
1064 {
1065 log_assert(lut_edges_fw[gate_input][lut]);
1066 elim_fanin_luts.insert(gate_input);
1067 }
1068 }
1069 if (debug_relax)
1070 {
1071 if (!lut_nodes[lut_gate])
1072 log(" Breaking requires a new LUT.\n");
1073 if (!gate_inputs.empty())
1074 {
1075 log(" Breaking eliminates LUT inputs");
1076 for (auto gate_input : gate_inputs)
1077 log(" %s", log_signal(gate_input));
1078 log(".\n");
1079 }
1080 if (!elim_fanin_luts.empty())
1081 {
1082 log(" Breaking eliminates fan-in LUTs");
1083 for (auto elim_fanin_lut : elim_fanin_luts)
1084 log(" %s", log_signal(elim_fanin_lut));
1085 log(".\n");
1086 }
1087 }
1088 r_ex = (lut_nodes[lut_gate] ? 0 : -1) + elim_fanin_luts.size();
1089
1090 pool<pair<RTLIL::SigBit, RTLIL::SigBit>> maybe_mergeable_luts;
1091
1092 // Try to merge LUTv with one of its successors.
1093 RTLIL::SigBit last_lut_succ;
1094 int fanout = 0;
1095 for (auto lut_succ : lut_edges_fw[lut])
1096 {
1097 if (lut_nodes[lut_succ])
1098 {
1099 fanout++;
1100 last_lut_succ = lut_succ;
1101 }
1102 }
1103 if (fanout == 1)
1104 maybe_mergeable_luts.insert({lut, last_lut_succ});
1105
1106 // Try to merge LUTv with one of its predecessors.
1107 for (auto lut_pred : other_inputs)
1108 {
1109 int fanout = 0;
1110 for (auto lut_pred_succ : lut_edges_fw[lut_pred])
1111 if (lut_nodes[lut_pred_succ] || lut_pred_succ == lut_gate)
1112 fanout++;
1113 if (fanout == 1)
1114 maybe_mergeable_luts.insert({lut_pred, lut});
1115 }
1116
1117 // Try to merge LUTw with one of its predecessors.
1118 for (auto lut_gate_pred : lut_edges_bw[lut_gate])
1119 {
1120 int fanout = 0;
1121 for (auto lut_gate_pred_succ : lut_edges_fw[lut_gate_pred])
1122 if (lut_nodes[lut_gate_pred_succ] || lut_gate_pred_succ == lut_gate)
1123 fanout++;
1124 if (fanout == 1)
1125 maybe_mergeable_luts.insert({lut_gate_pred, lut_gate});
1126 }
1127
1128 r_im = 0;
1129 for (auto maybe_mergeable_pair : maybe_mergeable_luts)
1130 {
1131 log_assert(lut_edges_fw[maybe_mergeable_pair.first][maybe_mergeable_pair.second]);
1132 pool<RTLIL::SigBit> unique_inputs;
1133 for (auto fst_lut_pred : lut_edges_bw[maybe_mergeable_pair.first])
1134 if (lut_nodes[fst_lut_pred])
1135 unique_inputs.insert(fst_lut_pred);
1136 for (auto snd_lut_pred : lut_edges_bw[maybe_mergeable_pair.second])
1137 if (lut_nodes[snd_lut_pred])
1138 unique_inputs.insert(snd_lut_pred);
1139 unique_inputs.erase(maybe_mergeable_pair.first);
1140 if ((int)unique_inputs.size() <= order)
1141 {
1142 if (debug_relax)
1143 log(" Breaking may allow merging %s and %s.\n",
1144 log_signal(maybe_mergeable_pair.first), log_signal(maybe_mergeable_pair.second));
1145 r_im++;
1146 }
1147 }
1148
1149 int lut_gate_depth;
1150 if (lut_nodes[lut_gate])
1151 lut_gate_depth = lut_depths[lut_gate];
1152 else
1153 {
1154 lut_gate_depth = 0;
1155 for (auto lut_gate_pred : lut_edges_bw[lut_gate])
1156 lut_gate_depth = max(lut_gate_depth, lut_depths[lut_gate_pred] + 1);
1157 }
1158 if (lut_depths[lut] >= lut_gate_depth + 1)
1159 r_slk = 0;
1160 else
1161 {
1162 int depth_delta = lut_gate_depth + 1 - lut_depths[lut];
1163 if (depth_delta > lut_slacks[lut])
1164 {
1165 if (debug_relax)
1166 log(" Breaking would increase depth by %d, which is more than available slack.\n", depth_delta);
1167 continue;
1168 }
1169
1170 if (debug_relax)
1171 {
1172 log(" Breaking increases depth of LUT by %d.\n", depth_delta);
1173 if (lut_critical_outputs.at(lut).size())
1174 {
1175 log(" Breaking decreases slack of outputs");
1176 for (auto lut_critical_output : lut_critical_outputs.at(lut))
1177 {
1178 log(" %s", log_signal(lut_critical_output));
1179 log_assert(lut_slacks[lut_critical_output] > 0);
1180 }
1181 log(".\n");
1182 }
1183 }
1184 r_slk = lut_critical_outputs.at(lut).size() * depth_delta;
1185 }
1186
1187 int p = 100 * (r_alpha * r_ex + r_beta * r_im + r_gamma) / (r_slk + 1);
1188 if (debug_relax)
1189 log(" Potential for breaking node %s: %d (Rex=%d, Rim=%d, Rslk=%d).\n",
1190 log_signal(lut_gate), p, r_ex, r_im, r_slk);
1191 potentials[lut][lut_gate] = p;
1192 }
1193 }
1194 }
1195
1196 bool relax_depth_for_bound(bool first, int depth_bound, dict<RTLIL::SigBit, pool<RTLIL::SigBit>> &lut_critical_outputs)
1197 {
1198 size_t initial_count = lut_nodes.size();
1199
1200 for (auto node : lut_nodes)
1201 {
1202 lut_slacks[node] = depth_bound - (lut_depths[node] + lut_altitudes[node]);
1203 log_assert(lut_slacks[node] >= 0);
1204 }
1205 if (debug)
1206 {
1207 dump_dot_lut_graph(stringf("flowmap-relax-%d-initial.dot", depth_bound), GraphMode::Slack);
1208 log(" Dumped initial slack graph to `flowmap-relax-%d-initial.dot`.\n", depth_bound);
1209 }
1210
1211 dict<RTLIL::SigBit, dict<RTLIL::SigBit, int>> potentials;
1212 for (int break_num = 1; ; break_num++)
1213 {
1214 update_breaking_node_potentials(potentials, lut_critical_outputs);
1215
1216 if (potentials.empty())
1217 {
1218 log(" Relaxed to %zu (+%zu) LUTs.\n", lut_nodes.size(), lut_nodes.size() - initial_count);
1219 if (!first && break_num == 1)
1220 {
1221 log(" Design fully relaxed.\n");
1222 return true;
1223 }
1224 else
1225 {
1226 log(" Slack exhausted.\n");
1227 break;
1228 }
1229 }
1230
1231 RTLIL::SigBit breaking_lut, breaking_gate;
1232 int best_potential = INT_MIN;
1233 for (auto lut_gate_potentials : potentials)
1234 {
1235 for (auto gate_potential : lut_gate_potentials.second)
1236 {
1237 if (gate_potential.second > best_potential)
1238 {
1239 breaking_lut = lut_gate_potentials.first;
1240 breaking_gate = gate_potential.first;
1241 best_potential = gate_potential.second;
1242 }
1243 }
1244 }
1245 log(" Breaking LUT %s to %s LUT %s (potential %d).\n",
1246 log_signal(breaking_lut), lut_nodes[breaking_gate] ? "reuse" : "extract", log_signal(breaking_gate), best_potential);
1247
1248 if (debug_relax)
1249 log(" Removing breaking gate %s from LUT.\n", log_signal(breaking_gate));
1250 lut_gates[breaking_lut].erase(breaking_gate);
1251
1252 auto cut_inputs = cut_lut_at_gate(breaking_lut, breaking_gate);
1253 pool<RTLIL::SigBit> gate_inputs = cut_inputs.first, other_inputs = cut_inputs.second;
1254
1255 pool<RTLIL::SigBit> worklist = lut_gates[breaking_lut];
1256 pool<RTLIL::SigBit> elim_gates = gate_inputs;
1257 while (!worklist.empty())
1258 {
1259 auto lut_gate = worklist.pop();
1260 bool all_gate_preds_elim = true;
1261 for (auto lut_gate_pred : edges_bw[lut_gate])
1262 if (!elim_gates[lut_gate_pred])
1263 all_gate_preds_elim = false;
1264 if (all_gate_preds_elim)
1265 {
1266 if (debug_relax)
1267 log(" Removing gate %s from LUT.\n", log_signal(lut_gate));
1268 lut_gates[breaking_lut].erase(lut_gate);
1269 for (auto lut_gate_succ : edges_fw[lut_gate])
1270 worklist.insert(lut_gate_succ);
1271 }
1272 }
1273 log_assert(!lut_gates[breaking_lut].empty());
1274
1275 pool<RTLIL::SigBit> directly_affected_nodes = {breaking_lut};
1276 for (auto gate_input : gate_inputs)
1277 {
1278 if (debug_relax)
1279 log(" Removing LUT edge %s -> %s.\n", log_signal(gate_input), log_signal(breaking_lut));
1280 remove_lut_edge(gate_input, breaking_lut, &directly_affected_nodes);
1281 }
1282 if (debug_relax)
1283 log(" Adding LUT edge %s -> %s.\n", log_signal(breaking_gate), log_signal(breaking_lut));
1284 add_lut_edge(breaking_gate, breaking_lut, &directly_affected_nodes);
1285
1286 if (debug_relax)
1287 log(" Updating slack and potentials.\n");
1288
1289 pool<RTLIL::SigBit> indirectly_affected_nodes = {};
1290 update_lut_depths_altitudes(directly_affected_nodes, &indirectly_affected_nodes);
1291 update_lut_critical_outputs(lut_critical_outputs, indirectly_affected_nodes);
1292 for (auto node : indirectly_affected_nodes)
1293 {
1294 lut_slacks[node] = depth_bound - (lut_depths[node] + lut_altitudes[node]);
1295 log_assert(lut_slacks[node] >= 0);
1296 if (debug_relax)
1297 log(" LUT %s now has depth %d and slack %d.\n", log_signal(node), lut_depths[node], lut_slacks[node]);
1298 }
1299
1300 worklist = indirectly_affected_nodes;
1301 pool<RTLIL::SigBit> visited;
1302 while (!worklist.empty())
1303 {
1304 auto node = worklist.pop();
1305 visited.insert(node);
1306 potentials.erase(node);
1307 // We are invalidating the entire output cone of the gate IR node, not just of the LUT IR node. This is done to also invalidate
1308 // all LUTs that could contain one of the indirectly affected nodes as a *part* of them, as they may not be in the output cone
1309 // of any of the LUT IR nodes, e.g. if we have a LUT IR node A and node B as predecessors of node C, where node B includes all
1310 // gates from node A.
1311 for (auto node_succ : edges_fw[node])
1312 if (!visited[node_succ])
1313 worklist.insert(node_succ);
1314 }
1315
1316 if (debug)
1317 {
1318 dump_dot_lut_graph(stringf("flowmap-relax-%d-break-%d.dot", depth_bound, break_num), GraphMode::Slack);
1319 log(" Dumped slack graph after break %d to `flowmap-relax-%d-break-%d.dot`.\n", break_num, depth_bound, break_num);
1320 }
1321 }
1322
1323 return false;
1324 }
1325
1326 void optimize_area(int depth, int optarea)
1327 {
1328 dict<RTLIL::SigBit, pool<RTLIL::SigBit>> lut_critical_outputs;
1329 update_lut_depths_altitudes();
1330 update_lut_critical_outputs(lut_critical_outputs);
1331
1332 for (int depth_bound = depth; depth_bound <= depth + optarea; depth_bound++)
1333 {
1334 log("Relaxing with depth bound %d.\n", depth_bound);
1335 bool fully_relaxed = relax_depth_for_bound(depth_bound == depth, depth_bound, lut_critical_outputs);
1336
1337 if (fully_relaxed)
1338 break;
1339 }
1340 }
1341
1342 void pack_cells(int minlut)
1343 {
1344 ConstEval ce(module);
1345 for (auto input_node : inputs)
1346 ce.stop(input_node);
1347
1348 pool<RTLIL::SigBit> mapped_nodes;
1349 for (auto node : lut_nodes)
1350 {
1351 if (node_origins.count(node))
1352 {
1353 auto origin = node_origins[node];
1354 if (origin.cell->getPort(origin.port).size() == 1)
1355 log("Packing %s.%s.%s (%s).\n",
1356 log_id(module), log_id(origin.cell), origin.port.c_str(), log_signal(node));
1357 else
1358 log("Packing %s.%s.%s [%d] (%s).\n",
1359 log_id(module), log_id(origin.cell), origin.port.c_str(), origin.offset, log_signal(node));
1360 }
1361 else
1362 {
1363 log("Packing %s.%s.\n", log_id(module), log_signal(node));
1364 }
1365
1366 for (auto gate_node : lut_gates[node])
1367 {
1368 log_assert(node_origins.count(gate_node));
1369
1370 if (gate_node == node)
1371 continue;
1372
1373 auto gate_origin = node_origins[gate_node];
1374 if (gate_origin.cell->getPort(gate_origin.port).size() == 1)
1375 log(" Packing %s.%s.%s (%s).\n",
1376 log_id(module), log_id(gate_origin.cell), gate_origin.port.c_str(), log_signal(gate_node));
1377 else
1378 log(" Packing %s.%s.%s [%d] (%s).\n",
1379 log_id(module), log_id(gate_origin.cell), gate_origin.port.c_str(), gate_origin.offset, log_signal(gate_node));
1380 }
1381
1382 vector<RTLIL::SigBit> input_nodes(lut_edges_bw[node].begin(), lut_edges_bw[node].end());
1383 RTLIL::Const lut_table(State::Sx, max(1 << input_nodes.size(), 1 << minlut));
1384 unsigned const mask = 1 << input_nodes.size();
1385 for (unsigned i = 0; i < mask; i++)
1386 {
1387 ce.push();
1388 for (size_t n = 0; n < input_nodes.size(); n++)
1389 ce.set(input_nodes[n], ((i >> n) & 1) ? State::S1 : State::S0);
1390
1391 RTLIL::SigSpec value = node, undef;
1392 if (!ce.eval(value, undef))
1393 {
1394 string env;
1395 for (auto input_node : input_nodes)
1396 env += stringf(" %s = %s\n", log_signal(input_node), log_signal(ce.values_map(input_node)));
1397 log_error("Cannot evaluate %s because %s is not defined.\nEvaluation environment:\n%s",
1398 log_signal(node), log_signal(undef), env.c_str());
1399 }
1400
1401 lut_table[i] = value.as_bool() ? State::S1 : State::S0;
1402 ce.pop();
1403 }
1404
1405 RTLIL::SigSpec lut_a, lut_y = node;
1406 for (auto input_node : input_nodes)
1407 lut_a.append_bit(input_node);
1408 lut_a.append(RTLIL::Const(State::Sx, minlut - input_nodes.size()));
1409
1410 RTLIL::Cell *lut = module->addLut(NEW_ID, lut_a, lut_y, lut_table);
1411 mapped_nodes.insert(node);
1412 for (auto gate_node : lut_gates[node])
1413 {
1414 auto gate_origin = node_origins[gate_node];
1415 lut->add_strpool_attribute("\\src", gate_origin.cell->get_strpool_attribute("\\src"));
1416 packed_count++;
1417 }
1418 lut_count++;
1419 lut_area += lut_table.size();
1420
1421 if ((int)input_nodes.size() >= minlut)
1422 log(" Packed into a %zu-LUT %s.%s.\n", input_nodes.size(), log_id(module), log_id(lut));
1423 else
1424 log(" Packed into a %zu-LUT %s.%s (implemented as %d-LUT).\n", input_nodes.size(), log_id(module), log_id(lut), minlut);
1425 }
1426
1427 for (auto node : mapped_nodes)
1428 {
1429 auto origin = node_origins[node];
1430 RTLIL::SigSpec driver = origin.cell->getPort(origin.port);
1431 driver[origin.offset] = module->addWire(NEW_ID);
1432 origin.cell->setPort(origin.port, driver);
1433 }
1434 }
1435
1436 FlowmapWorker(int order, int minlut, pool<IdString> cell_types, int r_alpha, int r_beta, int r_gamma,
1437 bool relax, int optarea, bool debug, bool debug_relax,
1438 RTLIL::Module *module) :
1439 order(order), r_alpha(r_alpha), r_beta(r_beta), r_gamma(r_gamma), debug(debug), debug_relax(debug_relax),
1440 module(module), sigmap(module), index(module)
1441 {
1442 log("Labeling cells.\n");
1443 discover_nodes(cell_types);
1444 label_nodes();
1445 int depth = map_luts();
1446
1447 if (relax)
1448 {
1449 log("\n");
1450 log("Optimizing area.\n");
1451 optimize_area(depth, optarea);
1452 }
1453
1454 log("\n");
1455 log("Packing cells.\n");
1456 pack_cells(minlut);
1457 }
1458 };
1459
1460 static void split(std::vector<std::string> &tokens, const std::string &text, char sep)
1461 {
1462 size_t start = 0, end = 0;
1463 while ((end = text.find(sep, start)) != std::string::npos) {
1464 tokens.push_back(text.substr(start, end - start));
1465 start = end + 1;
1466 }
1467 tokens.push_back(text.substr(start));
1468 }
1469
1470 struct FlowmapPass : public Pass {
1471 FlowmapPass() : Pass("flowmap", "pack LUTs with FlowMap") { }
1472 void help() YS_OVERRIDE
1473 {
1474 // |---v---|---v---|---v---|---v---|---v---|---v---|---v---|---v---|---v---|---v---|
1475 log("\n");
1476 log(" flowmap [options] [selection]\n");
1477 log("\n");
1478 log("This pass uses the FlowMap technology mapping algorithm to pack logic gates\n");
1479 log("into k-LUTs with optimal depth. It allows mapping any circuit elements that can\n");
1480 log("be evaluated with the `eval` pass, including cells with multiple output ports\n");
1481 log("and multi-bit input and output ports.\n");
1482 log("\n");
1483 log(" -maxlut k\n");
1484 log(" perform technology mapping for a k-LUT architecture. if not specified,\n");
1485 log(" defaults to 3.\n");
1486 log("\n");
1487 log(" -minlut n\n");
1488 log(" only produce n-input or larger LUTs. if not specified, defaults to 1.\n");
1489 log("\n");
1490 log(" -cells <cell>[,<cell>,...]\n");
1491 log(" map specified cells. if not specified, maps $_NOT_, $_AND_, $_OR_,\n");
1492 log(" $_XOR_ and $_MUX_, which are the outputs of the `simplemap` pass.\n");
1493 log("\n");
1494 log(" -relax\n");
1495 log(" perform depth relaxation and area minimization.\n");
1496 log("\n");
1497 log(" -r-alpha n, -r-beta n, -r-gamma n\n");
1498 log(" parameters of depth relaxation heuristic potential function.\n");
1499 log(" if not specified, alpha=8, beta=2, gamma=1.\n");
1500 log("\n");
1501 log(" -optarea n\n");
1502 log(" optimize for area by trading off at most n logic levels for fewer LUTs.\n");
1503 log(" n may be zero, to optimize for area without increasing depth.\n");
1504 log(" implies -relax.\n");
1505 log("\n");
1506 log(" -debug\n");
1507 log(" dump intermediate graphs.\n");
1508 log("\n");
1509 log(" -debug-relax\n");
1510 log(" explain decisions performed during depth relaxation.\n");
1511 log("\n");
1512 }
1513 void execute(std::vector<std::string> args, RTLIL::Design *design) YS_OVERRIDE
1514 {
1515 int order = 3;
1516 int minlut = 1;
1517 vector<string> cells;
1518 bool relax = false;
1519 int r_alpha = 8, r_beta = 2, r_gamma = 1;
1520 int optarea = 0;
1521 bool debug = false, debug_relax = false;
1522
1523 size_t argidx;
1524 for (argidx = 1; argidx < args.size(); argidx++)
1525 {
1526 if (args[argidx] == "-maxlut" && argidx + 1 < args.size())
1527 {
1528 order = atoi(args[++argidx].c_str());
1529 continue;
1530 }
1531 if (args[argidx] == "-minlut" && argidx + 1 < args.size())
1532 {
1533 minlut = atoi(args[++argidx].c_str());
1534 continue;
1535 }
1536 if (args[argidx] == "-cells" && argidx + 1 < args.size())
1537 {
1538 split(cells, args[++argidx], ',');
1539 continue;
1540 }
1541 if (args[argidx] == "-relax")
1542 {
1543 relax = true;
1544 continue;
1545 }
1546 if (args[argidx] == "-r-alpha" && argidx + 1 < args.size())
1547 {
1548 r_alpha = atoi(args[++argidx].c_str());
1549 continue;
1550 }
1551 if (args[argidx] == "-r-beta" && argidx + 1 < args.size())
1552 {
1553 r_beta = atoi(args[++argidx].c_str());
1554 continue;
1555 }
1556 if (args[argidx] == "-r-gamma" && argidx + 1 < args.size())
1557 {
1558 r_gamma = atoi(args[++argidx].c_str());
1559 continue;
1560 }
1561 if (args[argidx] == "-optarea" && argidx + 1 < args.size())
1562 {
1563 relax = true;
1564 optarea = atoi(args[++argidx].c_str());
1565 continue;
1566 }
1567 if (args[argidx] == "-debug")
1568 {
1569 debug = true;
1570 continue;
1571 }
1572 if (args[argidx] == "-debug-relax")
1573 {
1574 debug = debug_relax = true;
1575 continue;
1576 }
1577 break;
1578 }
1579 extra_args(args, argidx, design);
1580
1581 pool<IdString> cell_types;
1582 if (!cells.empty())
1583 {
1584 for (auto &cell : cells)
1585 cell_types.insert(cell);
1586 }
1587 else
1588 {
1589 cell_types = {"$_NOT_", "$_AND_", "$_OR_", "$_XOR_", "$_MUX_"};
1590 }
1591
1592 const char *algo_r = relax ? "-r" : "";
1593 log_header(design, "Executing FLOWMAP pass (pack LUTs with FlowMap%s).\n", algo_r);
1594
1595 int gate_count = 0, lut_count = 0, packed_count = 0;
1596 int gate_area = 0, lut_area = 0;
1597 for (auto module : design->selected_modules())
1598 {
1599 FlowmapWorker worker(order, minlut, cell_types, r_alpha, r_beta, r_gamma, relax, optarea, debug, debug_relax, module);
1600 gate_count += worker.gate_count;
1601 lut_count += worker.lut_count;
1602 packed_count += worker.packed_count;
1603 gate_area += worker.gate_area;
1604 lut_area += worker.lut_area;
1605 }
1606
1607 log("\n");
1608 log("Packed %d cells (%d of them duplicated) into %d LUTs.\n", packed_count, packed_count - gate_count, lut_count);
1609 log("Solution takes %.1f%% of original gate area.\n", lut_area * 100.0 / gate_area);
1610 }
1611 } FlowmapPass;
1612
1613 PRIVATE_NAMESPACE_END