1 \documentclass[slidestop
]{beamer
}
2 \usepackage{beamerthemesplit
}
10 \title{Data-Dependent-Fail-First
}
11 \author{Luke Kenneth Casson Leighton and Shriya Sharma
}
18 \huge{The Libre-SOC Hybrid
3D CPU
}\\
20 \Large{Data-Dependent-Fail-First
}\\
25 \large{Sponsored by NLnet's PET Programme
}\\
33 \begin{frame
}[fragile
]
34 \frametitle{Simple-V CMPI in a nutshell
}
37 function op
\_cmpi(BA, RA, SI) # cmpi not vector-cmpi!
38 (assuming you know power-isa)
40 for (i =
0; i < VL; i++)
41 CR
[BA+id
] <= compare(ireg
[RA+ira
], SI);
42 if (reg
\_is\_vectorised[BA
] ) \
{ id +=
1; \
}
43 if (reg
\_is\_vectorised[RA
]) \
{ ira +=
1; \
}
47 \item Above is oversimplified: predication etc. left out
48 \item Scalar-scalar and scalar-vector and vector-vector now all in one
49 \item OoO may choose to push CMPIs into instr. queue (v. busy!)
54 \frame{\frametitle{Load/Store Fault-First
}
57 \item Problem: vector load and store can cause a page fault
58 \item Solution: a protocol that allows optional load/store
59 \item instruction
\textit{requests
} a number of elements
60 \item instruction
\textit{informs
} the number actually loaded
61 \item first element load/store is not optional (cannot fail)
62 \item ARM SVE: https://arxiv.org/pdf/
1803.06185.pdf
63 \item more: wikipedia Vector processor page: Fault/Fail First
65 \item Load/Store is Memory to/from Register, what about
67 \item Register-to-register: "Data-Dependent Fail-First."
68 \item Z80 LDIR: Mem-Register, CPIR: Register-Register
72 \begin{frame
}[fragile
]
73 \frametitle{Data-Dependent-Fail-First in a nutshell
}
76 function op
\_cmpi(BA, RA, SI) # cmpi not vector-cmpi!
78 for (i =
0; i < VL; i++)
79 CR
[BA+id
] <= compare(ireg
[RA+ira
], SI);
80 if (reg
\_is\_vectorised[BA
] ) \
{ id +=
1; \
}
81 if (reg
\_is\_vectorised[RA
]) \
{ ira +=
1; \
}
82 if test (CR
[BA+id
]) == FAIL: \
{ VL = i +
1; break \
}
86 \item Parallelism still perfectly possible
87 ("hold" writing results until sequential post-analysis
88 carried out. Best done with OoO)
89 \item VL truncation can be inclusive or exclusive
90 (include or exclude a NULL pointer or a
91 string-end character, or overflow result)
92 \item \textit{Truncation can be to zero Vector Length
}
96 \frame{\frametitle{Power ISA v3.1 vstribr
}
98 \lstinputlisting[language=
{}]{vstribr.txt
}
101 \item ironically this hard-coded instruction is
102 identical to general-purpose Simple-V DD-FFirst...
107 \frame{\frametitle{maxloc
}
113 \frame{\frametitle{Pospopcount
}
116 \item Positional popcount adds up the totals of each bit set to
1 in each bit-position, of an array of input values.
117 \item Notoriously difficult to do in SIMD assembler: typically
550 lines
118 \item https://github.com/clausecker/pospop
122 \lstinputlisting[language=
{}]{pospopcount.c
}
127 \frame{\frametitle{Pospopcount
}
130 \includegraphics[width=
0.5\textwidth]{pospopcount.png
}
133 \item The challenge is to perform an appropriate transpose of the data (the CPU can only work on registers, horizontally),
134 in blocks that suit the processor and the ISA capacity.
140 \frame{\frametitle{Pospopcount
}
143 \includegraphics[width=
0.6\textwidth]{array_popcnt.png
}
148 \item The draft gbbd instruction implements the transpose (shown above),
149 preparing the data to use standard popcount.
150 (gbbd is based on Power ISA vgbbd, v3.1 p445)
156 \frame{\frametitle{Pospopcount.s
}
159 \lstinputlisting[language=
{}]{pospopcount.s
}
164 \frame{\frametitle{strncpy
}
166 \lstinputlisting[language=
{}]{strncpy.c
}
174 \frame{\frametitle{strncpy assembler
}
176 \lstinputlisting[language=
{}]{strncpy.s
}
180 \frame{\frametitle{linked-list walking
}
185 \frame{\frametitle{Summary
}
188 \item Goal is to create a mass-volume low-power embedded SoC suitable
189 for use in netbooks, chromebooks, tablets, smartphones, IoT SBCs.
190 \item No way we could implement a project of this magnitude without
191 nmigen (being able to use python OO to HDL)
192 \item Collaboration with OpenPOWER Foundation and Members absolutely
193 essential. No short-cuts. Standards to be developed and ratified
194 so that everyone benefits.
195 \item Riding the wave of huge stability of OpenPOWER ecosystem
196 \item Greatly simplified open
3D and Video drivers reduces product
197 development costs for customers
198 \item It also happens to be fascinating, deeply rewarding technically
199 challenging, and funded by NLnet
204 \frame{\frametitle{How can you help?
}
209 \item Start here! https://libre-soc.org \\
210 Mailing lists https://lists.libre-soc.org \\
211 IRC Freenode libre-soc \\
212 etc. etc. (it's a Libre project, go figure) \\
214 \item Can I get paid? Yes! NLnet funded\\
215 See https://libre-soc.org/nlnet/\#faq \\
217 \item Also profit-sharing in any commercial ventures \\
219 \item How many opportunities to develop Libre SoCs exist,\\
220 and actually get paid for it?
222 \item I'm not a developer, how can I help?\\
223 - Plenty of research needed, artwork, website \\
224 - Help find customers and OEMs willing to commit (LOI)
232 {\Huge The end
\vspace{12pt
}\\
233 Thank you
\vspace{12pt
}\\
234 Questions?
\vspace{12pt
}
239 \item Discussion: http://lists.libre-soc.org
240 \item Freenode IRC \#libre-soc
241 \item http://libre-soc.org/
242 \item http://nlnet.nl/PET
243 \item https://libre-soc.org/nlnet/\#faq