From: Joshua Harlan Lifton Date: Tue, 28 May 2019 15:45:14 +0000 (-0700) Subject: Copy edit update 018 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=7a6d792f83309e0e2085d46a5b0fd6760e84fe4f;p=crowdsupply.git Copy edit update 018 --- diff --git a/updates/018_2019may27_nlnet_grant_approved.mdwn b/updates/018_2019may27_nlnet_grant_approved.mdwn index f6b0be0..b15ae18 100644 --- a/updates/018_2019may27_nlnet_grant_approved.mdwn +++ b/updates/018_2019may27_nlnet_grant_approved.mdwn @@ -1,24 +1,22 @@ -# First NLNet Grant approved to fund development of the Libre RISC-V SoC - The application for funding from NLnet, from back in November of last year, has been approved. It means that we have EUR $50,000 to pay for full-time engineering work to be carried out over the next year, and to pay for -"bounty" style tasks. For the right people, with the right skills, there +bounty-style tasks. For the right people, with the right skills, there is money now available. -However this is not all: by splitting the tasks up into separate groups, +However, this is not all: by splitting the tasks up into separate groups, and using a second European-based individual as the applicant, we can apply for a second grant (also of up to EUR $50,000). In the next couple -of days we will put in an application for "Formal Mathematical Proofs" +of days, we will put in an application for "Formal Mathematical Proofs" of the processor design. There are several reasons for doing so. The primary one is down to the fact that we anticipate this (commercial, libre) product to be closely and independently examined by third parties, to verify for themselves that it does not contain spying backdoor co-processors, as well as the -usual security and correctness guarantees. If there exists *formal +usual security and correctness guarantees. If there exist *formal mathematical proofs* that the processor and its sub-components operate -correctly, that independent 3rd party verification task is a lot easier. +correctly, that independent third-party verification task is a lot easier. In addition, it turns out that when writing unit tests, using formal mathematical proofs makes for *complete* code coverage - far better @@ -28,25 +26,25 @@ provable* accuracy. Additional, much simpler unit tests can then be written which are more along the lines of "HOWTOs" - examples on how to use the unit. -This is one of those "epiphany" moments that, as a Software Engineer of +This is one of those "epiphany" moments that, as a software engineer of 25 years experience, has me stunned and wondering why on earth this is not more generally and widely deployed in software. The answer I believe is down to the nature of what a processor actually is. -A processor is developed much more along the lines of how Functional Programming -works. Functional Programming can have formal mathematical proofs applied +A processor is developed much more along the lines of how functional programming +works. Functional programming can have formal mathematical proofs applied to it because for any given inputs, the output is *guaranteed* to be the same. -This of course breaks down when the function has "side-effects" (such as -reading from a file or accesses other external state outside of the "control" -of the function). And in the design of a processor, by the very nature +This of course breaks down when the function has "side-effects," such as +reading from a file or accessing other external state outside of the "control" +of the function. And, in the design of a processor, by the very nature of hardware, you simply cannot create a verilog module that has access to -"files" or to "global variables". +"files" or to "global variables." In addition, hardware is based purely on boolean logic, and on if/else constructs. In essence, then the *entire hardware design* has to be made according to far simpler rules than "normal" software is expected to conform to. Even memory accesses in hardware have to be implemented according to -these strict rules (we're *implementing* LOAD and STORE instructions, here, +these strict rules (we're *implementing* LOAD and STORE instructions here, not *using* those LOAD and STORE instructions). Consequently, adding in formal proofs is a little bit easier, brings @@ -54,54 +52,54 @@ huge benefits as well in terms of code readability, reliability, and time-cost savings, and has the crucial advantage of being aligned with the overall privacy goal *and* with NLnet's funding remit. -# Summary from the past couple of months +### Summary from the past couple of months The past few months have seen a *lot* of activity. The IEEE754 ADD unit -has been completed, both as a Finite State Machine (FSM) and as a fully +has been completed, both as a finite state machine (FSM) and as a fully pipelined design, both of which have parameters that allow them to do FP16, FP32 or FP64. The DIV unit has been implemented as an FSM, and will stay that way for now. MUL has been completed, however needs to -be turned into an FMAC (3 operands: Multiply and Accumulate). +be turned into an FMAC (three operands: multiply and accumulate). A provisional pipeline API has been developed, which the IEEE754 FPU is using. It includes data "funneling" (multiplexing) blocks that allow -for the creation of what Mitch Alsup calls a "Concurrent Computation Unit". +for the creation of what Mitch Alsup calls a "concurrent computation unit." It's basically an array of matched operand latches and result latches (as input and output, respectively), in front of a single pipeline. This arrangement allows a batch of operations to be presented to the -CDC6600 style "Dependency Matrices". +CDC6600-style "dependency matrices." Jacob has been working on a fascinating design: a dynamically partitionable -Adder and Multiplier Unit. Given that we are doing a Vector Processing +adder and multiplier unit. Given that we are doing a vector processing front-end onto SIMD back-end operations, it makes sense to save gates by allowing the ADD and MUL units to be able to optionally handle a batch of 8-bit operations, or half the number of 16-bit operations, or a quarter -of the number of 32-bit operations or one eighth of the number of 64-bit -operations. In this way, a lot less gates are required than if they -were separate units. The unit tests demonstrate that the code that Jacob +of the number of 32-bit operations, or one eighth of the number of 64-bit +operations. In this way, many fewer gates are required than if they +were separate units. The unit tests demonstrate that the code Jacob has written provide RISC-V mul, mulh, mulhu and mulhsu functionality. -The augmented 6600 Scoreboard took literally six weeks to correctly implement -Read-after-Write and Write-after-Read hazards. It required extraordinary -and excruciating patience to get right. Adding in Write-after-Write -however only took two days, as the infrastructure to do so had already been +The augmented 6600 scoreboard took literally six weeks to correctly implement +read-after-write and write-after-read hazards. It required extraordinary +and excruciating patience to get right. Adding in write-after-write, +however, only took two days, as the infrastructure to do so had already been developed. Currently being implemented is "branch shadowing" - this is not the -same as branch *prediction*: that is a different algorithm which, when -*combined* with "branch shadowing", provides the feature known as branch -*speculation*. This is the source of a lot of confusion about Out-of-Order -designs in general. It seems to be assumed that an OoO design *has* to -have branch speculation: it doesn't. It's just that, given all the -pieces, adding in branch speculation is actually quite straightforward, -and provides such a high performance increase that it is hard to justify -leaving it out. +same as branch *prediction* - that is a different algorithm which, +when *combined* with "branch shadowing," provides the feature known as +branch *speculation*. This is the source of a lot of confusion about +out-of-order (OoO) designs in general. It seems to be assumed that an +OoO design *has* to have branch speculation: it doesn't. It's just +that, given all the pieces, adding in branch speculation is actually +quite straightforward, and provides such a high-performance increase +that it is hard to justify leaving it out. One huge surprise came out of a recent discussion with Mitch Alsup. It has been assumed all along that turning this design from a single-issue to multi-issue would be difficult, or require significant gates and latency -to do so. The "simple" approach to do multi-issue, using the Dependency -Matrices, would be to analyse a batch of instructions, and if there are +to do so. The "simple" approach to do multi-issue, using the dependency +matrices, would be to analyse a batch of instructions, and if there are no overlaps (no registers in common), allow that batch to proceed in parallel. This is a naive approach. @@ -110,15 +108,15 @@ family that AMD had to publish "Intel equivalent" speed numbers for, because it was so much more efficient and effective than Intel's designs) each instruction "accumulated" the dependencies of all prior instructions being issued in the same batch. This works because read and write -dependencies are *transitive* (google it...) +dependencies are *transitive* (Google it...) What that means, in practical terms, is that we have a way to create a -design that could, if ramped up, "take on the big boys". To make that +design that could, if ramped up, take on the big boys. To make that clear: there's no technical barrier that would prevent us from creating a quad issue (or higher) design. -There is still a heck of a lot to get done, here. However, it has to -be said that actually adding an instruction decoder onto the 6600 style -Dependency Matrices is relatively straightforward, this being RISC, after +There is still a heck of a lot to get done. However, it has to +be said that actually adding an instruction decoder onto the 6600-style +dependency matrices is relatively straightforward, this being RISC, after all. It is possible, then, that we may have a subset of functionality operational far sooner than anticipated.