From: Staf Verhaegen Date: Sat, 28 Mar 2020 14:08:25 +0000 (+0100) Subject: [libre-riscv-dev] Clock Gating (was cache SRAM organisation) X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=d14221be2a5e62e16bd5f4964e9bad2cfb94c518;p=libre-riscv-dev.git [libre-riscv-dev] Clock Gating (was cache SRAM organisation) --- diff --git a/95/818b78af08c4c32311c2c8a5847f49caa203ca b/95/818b78af08c4c32311c2c8a5847f49caa203ca new file mode 100644 index 0000000..ca2cbd3 --- /dev/null +++ b/95/818b78af08c4c32311c2c8a5847f49caa203ca @@ -0,0 +1,152 @@ +Return-path: +Envelope-to: publicinbox@libre-riscv.org +Delivery-date: Sat, 28 Mar 2020 14:08:34 +0000 +Received: from localhost ([::1] helo=libre-riscv.org) + by libre-riscv.org with esmtp (Exim 4.89) + (envelope-from ) + id 1jIC8W-0004aH-R7; Sat, 28 Mar 2020 14:08:32 +0000 +Received: from vps2.stafverhaegen.be ([85.10.201.15]) + by libre-riscv.org with esmtp (Exim 4.89) + (envelope-from ) id 1jIC8U-0004aB-Gu + for libre-riscv-dev@lists.libre-riscv.org; Sat, 28 Mar 2020 14:08:30 +0000 +Received: from hpdc7800 (hpdc7800 [10.0.0.1]) + by vps2.stafverhaegen.be (Postfix) with ESMTP id CAAF811C040B + for ; + Sat, 28 Mar 2020 15:08:29 +0100 (CET) +Message-ID: <0d35e45bd81eeaecedeb64dc5061c1e33c89630c.camel@fibraservi.eu> +From: Staf Verhaegen +To: libre-riscv-dev@lists.libre-riscv.org +Date: Sat, 28 Mar 2020 15:08:25 +0100 +In-Reply-To: +References: + <29b1a9ecedda151dc9c8da6516c3691dfede62ef.camel@fibraservi.eu> + + <6fa40cb78b3f8c013ca4953ccb4daa5c23e3b501.camel@fibraservi.eu> + + + + + + <6fbfb2a3258be77f4fce69661b283dc31a683f7b.camel@fibraservi.eu> + + <9e44930a0332eff507661e617796b9d0674b0e05.camel@fibraservi.eu> + +Organization: FibraServi bvba +X-Mailer: Evolution 3.28.5 (3.28.5-5.el7) +Mime-Version: 1.0 +X-Content-Filtered-By: Mailman/MimeDel 2.1.23 +Subject: [libre-riscv-dev] Clock Gating (was cache SRAM organisation) +X-BeenThere: libre-riscv-dev@lists.libre-riscv.org +X-Mailman-Version: 2.1.23 +Precedence: list +List-Id: Libre-RISCV General Development + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +Reply-To: Libre-RISCV General Development + +Content-Type: multipart/mixed; boundary="===============5623462699142308802==" +Errors-To: libre-riscv-dev-bounces@lists.libre-riscv.org +Sender: "libre-riscv-dev" + + +--===============5623462699142308802== +Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; + boundary="=-FBhRFmonkZ2xasih2XbI" + + +--=-FBhRFmonkZ2xasih2XbI +Content-Type: text/plain; charset="UTF-8" +Content-Transfer-Encoding: quoted-printable + +Luke Kenneth Casson Leighton schreef op vr 27-03-2020 om 10:59 [+0000]: +> On Fri, Mar 27, 2020 at 10:36 AM Staf Verhaegen wrot= +e: +> > Yes and no, it is the basic functionality of a pipeline :( +>=20 +> yes. +> > You have the same latency but can have double the number of operations = +in flight. +>=20 +> yes. hence why it is so important to have, because double the numberof o= +perations means that we need double the number of Function Unitsin the Depe= +ndency Matrix in order to keep the entire out-of-orderengine occupied. +> also, double the number of operations in flight means that we needdouble = +the number of Branch Prediction Units, and much more complexBPUs at that, j= +ust to deal with the (now very likely) scenario ofhaving far more overlappi= +ng inner loops "in flight". +> all this from just extending the pipeline length(s) from 5 to 10. soit's= + not just a "nice-to-have" feature, it's actually really importantto keepin= +g the overall size of the chip down. + +There is an (IMO better) alternative for what you are doing with your pass-= +through registers and that is clock gating (wikipedia, allaboutcircuits). +The principle is that you save power by not clocking the parts of the circu= +it that don't have to do any computing. I think this could be a more genera= +l way to only enable the stages in your pipeline who actually are doing com= +putation. +In the above example you would always use a 10 stage pipeline running at 16= +00MHz but to mimic the 5-stage pipeline you only submit an operation every = +other clock cycle and intermittently enable the odd and even stages in your= + pipeline. This way the MUXes are removed from the computation path. +Using a shift register it could be easily generalized to only enable the st= +ages for which there is an operation going through the pipeline. When an op= +eration is submitted you set the first bit in the shift register to enable = +the first stage in the pipeline. With each cycle you then shift this bit so= + the stage that is needed for the execution of that operation is active. +This is generalized power optimization because it means that if you are run= +ning a program that only uses integer operations your FPU and GPU with use = +almost no power. + +The way to implement it is using EnableInserter. Some untested code how I t= +hink it can be done: + + stages_en =3D Signal(10) + stage1 =3D EnableInserter(stages_en[0])(Stage1()) + stage2 =3D EnableInserter(stages_en[1])(Stage2()) + ... + + m.d.sync +=3D stages_en.eq(Cat(newop, stages_en[0:9])) + +That said I think this feature does not fit in the MVP scope of the October= + prototype so that chip should IMO not use clock gating nor the pass-throug= +h register feature from the original discussion. Reason is that implementin= +g it is easier said than done. Several things need to be done: +- You first need a clock gating cell. This is not available in nsxlib and i= +s currently not planned to be implemented. I don't want to commit to someth= +ing extra for the May test chip tape-out either. +- nmigen/yosys needs to properly support clock gating for ASICs. Likely thi= +s means work in yosys that insert the clock gates from if clauses in the RT= +L. +- Your P&R tool (e.g. Coriolis) needs to support the clock gates. It means = +your clock tree synthesis (CTS) needs to support more than just buffers in = +the clock tree. This is not a simple task and has to be discussed with Jean= +-Paul & co. + +greets, +Staf. + +--=-FBhRFmonkZ2xasih2XbI-- + + + +--===============5623462699142308802== +Content-Type: text/plain; charset="utf-8" +MIME-Version: 1.0 +Content-Transfer-Encoding: base64 +Content-Disposition: inline + +X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbGlicmUtcmlz +Y3YtZGV2IG1haWxpbmcgbGlzdApsaWJyZS1yaXNjdi1kZXZAbGlzdHMubGlicmUtcmlzY3Yub3Jn +Cmh0dHA6Ly9saXN0cy5saWJyZS1yaXNjdi5vcmcvbWFpbG1hbi9saXN0aW5mby9saWJyZS1yaXNj +di1kZXYK + +--===============5623462699142308802==-- + + +