Return-path: Envelope-to: publicinbox@libre-riscv.org Delivery-date: Fri, 27 Mar 2020 09:25:39 +0000 Received: from localhost ([::1] helo=libre-riscv.org) by libre-riscv.org with esmtp (Exim 4.89) (envelope-from ) id 1jHlFC-0002wu-RM; Fri, 27 Mar 2020 09:25:38 +0000 Received: from vps2.stafverhaegen.be ([85.10.201.15]) by libre-riscv.org with esmtp (Exim 4.89) (envelope-from ) id 1jHlFB-0002wo-1Q for libre-riscv-dev@lists.libre-riscv.org; Fri, 27 Mar 2020 09:25:37 +0000 Received: from hpdc7800 (hpdc7800 [10.0.0.1]) by vps2.stafverhaegen.be (Postfix) with ESMTP id C48B511C05D7 for ; Fri, 27 Mar 2020 10:25:36 +0100 (CET) Message-ID: <6fbfb2a3258be77f4fce69661b283dc31a683f7b.camel@fibraservi.eu> From: Staf Verhaegen To: libre-riscv-dev@lists.libre-riscv.org Date: Fri, 27 Mar 2020 10:25:24 +0100 In-Reply-To: References: <29b1a9ecedda151dc9c8da6516c3691dfede62ef.camel@fibraservi.eu> <6fa40cb78b3f8c013ca4953ccb4daa5c23e3b501.camel@fibraservi.eu> Organization: FibraServi bvba X-Mailer: Evolution 3.28.5 (3.28.5-5.el7) Mime-Version: 1.0 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 Subject: Re: [libre-riscv-dev] cache SRAM organisation X-BeenThere: libre-riscv-dev@lists.libre-riscv.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Libre-RISCV General Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Libre-RISCV General Development Content-Type: multipart/mixed; boundary="===============2592891676999365254==" Errors-To: libre-riscv-dev-bounces@lists.libre-riscv.org Sender: "libre-riscv-dev" --===============2592891676999365254== Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-VGIfDE4gJhKTDrBI5//N" --=-VGIfDE4gJhKTDrBI5//N Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Luke Kenneth Casson Leighton schreef op do 26-03-2020 om 21:37 [+0000]: > ---crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma6= 8 >=20 > On Thu, Mar 26, 2020 at 8:18 PM Staf Verhaegen wrote= : > > Luke Kenneth Casson Leighton schreef op do 26-03-2020 om 13:05 [+0000]: > > > On Thursday, March 26, 2020, Staf Verhaegen wrot= e: > > > > Would like to make separate side remark here. In ASICs MUXes are re= lativeexpensive gates with respect to delay and power. So if this principle= isgenerally applied over the whole design it will make it difficult to mak= e achip that is competitive in power/performance compared to ARM/x86 CPUs. > > >=20 > > >=20 > > > just the ALU pipeline registers. we felt that the advantage of being= ableto drop to say 500mhz and halve the number of pipeline stages to say 5= , andalso be able to ramp up to 1.6ghz and double bavk up to 10 stages, was= worth considering. > >=20 > > What would be the advantage over running at 800Mhz with 5 pipeline stag= es ? >=20 > i assume you mean fixed 5-pipeline stages. > the problem is, if you *want* to run at 1.6ghz and have complexpipeline s= tages, you simply can't: 5 stages are too long, the gatepropagation delay i= s too large. the only way to get to 1.6hz is:split those 5 stages into 10 = smaller stages. > the problem with _that_ is: if you then run those 10 stages at say800mhz,= or say even 400 mhz or 100mhz (because you are in power-savingmode), you j= ust *massively* increased the latency for completion ofany given operation. > so even though those 10 stages are so fast (because you are in 14nm)that,= at 100mhz, they complete in under 5% of a 100mhz clock rate, ifyou have a = fixed 10-stage pipeline you are absolutely screwed, you*have* to have the p= enalty of the 10-stage pipeline latency. > screwed 1: 5-stage pipeline FORCES you to ONLY be able to run atBELOW (e= .g) 800mhz > screwed 2: 10-stage pipeline FORCES you to have massive instructioncomple= tion latency at below (e.g.) 800mhz. > solution: give every other pipeline stage's registers a "combinatorial by= pass". > un-screwed 1: when speed is above 800mhz, switch off the combinatorialbyp= ass, pipeline becomes 10-stage. > un-screwed 2: when speed is below 800mhz, switch ON the combinatorialbypa= ss, latency due to slower clock rate DISAPPEARS because allpipelines are no= w only 5-stage, not 10. My point is that you will have the same performance for the fixed 5-stage p= ipeline running @ 800MHz as for the 10-stage pipeline running @ 1600MHz. Wh= y do want to run @1600MHz ? Actually the fixed 5-stage 800MHz capable pipeline will not be able to run = @1600MHz when converted to configurable 5/10-stage pipeline due to the addi= tional delay from the MUXes inserted in the path plus the fact that you lik= ely can't split up each stage in two stages with each exact the half of the= delay. greets, Staf. --=-VGIfDE4gJhKTDrBI5//N-- --===============2592891676999365254== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbGlicmUtcmlz Y3YtZGV2IG1haWxpbmcgbGlzdApsaWJyZS1yaXNjdi1kZXZAbGlzdHMubGlicmUtcmlzY3Yub3Jn Cmh0dHA6Ly9saXN0cy5saWJyZS1yaXNjdi5vcmcvbWFpbG1hbi9saXN0aW5mby9saWJyZS1yaXNj di1kZXYK --===============2592891676999365254==--