Return-path: Envelope-to: publicinbox@libre-riscv.org Delivery-date: Wed, 25 Mar 2020 10:54:33 +0000 Received: from localhost ([::1] helo=libre-riscv.org) by libre-riscv.org with esmtp (Exim 4.89) (envelope-from ) id 1jH3g8-0006Xs-H7; Wed, 25 Mar 2020 10:54:32 +0000 Received: from vps2.stafverhaegen.be ([85.10.201.15]) by libre-riscv.org with esmtp (Exim 4.89) (envelope-from ) id 1jH3g7-0006Xm-BV for libre-riscv-dev@lists.libre-riscv.org; Wed, 25 Mar 2020 10:54:31 +0000 Received: from hpdc7800 (hpdc7800 [10.0.0.1]) by vps2.stafverhaegen.be (Postfix) with ESMTP id 96CA511C027D for ; Wed, 25 Mar 2020 11:54:30 +0100 (CET) Message-ID: <29b1a9ecedda151dc9c8da6516c3691dfede62ef.camel@fibraservi.eu> From: Staf Verhaegen To: Libre RISC-V dev list Date: Wed, 25 Mar 2020 11:54:20 +0100 In-Reply-To: References: Organization: FibraServi bvba X-Mailer: Evolution 3.28.5 (3.28.5-5.el7) Mime-Version: 1.0 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 Subject: Re: [libre-riscv-dev] cache SRAM organisation X-BeenThere: libre-riscv-dev@lists.libre-riscv.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Libre-RISCV General Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Libre-RISCV General Development Content-Type: multipart/mixed; boundary="===============2511390821872612374==" Errors-To: libre-riscv-dev-bounces@lists.libre-riscv.org Sender: "libre-riscv-dev" --===============2511390821872612374== Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-hcKE8vRVlmNXPJh1IKgI" --=-hcKE8vRVlmNXPJh1IKgI Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Libre-SOC developers, That discussion is mainly on system level and I don't want to get too deep = into this as I don't have time for that. I am providing the SRAM blocks and then it is up to the system guys to see = how they use them. In this case you guys (libre-soc + LIP6) are the system = guys. On ASICs commonly three types of SRAM are provided: a single port RAM, a 2-= port RAM and a dual port RAM. Currently for NLNet only a single port SRAM i= s foreseen as this is the most common, the smallest in area per bit and the= fastest. A single port SRAM has one port where you can do a read or a write each clo= ck cycle. The 2-port one has one read port and one write port so you can do= a read and write each clock cycle. The dual port one now has two ports tha= t each can do a read or write each clock cycle. So you can do two reads, tw= o write or a read+write each clock cycle. For each of them you can have a synchronous or an asynchronous version. A s= ynchronous RAM has a clock input and the address and data inputs are latche= d on that clock signal. It thus means that the FFs are integrated in the SR= AM, e.g. thus very close :) . The RAM currently being developed in my NLNet= project is a synchronous SRAM as this is easier from timing point of view = because all the timing can be related to the clock. A synchronous RAM actua= lly functions as an addressable bunch of FFs and the synthesis and P&R tool= s know how to handle them. Given this building block you can now make blocks that look to the outside = world as higher number port blocks. You do this by instantiating multiple R= AM blocks and make sure that the content is mirrored between all the blocks= . This way you can read from the different blocks in parallel. Writing in t= he blocks still has to happen to all the blocks at the same time. So if you take four single port SRAM blocks you can make a four port SRAM b= lock. Each cycle you can do 1-4 reads or 1 write but you can't read and wri= te at the same time. With four 2-port RAMs you can do 4 reads and 1 write e= ach clock cycle. With four dual port RAMs you can do 4 reads or 3 reads + 1= write or 2 reads + 2 writes each cycle. I will provide the single block, the combining of the block has to happen i= n RTL/HDL. For Libre-SOC this means in nmigen and using Coriolis for placem= ent and connecting the single blocks. Although the SRAM does an operation each clock cycle the clock frequency co= uld be different from the rest of the logic. If the RAM is fast enough it c= ould run at double the frequency of the core so basically a single port RAM= could look like a dual port RAM to the rest of the logic which is running = at half the frequency. If the RAM is not fast enough wait states need to b= e implemented for each operation. The maximum clock frequency will go down = when you increase the size of a RAM block. So on CPU typically L1 cache run= s at the same clock frequency as the core without any wait states and highe= r level caches are bigger but also introduce more wait states for accessing= them. If you are thinking about having different clock frequencies in your design= you have to first discuss this with Jean-Paul/LIP6 as doing multi clock de= signs is opening it's own can of worms (cross clock domain problems etc). F= or the October prototype I feel we need to stick with use of single port SR= AM block and run the whole chip from the same clock. IMO, on this prototype= you should take any performance implication this has. greets, Staf. Luke Kenneth Casson Leighton schreef op di 24-03-2020 om 22:32 [+0000]: > https://groups.google.com/d/msg/comp.arch/cbGAlcCjiZE/mgMZVINVIAAJ > Staf can i ask you the favour of reviewing Mitch's comments about cache d= esign? > in particular the comments about the possibility of using multiported SRA= M cells as long as only 1R or 1W is done on any given cell? > also something about doing the FFs yourself, close to the SRAM cells? > l. >=20 >=20 --=-hcKE8vRVlmNXPJh1IKgI-- --===============2511390821872612374== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbGlicmUtcmlz Y3YtZGV2IG1haWxpbmcgbGlzdApsaWJyZS1yaXNjdi1kZXZAbGlzdHMubGlicmUtcmlzY3Yub3Jn Cmh0dHA6Ly9saXN0cy5saWJyZS1yaXNjdi5vcmcvbWFpbG1hbi9saXN0aW5mby9saWJyZS1yaXNj di1kZXYK --===============2511390821872612374==--