Envelope-to: publicinbox@libre-riscv.org
Delivery-date: Wed, 25 Mar 2020 10:54:33 +0000
Message-ID: <29b1a9ecedda151dc9c8da6516c3691dfede62ef.camel@fibraservi.eu>
From: Staf Verhaegen <staf@fibraservi.eu>
To: Libre RISC-V dev list <libre-riscv-dev@lists.libre-riscv.org>
Date: Wed, 25 Mar 2020 11:54:20 +0100
In-Reply-To: <CAPweEDx5QCCKxSr1gfuyuw_2D68Ld8fK85bEmmMTZi8S3w2E9g@mail.gmail.com>
References: <CAPweEDx5QCCKxSr1gfuyuw_2D68Ld8fK85bEmmMTZi8S3w2E9g@mail.gmail.com>
Organization: FibraServi bvba
Mime-Version: 1.0
Subject: Re: [libre-riscv-dev] cache SRAM organisation
Precedence: list
Reply-To: Libre-RISCV General Development
 <libre-riscv-dev@lists.libre-riscv.org>
Content-Type: multipart/mixed; boundary="===============2511390821872612374=="
Errors-To: libre-riscv-dev-bounces@lists.libre-riscv.org
Sender: "libre-riscv-dev" <libre-riscv-dev-bounces@lists.libre-riscv.org>


--===============2511390821872612374==
Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature";
	boundary="=-hcKE8vRVlmNXPJh1IKgI"


--=-hcKE8vRVlmNXPJh1IKgI
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Libre-SOC developers,

That discussion is mainly on system level and I don't want to get too deep =
into this as I don't have time for that.
I am providing the SRAM blocks and then it is up to the system guys to see =
how they use them. In this case you guys (libre-soc + LIP6) are the system =
guys.

On ASICs commonly three types of SRAM are provided: a single port RAM, a 2-=
port RAM and a dual port RAM. Currently for NLNet only a single port SRAM i=
s foreseen as this is the most common, the smallest in area per bit and the=
 fastest.
A single port SRAM has one port where you can do a read or a write each clo=
ck cycle. The 2-port one has one read port and one write port so you can do=
 a read and write each clock cycle. The dual port one now has two ports tha=
t each can do a read or write each clock cycle. So you can do two reads, tw=
o write or a read+write each clock cycle.
For each of them you can have a synchronous or an asynchronous version. A s=
ynchronous RAM has a clock input and the address and data inputs are latche=
d on that clock signal. It thus means that the FFs are integrated in the SR=
AM, e.g. thus very close :) . The RAM currently being developed in my NLNet=
 project is a synchronous SRAM as this is easier from timing point of view =
because all the timing can be related to the clock. A synchronous RAM actua=
lly functions as an addressable bunch of FFs and the synthesis and P&R tool=
s know how to handle them.

Given this building block you can now make blocks that look to the outside =
world as higher number port blocks. You do this by instantiating multiple R=
AM blocks and make sure that the content is mirrored between all the blocks=
. This way you can read from the different blocks in parallel. Writing in t=
he blocks still has to happen to all the blocks at the same time.

So if you take four single port SRAM blocks you can make a four port SRAM b=
lock. Each cycle you can do 1-4 reads or 1 write but you can't read and wri=
te at the same time. With four 2-port RAMs you can do 4 reads and 1 write e=
ach clock cycle. With four dual port RAMs you can do 4 reads or 3 reads + 1=
 write or 2 reads + 2 writes each cycle.
I will provide the single block, the combining of the block has to happen i=
n RTL/HDL. For Libre-SOC this means in nmigen and using Coriolis for placem=
ent and connecting the single blocks.

Although the SRAM does an operation each clock cycle the clock frequency co=
uld be different from the rest of the logic. If the RAM is fast enough it c=
ould run at double the frequency of the core so basically a single port RAM=
 could look like a dual port RAM to the rest of the logic which is running =
at half the frequency. If the RAM is not fast enough wait  states need to b=
e implemented for each operation. The maximum clock frequency will go down =
when you increase the size of a RAM block. So on CPU typically L1 cache run=
s at the same clock frequency as the core without any wait states and highe=
r level caches are bigger but also introduce more wait states for accessing=
 them.
If you are thinking about having different clock frequencies in your design=
 you have to first discuss this with Jean-Paul/LIP6 as doing multi clock de=
signs is opening it's own can of worms (cross clock domain problems etc). F=
or the October prototype I feel we need to stick with use of single port SR=
AM block and run the whole chip from the same clock. IMO, on this prototype=
 you should take any performance implication this has.

greets,
Staf.
Luke Kenneth Casson Leighton schreef op di 24-03-2020 om 22:32 [+0000]:
> https://groups.google.com/d/msg/comp.arch/cbGAlcCjiZE/mgMZVINVIAAJ
> Staf can i ask you the favour of reviewing Mitch's comments about cache d=
esign?
> in particular the comments about the possibility of using multiported SRA=
M cells as long as only 1R or 1W is done on any given cell?
> also something about doing the FFs yourself, close to the SRAM cells?
> l.
>=20
>=20


--=-hcKE8vRVlmNXPJh1IKgI--


--===============2511390821872612374==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline

X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbGlicmUtcmlz
Y3YtZGV2IG1haWxpbmcgbGlzdApsaWJyZS1yaXNjdi1kZXZAbGlzdHMubGlicmUtcmlzY3Yub3Jn
Cmh0dHA6Ly9saXN0cy5saWJyZS1yaXNjdi5vcmcvbWFpbG1hbi9saXN0aW5mby9saWJyZS1yaXNj
di1kZXYK

--===============2511390821872612374==--