git.libre-soc.org Git - microwatt.git/commit

author	Paul Mackerras <paulus@ozlabs.org>
	Fri, 18 Dec 2020 22:25:04 +0000 (09:25 +1100)
committer	Paul Mackerras <paulus@ozlabs.org>
	Mon, 18 Jan 2021 11:32:54 +0000 (22:32 +1100)
commit	0fb207be606969e7fb8b55241461596c2792c3dc
tree	ecb6e0d0400c489e14dcbf5e2364e0eb3af28ede	tree
parent	f7b855dfc36cd1d916e019ab31edbcc679077255	commit \| diff

fetch1: Implement a simple branch target cache

This implements a cache in fetch1, where each entry stores the address
of a simple branch instruction (b or bc) and the target of the branch.
When fetching sequentially, if the address being fetched matches the
cache entry, then fetching will be redirected to the branch target.
The cache has 1024 entries and is direct-mapped, i.e. indexed by bits
11..2 of the NIA.

The bus from execute1 now carries information about taken and
not-taken simple branches, which fetch1 uses to update the cache.
The cache entry is updated for both taken and not-taken branches, with
the valid bit being set if the branch was taken and cleared if the
branch was not taken.

If fetching is redirected to the branch target then that goes down the
pipe as a predicted-taken branch, and decode1 does not do any static
branch prediction.  If fetching is not redirected, then the next
instruction goes down the pipe as normal and decode1 does its static
branch prediction.

In order to make timing, the lookup of the cache is pipelined, so on
each cycle the cache entry for the current NIA + 8 is read.  This
means that after a redirect (from decode1 or execute1), only the third
and subsequent sequentially-fetched instructions will be able to be
predicted.

This improves the coremark value on the Arty A7-100 from about 180 to
about 190 (more than 5%).

The BTC is optional.  Builds for the Artix 7 35-T part have it off by
default because the extra ~1420 LUTs it takes mean that the design
doesn't fit on the Arty A7-35 board.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

common.vhdl		diff \| blob \| history
core.vhdl		diff \| blob \| history
decode1.vhdl		diff \| blob \| history
execute1.vhdl		diff \| blob \| history
fetch1.vhdl		diff \| blob \| history
fpga/top-arty.vhdl		diff \| blob \| history
fpga/top-generic.vhdl		diff \| blob \| history
fpga/top-nexys-video.vhdl		diff \| blob \| history
icache.vhdl		diff \| blob \| history
microwatt.core		diff \| blob \| history
soc.vhdl		diff \| blob \| history