Paul Mackerras [Sat, 2 Jul 2022 04:17:18 +0000 (14:17 +1000)]
FPU: Add stage-2 stall ability to FPU
This makes the FPU able to stall other units at execute stage 2 and be
stalled by other units (specifically the LSU).
This means that the completion and writeback for an instruction can
now end up being deferred until the second cycle of a following
instruction, i.e. the cycle when the state machine has gone through
IDLE state into one of the DO_* states, which means we need to latch
the destination FPR number, CR mask, etc. from the previous
instruction so that we present the correct information to writeback.
The advantage of this is that we can get rid of the in_progress signal
from the LSU.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras [Tue, 28 Jun 2022 08:18:08 +0000 (18:18 +1000)]
Do CR0 setting for Rc=1 instructions in execute2 instead of writeback
This lets us forward the CR0 result to following instructions that
use CR, meaning they get to issue one cycle earlier.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras [Mon, 27 Jun 2022 08:53:04 +0000 (18:53 +1000)]
Allow integer instructions and load/store instructions to execute together
Execute1 and loadstore1 now send each other stall signals that
indicate that a valid instruction in stage 2 can't complete in this
cycle, and hence any valid instruction in stage 1 in the other unit
can't move to stage 2. With this in place, an ALU instruction can
move into stage 1 while a LSU instruction is in stage 2.
Since the FPU doesn't yet have a way to stall completion, we can't yet
start FPU instructions while any LSU or ALU instruction is in
progress.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras [Mon, 27 Jun 2022 22:40:42 +0000 (08:40 +1000)]
Add a bypass path from the execute2 stage
This enables some instructions to issue earlier and thus improves
performance, at the cost of some extra multiplexers in decode2.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras [Thu, 30 Jun 2022 10:33:33 +0000 (20:33 +1000)]
Add a second execute stage to the pipeline
This adds a second execute stage to the pipeline, in order to match up
the length of the pipeline through loadstore and dcache with the
length through execute1. This will ultimately enable us to get rid of
the 1-cycle bubble that we currently have when issuing ALU
instructions after one or more LSU instructions.
Most ALU instructions execute in the first stage, except for
count-zeroes and popcount instructions (which take two cycles and do
some of their work in the second stage) and mfspr/mtspr to "slow" SPRs
(TB, DEC, PVR, LOGA/LOGD, CFAR). Multiply and divide/mod instructions
take several cycles but the instruction stays in the first stage (ex1)
and ex1.busy is asserted until the operation is complete.
There is currently a bypass from the first stage but not the second
stage. Performance is down somewhat because of that and because this
doesn't yet eliminate the bubble between LSU and ALU instructions.
The forwarding of XER common bits has been changed somewhat because
now there is another pipeline stage between ex1 and the committed
state in cr_file. The simplest thing for now is to record the last
value written and use that, unless there has been a flush, in which
case the committed state (obtained via e_in.xerc) is used.
Note that this fixes what was previously a benign bug in control.vhdl,
where it was possible for control to forget an instructions dependency
on a value from a previous instruction (a GPR or the CR) if this
instruction writes the value and the instruction gets to the point
where it could issue but is blocked by the busy signal from execute1.
In that situation, control may incorrectly not indicate that a bypass
should be used. That didn't matter previously because, for ALU and
FPU instructions, there was only one previous instruction in flight
and once the current instruction could issue, the previous instruction
was completing and the correct value would be obtained from
register_file or cr_file. For loadstore instructions there could be
two being executed, but because there are no bypass paths, failing to
indicate use of a bypass path is fine.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras [Sat, 18 Jun 2022 07:29:43 +0000 (17:29 +1000)]
execute1: Rename 'r' to 'ex1'
Maybe this will give us slightly better names in critical path reports
and the like.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras [Sat, 18 Jun 2022 06:24:30 +0000 (16:24 +1000)]
execute1: Restructure to separate out execution of side effects
We now have a record that represents the actions taken in executing an
instruction, and a process that computes that for the incoming
instruction. We no longer have 'current' or 'r.cur_instr', instead
things like the destination register are put into r.e in the first
cycle of an instruction and not reinitialized in subsequent busy
cycles.
For mfspr and mtspr, we now decode "slow" SPR numbers (those SPRs that
are not stored in the register file) to a new "spr_selector" record
in decode1 (excluding those in the loadstore unit). With this, the
result for mfspr is determined in the data path.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras [Wed, 29 Jun 2022 10:02:36 +0000 (20:02 +1000)]
Move XER low bits out of register file
Besides the overflow and status carry bits, XER has 18 bits which need
to retain the value written by mtxer (in case software wants to
emulate the move-assist instructions (lswi, lswx, stswi, stswx).
Until now these bits (and others) have been stored in the GPR file as
a "fast" SPR, but this causes complications because XER is not really
a fast SPR.
Instead, we now store these 18 bits in the 'ctrl' signal, which exists
in execute1. This will enable us to simplify the data path in future,
and has the added bonus that with a little bit of plumbing, we can get
the full XER value printed when dumping registers at the end of a
simulation.
Therefore this changes scripts/run_test.sh to remove the greps which
exclude XER from the comparison of actual and expected register
results.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras [Sat, 11 Jun 2022 09:20:57 +0000 (19:20 +1000)]
Simplify flow control in the dcache and loadstore units
Simplify the flow control by stalling the whole upstream pipeline when
a stage can't proceed, instead of trying to let each stage progress
independently when it can.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras [Thu, 16 Jun 2022 23:46:57 +0000 (09:46 +1000)]
Merge pull request #353 from tianrui-wei/master
fix: fix icache_tb not finishing correctly
Michael Neuling [Thu, 16 Jun 2022 23:13:49 +0000 (09:13 +1000)]
Merge pull request #373 from antonblanchard/icache-insn-u-state
icache: Don't output X on i_out.insn
Michael Neuling [Thu, 16 Jun 2022 06:47:33 +0000 (16:47 +1000)]
Merge pull request #376 from antonblanchard/loadstore-init
loadstore1: reduce U state being output
Michael Neuling [Thu, 16 Jun 2022 06:45:41 +0000 (16:45 +1000)]
Merge pull request #374 from antonblanchard/icache-unused-sig
core: Remove unused icache_inv signal
Michael Neuling [Thu, 16 Jun 2022 04:38:12 +0000 (14:38 +1000)]
Merge pull request #364 from shenki/readme-updates
Readme updates
Michael Neuling [Thu, 16 Jun 2022 04:36:50 +0000 (14:36 +1000)]
Merge pull request #372 from antonblanchard/dcache-unused-sig
dcache: remove unused do_write signal
Michael Neuling [Thu, 16 Jun 2022 04:35:10 +0000 (14:35 +1000)]
Merge pull request #371 from antonblanchard/unused-sig
execute1: sub_mux_sel and result_mux_sel are unused
Michael Neuling [Thu, 16 Jun 2022 04:33:45 +0000 (14:33 +1000)]
Merge pull request #370 from antonblanchard/divider-init
divider: Fix d_out.overflow U state issue
Paul Mackerras [Wed, 15 Jun 2022 01:02:58 +0000 (11:02 +1000)]
Merge pull request #368 from antonblanchard/icache-pmu-events
icache: Hook up PMU events
Anton Blanchard [Tue, 14 Jun 2022 08:10:37 +0000 (18:10 +1000)]
Merge pull request #377 from antonblanchard/fpu-init
fpu: Reduce uninitialised signals
Anton Blanchard [Tue, 14 Jun 2022 05:14:19 +0000 (15:14 +1000)]
fpu: Reduce uninitialised signals
Reduce uninitialised signals coming out of the FPU.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Michael Neuling [Tue, 14 Jun 2022 03:09:57 +0000 (13:09 +1000)]
Merge pull request #366 from antonblanchard/hello-world-bss
Zero BSS in hello world test
Anton Blanchard [Sun, 12 Jun 2022 21:15:55 +0000 (07:15 +1000)]
Merge pull request #375 from antonblanchard/core_debug-init
core_debug: Initialise gspr_index
Anton Blanchard [Sun, 12 Jun 2022 12:15:11 +0000 (22:15 +1000)]
loadstore1: reduce U state being output
While these signals should only be read when valid is true, they
are only a small number of bits and we want to reduce the amount of
U/X state bouncing around the chip.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard [Sun, 12 Jun 2022 11:49:13 +0000 (21:49 +1000)]
core_debug: Initialise gspr_index
Another case of U state being driven out of a module.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard [Sun, 12 Jun 2022 11:04:16 +0000 (21:04 +1000)]
core: Remove unused icache_inv signal
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard [Sun, 12 Jun 2022 01:42:32 +0000 (11:42 +1000)]
icache: Don't output X on i_out.insn
decode1 has a lot of logic that uses i_out.insn without first looking at
i_iout.valid. Play it safe and never output X state.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard [Sun, 12 Jun 2022 01:39:31 +0000 (11:39 +1000)]
dcache: remove unused do_write signal
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard [Sun, 12 Jun 2022 00:49:26 +0000 (10:49 +1000)]
execute1: sub_mux_sel and result_mux_sel are unused
Remove them.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard [Sun, 12 Jun 2022 00:34:20 +0000 (10:34 +1000)]
divider: Fix d_out.overflow U state issue
While we should only look at this when d_out.valid = 1, we may as remove
some U state across interfaces.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard [Sun, 12 Jun 2022 00:24:54 +0000 (10:24 +1000)]
Merge pull request #369 from antonblanchard/loadstore-pmu-init
loadstore1: Initialise PMU events
Anton Blanchard [Sat, 11 Jun 2022 23:29:46 +0000 (09:29 +1000)]
loadstore1: Initialise PMU events
The loadstore1 PMU events are U state until a load and a store completes.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard [Sat, 11 Jun 2022 23:32:59 +0000 (09:32 +1000)]
Merge pull request #367 from antonblanchard/fpu-typo
fpu: Fix capitalisation of Execute1ToFPUType
Anton Blanchard [Sat, 11 Jun 2022 23:21:56 +0000 (09:21 +1000)]
icache: Hook up PMU events
We weren't connecting the icache PMU events up.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard [Thu, 9 Jun 2022 22:10:27 +0000 (08:10 +1000)]
fpu: Fix capitalisation of Execute1ToFPUType
While this is not an issue in VHDL, I noticed this when running
a script over the source and we may as well fix it.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard [Wed, 8 Jun 2022 05:20:07 +0000 (15:20 +1000)]
Zero BSS in hello world test
While trying to reduce U/X state issues, I notice that our BSS is not
being initialised in the hello world test.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard [Wed, 8 Jun 2022 04:54:48 +0000 (14:54 +1000)]
Merge pull request #365 from antonblanchard/less-fpga-init
Remove some FPGA style signal inits
Anton Blanchard [Tue, 7 Jun 2022 10:01:14 +0000 (20:01 +1000)]
Remove some FPGA style signal inits
These don't work on the ASIC flow, so remove them and initialise
them explicitly where required.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard [Tue, 7 Jun 2022 07:38:24 +0000 (17:38 +1000)]
Remove some FPGA style signal inits
These don't work on the ASIC flow, so remove them and initialise
them explicitly where required.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Joel Stanley [Tue, 7 Jun 2022 03:20:03 +0000 (12:50 +0930)]
README: Add Linux on Microwatt instructions
These instructions are similar to those at
https://ozlabs.org/~joel/microwatt/README
except they describe how to build the artifacts from scratch instead of
downloading them.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Joel Stanley [Tue, 7 Jun 2022 03:18:42 +0000 (12:48 +0930)]
README: Add uart to fusesoc instructions
The SoC defaults to using the uart16550 so provide instructions on how
to fetch that library when seetting up fusesoc.
Also remove the text about a working directory; fusesoc doesn't need
one.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Michael Neuling [Tue, 22 Mar 2022 00:55:54 +0000 (11:55 +1100)]
Merge pull request #361 from antonblanchard/alt-reset-address
Allow ALT_RESET_ADDRESS to be overridden
Anton Blanchard [Mon, 21 Mar 2022 22:35:17 +0000 (09:35 +1100)]
Allow ALT_RESET_ADDRESS to be overridden
This allows us to boot from flash for example.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Michael Neuling [Fri, 18 Mar 2022 07:28:34 +0000 (18:28 +1100)]
Merge pull request #360 from antonblanchard/log2ceil-issue
wishbone_bram_wrapper ram_addr_bits is 1 bit off
Anton Blanchard [Thu, 17 Mar 2022 07:03:29 +0000 (18:03 +1100)]
wishbone_bram_wrapper ram_addr_bits is 1 bit off
log2ceil() returns the number of bits required to store a value, so we
need to pass in memory_size-1, not memory_size.
Every other user of log2ceil() gets this right.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Michael Neuling [Tue, 15 Mar 2022 23:49:47 +0000 (10:49 +1100)]
Merge pull request #358 from antonblanchard/unused-sig
Remove unused sequential signal from Fetch1ToIcacheType
Michael Neuling [Tue, 15 Mar 2022 23:49:29 +0000 (10:49 +1100)]
Merge pull request #356 from antonblanchard/fpu-constant
fpu: Make inverse_table a constant
Michael Neuling [Tue, 15 Mar 2022 23:48:59 +0000 (10:48 +1100)]
Merge pull request #357 from antonblanchard/xics-warning
xics: Fix warning when comparing two std_ulogic_vectors
Anton Blanchard [Tue, 15 Mar 2022 07:27:48 +0000 (18:27 +1100)]
Remove unused sequential signal from Fetch1ToIcacheType
GHDL synthesis is flagging a warning about this.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard [Tue, 15 Mar 2022 05:04:18 +0000 (16:04 +1100)]
xics: Fix warning when comparing two std_ulogic_vectors
Use unsigned() to make it clear what we are doing.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard [Tue, 15 Mar 2022 05:03:34 +0000 (16:03 +1100)]
fpu: Make inverse_table a constant
GHDL synthesis is complaining that inverse_table is never stored to.
Change it to a constant.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Tianrui Wei [Tue, 1 Mar 2022 15:51:35 +0000 (23:51 +0800)]
fix: fix icache_tb not finishing correctly
Setting icache to be privileged and accessing physical memory directly.
And set big_endian to 0 to correspond to the testbench result.
Signed-off-by: Tianrui Wei <tianrui@tianruiwei.com>
Michael Neuling [Sun, 27 Feb 2022 21:17:50 +0000 (08:17 +1100)]
Merge pull request #352 from mkj/static-urjtag
mw_debug: Add STATIC_URJTAG flag
Matt Johnston [Fri, 25 Feb 2022 09:43:28 +0000 (17:43 +0800)]
mw_debug: Add STATIC_URJTAG flag
Revert to linking dynamically by default, can statically link with
`make STATIC_URJTAG=1`
Fixes #351
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Michael Neuling [Fri, 25 Feb 2022 02:18:38 +0000 (13:18 +1100)]
Update the README Issues (#350)
We've had these for a while now:
- D/I cache
- GPR bypassing
- Supervisor state (and can boot linux)
We still need Vector/VMX/VSX (and probably some other things)
Signed-off-by: Michael Neuling <mikey@neuling.org>
Michael Neuling [Fri, 25 Feb 2022 00:08:57 +0000 (11:08 +1100)]
Merge pull request #349 from madscientist159/master
Extend LiteDRAM VHDL wrapper to allow more than one clock line
Raptor Engineering Development Team [Tue, 22 Feb 2022 17:49:33 +0000 (11:49 -0600)]
Extend LiteDRAM VHDL wrapper to allow more than one clock line
This is necessary for the upcoming Arctic Tern system enablement,
since Arctic Tern uses two DRAM devices and a separate clock line
is routed to each device. LiteX handles this behavior correctly,
therefore we assume other hardware exists that uses a similar
DRAM clock design.
Updates from Mikey to fix some compile issues.
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Michael Neuling [Wed, 23 Feb 2022 01:03:59 +0000 (12:03 +1100)]
Merge pull request #348 from paulusmack/reduce
Reduce LUT usage
Paul Mackerras [Tue, 19 Oct 2021 04:13:31 +0000 (15:13 +1100)]
xics: Rework the irq_gen process
At present, the loop in the irq_gen process generates a chain of
comparators and other logic to work out the source number and priority
of the most-favoured (lowest priority number) pending interrupt.
This replaces that chain with (1) logic to generate an array of bits,
one per priority, indicating whether any interrupt is pending at that
priority, (2) a priority encoder to select the most favoured priority
with an interrupt pending, (3) logic to generate an array of bits, one
per source, indicating whether an interrupt is pending at the priority
calculated in step 2, and (4) a priority encoder to work out the
lowest numbered source that has an interrupt pending at the selected
priority. This reduces LUT utilization.
The priority encoder function implemented here uses the optimized
count-leading-zeroes logic from helpers.vhdl.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras [Mon, 21 Feb 2022 01:06:11 +0000 (12:06 +1100)]
Use alternative count-leading-zeroes algorithm in the FPU and LSU
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras [Sun, 20 Feb 2022 22:58:07 +0000 (09:58 +1100)]
countzero: Use alternative algorithm for higher bits
This implements an alternative count-leading-zeroes algorithm which
uses less LUTs to generate the higher-order bits (2..5) of the
result.
By doing (v | -v) rather than (v & -v), we get a value which has ones
from the MSB down to the rightmost 1 bit in v and then zeroes down to
the LSB. This means that we can generate the MSB of the result (the
index of the rightmost 1 bit in v) just by looking at bits 63 and 31
of (v | -v), assuming that v is 64 bits. Bit 4 of the result requires
looking at bits 63, 47, 31 and 15. In contrast, each bit of the
result using (v & -v), which has a single 1, requires ORing together
32 bits.
It turns out that the minimum LUT usage comes from using (v & -v) to
generate bits 0 and 1 of the result, and using (v | -v) to generate
bits 2 to 5. This saves almost 60 6-input LUTs on the Artix-7.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras [Mon, 11 Oct 2021 06:46:44 +0000 (17:46 +1100)]
soc: Re-do peripheral address decode to improve timing
This generates a series of io_cycle_* signals which are clean latches
and which become the 'cyc' signals of the wishbone buses going to
various peripherals (syscon, uarts, XICS, GPIO, etc.). Effectively
this is done by moving the address decoding into the slave_io_latch
process. The slave_io_type, which drives the multiplexer which
selects which wishbone to look for a response on, is reduced to just 8
values in the expectation that an 8-way multiplexer will use less
logic than one with more than 8 inputs.
With this timing is considerably better on the A7-100T.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Michael Neuling [Mon, 7 Feb 2022 22:09:22 +0000 (09:09 +1100)]
Merge pull request #346 from mkj/dmi_ecp5
Add DMI and mw_debug for ECP5
Anton Blanchard [Mon, 7 Feb 2022 06:57:08 +0000 (17:57 +1100)]
Merge pull request #343 from mikey/orange-crab-ci
ci: Add new Orange Crab build
Matt Johnston [Fri, 4 Feb 2022 07:29:40 +0000 (15:29 +0800)]
mw_debug: Add Lattice ECP5 support
"-b ecp5" will select ECP5 interface that talks to a JTAGG
primitive.
For example with a FT232H JTAG board:
./mw_debug -t 'ft2232 vid=0x0403 pid=0x6014' -s
30000000 -b ecp5 mr
ff003888 6
Connected to libftdi driver.
Found device ID: 0x41113043
00000000ff003888:
6d6f636c65570a0a ..Welcom
00000000ff003890:
63694d206f742065 e to Mic
00000000ff003898:
2120747461776f72 rowatt !
00000000ff0038a0:
0000000000000a0a ........
00000000ff0038a8:
67697320636f5320 Soc sig
00000000ff0038b0:
203a65727574616e nature:
Core: running
NIA:
c0000000000187f8
MSR:
9000000000001033
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Fri, 26 Nov 2021 02:47:07 +0000 (10:47 +0800)]
dmi_dtm_ecp5: Use ECP5 JTAGG for DMI
This uses the JTAGG primitive which is similar to BSCANE2.
The LUT4 delay approach came from Florian and Greg in
https://github.com/enjoy-digital/litex/pull/1087
Has been tested on an OrangeCrab with 48MHz sysclk
FT232H up to 30MHz (though libusb/urjtag is by far the bottleneck vs
the JTAG clock)
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Fri, 4 Feb 2022 06:40:42 +0000 (14:40 +0800)]
mw_debug: Link urjtag statically
liburjtag isn't in Debian, so usually we're pointing at a urjtag
build directory when building mw_debug
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Fri, 4 Feb 2022 04:08:07 +0000 (12:08 +0800)]
mw_debug: use isxdigit for hex arguments
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Fri, 26 Nov 2021 02:43:06 +0000 (10:43 +0800)]
mw_debug: Add -s frequency argument
Chose -s for speed, vs -f for --force
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Thu, 25 Nov 2021 06:12:13 +0000 (14:12 +0800)]
mw_debug: pass target parameters to urjtag
An example
./mw_debug -d -t 'ft2232 vid=0x0403 pid=0x6014'
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Paul Mackerras [Mon, 11 Oct 2021 06:23:08 +0000 (17:23 +1100)]
fetch1/icache1: Remove the use_previous logic
This removes logic that I added some time ago with the thought that it
would enable us to do prefetching in the icache. This logic detects
when the fetch address is an odd multiple of 4 and the next address in
sequence from the previous cycle. In that case the instruction we
want is in the output register of the icache RAM already so there is
no need to do another read or any icache tag or TLB lookup.
However, this logic adds complexity, and removing it improves timing,
so this removes it.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras [Fri, 4 Feb 2022 00:43:42 +0000 (11:43 +1100)]
Merge pull request #345 from antonblanchard/popcnt-go-fast
popcnt* timing improvements from Paul
Paul Mackerras [Tue, 19 Oct 2021 01:22:10 +0000 (12:22 +1100)]
core: Make popcnt* take two cycles
This moves the calculation of the result for popcnt* into the
countbits unit, renamed from countzero, so that we can take two cycles
to get the result. The motivation for this is that the popcnt*
calculation was showing up as a critical path.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Michael Neuling [Tue, 18 Jan 2022 01:41:03 +0000 (12:41 +1100)]
ci: Add new Orange Crab build
This builds the Orange Crab v0.21 + litedram image
Signed-off-by: Michael Neuling <mikey@neuling.org>
Michael Neuling [Tue, 18 Jan 2022 02:27:27 +0000 (13:27 +1100)]
Merge pull request #342 from mkj/orangecrab-merge
Orangecrab working with litedram
Fixed up a few simple merge conflicts in the Makefile.
Michael Neuling [Tue, 18 Jan 2022 01:03:46 +0000 (12:03 +1100)]
Merge branch 'master' into orangecrab-merge
Michael Neuling [Tue, 18 Jan 2022 00:51:54 +0000 (11:51 +1100)]
Merge pull request #341 from mkj/progtools
orangecrab programming targets
Michael Neuling [Tue, 18 Jan 2022 00:50:22 +0000 (11:50 +1100)]
Merge pull request #340 from mkj/orangecrab-ghdl-plugin
Makefile: detect when ghdl is a yosys plugin
Matt Johnston [Fri, 19 Nov 2021 05:13:15 +0000 (13:13 +0800)]
orangecrab: Fix sdcard wishbone addressing
Orangecrab missed out on:
Make wishbone addresses be in units of doublewords or words
Author: Paul Mackerras <paulus@ozlabs.org>
Date: Wed Sep 15 18:18:09 2021 +1000
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Fri, 14 Jan 2022 00:04:18 +0000 (08:04 +0800)]
orangecrab: use litesdcard
Currently not working (tested in Linux)
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Fri, 14 Jan 2022 00:08:09 +0000 (08:08 +0800)]
litesdcard: add lattice, regenerate
Modifies litescard generate script to take a clock speed.
Regenerated verilog with latest litesdcard
e52c731 ("Bump year.")
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Mon, 23 Aug 2021 02:30:40 +0000 (10:30 +0800)]
orangecrab: No BTC, LOG_LENGTH, dram NUM_LINES
Reduce litedram NUM_LINES 64->8
This allows us to meet timing. Can probably
be improved in future with better BRAM usage.
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Thu, 13 Jan 2022 08:51:57 +0000 (16:51 +0800)]
orangecrab: Use litedram
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Wed, 24 Nov 2021 08:47:16 +0000 (16:47 +0800)]
orangecrab: set HAS_SHORT_MULT
It seems free, generated as a single MULT18X18D
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Wed, 11 Aug 2021 05:11:57 +0000 (13:11 +0800)]
orangecrab: add Orange Crab r0.2 target
top-orangecrab0.2 is a copy of top-arty with various changes.
USRMCLK is added for the SPI clock
ethernet is removed
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Fri, 13 Aug 2021 02:07:15 +0000 (10:07 +0800)]
litedram: Add orangecrab-85-0.2 target
Parameters are based on
https://github.com/gregdavill/OrangeCrab-test-sw/blob/main/hw/OrangeCrab-bitstream.py
and litex-boards orangecrab.py
rtt_nom and cmd_delay are overridden for OrangeCrab, we do the same here.
Generated with litedram and litex
62abf9c ("litedram_gen: Add block_until_ready port parameter to control blocking behaviour.")
add2746a ("tools/litex_cli: Rename wb to bus.")
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Fri, 24 Sep 2021 04:24:29 +0000 (12:24 +0800)]
litedram: set Makefile -Werror
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Thu, 13 Jan 2022 23:23:35 +0000 (07:23 +0800)]
litedram: disable block_until_ready, regenerate
Recent litedram gets stuck at memtest unless block_until_ready=False.
(discussion in https://github.com/enjoy-digital/litedram/pull/292)
This change regenerates with latest litedram and litex
62abf9c ("litedram_gen: Add block_until_ready port parameter to control blocking behaviour.")
add2746a ("tools/litex_cli: Rename wb to bus.")
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Fri, 26 Nov 2021 02:33:55 +0000 (10:33 +0800)]
Makefile: add ecpprog targets
The 0x80000 offset is specific to the OrangeCrab bootloader.
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Wed, 11 Aug 2021 05:07:34 +0000 (13:07 +0800)]
Makefile: Add DFU programming
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Matt Johnston [Wed, 11 Aug 2021 07:17:39 +0000 (15:17 +0800)]
Makefile: detect when ghdl is a yosys plugin
oss-cad-suite builds it as a plugin, some other toolchains
have it built in.
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Anton Blanchard [Sat, 8 Jan 2022 21:08:48 +0000 (08:08 +1100)]
Merge pull request #338 from shenki/yosys-read-verilog
Makefile: Use read_verilog with yosys
Joel Stanley [Mon, 20 Dec 2021 22:32:24 +0000 (09:02 +1030)]
Makefile: Use read_verilog with yosys
Yosys changed command line behaviour following the v0.12 release. Work
around this by using read_verilog, which maintains the old behaviour.
This should work fine for current yosys and be compatible with
future releases.
See https://github.com/YosysHQ/yosys/issues/3109
Signed-off-by: Joel Stanley <joel@jms.id.au>
Michael Neuling [Mon, 25 Oct 2021 05:49:19 +0000 (16:49 +1100)]
Merge pull request #337 from paulusmack/fixes
ECP5: Adjust PLL constants so the PLL lock indication works
Paul Mackerras [Sat, 16 Oct 2021 08:24:14 +0000 (19:24 +1100)]
ECP5: Adjust PLL constants so the PLL lock indication works
At present, code (such as simple_random) which produces serial port
output during the first few milliseconds of operation produces garbled
output. The reason is that the clock has not yet stabilized and is
running slow, resulting in the bit time of the serial characters being
too long.
The ECP5 data sheet says that the phase detector should be operated
between 10 and 400 MHz. The current code operates it at 2MHz.
Consequently, the PLL lock indication doesn't work, i.e. it is always
zero. The current code works around that by inverting it, i.e. taking
the "not locked" indication to mean "locked".
Instead, we now run it at 12MHz, chosen because the common external
clock inputs on ECP5 boards are 12MHz and 48MHz. Normally this would
mean that the available system clock frequencies would be multiples of
12MHz, but this is a little inconvenient as we use 40MHz on the Orange
Crab v0.21 boards. Instead, by using the secondary clock output for
feedback, we can have any divisor of the PLL frequency as the system
clock frequency.
The ECP5 data sheet says the PLL oscillator can run at 400 to 800
MHz. Here we choose 480MHz since that allows us to generate 40MHz and
48MHz easily and is a multiple of 12MHz.
With this, the lock signal works correctly, and the inversion can be
removed.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Michael Neuling [Wed, 13 Oct 2021 06:44:47 +0000 (17:44 +1100)]
Merge pull request #336 from paulusmack/fixes
Makefile: Correct parameters for the Orange Crab 85F
Paul Mackerras [Tue, 12 Oct 2021 07:30:36 +0000 (18:30 +1100)]
Makefile: Add a target for the Orange Crab v0.21 with LFE5U-85F
The existing orange crab target is for an older board with a
LFE5UM5G-85F device. Newer orange crab boards (v0.21) have a
LFE5U-85F device in the -8 speed grade, so make a new target for them
called ORANGE-CRAB-0.21.
Also add flags to ecppack to indicate that the bitstream should be
compressed and can be loaded at 38.8MHz.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Michael Neuling [Mon, 27 Sep 2021 23:06:18 +0000 (09:06 +1000)]
Merge pull request #334 from antonblanchard/icbi-issue
Add a test for icbi and dcbz issues
Anton Blanchard [Mon, 27 Sep 2021 20:18:59 +0000 (06:18 +1000)]
Merge pull request #335 from ozbenh/misc
Misc cleanups and icache fix
Benjamin Herrenschmidt [Mon, 27 Sep 2021 12:03:18 +0000 (22:03 +1000)]
icache: req_laddr becomes req_raddr
Uses real_addr_t and only stores the real address bits
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Mon, 27 Sep 2021 11:53:52 +0000 (21:53 +1000)]
Introduce addr_to_wb() and wb_to_addr() helpers
These convert addresses to/from wishbone addresses, and use them
in parts of the caches, in order to make the code a bit more readable.
Along the way, rename some functions in the caches to make it a bit
clearer what they operate on and fix a bug in the icache STOP_RELOAD state where
the wb address wasn't properly converted.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>