Luke Kenneth Casson Leighton [Wed, 10 Oct 2018 14:01:42 +0000 (15:01 +0100)]
add operators test class
Luke Kenneth Casson Leighton [Wed, 10 Oct 2018 08:49:43 +0000 (09:49 +0100)]
add operators library to contain operator-overloads of +/-/*/div/>/>= etc
for all types int8, uint8, float16_t, float32_t etc.
all to be template-ified
Luke Kenneth Casson Leighton [Tue, 9 Oct 2018 18:33:53 +0000 (19:33 +0100)]
get predicated-vectorised branch working
Luke Kenneth Casson Leighton [Tue, 9 Oct 2018 16:01:12 +0000 (17:01 +0100)]
save branch address and predication merged result, and test after branch
if the predication was all good, go ahead with the branch
still not tested for predicated / vectorised branch yet however
at least scalar branches work
Luke Kenneth Casson Leighton [Tue, 9 Oct 2018 15:17:42 +0000 (16:17 +0100)]
add explanatory comment
Luke Kenneth Casson Leighton [Tue, 9 Oct 2018 15:02:00 +0000 (16:02 +0100)]
add explanatory comment
Luke Kenneth Casson Leighton [Tue, 9 Oct 2018 14:51:12 +0000 (15:51 +0100)]
start adding explicit twin-predicated branch identification (rs2)
Luke Kenneth Casson Leighton [Tue, 9 Oct 2018 10:39:49 +0000 (11:39 +0100)]
extend sv register file from 64 to 128 after discussion.
evaluation of even embedded GPUs shows that they have really enormous
register files. a fp16 x 4 to express quads, times four, takes up eight
consecutive registers just on its own.
Luke Kenneth Casson Leighton [Sun, 7 Oct 2018 14:06:39 +0000 (15:06 +0100)]
override setpc macro so that sv can redirect it in branch
this is slightly complicated. setpc is a global macro, intended for
use in the templates. however for SV it needs to be conditional,
so needs to redirect to a function in sv_insn_t.
that in turn needs quite a few extra parameters: the current loop
element offset, the argument to set_pc just in case actually it is
detected that this really is to be a branch not a predication scenario
and so on.
have not tried out vectorisation yet, at least straight non-vectorised
branch operations (unit tests) pass.
Luke Kenneth Casson Leighton [Sun, 7 Oct 2018 08:09:21 +0000 (09:09 +0100)]
swap #ifdef USING_NOREGS so that it is possible to redefine set_pc
to avoid having to modify the branch instructions the set_pc macro will
be #undefd and redefined to modify the predication target
Luke Kenneth Casson Leighton [Sun, 7 Oct 2018 07:03:11 +0000 (08:03 +0100)]
add rd bit-setting function
Luke Kenneth Casson Leighton [Sun, 7 Oct 2018 05:52:11 +0000 (06:52 +0100)]
add extra debug printing for c.lwsp
Luke Kenneth Casson Leighton [Sun, 7 Oct 2018 03:51:10 +0000 (04:51 +0100)]
add rvc_swsp_imm sv overload, provides vector unit stride now
Luke Kenneth Casson Leighton [Sat, 6 Oct 2018 19:18:32 +0000 (20:18 +0100)]
c.swsp and c.fswsp predication and offset enabling
Luke Kenneth Casson Leighton [Sat, 6 Oct 2018 16:07:16 +0000 (17:07 +0100)]
allow x2 (sp) to be redirected in C.LWSP
Luke Kenneth Casson Leighton [Sat, 6 Oct 2018 15:49:20 +0000 (16:49 +0100)]
temporary hack disabling SV in anything other than user mode
Luke Kenneth Casson Leighton [Sat, 6 Oct 2018 15:48:55 +0000 (16:48 +0100)]
whoops inverted ldsp and lwsp immediates
Luke Kenneth Casson Leighton [Sat, 6 Oct 2018 09:24:25 +0000 (10:24 +0100)]
create ldsp immediate offset overrides
this allows C.LWSP/C.LDSP to do predicated load/stores from contiguous
blocks of memory
Luke Kenneth Casson Leighton [Sat, 6 Oct 2018 05:51:46 +0000 (06:51 +0100)]
add in predication for immediate, for C.LWSP
Luke Kenneth Casson Leighton [Fri, 5 Oct 2018 14:26:20 +0000 (15:26 +0100)]
reorganise src and dest vector-element offsets
pass in pointer to offsets from processor_t->state.srcoffs and destoffs
into sv_insn_t so that the element state information about how many
parallel elements have been executed is recorded.
in this way it will be possible to restart a loop at the right place
if there is an exception (such as a memory cache miss on a LOAD/STORE)
Luke Kenneth Casson Leighton [Fri, 5 Oct 2018 14:09:48 +0000 (15:09 +0100)]
add srcoffs and destoffs sv state, alter CSRs
remove SVREALVL, replace with SVSTATE
Luke Kenneth Casson Leighton [Thu, 4 Oct 2018 13:11:02 +0000 (14:11 +0100)]
reorganise twin-predication
move the offset incrementing to outside of the sv_insn_t, and pass in
the src_offs and dest_offs by reference, mirroring and matching the
predication src and dest referencing
Luke Kenneth Casson Leighton [Thu, 4 Oct 2018 09:16:10 +0000 (10:16 +0100)]
big reorganisation to support twin-predication
twin predication is on certain operations like LOAD, STORE, C.MV, FCVT
where the effect is similar to VSPLAT, VINSERT, and also bitmanip
scatter/gather.
Luke Kenneth Casson Leighton [Wed, 3 Oct 2018 09:42:23 +0000 (10:42 +0100)]
add in twin-predication identification
pass in second predicate for twin-predication operands
Luke Kenneth Casson Leighton [Wed, 3 Oct 2018 06:16:34 +0000 (07:16 +0100)]
decided not to change the behaviour of LOAD/STORE
Luke Kenneth Casson Leighton [Tue, 2 Oct 2018 15:01:22 +0000 (16:01 +0100)]
start work on parallelsing LOAD, pass in parameter to reinterpret immed
Luke Kenneth Casson Leighton [Tue, 2 Oct 2018 11:26:47 +0000 (12:26 +0100)]
debug print for floating-point regs
Luke Kenneth Casson Leighton [Mon, 1 Oct 2018 14:09:21 +0000 (15:09 +0100)]
add comment explaining why invert isnt done in zeroing test
(already inverted basically)
Luke Kenneth Casson Leighton [Mon, 1 Oct 2018 14:08:25 +0000 (15:08 +0100)]
add comment explaining use of insn._rd() in zeroing
Luke Kenneth Casson Leighton [Mon, 1 Oct 2018 11:00:27 +0000 (12:00 +0100)]
whoops vloop continuation logic the wrong way round
the loop has to continue if there is one vectorised register left
rather than stop if there is *no* vectorised registers
Luke Kenneth Casson Leighton [Mon, 1 Oct 2018 01:16:21 +0000 (02:16 +0100)]
skip parallelisation of complex LR/SC operations
Luke Kenneth Casson Leighton [Mon, 1 Oct 2018 01:04:57 +0000 (02:04 +0100)]
identify type of instruction with additional #defines
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 12:33:07 +0000 (13:33 +0100)]
add a #define to id_regs.py which indicates name of the instruction
the c preprocessor cannot cope with detecting what the name of the
instruction is (and when it does, if it is "and" that throws a
compile-error), so as a workaround have id_regs.py create a
#define INSN_ADD, #define INSN_ADD_I etc.
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 09:37:05 +0000 (10:37 +0100)]
list of instructions to avoid parallelising
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 08:26:54 +0000 (09:26 +0100)]
update template comment
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 08:15:17 +0000 (09:15 +0100)]
lots of debugging of predication, found other errors
lots of stuff here:
* changed sv reg and pred entry layout to make them a bit more
human-readable: put the key and idx on byte-boundaries
* bug in SV REG setting (getting int table instead of fp one, twice)
* bug(s) in SV PRED table setting (zeroing the reg tables, using idx not
key and more)
* stop lookup of predication if the *REGISTER* entry is inactive
(but not if it is a scalar, because scalar predication is ok)
this was the final piece of the puzzle that got predication working
* added a useful macro for creating SV REG and PRED CAM table entries
predication now working including zeroing
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 07:00:09 +0000 (08:00 +0100)]
add sv support for zeroing predication in dest register
bit of a major rework:
* access to the "unpredicated" (non-zero-hacked) register was needed
* therefore all rd/rs1-3/rvc_xxx functions had to have _ variants
* the underscored variants are not predicated
* this in turn meant that the offset for each register was wrong
as it is incremented *after* being checked
* therefore a newoffs had to be added
* and the reset_cache function copies the newoffs values
bit of a mess but it works: this is a state machine after all...
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 06:07:43 +0000 (07:07 +0100)]
add in predication to sv instruction execution
this relies on setting the value of the destination register
(and source registers) to zero. a bad hack but it will do
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 05:14:17 +0000 (06:14 +0100)]
start linking in predication into sv
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 04:13:50 +0000 (05:13 +0100)]
use an alternative logic for detecting scalar / loop-end
instead of pre-checking do the check for "all-scalar" during the
first loop iteration i.e. when registers are first accessed
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 04:12:49 +0000 (05:12 +0100)]
add compressed-identifying patterns to id_regs.py
also skip LUI (and C.LUI) as non-paralleliseable instructions
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 03:25:55 +0000 (04:25 +0100)]
fix code template for when SPIKE_SIMPLEV is not defined
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 03:25:28 +0000 (04:25 +0100)]
yuk. break id_regs.py being a generic tool by skipping csr ops
trying to use c preprocessor macros to skip CSRs in sv from being
parallelised is too painful, and is necessary to do. a parallel CSR
read/write does not make sense
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 11:16:00 +0000 (12:16 +0100)]
fix bug in sv template where FRS2 was checking rs3
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 11:15:29 +0000 (12:15 +0100)]
add checks for RVC registers to sv template
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 11:13:25 +0000 (12:13 +0100)]
add sv_insn_t overloads for rvc registers
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 11:06:10 +0000 (12:06 +0100)]
also arrange for id_regs.py to identify compressed instruction usage
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 09:19:47 +0000 (10:19 +0100)]
a LOT of debugging and fixing, sv loop actually working
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 07:19:59 +0000 (08:19 +0100)]
move SV CSRs to user-read-write
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 04:57:19 +0000 (05:57 +0100)]
add near-duplicate of SV CFG REG CSRs, for predication
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 04:49:13 +0000 (05:49 +0100)]
add implementation of CSR SV CFG regs 0-7
this is a CAM table of key-value entries, 5-bits key (from instruction)
6-bits value (actual register table, now 64 entries)
TODO: obviously RV32E that would be reduced.
TODO: make it optional to have 32-32
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 04:35:10 +0000 (05:35 +0100)]
assign SV REG CSRs (using new union ability)
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 04:34:41 +0000 (05:34 +0100)]
make sv csr tables a union so they can be assigned to a ushort easily
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 03:55:00 +0000 (04:55 +0100)]
add support for CSR_SVVL to CSRRWI as well
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 03:23:18 +0000 (04:23 +0100)]
fix bug in CSR set SVVL: val has already been looked up
csrrw.h has been modified to invert the order of set/get, so the
call to processor_t::set_csr(SVVL, val) will do the right thing
and the subsequent (delayed) call to get_csr will return the
state.vl value in the chosen RD
all good
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 03:19:19 +0000 (04:19 +0100)]
add stub for SV REG configs
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 03:05:27 +0000 (04:05 +0100)]
stop a compiler warning
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 03:05:00 +0000 (04:05 +0100)]
reorganise from moving sv_pred_* and sv_reg_* tables into processor_t
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 02:54:41 +0000 (03:54 +0100)]
have to move SV CSRs into processor_t
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 02:49:49 +0000 (03:49 +0100)]
add 8 CSRs for registers and predication each
each CSR contains 2 16-bit entries and is a CAM based on register as
key (5-bit) and target-register as value (6-bit) so that a 5-bit
RS1-3/RD can actually reach 64 actual registers, and *3-bit C instructions
can as well*
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 02:14:15 +0000 (03:14 +0100)]
whoops dont need separate SVSETVL/SVGETVL CSRs
also add SVREALVL which is needed for state context save/restore
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 01:18:19 +0000 (02:18 +0100)]
revert addition of svsetvl as an actual opcode, add mvl CSR instead
this is less than ideal but better than having to add new opcodes
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 00:52:27 +0000 (01:52 +0100)]
Revert "sv setvl as a csr not going to work, add getvl only"
This reverts commit
996e0246aa614231a560f3e3e84793745470ca6f.
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 00:51:45 +0000 (01:51 +0100)]
Revert "manually add svsetvl instruction"
This reverts commit
b3e60c2c6e722bc181369d48642834422fa1082c.
Luke Kenneth Casson Leighton [Fri, 28 Sep 2018 07:42:48 +0000 (08:42 +0100)]
manually add svsetvl instruction
Luke Kenneth Casson Leighton [Fri, 28 Sep 2018 02:59:42 +0000 (03:59 +0100)]
sv setvl as a csr not going to work, add getvl only
Luke Kenneth Casson Leighton [Thu, 27 Sep 2018 13:24:48 +0000 (14:24 +0100)]
adding sv vector length CSR to processor state, and csr get/set
32 CSRs are used up, here, as SETVL requires not only an immediate
but also a target integer register in which the SETVL value is
stored.
Luke Kenneth Casson Leighton [Thu, 27 Sep 2018 11:03:26 +0000 (12:03 +0100)]
add sv predication function
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 15:25:04 +0000 (16:25 +0100)]
save some cpu cycles by |ing the checks for vectorop together
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 15:22:29 +0000 (16:22 +0100)]
whoops vectorop has to be |= not &= to accumulate "true"
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 15:21:19 +0000 (16:21 +0100)]
cache the sv redirected register values on each loop
if an emulated opcode ever calls insn.rd() or rs1-3 more than once
sv_inst_t::remap would accidentally increment the loop offset before
it was time to do so.
therefore put in a cacheing system and clear it only at the end
of each loop
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 15:09:42 +0000 (16:09 +0100)]
remembered that the use of sv registers have to be loop-incremented separately
the SV parallelism loop has to respect whether each *individual* register
is a vector or a scalar.
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 10:14:24 +0000 (11:14 +0100)]
clarify comments on (key strategic) sv_insn_t::remap function
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 09:54:54 +0000 (10:54 +0100)]
actually implement sv register re-mapping
the algorithm here checks the (required) table (int or fp), checks if
the entry is "active", does a redirect, then checks if the entry is
scalar or vector. if vector, the loop-offset (passed by value) is
added
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 09:39:05 +0000 (10:39 +0100)]
ok this is tricky: an extra parameter has to be passed into sv_insn_t::remap
the reason is that the remap has to know if the register being
remapped is an int or a float. the place where that is known
is at *decode* time... and that means that id_regs.py has to
look that up and pass it on (in a #define REGS_PATTERN).
the reason it is passed in as a pattern is so that svn_insn_t rd/rs1-3
have access to the information that is needed
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 08:07:45 +0000 (09:07 +0100)]
move sv remap function to sv.cc (not inline)
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 08:03:08 +0000 (09:03 +0100)]
check if register redirection is active, and if vectorisation enabled
check each register (if used) - this is what the #define USING_RD etc.
macros are for, to disable checks that are not needed
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 05:37:51 +0000 (06:37 +0100)]
comment why sv_insn_t is set up the way it is; add vector loop stub
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 05:35:32 +0000 (06:35 +0100)]
easier to #define USING_NOREGS if the opcode does not use any registers
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 05:28:32 +0000 (06:28 +0100)]
include auto-generated identification of use of registers per op
modified id_regs.py to take a single argument (file in riscv/insns to parse)
added call to id_regs.py in riscv.mk.in
included the auto-generated file in the insn_template.cc
now each instruction has a way - BEFORE the emulated instruction is called -
to identify which registers (RD, RS1-3, FRD, FRS1-3) are going to be used.
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 04:38:56 +0000 (05:38 +0100)]
shuffle things around a bit for sv, put rv32/64_name back to like they were
decided to move sv to its own template file, and make a bit more use
of macro pre-processing
Luke Kenneth Casson Leighton [Tue, 25 Sep 2018 22:06:01 +0000 (23:06 +0100)]
skip id_reg.py parsing its own output; stop outputting "0" on empty
Luke Kenneth Casson Leighton [Tue, 25 Sep 2018 22:04:09 +0000 (23:04 +0100)]
change to instruction template parsing, create one file per instruction
id_regs.py looks for patterns in riscv/insns/*.h to find the use of
registers
Luke Kenneth Casson Leighton [Tue, 25 Sep 2018 05:47:30 +0000 (06:47 +0100)]
add decode.h header to sv.h
Luke Kenneth Casson Leighton [Tue, 25 Sep 2018 05:46:08 +0000 (06:46 +0100)]
rename sv vlen to sv voffs, add csr and reg tables
Luke Kenneth Casson Leighton [Tue, 25 Sep 2018 05:28:04 +0000 (06:28 +0100)]
add reference to vector length in sv
Luke Kenneth Casson Leighton [Tue, 25 Sep 2018 04:34:30 +0000 (05:34 +0100)]
use sv_insn_t class in instruction template
Luke Kenneth Casson Leighton [Tue, 25 Sep 2018 01:45:26 +0000 (02:45 +0100)]
add sv_insn_t class (inherits from insn_t)
Luke Kenneth Casson Leighton [Tue, 25 Sep 2018 01:34:29 +0000 (02:34 +0100)]
argh cant virtualise rd/rs1-3, due to union usage with rocc_insn_union_t
Luke Kenneth Casson Leighton [Tue, 25 Sep 2018 01:28:41 +0000 (02:28 +0100)]
sv: rd, rs1/2/3 become virtual so that sv_insn_t can override them
Luke Kenneth Casson Leighton [Tue, 25 Sep 2018 01:17:49 +0000 (02:17 +0100)]
clarify sv cam table
Luke Kenneth Casson Leighton [Mon, 24 Sep 2018 06:42:27 +0000 (07:42 +0100)]
define CSR and register tables for SV
Luke Kenneth Casson Leighton [Mon, 24 Sep 2018 06:24:11 +0000 (07:24 +0100)]
remove unneeded use of AM_CONDITIONAL
Luke Kenneth Casson Leighton [Mon, 24 Sep 2018 06:18:50 +0000 (07:18 +0100)]
add #define for SPIKE_SIMPLEV, re-run autoreconf
Luke Kenneth Casson Leighton [Mon, 24 Sep 2018 02:56:21 +0000 (03:56 +0100)]
create #defines from identified registers, per opcode
Luke Kenneth Casson Leighton [Mon, 24 Sep 2018 02:04:18 +0000 (03:04 +0100)]
clarify docstring on id_regs.py
Luke Kenneth Casson Leighton [Mon, 24 Sep 2018 02:03:58 +0000 (03:03 +0100)]
add function identifying the registers in each emulated instruction
Luke Kenneth Casson Leighton [Mon, 24 Sep 2018 01:53:14 +0000 (02:53 +0100)]
identify instructions, plan: extract registers
Andrew Waterman [Thu, 13 Sep 2018 06:56:49 +0000 (23:56 -0700)]
Update README
Tim Newsome [Thu, 6 Sep 2018 19:04:52 +0000 (12:04 -0700)]
Merge pull request #235 from riscv/sba
Fix cut-and-paste bug in 64-bit SBA loads.