Luke Kenneth Casson Leighton [Sun, 14 Oct 2018 04:22:18 +0000 (05:22 +0100)]
redirect add to rv_add
Luke Kenneth Casson Leighton [Sun, 14 Oct 2018 04:06:46 +0000 (05:06 +0100)]
bit of a mess: attempted to create a complete arithmetic overload
had to back most of it out, and left in a change to the amo* functions
passing in a 2nd parameter to the higher-order-function
Luke Kenneth Casson Leighton [Sat, 13 Oct 2018 13:43:06 +0000 (14:43 +0100)]
rename _zext_xlen
Luke Kenneth Casson Leighton [Sat, 13 Oct 2018 13:40:30 +0000 (14:40 +0100)]
add sv_reg_t
Luke Kenneth Casson Leighton [Fri, 12 Oct 2018 17:22:43 +0000 (18:22 +0100)]
redirect WRITE_FRD including different types (128/64/32)
Luke Kenneth Casson Leighton [Fri, 12 Oct 2018 17:05:04 +0000 (18:05 +0100)]
add WRITE_FRD macro redirect
Luke Kenneth Casson Leighton [Fri, 12 Oct 2018 15:21:11 +0000 (16:21 +0100)]
changed style, can revert changes to amomin/max
Luke Kenneth Casson Leighton [Fri, 12 Oct 2018 15:20:27 +0000 (16:20 +0100)]
add frs2 redirect
Luke Kenneth Casson Leighton [Fri, 12 Oct 2018 15:15:32 +0000 (16:15 +0100)]
add RS3 replacement
Luke Kenneth Casson Leighton [Fri, 12 Oct 2018 15:12:31 +0000 (16:12 +0100)]
simplify sv_proc_t redirection of RS1-3 / FRS1 macrhos
Luke Kenneth Casson Leighton [Fri, 12 Oct 2018 14:16:07 +0000 (15:16 +0100)]
redirect RS2 to sv_proc_t class
Luke Kenneth Casson Leighton [Fri, 12 Oct 2018 12:27:11 +0000 (13:27 +0100)]
proof-of-concept, redirect RS1 to class sv_proc_t
Luke Kenneth Casson Leighton [Fri, 12 Oct 2018 02:38:33 +0000 (03:38 +0100)]
combination of redirection through a "property" class overloads WRITE_RD
WRITE_RD, formerly a macro, now replaced with a function.
also required a redirector function rd
Luke Kenneth Casson Leighton [Thu, 11 Oct 2018 20:45:26 +0000 (21:45 +0100)]
redirect instructions through a class called sv_proc_t
preparing the groundwork for a total over-ride of macros such as
WRITE_RD, and so on, so that element width can be implemented
Luke Kenneth Casson Leighton [Thu, 11 Oct 2018 12:17:11 +0000 (13:17 +0100)]
more explicit testing, duplicating header file algorithms for div, rem and shift
Luke Kenneth Casson Leighton [Thu, 11 Oct 2018 07:09:16 +0000 (08:09 +0100)]
whoops run from 0-255 not 0-254, and other test corrections
Luke Kenneth Casson Leighton [Thu, 11 Oct 2018 06:56:32 +0000 (07:56 +0100)]
add some operator tests for int8_t being typecast to uint16_t
Luke Kenneth Casson Leighton [Thu, 11 Oct 2018 06:34:13 +0000 (07:34 +0100)]
add more experimenting on operators
Luke Kenneth Casson Leighton [Wed, 10 Oct 2018 14:01:42 +0000 (15:01 +0100)]
add operators test class
Luke Kenneth Casson Leighton [Wed, 10 Oct 2018 08:49:43 +0000 (09:49 +0100)]
add operators library to contain operator-overloads of +/-/*/div/>/>= etc
for all types int8, uint8, float16_t, float32_t etc.
all to be template-ified
Luke Kenneth Casson Leighton [Tue, 9 Oct 2018 18:33:53 +0000 (19:33 +0100)]
get predicated-vectorised branch working
Luke Kenneth Casson Leighton [Tue, 9 Oct 2018 16:01:12 +0000 (17:01 +0100)]
save branch address and predication merged result, and test after branch
if the predication was all good, go ahead with the branch
still not tested for predicated / vectorised branch yet however
at least scalar branches work
Luke Kenneth Casson Leighton [Tue, 9 Oct 2018 15:17:42 +0000 (16:17 +0100)]
add explanatory comment
Luke Kenneth Casson Leighton [Tue, 9 Oct 2018 15:02:00 +0000 (16:02 +0100)]
add explanatory comment
Luke Kenneth Casson Leighton [Tue, 9 Oct 2018 14:51:12 +0000 (15:51 +0100)]
start adding explicit twin-predicated branch identification (rs2)
Luke Kenneth Casson Leighton [Tue, 9 Oct 2018 10:39:49 +0000 (11:39 +0100)]
extend sv register file from 64 to 128 after discussion.
evaluation of even embedded GPUs shows that they have really enormous
register files. a fp16 x 4 to express quads, times four, takes up eight
consecutive registers just on its own.
Luke Kenneth Casson Leighton [Sun, 7 Oct 2018 14:06:39 +0000 (15:06 +0100)]
override setpc macro so that sv can redirect it in branch
this is slightly complicated. setpc is a global macro, intended for
use in the templates. however for SV it needs to be conditional,
so needs to redirect to a function in sv_insn_t.
that in turn needs quite a few extra parameters: the current loop
element offset, the argument to set_pc just in case actually it is
detected that this really is to be a branch not a predication scenario
and so on.
have not tried out vectorisation yet, at least straight non-vectorised
branch operations (unit tests) pass.
Luke Kenneth Casson Leighton [Sun, 7 Oct 2018 08:09:21 +0000 (09:09 +0100)]
swap #ifdef USING_NOREGS so that it is possible to redefine set_pc
to avoid having to modify the branch instructions the set_pc macro will
be #undefd and redefined to modify the predication target
Luke Kenneth Casson Leighton [Sun, 7 Oct 2018 07:03:11 +0000 (08:03 +0100)]
add rd bit-setting function
Luke Kenneth Casson Leighton [Sun, 7 Oct 2018 05:52:11 +0000 (06:52 +0100)]
add extra debug printing for c.lwsp
Luke Kenneth Casson Leighton [Sun, 7 Oct 2018 03:51:10 +0000 (04:51 +0100)]
add rvc_swsp_imm sv overload, provides vector unit stride now
Luke Kenneth Casson Leighton [Sat, 6 Oct 2018 19:18:32 +0000 (20:18 +0100)]
c.swsp and c.fswsp predication and offset enabling
Luke Kenneth Casson Leighton [Sat, 6 Oct 2018 16:07:16 +0000 (17:07 +0100)]
allow x2 (sp) to be redirected in C.LWSP
Luke Kenneth Casson Leighton [Sat, 6 Oct 2018 15:49:20 +0000 (16:49 +0100)]
temporary hack disabling SV in anything other than user mode
Luke Kenneth Casson Leighton [Sat, 6 Oct 2018 15:48:55 +0000 (16:48 +0100)]
whoops inverted ldsp and lwsp immediates
Luke Kenneth Casson Leighton [Sat, 6 Oct 2018 09:24:25 +0000 (10:24 +0100)]
create ldsp immediate offset overrides
this allows C.LWSP/C.LDSP to do predicated load/stores from contiguous
blocks of memory
Luke Kenneth Casson Leighton [Sat, 6 Oct 2018 05:51:46 +0000 (06:51 +0100)]
add in predication for immediate, for C.LWSP
Luke Kenneth Casson Leighton [Fri, 5 Oct 2018 14:26:20 +0000 (15:26 +0100)]
reorganise src and dest vector-element offsets
pass in pointer to offsets from processor_t->state.srcoffs and destoffs
into sv_insn_t so that the element state information about how many
parallel elements have been executed is recorded.
in this way it will be possible to restart a loop at the right place
if there is an exception (such as a memory cache miss on a LOAD/STORE)
Luke Kenneth Casson Leighton [Fri, 5 Oct 2018 14:09:48 +0000 (15:09 +0100)]
add srcoffs and destoffs sv state, alter CSRs
remove SVREALVL, replace with SVSTATE
Luke Kenneth Casson Leighton [Thu, 4 Oct 2018 13:11:02 +0000 (14:11 +0100)]
reorganise twin-predication
move the offset incrementing to outside of the sv_insn_t, and pass in
the src_offs and dest_offs by reference, mirroring and matching the
predication src and dest referencing
Luke Kenneth Casson Leighton [Thu, 4 Oct 2018 09:16:10 +0000 (10:16 +0100)]
big reorganisation to support twin-predication
twin predication is on certain operations like LOAD, STORE, C.MV, FCVT
where the effect is similar to VSPLAT, VINSERT, and also bitmanip
scatter/gather.
Luke Kenneth Casson Leighton [Wed, 3 Oct 2018 09:42:23 +0000 (10:42 +0100)]
add in twin-predication identification
pass in second predicate for twin-predication operands
Luke Kenneth Casson Leighton [Wed, 3 Oct 2018 06:16:34 +0000 (07:16 +0100)]
decided not to change the behaviour of LOAD/STORE
Luke Kenneth Casson Leighton [Tue, 2 Oct 2018 15:01:22 +0000 (16:01 +0100)]
start work on parallelsing LOAD, pass in parameter to reinterpret immed
Luke Kenneth Casson Leighton [Tue, 2 Oct 2018 11:26:47 +0000 (12:26 +0100)]
debug print for floating-point regs
Luke Kenneth Casson Leighton [Mon, 1 Oct 2018 14:09:21 +0000 (15:09 +0100)]
add comment explaining why invert isnt done in zeroing test
(already inverted basically)
Luke Kenneth Casson Leighton [Mon, 1 Oct 2018 14:08:25 +0000 (15:08 +0100)]
add comment explaining use of insn._rd() in zeroing
Luke Kenneth Casson Leighton [Mon, 1 Oct 2018 11:00:27 +0000 (12:00 +0100)]
whoops vloop continuation logic the wrong way round
the loop has to continue if there is one vectorised register left
rather than stop if there is *no* vectorised registers
Luke Kenneth Casson Leighton [Mon, 1 Oct 2018 01:16:21 +0000 (02:16 +0100)]
skip parallelisation of complex LR/SC operations
Luke Kenneth Casson Leighton [Mon, 1 Oct 2018 01:04:57 +0000 (02:04 +0100)]
identify type of instruction with additional #defines
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 12:33:07 +0000 (13:33 +0100)]
add a #define to id_regs.py which indicates name of the instruction
the c preprocessor cannot cope with detecting what the name of the
instruction is (and when it does, if it is "and" that throws a
compile-error), so as a workaround have id_regs.py create a
#define INSN_ADD, #define INSN_ADD_I etc.
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 09:37:05 +0000 (10:37 +0100)]
list of instructions to avoid parallelising
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 08:26:54 +0000 (09:26 +0100)]
update template comment
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 08:15:17 +0000 (09:15 +0100)]
lots of debugging of predication, found other errors
lots of stuff here:
* changed sv reg and pred entry layout to make them a bit more
human-readable: put the key and idx on byte-boundaries
* bug in SV REG setting (getting int table instead of fp one, twice)
* bug(s) in SV PRED table setting (zeroing the reg tables, using idx not
key and more)
* stop lookup of predication if the *REGISTER* entry is inactive
(but not if it is a scalar, because scalar predication is ok)
this was the final piece of the puzzle that got predication working
* added a useful macro for creating SV REG and PRED CAM table entries
predication now working including zeroing
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 07:00:09 +0000 (08:00 +0100)]
add sv support for zeroing predication in dest register
bit of a major rework:
* access to the "unpredicated" (non-zero-hacked) register was needed
* therefore all rd/rs1-3/rvc_xxx functions had to have _ variants
* the underscored variants are not predicated
* this in turn meant that the offset for each register was wrong
as it is incremented *after* being checked
* therefore a newoffs had to be added
* and the reset_cache function copies the newoffs values
bit of a mess but it works: this is a state machine after all...
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 06:07:43 +0000 (07:07 +0100)]
add in predication to sv instruction execution
this relies on setting the value of the destination register
(and source registers) to zero. a bad hack but it will do
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 05:14:17 +0000 (06:14 +0100)]
start linking in predication into sv
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 04:13:50 +0000 (05:13 +0100)]
use an alternative logic for detecting scalar / loop-end
instead of pre-checking do the check for "all-scalar" during the
first loop iteration i.e. when registers are first accessed
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 04:12:49 +0000 (05:12 +0100)]
add compressed-identifying patterns to id_regs.py
also skip LUI (and C.LUI) as non-paralleliseable instructions
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 03:25:55 +0000 (04:25 +0100)]
fix code template for when SPIKE_SIMPLEV is not defined
Luke Kenneth Casson Leighton [Sun, 30 Sep 2018 03:25:28 +0000 (04:25 +0100)]
yuk. break id_regs.py being a generic tool by skipping csr ops
trying to use c preprocessor macros to skip CSRs in sv from being
parallelised is too painful, and is necessary to do. a parallel CSR
read/write does not make sense
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 11:16:00 +0000 (12:16 +0100)]
fix bug in sv template where FRS2 was checking rs3
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 11:15:29 +0000 (12:15 +0100)]
add checks for RVC registers to sv template
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 11:13:25 +0000 (12:13 +0100)]
add sv_insn_t overloads for rvc registers
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 11:06:10 +0000 (12:06 +0100)]
also arrange for id_regs.py to identify compressed instruction usage
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 09:19:47 +0000 (10:19 +0100)]
a LOT of debugging and fixing, sv loop actually working
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 07:19:59 +0000 (08:19 +0100)]
move SV CSRs to user-read-write
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 04:57:19 +0000 (05:57 +0100)]
add near-duplicate of SV CFG REG CSRs, for predication
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 04:49:13 +0000 (05:49 +0100)]
add implementation of CSR SV CFG regs 0-7
this is a CAM table of key-value entries, 5-bits key (from instruction)
6-bits value (actual register table, now 64 entries)
TODO: obviously RV32E that would be reduced.
TODO: make it optional to have 32-32
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 04:35:10 +0000 (05:35 +0100)]
assign SV REG CSRs (using new union ability)
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 04:34:41 +0000 (05:34 +0100)]
make sv csr tables a union so they can be assigned to a ushort easily
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 03:55:00 +0000 (04:55 +0100)]
add support for CSR_SVVL to CSRRWI as well
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 03:23:18 +0000 (04:23 +0100)]
fix bug in CSR set SVVL: val has already been looked up
csrrw.h has been modified to invert the order of set/get, so the
call to processor_t::set_csr(SVVL, val) will do the right thing
and the subsequent (delayed) call to get_csr will return the
state.vl value in the chosen RD
all good
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 03:19:19 +0000 (04:19 +0100)]
add stub for SV REG configs
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 03:05:27 +0000 (04:05 +0100)]
stop a compiler warning
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 03:05:00 +0000 (04:05 +0100)]
reorganise from moving sv_pred_* and sv_reg_* tables into processor_t
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 02:54:41 +0000 (03:54 +0100)]
have to move SV CSRs into processor_t
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 02:49:49 +0000 (03:49 +0100)]
add 8 CSRs for registers and predication each
each CSR contains 2 16-bit entries and is a CAM based on register as
key (5-bit) and target-register as value (6-bit) so that a 5-bit
RS1-3/RD can actually reach 64 actual registers, and *3-bit C instructions
can as well*
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 02:14:15 +0000 (03:14 +0100)]
whoops dont need separate SVSETVL/SVGETVL CSRs
also add SVREALVL which is needed for state context save/restore
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 01:18:19 +0000 (02:18 +0100)]
revert addition of svsetvl as an actual opcode, add mvl CSR instead
this is less than ideal but better than having to add new opcodes
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 00:52:27 +0000 (01:52 +0100)]
Revert "sv setvl as a csr not going to work, add getvl only"
This reverts commit
996e0246aa614231a560f3e3e84793745470ca6f.
Luke Kenneth Casson Leighton [Sat, 29 Sep 2018 00:51:45 +0000 (01:51 +0100)]
Revert "manually add svsetvl instruction"
This reverts commit
b3e60c2c6e722bc181369d48642834422fa1082c.
Luke Kenneth Casson Leighton [Fri, 28 Sep 2018 07:42:48 +0000 (08:42 +0100)]
manually add svsetvl instruction
Luke Kenneth Casson Leighton [Fri, 28 Sep 2018 02:59:42 +0000 (03:59 +0100)]
sv setvl as a csr not going to work, add getvl only
Luke Kenneth Casson Leighton [Thu, 27 Sep 2018 13:24:48 +0000 (14:24 +0100)]
adding sv vector length CSR to processor state, and csr get/set
32 CSRs are used up, here, as SETVL requires not only an immediate
but also a target integer register in which the SETVL value is
stored.
Luke Kenneth Casson Leighton [Thu, 27 Sep 2018 11:03:26 +0000 (12:03 +0100)]
add sv predication function
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 15:25:04 +0000 (16:25 +0100)]
save some cpu cycles by |ing the checks for vectorop together
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 15:22:29 +0000 (16:22 +0100)]
whoops vectorop has to be |= not &= to accumulate "true"
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 15:21:19 +0000 (16:21 +0100)]
cache the sv redirected register values on each loop
if an emulated opcode ever calls insn.rd() or rs1-3 more than once
sv_inst_t::remap would accidentally increment the loop offset before
it was time to do so.
therefore put in a cacheing system and clear it only at the end
of each loop
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 15:09:42 +0000 (16:09 +0100)]
remembered that the use of sv registers have to be loop-incremented separately
the SV parallelism loop has to respect whether each *individual* register
is a vector or a scalar.
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 10:14:24 +0000 (11:14 +0100)]
clarify comments on (key strategic) sv_insn_t::remap function
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 09:54:54 +0000 (10:54 +0100)]
actually implement sv register re-mapping
the algorithm here checks the (required) table (int or fp), checks if
the entry is "active", does a redirect, then checks if the entry is
scalar or vector. if vector, the loop-offset (passed by value) is
added
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 09:39:05 +0000 (10:39 +0100)]
ok this is tricky: an extra parameter has to be passed into sv_insn_t::remap
the reason is that the remap has to know if the register being
remapped is an int or a float. the place where that is known
is at *decode* time... and that means that id_regs.py has to
look that up and pass it on (in a #define REGS_PATTERN).
the reason it is passed in as a pattern is so that svn_insn_t rd/rs1-3
have access to the information that is needed
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 08:07:45 +0000 (09:07 +0100)]
move sv remap function to sv.cc (not inline)
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 08:03:08 +0000 (09:03 +0100)]
check if register redirection is active, and if vectorisation enabled
check each register (if used) - this is what the #define USING_RD etc.
macros are for, to disable checks that are not needed
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 05:37:51 +0000 (06:37 +0100)]
comment why sv_insn_t is set up the way it is; add vector loop stub
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 05:35:32 +0000 (06:35 +0100)]
easier to #define USING_NOREGS if the opcode does not use any registers
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 05:28:32 +0000 (06:28 +0100)]
include auto-generated identification of use of registers per op
modified id_regs.py to take a single argument (file in riscv/insns to parse)
added call to id_regs.py in riscv.mk.in
included the auto-generated file in the insn_template.cc
now each instruction has a way - BEFORE the emulated instruction is called -
to identify which registers (RD, RS1-3, FRD, FRS1-3) are going to be used.
Luke Kenneth Casson Leighton [Wed, 26 Sep 2018 04:38:56 +0000 (05:38 +0100)]
shuffle things around a bit for sv, put rv32/64_name back to like they were
decided to move sv to its own template file, and make a bit more use
of macro pre-processing
Luke Kenneth Casson Leighton [Tue, 25 Sep 2018 22:06:01 +0000 (23:06 +0100)]
skip id_reg.py parsing its own output; stop outputting "0" on empty