fosdem2024_formal: add slides and diagrams

[libreriscv.git] / lxo / ChangeLog
diff --git a/lxo/ChangeLog b/lxo/ChangeLog

index 6710b0a9a5687302f3d43ef339c5836b69e24b74..d464b9040b01eb5a07c7d0fdd4410c9a5c96a4af 100644 (file)
--- a/lxo/ChangeLog
+++ b/lxo/ChangeLog
@@ -1,3 +1,296 @@
+2021-02-20
+
+       * GCC: Lowering DWARF_FRAME_REGISTERS, once I rebuilt the
+       compiler, libgcc, and the test program, avoided the problem.  That
+       didn't make much sense, so I reversed that change and got back to
+       debugging.  The signal frame seemed to be unwound correctly, but
+       instead of using the linux-unwind fallback frame stuff, that I'd
+       messed with a week before, I noticed it was using frame info from
+       the __sigtramp64rt (sp?) entry point in the kernel-supplied vdso.
+       Though I'm pretty sure that changing that file got me some
+       different results the week before, with vdso it couldn't possibly
+       be where things got wrong.  So I proceeded to unwinding the frame
+       until we hit the caller of the infinitely-recursive function, and
+       found we got to the end of the stack before reaching it.  Huh?  A
+       GDB stack frame also hit the same problem.  Oh, maybe there was
+       something wrong with the frame info for those early calls in the
+       thread.  But the stack frame only stopped at the third or fourth
+       recursive call.  That seemed fishy, so I started the program over,
+       and checked the stack trace at the point of the signal delivery,
+       and found it was fine.  I stepped into the signal handler, and
+       into the exception raising machinery, and it was still fine.  Only
+       after we started the unwinding did it get corrupted.  At first I
+       suspected something going wrong because of out-of-range accesses
+       to the regs array, recompiled compiler and library and program
+       just to be sure, and still the same issue.  Finally, then it
+       occurred to me to check where the alternate stack stack, in which
+       the stack overflow signal was handled, and found it to be running
+       into the other end of the task's stack.  Turns out the Ada
+       runtime, when starting a task, allocates an alt stack to handle
+       stack overflows out of the stack itself.  With the larger register
+       file, unwinding was taking up more of the alt stack space,
+       overflowing it and thus overwriting part of the task's call stack,
+       corrupting it to the point that the unwinder could no longer reach
+       the exception handler in the task setup code, supposed to catch an
+       escaping exception for the task parent to analyze/reraise.
+       Growing the alt stack size in the Ada runtime fixes the problem,
+       but since this explains why lowering DWARF_FRAME_REGISTERS avoided
+       the problem, I'm now happy to have it set to the lower value, at
+       least until call-saved SVP64 regs are needed.  Adjusted other
+       references to ARG_POINTER_REGNUM in libgcc to use a fixed index.
+       Wrote a blog post about this, while regstrapping the fix.
+       https://www.fsfla.org/blogs/lxo/2021-02-20-longest-debugging-session.en.html
+       Success, no regressions.  (9:09)
+
+2021-02-13
+
+       * GCC: Found libgcc/config/linux-unwind.h using GCC's internal
+       register numbers, and thus in need of renumbering as well.  Alas,
+       the right fix didn't jump at me.  There's some confusion about
+       using mapped register numbers or not.  Using the pristine
+       libgcc_eh.a to link the program built with the new compiler, using
+       newly-built libraries, it works, but with the new libgcc_eh.a, it
+       fails, whether using 291 or 99 or 67 for R_AR, that used to be
+       ARG_POINTER_REGNUM.  Changing R_AR and rebuilding doesn't alter
+       anything within gcc/ada, so it's not the Ada runtime.  I guess
+       I may have to go back to debugging, as it's not clear whether GCC
+       is losing track of the frames or not finding the handler that
+       would propagate the EH to the thread that activated the task.
+       Tried experimenting with overriding DWARF_FRAME_REGISTERS to its
+       original value.  (6:13)
+
+2021-02-10
+
+       * MW (0:48)
+
+2021-02-09
+
+       * VC (1:59)
+
+2021-02-05
+
+       * GCC: Started investigating the remaining regressions, all in
+       Ada.  They all turn out to be -fstack-check tests.  (0:40)
+
+2021-01-31
+
+       * GCC: Started debugging regressions in the stage1 non-svp64
+       compiler.  Noticed that the renaming of mov to @altivec_mov
+       removed expanders for some modes used by altivec but not by svp64.
+       Reintroduced them, and added floating-point svp64 mov patterns.
+       Split out of the main patch a preparation patch that could be
+       submitted upstream right away, for it just prepares for register
+       renumbering.  Fixed conditional register usage that, when svp64
+       was not enabled, caused the LAST/MAX_* non-SVP64 registers to be
+       marked as fixed, which caused the frame pointer to not be
+       preserved across calls.  Fixed the *logue routines to account for
+       the register renumbering.  Fixed the svp64 add expander to use a
+       correct expansion of <VI_unit>, covering V2DI at power8.  With
+       that, we're down to a single C regression, when not enabling
+       svp64.  While the expected behavior is for the compiler to
+       optimize gcc.target/powerpc/dform-3.c's gpr function so that p->c
+       is (re)loaded to e.g. r10&r11 with a vsx_movv2df_64bit, because
+       the MEM cost for its reload is negative, whereas the
+       svp64-modified compiler keeps both such instructions, first
+       loading to a VSX reg, then splitting it into a pair of GPRs.  It
+       is a performance bug, but the generated code works.  Trying a
+       bootstrap!  Stage2 wouldn't build because of /* within comments in
+       rs6000-modes.def; adjusted the commented-out entries I'd put in to
+       avoid that.  The memory move costs were off because of the use of
+       literal regno 32 when computing the costs for FPR classes.
+       Regstrapped the prepping patch successfully, then went back to the
+       patch that introduces svp64 support, still disabled.  -msvp64 is
+       still slightly, but without enabling it, we may be down to no
+       regressions, testing should confirm.  (14:37)
+
+2021-01-30
+
+       * GCC: Fixed the boundaries of the loops that disable SVP64
+       registers when SVP64 is not enabled.  Fixed macros used for
+       parameter and return value assignment to reflect the new FP
+       numbers.  Require at least one register operand for svp64 vector
+       mov pattern.  Add emit of altivec insn when not using svp64.
+       Introduced a first svp64 reload change for preferred_reload, to
+       avoid trying to reload constants into altivec registers.  A lot
+       more work will be needed for svp64 reloads.  A non-svp64 native
+       compiler builds stage1, but the compiler is still pretty broken,
+       with thousands of regressions.  A svp one builds stage1 and fails
+       in libgcc, with a bunch of asm failures because of (unsupported)
+       sv.* opcodes, and one reload failure in decContext, that I started
+       investigating.  (8:22)
+
+2021-01-27
+
+       * µW (1:10)
+
+2021-01-26
+
+       * VCoffee (1:41)
+
+2021-01-24
+
+       * GCC: Introduced vector modes, registers, classes, constraints,
+       renumbered and remapped registers, went over literals referring to
+       register numbers, and started implementation of move/load/store
+       and add for the V*DI integral types.  Still have to test that the
+       compiler still works after the renumbering.  The new insns are not
+       generated yet, I haven't made the new registers usable for
+       anything yet.  (12:13)
+
+2021-01-22
+
+       * 578: Specifying and debating the task with luke and, later,
+       jacob.  Difficulties in conveying the requirements and overcoming
+       the complexities involved in figuring out how to parse each asm
+       operand in Python, underspecification of the input language,
+       disagreement as to the complexity and the amount of work required
+       to duplicate existing binutils functionality in python, and then
+       duplicate this work one more time into binutils later, led Luke to
+       take it upon himself.
+       * 579: Talked to Jacob a bit about potential implementation
+       strategies.  The need to build an immediate constant to use as the
+       operand to .long/svp64 makes for plenty of complexity, even in
+       C++.  I'm again unhappy with a plan that involves so much
+       intentional waste of effort.  I'm also very surprised with the
+       estimated amount of work involved in this task, compared with
+       578, that is a much bigger one with all the rewriting of an asm
+       parser, and likely more rewriting as the extended asm syntax
+       evolves.  And thus pretty much a full workday ends up wasted,
+       most of it complaining about planning to waste work.  (8:29)
+
+2021-01-19
+
+       * Virtual Coffe (1:39)
+
+2021-01-13
+
+       * Microwatts meeting (1:08)
+
+2021-01-07
+
+       * 572: New, split out of 570, on what .[sv], elwidth, subvl
+       affect in load/store ops: the address [vector] or the in-memory
+       [vector]?
+
+2021-01-06
+
+       * 570: New.  It's not specified whether selection of elwidth
+       sub-dword bytes get byte-reversed into LE before or after the
+       selection.  The specs say we convert loaded words to LE as quickly
+       as possible, so that all internal operations are LE, but this
+       would lead to reversal of sub-register vector elements when
+       loading, even when using svp64 loads with the correct elwidth_src.
+       * 569: New.  Also concerned about how to get bit arrays properly
+       loaded into predicate registers so that the *bits* are reversed to
+       match LE requirements.
+       * 568: New.  After gotting clarification from Jacob about setvl's
+       behavior: VL gets set to MIN(VL, MAXVL), you can count on its not
+       being a smaller value.  This is documented only in pseudocode, it
+       could be made more self-evident.  (3:13)
+
+2021-01-05
+
+       * 567: Cesar filed it for me; I clarified it a bit further.
+
+2021-01-04
+
+       * 560: Tried to show I understand the effects of loads and
+       byte-swapping loads in both endiannesses, and restated my
+       suggestion of iteration order matching the natural memory layout
+       of arrays/vectors.  (1:46)
+
+2021-01-03
+
+       * 560: Pointed out the circular reasoning in assuming LE in
+       showing it works for LE and BE, stated the problem with BE and how
+       the current BE status is incompatible with both PPC vectors and
+       with how svp64 vectors are said to be expected to work.
+       Recommended ruling BE out entirely for now, if the approach is to
+       not look into the problems, this will result in broken,
+       self-inconsistent specs that we'll either have to discontinue or
+       carry indefinitely.
+       * 558: Looked at the riscv implementation, particularly commit
+       4922a0bed80f8fa1b7d507eee6f94fb9c34bfc32, the testcases in
+       299ed9a4eaa569a5fc2b2291911ebf55318e85e4, and the reduction of
+       redundant setvli in e71a47e3cd553cec24afbc752df2dc2214dd3850, and
+       5fa22b2687b1f6ca1558fb487fc07e83ffe95707 that enables vl to not be
+       a power of two.
+       * 560: Wrote up about significance, ordering, endianness and such
+       conventions.  (6:21)
+
+2020-12-30
+
+       * 559: Luke split out the issue of whether we should we have
+       automatic detection and reversal of direction of vectors, so that
+       they always behave as if parallel, even if implemented as
+       sequential.  Jacob pointed out that reversal is not enough for
+       some 3-operand cases.
+       * SVP64: Second review call.
+       * 562: Filed, on elwidth encoding.
+       * 558: Raised the need for the compiler to be able to save and
+       restore VL, if it's exposed separately from maxvl; also brought up
+       calling conventions.
+       * 560: Commented on potential endianness issue: identity of
+       register as scalar and of first element of vector starting at that
+       register.  More questions on issues that arise in big endian mode,
+       and compatibility we may wish to aim for.  Some difficulties in
+       getting as much as a conversation going on endianness-influenced
+       sub-register iteration order; presented a simple scenario that
+       demonstrates the fundamental programming problems that will arise
+       out of favoring LE as we seem to.
+       * 558: Explained why disregarding things the compiler will do on
+       its own and arguing it shouldn't do that doesn't make the initial
+       project simpler, but harder, and also more fragile and likely to
+       be throw-away code in the end.  Argued for in favor of seeing
+       where we want to get to in the end, and then mapping out what it
+       takes to get features we want for the first stage so that it's a
+       step in the general direction of the end goal.  (6:43)
+
+2020-12-28
+
+       * 558: Commented on vector modes, insns, regalloc, scheduling,
+       auto vectorization, instrinsics, and the possibilities of vector
+       length and component modes as parameters to template insns and
+       instrinsics, and of mechanic generation thereof.  (2:22)
+
+2020-12-26
+
+       * SVP64: Reviewed overview and proposed encoding, posted more
+       questions.  (2:30)
+
+2020-12-25
+
+       * Email backlog.
+       * SVP64: More studying, more making sense.  Asked about
+       parallelism vs dependencies.  (3:02)
+
+       * 550: Implemented the first cut at svp64 prefix in the assembler,
+       namely, a 32-bit pseudo-insn that takes a 24-bit immediate
+       operand, encoding it as an insn with EXT01 as the major opcode,
+       MSB0 bits 7 and 9 also set, and the top two bits of the immediate
+       shuffled into bits 6 and 8.  Added patch to bugzill and to the
+       wiki.  Updated status.  (1:41)
+
+2020-12-23
+
+       * SVP64: Review meeting.
+       * 555: Reduce flag/s for fma.  Commented on the possibilities.
+       (1:26)
+
+2020-12-20
+
+       * 532: Implemented logic for mode-switching 32-bit insns with 6
+       bits for the opcode, a 16-bit embedded compressed insn, and 10
+       bits corresponding to subsequent insns, to tell whether or not
+       each of them is compressed.  This nearly doubled the compression
+       rate, using one such mode-switching insn per 3 compressed insns.
+       (1:48)
+
+2020-12-14
+
+       * 532: Reported on compression ratio findings and analyses.
+       (1:06)
+
  2020-12-13
  
         * 532: Questioned some bullets under 16-imm opcodes.  Implemented