+2021-02-20
+
+ * GCC: Lowering DWARF_FRAME_REGISTERS, once I rebuilt the
+ compiler, libgcc, and the test program, avoided the problem. That
+ didn't make much sense, so I reversed that change and got back to
+ debugging. The signal frame seemed to be unwound correctly, but
+ instead of using the linux-unwind fallback frame stuff, that I'd
+ messed with a week before, I noticed it was using frame info from
+ the __sigtramp64rt (sp?) entry point in the kernel-supplied vdso.
+ Though I'm pretty sure that changing that file got me some
+ different results the week before, with vdso it couldn't possibly
+ be where things got wrong. So I proceeded to unwinding the frame
+ until we hit the caller of the infinitely-recursive function, and
+ found we got to the end of the stack before reaching it. Huh? A
+ GDB stack frame also hit the same problem. Oh, maybe there was
+ something wrong with the frame info for those early calls in the
+ thread. But the stack frame only stopped at the third or fourth
+ recursive call. That seemed fishy, so I started the program over,
+ and checked the stack trace at the point of the signal delivery,
+ and found it was fine. I stepped into the signal handler, and
+ into the exception raising machinery, and it was still fine. Only
+ after we started the unwinding did it get corrupted. At first I
+ suspected something going wrong because of out-of-range accesses
+ to the regs array, recompiled compiler and library and program
+ just to be sure, and still the same issue. Finally, then it
+ occurred to me to check where the alternate stack stack, in which
+ the stack overflow signal was handled, and found it to be running
+ into the other end of the task's stack. Turns out the Ada
+ runtime, when starting a task, allocates an alt stack to handle
+ stack overflows out of the stack itself. With the larger register
+ file, unwinding was taking up more of the alt stack space,
+ overflowing it and thus overwriting part of the task's call stack,
+ corrupting it to the point that the unwinder could no longer reach
+ the exception handler in the task setup code, supposed to catch an
+ escaping exception for the task parent to analyze/reraise.
+ Growing the alt stack size in the Ada runtime fixes the problem,
+ but since this explains why lowering DWARF_FRAME_REGISTERS avoided
+ the problem, I'm now happy to have it set to the lower value, at
+ least until call-saved SVP64 regs are needed. Adjusted other
+ references to ARG_POINTER_REGNUM in libgcc to use a fixed index.
+ Wrote a blog post about this, while regstrapping the fix.
+ https://www.fsfla.org/blogs/lxo/2021-02-20-longest-debugging-session.en.html
+ Success, no regressions. (9:09)
+
+2021-02-13
+
+ * GCC: Found libgcc/config/linux-unwind.h using GCC's internal
+ register numbers, and thus in need of renumbering as well. Alas,
+ the right fix didn't jump at me. There's some confusion about
+ using mapped register numbers or not. Using the pristine
+ libgcc_eh.a to link the program built with the new compiler, using
+ newly-built libraries, it works, but with the new libgcc_eh.a, it
+ fails, whether using 291 or 99 or 67 for R_AR, that used to be
+ ARG_POINTER_REGNUM. Changing R_AR and rebuilding doesn't alter
+ anything within gcc/ada, so it's not the Ada runtime. I guess
+ I may have to go back to debugging, as it's not clear whether GCC
+ is losing track of the frames or not finding the handler that
+ would propagate the EH to the thread that activated the task.
+ Tried experimenting with overriding DWARF_FRAME_REGISTERS to its
+ original value. (6:13)
+
+2021-02-10
+
+ * MW (0:48)
+
+2021-02-09
+
+ * VC (1:59)
+
+2021-02-05
+
+ * GCC: Started investigating the remaining regressions, all in
+ Ada. They all turn out to be -fstack-check tests. (0:40)
+
+2021-01-31
+
+ * GCC: Started debugging regressions in the stage1 non-svp64
+ compiler. Noticed that the renaming of mov to @altivec_mov
+ removed expanders for some modes used by altivec but not by svp64.
+ Reintroduced them, and added floating-point svp64 mov patterns.
+ Split out of the main patch a preparation patch that could be
+ submitted upstream right away, for it just prepares for register
+ renumbering. Fixed conditional register usage that, when svp64
+ was not enabled, caused the LAST/MAX_* non-SVP64 registers to be
+ marked as fixed, which caused the frame pointer to not be
+ preserved across calls. Fixed the *logue routines to account for
+ the register renumbering. Fixed the svp64 add expander to use a
+ correct expansion of <VI_unit>, covering V2DI at power8. With
+ that, we're down to a single C regression, when not enabling
+ svp64. While the expected behavior is for the compiler to
+ optimize gcc.target/powerpc/dform-3.c's gpr function so that p->c
+ is (re)loaded to e.g. r10&r11 with a vsx_movv2df_64bit, because
+ the MEM cost for its reload is negative, whereas the
+ svp64-modified compiler keeps both such instructions, first
+ loading to a VSX reg, then splitting it into a pair of GPRs. It
+ is a performance bug, but the generated code works. Trying a
+ bootstrap! Stage2 wouldn't build because of /* within comments in
+ rs6000-modes.def; adjusted the commented-out entries I'd put in to
+ avoid that. The memory move costs were off because of the use of
+ literal regno 32 when computing the costs for FPR classes.
+ Regstrapped the prepping patch successfully, then went back to the
+ patch that introduces svp64 support, still disabled. -msvp64 is
+ still slightly, but without enabling it, we may be down to no
+ regressions, testing should confirm. (14:37)
+
+2021-01-30
+
+ * GCC: Fixed the boundaries of the loops that disable SVP64
+ registers when SVP64 is not enabled. Fixed macros used for
+ parameter and return value assignment to reflect the new FP
+ numbers. Require at least one register operand for svp64 vector
+ mov pattern. Add emit of altivec insn when not using svp64.
+ Introduced a first svp64 reload change for preferred_reload, to
+ avoid trying to reload constants into altivec registers. A lot
+ more work will be needed for svp64 reloads. A non-svp64 native
+ compiler builds stage1, but the compiler is still pretty broken,
+ with thousands of regressions. A svp one builds stage1 and fails
+ in libgcc, with a bunch of asm failures because of (unsupported)
+ sv.* opcodes, and one reload failure in decContext, that I started
+ investigating. (8:22)
+
+2021-01-27
+
+ * µW (1:10)
+
+2021-01-26
+
+ * VCoffee (1:41)
+
+2021-01-24
+
+ * GCC: Introduced vector modes, registers, classes, constraints,
+ renumbered and remapped registers, went over literals referring to
+ register numbers, and started implementation of move/load/store
+ and add for the V*DI integral types. Still have to test that the
+ compiler still works after the renumbering. The new insns are not
+ generated yet, I haven't made the new registers usable for
+ anything yet. (12:13)
+
+2021-01-22
+
+ * 578: Specifying and debating the task with luke and, later,
+ jacob. Difficulties in conveying the requirements and overcoming
+ the complexities involved in figuring out how to parse each asm
+ operand in Python, underspecification of the input language,
+ disagreement as to the complexity and the amount of work required
+ to duplicate existing binutils functionality in python, and then
+ duplicate this work one more time into binutils later, led Luke to
+ take it upon himself.
+ * 579: Talked to Jacob a bit about potential implementation
+ strategies. The need to build an immediate constant to use as the
+ operand to .long/svp64 makes for plenty of complexity, even in
+ C++. I'm again unhappy with a plan that involves so much
+ intentional waste of effort. I'm also very surprised with the
+ estimated amount of work involved in this task, compared with
+ 578, that is a much bigger one with all the rewriting of an asm
+ parser, and likely more rewriting as the extended asm syntax
+ evolves. And thus pretty much a full workday ends up wasted,
+ most of it complaining about planning to waste work. (8:29)
+
+2021-01-19
+
+ * Virtual Coffe (1:39)
+
+2021-01-13
+
+ * Microwatts meeting (1:08)
+
+2021-01-07
+
+ * 572: New, split out of 570, on what .[sv], elwidth, subvl
+ affect in load/store ops: the address [vector] or the in-memory
+ [vector]?
+
+2021-01-06
+
+ * 570: New. It's not specified whether selection of elwidth
+ sub-dword bytes get byte-reversed into LE before or after the
+ selection. The specs say we convert loaded words to LE as quickly
+ as possible, so that all internal operations are LE, but this
+ would lead to reversal of sub-register vector elements when
+ loading, even when using svp64 loads with the correct elwidth_src.
+ * 569: New. Also concerned about how to get bit arrays properly
+ loaded into predicate registers so that the *bits* are reversed to
+ match LE requirements.
+ * 568: New. After gotting clarification from Jacob about setvl's
+ behavior: VL gets set to MIN(VL, MAXVL), you can count on its not
+ being a smaller value. This is documented only in pseudocode, it
+ could be made more self-evident. (3:13)
+
+2021-01-05
+
+ * 567: Cesar filed it for me; I clarified it a bit further.
+
+2021-01-04
+
+ * 560: Tried to show I understand the effects of loads and
+ byte-swapping loads in both endiannesses, and restated my
+ suggestion of iteration order matching the natural memory layout
+ of arrays/vectors. (1:46)
+
2021-01-03
* 560: Pointed out the circular reasoning in assuming LE in