error/internal-error printing local variable during "bt full".
One of our users reported an internal error using the "bt full"
command. In their situation, reproducing involved the following
scenario:
(gdb) frame 1
(gdb) bt full
#0 0xf7783430 in __kernel_vsyscall ()
No symbol table info available.
#1 0xf5550aeb in waitpid () at ../sysdeps/unix/syscall-template.S:81
No locals.
[...]
#6 0x0fe83139 in xxxx (arg=...)
[...some locals printed, and then...]
<S17b> =
[...]/dwarf2loc.c:364: internal-error: dwarf_expr_frame_base: Assertion
`framefunc != NULL' failed.
As shown above, the error happens while GDB is trying to print the value
of <S17b>, which is a local string internally generated by the compiler.
For that, it finds that the array lives in memory, and therefore tries
to create a struct value for it via:
case DWARF_VALUE_MEMORY:
{
CORE_ADDR address = dwarf_expr_fetch_address (ctx, 0);
[...]
retval = value_at_lazy (type, address + byte_offset);
Unfortunately for us, TYPE happens to be an array whose bounds
are dynamic. More precisely, the bounds of our arrays are described
in the debugging info as being...
<4><
2c1985e>: Abbrev Number: 33 (DW_TAG_subrange_type)
<
2c1985f> DW_AT_type : <0x2c1989c>
<
2c19863> DW_AT_lower_bound : <0x2c19835>
<
2c19867> DW_AT_upper_bound : <0x2c19841>
... which are references to a pair of local variables. For instance,
the lower bound is a reference to the following DIE
<3><
2c19835>: Abbrev Number: 32 (DW_TAG_variable)
<
2c19836> DW_AT_name : [...]
<
2c1983a> DW_AT_type : <0x2c198b4>
<
2c1983e> DW_AT_artificial : 1
<
2c1983e> DW_AT_location : 2 byte block: 91 58 (DW_OP_fbreg: -40)
As a result of the above, value_at_lazy indirectly triggers
a resolution of TYPE (via value_from_contents_and_address),
which means a resolution of TYPE's bounds, and as seen in
the DW_AT_location attribute above for our bounds, computing
the bound's location requires the frame (its location expression
uses DW_OP_fbreg).
Unfortunately for us, value_at_lazy does not get passed a frame,
we've lost the relevant frame when we try to resolve the array's
bounds. Instead, resolve_dynamic_range gets calls dwarf2_evaluate_property
with NULL as the frame:
static struct type *
resolve_dynamic_range (struct type *dyn_range_type,
struct property_addr_info *addr_stack)
{
[...]
if (dwarf2_evaluate_property (prop, NULL, addr_stack, &value))
^^^^
... which then handles this by using the selected frame instead:
if (frame == NULL && has_stack_frames ())
frame = get_selected_frame (NULL);
In our case, the selected frame happens to be frame #1, which is
a frame where we have a minimal amount of debugging info, and in
particular, no debug info for the function itself. And because of that,
when we try to determine the frame's base...
static void
dwarf_expr_frame_base (void *baton, const gdb_byte **start,
size_t * length)
{
struct dwarf_expr_baton *debaton = (struct dwarf_expr_baton *) baton;
const struct block *bl = get_frame_block (debaton->frame, NULL);
[...]
framefunc = block_linkage_function (bl);
... framefunc ends up being NULL, which triggers the assert
in that same function:
gdb_assert (framefunc != NULL);
This patches avoids the issue by temporarily setting the selected_frame
before printing the locals of each frames.
This patch also adds a small testcase, which reproduces the same
issue, but with a slightly different outcome:
(gdb) bt full
#0 0x000000000040049a in opaque_routine ()
No symbol table info available.
#1 0x0000000000400532 in main () at wrong_frame_bt_full-main.c:20
my_table_size = 3
my_table = <error reading variable my_table (frame address is not available.)>
With this patch, the output becomes:
(gdb) bt full
[...]
my_table = {0, 1, 2}
gdb/ChangeLog:
* stack.c (print_frame_local_vars): Temporarily set the selected
frame to FRAME while printing the frame's local variables.
gdb/testsuite/ChangeLog:
* gdb.base/wrong_frame_bt_full-main.c: New file.
* gdb.base/wrong_frame_bt_full-opaque.c: New file.
* gdb.base/wrong_frame_bt_full.exp: New file.