PR gdb/16575: stale breakpoint instructions in the code cache
In non-stop mode, or rather, breakpoints always-inserted mode, the
code cache can easily end up with stale breakpoint instructions:
All it takes is filling a cache line when breakpoints already exist in
that memory region, and then delete the breakpoint.
Vis. (from the new test):
(gdb) set breakpoint always-inserted on
(gdb) b 23
Breakpoint 2 at 0x400540: file ../../../src/gdb/testsuite/gdb.base/breakpoint-shadow.c, line 23.
(gdb) b 24
Breakpoint 3 at 0x400547: file ../../../src/gdb/testsuite/gdb.base/breakpoint-shadow.c, line 24.
disass main
Dump of assembler code for function main:
0x000000000040053c <+0>: push %rbp
0x000000000040053d <+1>: mov %rsp,%rbp
=> 0x0000000000400540 <+4>: movl $0x1,-0x4(%rbp)
0x0000000000400547 <+11>: movl $0x2,-0x4(%rbp)
0x000000000040054e <+18>: mov $0x0,%eax
0x0000000000400553 <+23>: pop %rbp
0x0000000000400554 <+24>: retq
End of assembler dump.
So far so good. Now flush the code cache:
(gdb) set code-cache off
(gdb) set code-cache on
Requesting a disassembly works as expected, breakpoint shadowing is
applied:
(gdb) disass main
Dump of assembler code for function main:
0x000000000040053c <+0>: push %rbp
0x000000000040053d <+1>: mov %rsp,%rbp
=> 0x0000000000400540 <+4>: movl $0x1,-0x4(%rbp)
0x0000000000400547 <+11>: movl $0x2,-0x4(%rbp)
0x000000000040054e <+18>: mov $0x0,%eax
0x0000000000400553 <+23>: pop %rbp
0x0000000000400554 <+24>: retq
End of assembler dump.
However, now delete the breakpoints:
(gdb) delete
Delete all breakpoints? (y or n) y
And disassembly shows the old breakpoint instructions:
(gdb) disass main
Dump of assembler code for function main:
0x000000000040053c <+0>: push %rbp
0x000000000040053d <+1>: mov %rsp,%rbp
=> 0x0000000000400540 <+4>: int3
0x0000000000400541 <+5>: rex.RB cld
0x0000000000400543 <+7>: add %eax,(%rax)
0x0000000000400545 <+9>: add %al,(%rax)
0x0000000000400547 <+11>: int3
0x0000000000400548 <+12>: rex.RB cld
0x000000000040054a <+14>: add (%rax),%al
0x000000000040054c <+16>: add %al,(%rax)
0x000000000040054e <+18>: mov $0x0,%eax
0x0000000000400553 <+23>: pop %rbp
0x0000000000400554 <+24>: retq
End of assembler dump.
Those breakpoint instructions are no longer installed in target memory
they're stale in the code cache. Easily confirmed by just disabling
the code cache:
(gdb) set code-cache off
(gdb) disass main
Dump of assembler code for function main:
0x000000000040053c <+0>: push %rbp
0x000000000040053d <+1>: mov %rsp,%rbp
=> 0x0000000000400540 <+4>: movl $0x1,-0x4(%rbp)
0x0000000000400547 <+11>: movl $0x2,-0x4(%rbp)
0x000000000040054e <+18>: mov $0x0,%eax
0x0000000000400553 <+23>: pop %rbp
0x0000000000400554 <+24>: retq
End of assembler dump.
I stumbled upon this when writing a patch to infrun.c, that made
handle_inferior_event & co fill in the cache before breakpoints were
removed from the target. Recall that wait_for_inferior flushes the
dcache for every event. So in that case, always-inserted mode was not
necessary to trigger this. It's just a convenient way to expose the
issue.
The dcache works at the raw memory level. We need to update it
whenever memory is written, no matter what kind of target memory
object was originally passed down by the caller. The issue is that
the dcache update code isn't reached when a caller explicitly writes
raw memory. Breakpoint insertion/removal is one such case --
mem-break.c uses target_write_read_memory/target_write_raw_memory.
The fix is to move the dcache update code from memory_xfer_partial_1
to raw_memory_xfer_partial so that it's always reachable.
When we do that, we can actually simplify a series of things.
memory_xfer_partial_1 no longer needs to handle writes for any kind of
memory object, and therefore dcache_xfer_memory no longer needs to
handle writes either. So the latter (dcache_xfer_memory) and its
callees can be simplified to only care about reads. While we're
touching dcache_xfer_memory's prototype, might as well rename it to
reflect that fact that it only handles reads, and make it follow the
new target_xfer_status/xfered_len style. This made me notice that
dcache_xfer_memory loses the real error status if a memory read fails:
we could have failed to read due to TARGET_XFER_E_UNAVAILABLE, for
instance, but we always return TARGET_XFER_E_IO, hence the FIXME note.
I felt that fixing that fell out of the scope of this patch.
Currently dcache_xfer_memory handles the case of a write failing. The
whole cache line is invalidated when that happens. However,
dcache_update, the sole mechanism for handling writes that will remain
after the patch, does not presently handle that scenario. That's a
bug. The patch makes it handle that, by passing down the
target_xfer_status status from the caller, so that it can better
decide what to do itself. While I was changing the function's
prototype, I constified the myaddr parameter, getting rid of the need
for the cast as seen in its existing caller.
Tested on x86_64 Fedora 17, native and gdbserver.
gdb/
2014-03-05 Pedro Alves <palves@redhat.com>
PR gdb/16575
* dcache.c (dcache_poke_byte): Constify ptr parameter. Return
void. Update comment.
(dcache_xfer_memory): Delete.
(dcache_read_memory_partial): New, based on the read bits of
dcache_xfer_memory.
(dcache_update): Add status parameter. Use ULONGEST for len, and
adjust. Discard cache lines if the reason for the update was
error.
* dcache.h (dcache_xfer_memory): Delete declaration.
(dcache_read_memory_partial): New declaration.
(dcache_update): Update prototype.
* target.c (raw_memory_xfer_partial): Update the dcache here.
(memory_xfer_partial_1): Don't handle dcache writes here.
gdb/testsuite/
2014-03-05 Pedro Alves <palves@redhat.com>
PR gdb/16575
* gdb.base/breakpoint-shadow.exp (compare_disassembly): New
procedure.
(top level): Adjust to use it. Add tests that exercise breakpoint
interaction with the code-cache.