PR gdb/15713 - errors from i386_linux_resume lead to lock-up
linux_nat_resume is not considering that linux_ops->to_resume may throw:
/* Mark LWP as not stopped to prevent it from being continued by
linux_nat_resume_callback. */
lp->stopped = 0;
if (resume_many)
iterate_over_lwps (ptid, linux_nat_resume_callback, NULL);
If something within linux_nat_resume_callback throws, GDB leaves the
lwp_info as if the inferior was resumed, while it actually wasn't.
A couple examples, there are possibly others:
- i386_linux_resume calls target_read which calls QUIT.
- if the actual ptrace resumption fails in inf_ptrace_resume,
perror_with_name is called.
If the user tries to kill the inferior at this point (or quit, which
offers to kill), GDB locks up trying to stop the lwp -- if it is
already stopped no new waitpid event gets generated for it.
Fix this by setting the stopped flag earlier, as soon as we collect a
stop event with waitpid, and clearing it always only after resuming
the lwp successfully.
Tested on x86_64 Fedora 20. Confirmed the lock-up disappears using a
local hack that forces an error in inf_ptrace_resume.
Also fixes a little "set debug lin-lwp" annoyance. Currently we always see:
Continuing.
LLR: Preparing to resume process 6802, 0, inferior_ptid Thread 0x7ffff7fc7740 (LWP 6802)
^^^^^^^^
RC: Resuming sibling Thread 0x7ffff77c5700 (LWP 6807), 0, resume
RC: Resuming sibling Thread 0x7ffff7fc6700 (LWP 6806), 0, resume
RC: Not resuming sibling Thread 0x7ffff7fc7740 (LWP 6802) (not stopped)
^^^^^^^^^^^^^^^^^^^^^^^
LLR: PTRACE_CONT process 6802, 0 (resume event thread)
This patch gets rid of the "Not resuming sibling" line.
2014-05-29 Pedro Alves <palves@redhat.com>
PR gdb/15713
* linux-nat.c (linux_nat_resume_callback): Rename the second
parameter to 'except'. Skip LP if it points to EXCEPT.
(linux_nat_resume): Don't mark the event lwp as not stopped
before resuming sibling lwps. Instead ask
linux_nat_resume_callback to skip the event lwp. Mark it as not
stopped after actually resuming it.
(linux_handle_syscall_trap): Mark the lwp as not stopped after
resuming it.
(wait_lwp): Mark the lwp as stopped here.
(stop_wait_callback): Mark the lwp as not stopped right after
resuming it. Don't mark lwps as stopped here.
(linux_nat_filter_event): Mark the lwp as stopped earlier.
(linux_nat_wait_1): Don't mark dead lwps as stopped here.