When running GDB tests under Valgrind, various tests are failing due
to invalid memory access.
Here is the stack trace reported by Valgrind, for gdb.base/freebpcmd.exp :
==18658== Invalid read of size 8
==18658== at 0x7F9107: is_main (signalmodule.c:195)
==18658== by 0x7F9107: PyOS_InterruptOccurred (signalmodule.c:1730)
==18658== by 0x3696E2: check_quit_flag() (extension.c:829)
==18658== by 0x36980B: restore_active_ext_lang(active_ext_lang_state*) (extension.c:782)
==18658== by 0x48F617: gdbpy_enter::~gdbpy_enter() (python.c:235)
==18658== by 0x47BB71: add_thread_object(thread_info*) (object.h:470)
==18658== by 0x53A84D: operator() (std_function.h:687)
==18658== by 0x53A84D: notify (observable.h:106)
==18658== by 0x53A84D: add_thread_silent(ptid_t) (thread.c:311)
==18658== by 0x3CD954: inf_ptrace_target::create_inferior(char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&
, char**, int) (inf-ptrace.c:139)
==18658== by 0x3FE644: linux_nat_target::create_inferior(char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&,
char**, int) (linux-nat.c:1094)
==18658== by 0x3D5727: run_command_1(char const*, int, run_how) (infcmd.c:633)
==18658== by 0x2C05D1: cmd_func(cmd_list_element*, char const*, int) (cli-decode.c:1948)
==18658== by 0x53F29F: execute_command(char const*, int) (top.c:639)
==18658== by 0x3638EB: command_handler(char const*) (event-top.c:586)
==18658== by 0x36468C: command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) (event-top.c:771)
==18658== by 0x36407C: gdb_rl_callback_handler(char*) (event-top.c:217)
==18658== by 0x5B2A1F: rl_callback_read_char (callback.c:281)
==18658== by 0x36346D: gdb_rl_callback_read_char_wrapper_noexcept() (event-top.c:175)
==18658== by 0x363F70: gdb_rl_callback_read_char_wrapper(void*) (event-top.c:192)
==18658== by 0x3633AF: stdin_event_handler(int, void*) (event-top.c:514)
==18658== by 0x362504: gdb_wait_for_event (event-loop.c:857)
==18658== by 0x362504: gdb_wait_for_event(int) (event-loop.c:744)
==18658== by 0x362676: gdb_do_one_event() [clone .part.11] (event-loop.c:321)
==18658== by 0x3627AD: gdb_do_one_event (event-loop.c:303)
==18658== by 0x3627AD: start_event_loop() (event-loop.c:370)
==18658== by 0x41D35A: captured_command_loop() (main.c:381)
==18658== by 0x41F2A4: captured_main (main.c:1224)
==18658== by 0x41F2A4: gdb_main(captured_main_args*) (main.c:1239)
==18658== by 0x227D0A: main (gdb.c:32)
==18658== Address 0x10 is not stack'd, malloc'd or (recently) free'd
The problem seems to be created by gdbpy_enter::~gdbpy_enter () releasing the GIL lock
too early:
~gdbpy_enter () does:
...
PyGILState_Release (m_state);
python_gdbarch = m_gdbarch;
python_language = m_language;
restore_active_ext_lang (m_previous_active);
}
So, it releases the GIL lock, does 2 assignments and then leads to the following
call sequence:
restore_active_ext_lang => check_quit_flag => python.c gdbpy_check_quit_flag
=> PyOS_InterruptOccurred => is_main.
is_main code is:
static int
is_main(_PyRuntimeState *runtime)
{
unsigned long thread = PyThread_get_thread_ident();
PyInterpreterState *interp = _PyRuntimeState_GetThreadState(runtime)->interp;
return (thread == runtime->main_thread
&& interp == runtime->interpreters.main);
}
The macros and functions to access the thread state are documented as:
/* Variable and macro for in-line access to current thread
and interpreter state */
#define _PyRuntimeState_GetThreadState(runtime) \
((PyThreadState*)_Py_atomic_load_relaxed(&(runtime)->gilstate.tstate_current))
/* Get the current Python thread state.
Efficient macro reading directly the 'gilstate.tstate_current' atomic
variable. The macro is unsafe: it does not check for error and it can
return NULL.
The caller must hold the GIL.
See also PyThreadState_Get() and PyThreadState_GET(). */
#define _PyThreadState_GET() _PyRuntimeState_GetThreadState(&_PyRuntime)
So, we see that GDB releases the GIL and then potentially calls
_PyRuntimeState_GetThreadState that needs the GIL.
It is not very clear why the problem is only observed when running under
Valgrind. Probably caused by the slowdown due to Valgrind and/or to the 'single
thread' scheduling by Valgrind.
This patch fixes the crashes by releasing the GIT lock later.
2019-11-26 Philippe Waroquiers <philippe.waroquiers@skynet.be>
* python/python.c (gdbpy_enter::~gdbpy_enter): Release GIL after
restore_active_ext_lang, as GIL is needed for (indirectly)
called PyOS_InterruptOccurred.
+2019-11-26 Philippe Waroquiers <philippe.waroquiers@skynet.be>
+
+ * python/python.c (gdbpy_enter::~gdbpy_enter): Release GIL after
+ restore_active_ext_lang, as GIL is needed for (indirectly)
+ called PyOS_InterruptOccurred.
+
2019-11-26 Simon Marchi <simon.marchi@efficios.com>
* sparc-nat.c (sparc_xfer_wcookie): Sync declaration with
m_error->restore ();
- PyGILState_Release (m_state);
python_gdbarch = m_gdbarch;
python_language = m_language;
restore_active_ext_lang (m_previous_active);
+ PyGILState_Release (m_state);
}
/* Set the quit flag. */