From 8ed5b76ea2de370265382dab1d538a919e2603ab Mon Sep 17 00:00:00 2001 From: Joel Brobecker Date: Tue, 30 Apr 2019 15:59:17 -0500 Subject: [PATCH] (Windows) fix thr != nullptr assert failure in delete_thread_1 We have observed that GDB would randomly trip the following assertion failure when debugging on Windows. When allowing the program to run until the inferior exits, we occasionally see: (gdb) cont Continuing. [Thread 48192.0xd100 exited with code 1] [Thread 48192.0x10ad8 exited with code 1] [Thread 48192.0x36e28 exited with code 0] [Thread 48192.0x52be4 exited with code 0] [Thread 48192.0x5aa40 exited with code 0] ../../src/gdb/thread.c:453: internal-error: void delete_thread_1(thread_inf o*, bool): Assertion `thr != nullptr' failed. Running the same scenario with some additional traces enabled... (gdb) set verbose (gdb) set debugevents ... allows us to understand what the issue is. To understand, we need to first look at the events received when starting the program, and in particular which threads got created how. First, we get a CREATE_PROCESS_DEBUG_EVENT for tid=0x442a8: gdb: kernel event for pid=317536 tid=0x442a8 code=CREATE_PROCESS_DEBUG_EVENT) Shortly after, we get some CREATE_THREAD_DEBUG_EVENT events, one of them being for tid=0x4010c: gdb: kernel event for pid=317536 tid=0x4010c code=CREATE_THREAD_DEBUG_EVENT) Fast forward a bit of debugging, and we do a "cont" as above, at which point the programs reaches the end, and the system reports "exit" events. The first interesting one is the following: gdb: kernel event for pid=317536 tid=0x442a8 code=EXIT_THREAD_DEBUG_EVENT) This is reporting a thread-exit event for a thread whose tid is the TID of what we call the "main thread". That's the thread that was created when we received the CREATE_PROCESS_DEBUG_EVENT notification, and whose TID is actually stored in a global variable named main_thread_id. This is not something we expected, as the assumption we made was that the main thread would exit last, and we would be notified of it via an EXIT_PROCESS_DEBUG_EVENT. But apparently, this is not always true, at least on Windows Server 2012 and 2016 where this issue has been observed happening randomly. The consequence of the above notification is that we call windows_delete_thread for that thread, which removes it from our list of known threads. And a little bit later, then we then get the EXIT_PROCESS_DEBUG_EVENT, and we can see that the associated tid is not the main_thread_id, but rather the tid of one of the threads that was created during the lifetime of the program, in this case tid=0x4010c: gdb: kernel event for pid=317536 tid=0x4010c code=EXIT_PROCESS_DEBUG_EVENT) And the debug trace printed right after shows why we're crashing: [Deleting Thread 317536.0x442a8] We are trying to delete the thread whose tid=0x442a8, which is the main_thread_id! As we have already deleted that thread before, the search for it returns a nullptr, which then trips the assertion check in delete_thread_1. This commit fixes this issue. It ignores the open question of what to do with the main_thread_id global, particularly after that thread has been removed from our list of threads. This will be dealt with as a separate patch, to allow cherry-picking this patch into a release branch. For now, we fix the code so as to avoid this crash. gdb/ChangeLog: * windows-nat.c (get_windows_debug_event) : Use current_event.dwThreadId instead of main_thread_id. --- gdb/ChangeLog | 5 +++++ gdb/windows-nat.c | 4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/gdb/ChangeLog b/gdb/ChangeLog index c09d4d45440..1b4b83d13af 100644 --- a/gdb/ChangeLog +++ b/gdb/ChangeLog @@ -1,3 +1,8 @@ +2019-04-30 Joel Brobecker + + * windows-nat.c (get_windows_debug_event) : + Use current_event.dwThreadId instead of main_thread_id. + 2019-04-30 Tom Tromey * ada-lang.c (ada_lookup_simple_minsyms): New function. diff --git a/gdb/windows-nat.c b/gdb/windows-nat.c index 50094187bd1..9f3242cc405 100644 --- a/gdb/windows-nat.c +++ b/gdb/windows-nat.c @@ -1637,11 +1637,11 @@ get_windows_debug_event (struct target_ops *ops, else if (saw_create == 1) { windows_delete_thread (ptid_t (current_event.dwProcessId, 0, - main_thread_id), + current_event.dwThreadId), 0, true /* main_thread_p */); ourstatus->kind = TARGET_WAITKIND_EXITED; ourstatus->value.integer = current_event.u.ExitProcess.dwExitCode; - thread_id = main_thread_id; + thread_id = current_event.dwThreadId; } break; -- 2.30.2