From 087969169836f802a09b1cd0502d2f22d7a8f7dc Mon Sep 17 00:00:00 2001 From: Andrew Burgess Date: Tue, 23 May 2023 11:25:21 +0100 Subject: [PATCH] gdb: handle core files with .reg/0 section names The previous commit added the test gdb.arch/core-file-pid0.exp which tests GDB's ability to load a core file containing threads with an lwpid of 0, which is something we GDB can encounter when loading a vmcore file -- a core file generated by the Linux kernel. The threads with an lwpid of 0 represents idle cores. While the previous commit added the test, which confirms GDB doesn't crash when confronted with such a core file, there are still some problems with GDB's handling of these core files. These problems all originate from the fact that the core file (once opened by bfd) contains multiple sections called .reg/0, these sections all represents different threads (cpu cores in the original vmcore dump), but GDB gets confused and thinks all of these .reg/0 sections are all referencing the same thread. Here is a GDB session on an x86-64 machine which loads the core file from the gdb.arch/core-file-pid0.exp, this core file contains two threads, both of which have a pid of 0: $ ./gdb/gdb --data-directory ./gdb/data-directory/ -q (gdb) core-file /tmp/x86_64-pid0-core.core [New process 1] [New process 1] Failed to read a valid object file image from memory. Core was generated by `./segv-mt'. Program terminated with signal SIGSEGV, Segmentation fault. The current thread has terminated (gdb) info threads Id Target Id Frame 2 process 1 0x00000000004017c2 in ?? () The current thread has terminated. See `help thread'. (gdb) maintenance info sections Core file: `/tmp/x86_64-pid0-core.core', file type elf64-x86-64. [0] 0x00000000->0x000012d4 at 0x00000318: note0 READONLY HAS_CONTENTS [1] 0x00000000->0x000000d8 at 0x0000039c: .reg/0 HAS_CONTENTS [2] 0x00000000->0x000000d8 at 0x0000039c: .reg HAS_CONTENTS [3] 0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo/0 HAS_CONTENTS [4] 0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo HAS_CONTENTS [5] 0x00000000->0x00000140 at 0x000005c0: .auxv HAS_CONTENTS [6] 0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file/0 HAS_CONTENTS [7] 0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file HAS_CONTENTS [8] 0x00000000->0x00000200 at 0x000007cc: .reg2/0 HAS_CONTENTS [9] 0x00000000->0x00000200 at 0x000007cc: .reg2 HAS_CONTENTS [10] 0x00000000->0x00000440 at 0x000009e0: .reg-xstate/0 HAS_CONTENTS [11] 0x00000000->0x00000440 at 0x000009e0: .reg-xstate HAS_CONTENTS [12] 0x00000000->0x000000d8 at 0x00000ea4: .reg/0 HAS_CONTENTS [13] 0x00000000->0x00000200 at 0x00000f98: .reg2/0 HAS_CONTENTS [14] 0x00000000->0x00000440 at 0x000011ac: .reg-xstate/0 HAS_CONTENTS [15] 0x00400000->0x00401000 at 0x00002000: load1 ALLOC LOAD READONLY HAS_CONTENTS [16] 0x00401000->0x004b9000 at 0x00003000: load2 ALLOC READONLY CODE [17] 0x004b9000->0x004e5000 at 0x00003000: load3 ALLOC READONLY [18] 0x004e6000->0x004ec000 at 0x00003000: load4 ALLOC LOAD HAS_CONTENTS [19] 0x004ec000->0x004f2000 at 0x00009000: load5 ALLOC LOAD HAS_CONTENTS [20] 0x012a8000->0x012cb000 at 0x0000f000: load6 ALLOC LOAD HAS_CONTENTS [21] 0x7fda77736000->0x7fda77737000 at 0x00032000: load7 ALLOC READONLY [22] 0x7fda77737000->0x7fda77f37000 at 0x00032000: load8 ALLOC LOAD HAS_CONTENTS [23] 0x7ffd55f65000->0x7ffd55f86000 at 0x00832000: load9 ALLOC LOAD HAS_CONTENTS [24] 0x7ffd55fc3000->0x7ffd55fc7000 at 0x00853000: load10 ALLOC LOAD READONLY HAS_CONTENTS [25] 0x7ffd55fc7000->0x7ffd55fc9000 at 0x00857000: load11 ALLOC LOAD READONLY CODE HAS_CONTENTS [26] 0xffffffffff600000->0xffffffffff601000 at 0x00859000: load12 ALLOC LOAD READONLY CODE HAS_CONTENTS (gdb) Notice when the core file is first loaded we see two lines like: [New process 1] And GDB reports: The current thread has terminated Which isn't what we'd expect from a core file -- the core file should only contain threads that are live at the point of the crash, one of which should be the current thread. The above message is reported because GDB has deleted what we think is the current thread! And in the 'info threads' output we are only seeing a single thread, again, this is because GDB has deleted one of the threads. Finally, the 'maintenance info sections' output shows the cause of all our problems, two sections named .reg/0. When GDB sees the first of these it creates a new thread. But, when we see the second .reg/0 GDB tries to create another new thread, but this thread has the same ptid_t as the first thread, so GDB deletes the first thread and creates the second thread in its place. Because both these threads are created with an lwpid of 0 GDB reports these are 'New process NN' rather than 'New LWP NN' which is what we would normally expect. The previous commit includes a little more of the history of GDB support in this area, but these problems were discussed on the mailing list a while ago in this thread: https://inbox.sourceware.org/gdb-patches/AANLkTi=zuEDw6qiZ1jRatkdwHO99xF2Qu+WZ7i0EQjef@mail.gmail.com/ In this commit I propose a solution to these problems. What I propose is that GDB should spot when we have .reg/0 sections and, when these are found, should rename these sections using some unique non-zero lwpid. Note in the above output we also have sections like .reg2/0 and .reg-xstate/0, these are additional register sets, this commit also renumbers these sections inline with their .reg section. The user is warned that some section renumbering has been performed. GDB takes care to ensure that the new numbers assigned are unique and don't clash with any of the pid's that might already be in use -- remember, in a real vmcore file, 0 is used to indicate an idle core, non-idle cores will have the pid of whichever process was running on that core, so we don't want GDB to assign an lwpid that clashes with an actual pid that is in use in the core file. After this commit here's the updated GDB session output: $ ./gdb/gdb --data-directory ./gdb/data-directory/ -q (gdb) core-file /tmp/x86_64-pid0-core.core warning: found threads with pid 0, assigned replacement Target Ids: LWP 1, LWP 2 [New LWP 1] [New LWP 2] Failed to read a valid object file image from memory. Core was generated by `./segv-mt'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00000000004017c2 in ?? () [Current thread is 1 (LWP 1)] (gdb) info threads Id Target Id Frame * 1 LWP 1 0x00000000004017c2 in ?? () 2 LWP 2 0x000000000040dda5 in ?? () (gdb) maintenance info sections Core file: `/tmp/x86_64-pid0-core.core', file type elf64-x86-64. [0] 0x00000000->0x000012d4 at 0x00000318: note0 READONLY HAS_CONTENTS [1] 0x00000000->0x000000d8 at 0x0000039c: .reg/1 HAS_CONTENTS [2] 0x00000000->0x000000d8 at 0x0000039c: .reg HAS_CONTENTS [3] 0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo/1 HAS_CONTENTS [4] 0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo HAS_CONTENTS [5] 0x00000000->0x00000140 at 0x000005c0: .auxv HAS_CONTENTS [6] 0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file/1 HAS_CONTENTS [7] 0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file HAS_CONTENTS [8] 0x00000000->0x00000200 at 0x000007cc: .reg2/1 HAS_CONTENTS [9] 0x00000000->0x00000200 at 0x000007cc: .reg2 HAS_CONTENTS [10] 0x00000000->0x00000440 at 0x000009e0: .reg-xstate/1 HAS_CONTENTS [11] 0x00000000->0x00000440 at 0x000009e0: .reg-xstate HAS_CONTENTS [12] 0x00000000->0x000000d8 at 0x00000ea4: .reg/2 HAS_CONTENTS [13] 0x00000000->0x00000200 at 0x00000f98: .reg2/2 HAS_CONTENTS [14] 0x00000000->0x00000440 at 0x000011ac: .reg-xstate/2 HAS_CONTENTS [15] 0x00400000->0x00401000 at 0x00002000: load1 ALLOC LOAD READONLY HAS_CONTENTS [16] 0x00401000->0x004b9000 at 0x00003000: load2 ALLOC READONLY CODE [17] 0x004b9000->0x004e5000 at 0x00003000: load3 ALLOC READONLY [18] 0x004e6000->0x004ec000 at 0x00003000: load4 ALLOC LOAD HAS_CONTENTS [19] 0x004ec000->0x004f2000 at 0x00009000: load5 ALLOC LOAD HAS_CONTENTS [20] 0x012a8000->0x012cb000 at 0x0000f000: load6 ALLOC LOAD HAS_CONTENTS [21] 0x7fda77736000->0x7fda77737000 at 0x00032000: load7 ALLOC READONLY [22] 0x7fda77737000->0x7fda77f37000 at 0x00032000: load8 ALLOC LOAD HAS_CONTENTS [23] 0x7ffd55f65000->0x7ffd55f86000 at 0x00832000: load9 ALLOC LOAD HAS_CONTENTS [24] 0x7ffd55fc3000->0x7ffd55fc7000 at 0x00853000: load10 ALLOC LOAD READONLY HAS_CONTENTS [25] 0x7ffd55fc7000->0x7ffd55fc9000 at 0x00857000: load11 ALLOC LOAD READONLY CODE HAS_CONTENTS [26] 0xffffffffff600000->0xffffffffff601000 at 0x00859000: load12 ALLOC LOAD READONLY CODE HAS_CONTENTS (gdb) Notice the new warning which is issued when the core file is being loaded. The threads are announced as '[New LWP NN]', and we see two threads in the 'info threads' output. The 'maintenance info sections' output shows the result of the section renaming. The gdb.arch/core-file-pid0.exp test has been update to check for the improved GDB output. Reviewed-By: Kevin Buettner --- gdb/corelow.c | 150 ++++++++++++++++++++++ gdb/testsuite/gdb.arch/core-file-pid0.exp | 12 +- 2 files changed, 161 insertions(+), 1 deletion(-) diff --git a/gdb/corelow.c b/gdb/corelow.c index e706427772a..46bb1077b6d 100644 --- a/gdb/corelow.c +++ b/gdb/corelow.c @@ -407,6 +407,153 @@ core_file_command (const char *filename, int from_tty) core_target_open (filename, from_tty); } +/* A vmcore file is a core file created by the Linux kernel at the point of + a crash. Each thread in the core file represents a real CPU core, and + the lwpid for each thread is the pid of the process that was running on + that core at the moment of the crash. + + However, not every CPU core will have been running a process, some cores + will be idle. For these idle cores the CPU writes an lwpid of 0. And + of course, multiple cores might be idle, so there could be multiple + threads with an lwpid of 0. + + The problem is GDB doesn't really like threads with an lwpid of 0; GDB + presents such a thread as a process rather than a thread. And GDB + certainly doesn't like multiple threads having the same lwpid, each time + a new thread is seen with the same lwpid the earlier thread (with the + same lwpid) will be deleted. + + This function addresses both of these problems by assigning a fake lwpid + to any thread with an lwpid of 0. + + GDB finds the lwpid information by looking at the bfd section names + which include the lwpid, e.g. .reg/NN where NN is the lwpid. This + function looks though all the section names looking for sections named + .reg/NN. If any sections are found where NN == 0, then we assign a new + unique value of NN. Then, in a second pass, any sections ending /0 are + assigned their new number. + + Remember, a core file may contain multiple register sections for + different register sets, but the sets are always grouped by thread, so + we can figure out which registers should be assigned the same new + lwpid. For example, consider a core file containing: + + .reg/0, .reg2/0, .reg/0, .reg2/0 + + This represents two threads, each thread contains a .reg and .reg2 + register set. The .reg represents the start of each thread. After + renaming the sections will now look like this: + + .reg/1, .reg2/1, .reg/2, .reg2/2 + + After calling this function the rest of the core file handling code can + treat this core file just like any other core file. */ + +static void +rename_vmcore_idle_reg_sections (bfd *abfd, inferior *inf) +{ + /* Map from the bfd section to its lwpid (the /NN number). */ + std::vector> sections_and_lwpids; + + /* The set of all /NN numbers found. Needed so we can easily find unused + numbers in the case that we need to rename some sections. */ + std::unordered_set all_lwpids; + + /* A count of how many sections called .reg/0 we have found. */ + unsigned zero_lwpid_count = 0; + + /* Look for all the .reg sections. Record the section object and the + lwpid which is extracted from the section name. Spot if any have an + lwpid of zero. */ + for (asection *sect : gdb_bfd_sections (core_bfd)) + { + if (startswith (bfd_section_name (sect), ".reg/")) + { + int lwpid = atoi (bfd_section_name (sect) + 5); + sections_and_lwpids.emplace_back (sect, lwpid); + all_lwpids.insert (lwpid); + if (lwpid == 0) + zero_lwpid_count++; + } + } + + /* If every ".reg/NN" section has a non-zero lwpid then we don't need to + do any renaming. */ + if (zero_lwpid_count == 0) + return; + + /* Assign a new number to any .reg sections with an lwpid of 0. */ + int new_lwpid = 1; + for (auto §_and_lwpid : sections_and_lwpids) + if (sect_and_lwpid.second == 0) + { + while (all_lwpids.find (new_lwpid) != all_lwpids.end ()) + new_lwpid++; + sect_and_lwpid.second = new_lwpid; + new_lwpid++; + } + + /* Now update the names of any sections with an lwpid of 0. This is + more than just the .reg sections we originally found. */ + std::string replacement_lwpid_str; + auto iter = sections_and_lwpids.begin (); + int replacement_lwpid = 0; + for (asection *sect : gdb_bfd_sections (core_bfd)) + { + if (iter != sections_and_lwpids.end () && sect == iter->first) + { + gdb_assert (startswith (bfd_section_name (sect), ".reg/")); + + int lwpid = atoi (bfd_section_name (sect) + 5); + if (lwpid == iter->second) + { + /* This section was not given a new number. */ + gdb_assert (lwpid != 0); + replacement_lwpid = 0; + } + else + { + replacement_lwpid = iter->second; + ptid_t ptid (inf->pid, replacement_lwpid); + if (!replacement_lwpid_str.empty ()) + replacement_lwpid_str += ", "; + replacement_lwpid_str += target_pid_to_str (ptid); + } + + iter++; + } + + if (replacement_lwpid != 0) + { + const char *name = bfd_section_name (sect); + size_t len = strlen (name); + + if (strncmp (name + len - 2, "/0", 2) == 0) + { + /* This section needs a new name. */ + std::string name_str + = string_printf ("%.*s/%d", + static_cast (len - 2), + name, replacement_lwpid); + char *name_buf + = static_cast (bfd_alloc (abfd, name_str.size () + 1)); + if (name_buf == nullptr) + error (_("failed to allocate space for section name '%s'"), + name_str.c_str ()); + memcpy (name_buf, name_str.c_str(), name_str.size () + 1); + bfd_rename_section (sect, name_buf); + } + } + } + + if (zero_lwpid_count == 1) + warning (_("found thread with pid 0, assigned replacement Target Id: %s"), + replacement_lwpid_str.c_str ()); + else + warning (_("found threads with pid 0, assigned replacement Target Ids: %s"), + replacement_lwpid_str.c_str ()); +} + /* Locate (and load) an executable file (and symbols) given the core file BFD ABFD. */ @@ -542,6 +689,9 @@ core_target_open (const char *arg, int from_tty) inferior_appeared (inf, pid); inf->fake_pid_p = fake_pid_p; + /* Rename any .reg/0 sections, giving them each a fake lwpid. */ + rename_vmcore_idle_reg_sections (core_bfd, inf); + /* Build up thread list from BFD sections, and possibly set the current thread to the .reg/NN section matching the .reg section. */ diff --git a/gdb/testsuite/gdb.arch/core-file-pid0.exp b/gdb/testsuite/gdb.arch/core-file-pid0.exp index b960dfe095b..6e91111b44b 100644 --- a/gdb/testsuite/gdb.arch/core-file-pid0.exp +++ b/gdb/testsuite/gdb.arch/core-file-pid0.exp @@ -57,7 +57,17 @@ clean_restart # and incorrectly deletes what should be the current thread. gdb_test "core-file ${corefile}" \ [multi_line \ + "warning: found threads with pid 0, assigned replacement Target Ids: LWP 1, LWP 2" \ + ".*" \ "Core was generated by \[^\r\n\]+\\." \ "Program terminated with signal (?:11|SIGSEGV), Segmentation fault\\." \ - "The current thread has terminated"] \ + "#0\\s+$hex in \[^\r\n\]+" \ + "\\\[Current thread is 1 \\(LWP 1\\)\\\]"] \ "check core file termination reason" + +# And check GDB has found both threads. +gdb_test "info threads" \ + [multi_line \ + "\\* 1\\s+LWP 1\\s+$hex in \[^\r\n\]+" \ + " 2\\s+LWP 2\\s+$hex in \[^\r\n\]+"] \ + "check both threads are visible" -- 2.30.2