gdb/solib-rocm: Detect SO for unsupported AMDGPU device
authorLancelot SIX <lancelot.six@amd.com>
Wed, 26 Jul 2023 10:30:56 +0000 (11:30 +0100)
committerLancelot Six <lancelot.six@amd.com>
Thu, 24 Aug 2023 19:33:04 +0000 (19:33 +0000)
It is possible to debug a process which uses unsupported AMDGPU devices.
In such scenario, we can still use librocm-dbgapi.so to attach to the
process and complete the runtime activation sequence.

However, when listing shared objects loaded on the AMDGPU devices, we
might list SOs loaded on the unsupported devices.  If such SO is
seen, one of two things can happen.

First, if the arch of this device is unknown to BFD,
'gdbarch_find_by_info (gdbarch_info info)' will return the gdbarch
matching default_bfd_arch.  As a result,
rocm_solib_relocate_section_addresses will delegate the relocation
operation to svr4_so_ops.relocate_section_addresses, but this makes no
sense: this code object was not loaded by the system loader.

The second case is if BFD knows the micro-architecture of the device,
but dbgapi does not support it.  In such case, gdbarch_info_fill will
successfully identify an amdgcn architecture (bfd_arch_amdgcn).  From
there, gdbarch_find_by_info calls amdgpu_gdbarch_init which will fail to
query arch specific details from dbgapi and subsequently fail to
initialize the gdbarch object.  As a result, gdbarch_find_by_info
returns nullptr, which will down the line cause some "gdb_assert
(gdbarch != nullptr)" assertion failures.

This patch proposes to add a check in rocm_solib_bfd_open to ensure that
the architecture associated with the code object to open is fully
supported by both BFD and amd-dbgapi, and error-out otherwise.

Change-Id: Ica97ab7cba45e4944b77d3080c54c1038aaeda54
Approved-By: Pedro Alves <pedro@palves.net>
gdb/solib-rocm.c

index 882920a3711f6d766096c40daf2c86a0b62f9239..56c210e9fa5586830e7a0d3f2c20905b4240937f 100644 (file)
@@ -663,6 +663,56 @@ rocm_solib_bfd_open (const char *pathname)
     error (_("`%s': ELF file HSA OS ABI version is not supported (%d)."),
           bfd_get_filename (abfd.get ()), osabiversion);
 
+  /* For GDB to be able to use this solib, the exact AMDGPU processor type
+     must be supported by both BFD and the amd-dbgapi library.  */
+  const unsigned char gfx_arch
+    = elf_elfheader (abfd)->e_flags & EF_AMDGPU_MACH ;
+  const bfd_arch_info_type *bfd_arch_info
+    = bfd_lookup_arch (bfd_arch_amdgcn, gfx_arch);
+
+  amd_dbgapi_architecture_id_t architecture_id;
+  amd_dbgapi_status_t dbgapi_query_arch
+    = amd_dbgapi_get_architecture (gfx_arch, &architecture_id);
+
+  if (dbgapi_query_arch != AMD_DBGAPI_STATUS_SUCCESS
+      || bfd_arch_info ==  nullptr)
+    {
+      if (dbgapi_query_arch != AMD_DBGAPI_STATUS_SUCCESS
+         && bfd_arch_info ==  nullptr)
+       {
+         /* Neither of the libraries knows about this arch, so we cannot
+            provide a human readable name for it.  */
+         error (_("'%s': AMDGCN architecture %#02x is not supported."),
+                bfd_get_filename (abfd.get ()), gfx_arch);
+       }
+      else if (dbgapi_query_arch != AMD_DBGAPI_STATUS_SUCCESS)
+       {
+         gdb_assert (bfd_arch_info != nullptr);
+         error (_("'%s': AMDGCN architecture %s not supported by "
+                  "amd-dbgapi."),
+                bfd_get_filename (abfd.get ()),
+                bfd_arch_info->printable_name);
+       }
+      else
+       {
+         gdb_assert (dbgapi_query_arch == AMD_DBGAPI_STATUS_SUCCESS);
+         char *arch_name;
+         if (amd_dbgapi_architecture_get_info
+             (architecture_id, AMD_DBGAPI_ARCHITECTURE_INFO_NAME,
+              sizeof (arch_name), &arch_name) != AMD_DBGAPI_STATUS_SUCCESS)
+           error ("amd_dbgapi_architecture_get_info call failed for arch "
+                  "%#02x.", gfx_arch);
+         gdb::unique_xmalloc_ptr<char> arch_name_cleaner (arch_name);
+
+         error (_("'%s': AMDGCN architecture %s not supported."),
+                bfd_get_filename (abfd.get ()),
+                arch_name);
+       }
+    }
+
+  gdb_assert (gdbarch_from_bfd (abfd.get ()) != nullptr);
+  gdb_assert (is_amdgpu_arch (gdbarch_from_bfd (abfd.get ())));
+
   return abfd;
 }