Avoid unaligned pointer reads in PEP .idata section
This is something I discovered when working on aarch64, though it's
relevant to x86_64 too.
The PE32+ imports are located in the .idata section, which starts off
with a 20-byte structure for each DLL, containing offsets into the rest
of the section. This is the Import Directory Table in
https://learn.microsoft.com/en-us/windows/win32/debug/pe-format, which
is a concatenation of the .idata$2 sections. This is then followed by an
20 zero bytes generated by the linker script, which calls this .idata$3.
After this comes the .idata$4 entries for each function, which the
loader overwrites with the function pointers. Because there's no padding
between .idata$3 and .idata$4, this means that if there's an even number
of DLLs, the function pointers won't be aligned on an 8-byte boundary.
Misaligned reads are slower on x86_64, but this is more important on
aarch64, as the e.g. `ldr x0, [x0, :lo12:__imp__func]` the compiler
might generate requires __imp__func (the .idata$4 entry) to be aligned
to 8 bytes. Without this you get IMAGE_REL_ARM64_PAGEOFFSET_12L overflow
errors.