From: Jan Beulich Date: Wed, 23 Mar 2022 07:41:54 +0000 (+0100) Subject: gas: retain whitespace between strings X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=b3446f947bd16a0e2a211343d076c36e4de68a2c;p=binutils-gdb.git gas: retain whitespace between strings Macro arguments may be separated by commas or just whitespace. Macro arguments may also be quoted (where one level of quotes is removed in the course of determining the values for the respective formal parameters). Furthermore this quote removal knows _two_ somewhat odd escaping mechanisms: One, apparently in existence forever, is that a pair of quotes counts as the escaping of a quote, with the pair being transformed to a single quote in the course of quote removal. The other (introduced by c06ae4f232e6) looks more usual on the surface in that it deals with \" sequences, but it _retains_ the escaping \. Hence only the former mechanism is suitable when the value to be used by the macro body is to contain a quote. Yet this results in ambiguity of what "a""b" is intended to mean; elsewhere (e.g. for .ascii) it represents two successive string literals. However, in any event is the above different from "a" "b": I don't think this can be viewed the same as "a""b" when processing macro arguments. Change the scrubber to retain such whitespace, by making the processing of strings more similar to that of symbols. And indeed this appears to make sense when taking into account that for quite a while gas has been supporting quoted symbol names. Taking a more general view, however, the change doesn't go quite far enough. There are further cases where significant whitespace is removed by the scrubber. The new testcase enumerates a few in its ".if 0" section. I'm afraid the only way that I see to deal with this would be to significantly simplify the scrubber, such that it wouldn't do much more than collapse sequences of unquoted whitespace into a single blank. To be honest problems in this area aren't really surprising when seeing that there's hardly any checking of .macro use throughout the testsuite (and in particular in the [relatively] generic tests under all/). --- diff --git a/gas/app.c b/gas/app.c index dd96b410bad..a3ad146625c 100644 --- a/gas/app.c +++ b/gas/app.c @@ -1088,10 +1088,10 @@ do_scrub_chars (size_t (*get) (char *, size_t), char *tostart, size_t tolen) /* PUT didn't jump out. We could just break, but we know what will happen, so optimize a bit. */ ch = GET (); - old_state = 3; + old_state = 9; } - else if (state == 9) - old_state = 3; + else if (state == 3) + old_state = 9; else old_state = state; state = 5; diff --git a/gas/doc/as.texi b/gas/doc/as.texi index 55b3156f1fa..9a7db3fb81f 100644 --- a/gas/doc/as.texi +++ b/gas/doc/as.texi @@ -6186,6 +6186,27 @@ Note: this problem of correctly identifying string parameters to pseudo ops also applies to the identifiers used in @code{.irp} (@pxref{Irp}) and @code{.irpc} (@pxref{Irpc}) as well. +Another issue can occur with the actual arguments passed during macro +invocation: Multiple arguments can be separated by blanks or commas. To have +arguments actually contain blanks or commas (or potentially other non-alpha- +numeric characters), individual arguments will need to be enclosed in either +parentheses @code{()}, square brackets @code{[]}, or double quote @code{"} +characters. The latter may be the only viable option in certain situations, +as only double quotes are actually stripped while establishing arguments. It +may be important to be aware of two escaping models used when processing such +quoted argument strings: For one two adjacent double quotes represent a single +double quote in the resulting argument, going along the lines of the stripping +of the enclosing quotes. But then double quotes can also be escaped by a +backslash @code{\}, but this backslash will not be retained in the resulting +actual argument as then seen / used while expanding the macro. + +As a consequence to the first of these escaping mechanisms two string literals +intended to be representing separate macro arguments need to be separated by +white space (or, better yet, by a comma). To state it differently, such +adjacent string literals - even if separated only by a blank - will not be +concatenated when determining macro arguments, even if they're only separated +by white space. This is unlike certain other pseudo ops, e.g. @code{.ascii}. + @item .endm @cindex @code{endm} directive Mark the end of a macro definition. diff --git a/gas/testsuite/gas/all/gas.exp b/gas/testsuite/gas/all/gas.exp index a6c64407687..ae7957b2ee0 100644 --- a/gas/testsuite/gas/all/gas.exp +++ b/gas/testsuite/gas/all/gas.exp @@ -461,6 +461,8 @@ if [is_elf_format] { run_dump_test quoted-sym-names +run_list_test macro "-alm" + run_list_test pr20312 load_lib gas-dg.exp diff --git a/gas/testsuite/gas/all/macro.l b/gas/testsuite/gas/all/macro.l new file mode 100644 index 00000000000..75fe338f132 --- /dev/null +++ b/gas/testsuite/gas/all/macro.l @@ -0,0 +1,25 @@ +# This should match the output of gas -alm macro.s. + +.*: Assembler messages: +.*:10: Error: too many positional arguments +.*macro.s.* + + +[ ]*[1-9][0-9]*[ ]+\.macro[ ]+m[ ]+arg1,[ ]*arg2[ ]* +#... +[ ]*[1-9][0-9]*[ ]+\.endm[ ]* +[ ]*[1-9][0-9]*[ ]+ +[ ]*[1-9][0-9]*[ ]+m[ ]+1,[ ]*2 +[ ]*[1-9][0-9]*[ ]+.... 0+10*[ ]+> .byte 1 +[ ]*[1-9][0-9]*[ ]+.... 0+20*[ ]+> .byte 2 +[ ]*[1-9][0-9]*[ ]+m[ ]+3[ ]+4 +[ ]*[1-9][0-9]*[ ]+.... 0+30*[ ]+> .byte 3 +[ ]*[1-9][0-9]*[ ]+.... 0+40*[ ]+> .byte 4 +[ ]*[1-9][0-9]*[ ]+m[ ]+"5",[ ]*"6" +[ ]*[1-9][0-9]*[ ]+.... 0+50*[ ]+> .byte 5 +[ ]*[1-9][0-9]*[ ]+.... 0+60*[ ]+> .byte 6 +[ ]*[1-9][0-9]*[ ]+m[ ]+"7"[ ]+"8" +[ ]*[1-9][0-9]*[ ]+.... 0+70*[ ]+> .byte 7 +[ ]*[1-9][0-9]*[ ]+.... 0+80*[ ]+> .byte 8 +[ ]*[1-9][0-9]*[ ]+m[ ]+""[ ]+""[ ]+"" +#pass diff --git a/gas/testsuite/gas/all/macro.s b/gas/testsuite/gas/all/macro.s new file mode 100644 index 00000000000..9e70f3067b7 --- /dev/null +++ b/gas/testsuite/gas/all/macro.s @@ -0,0 +1,16 @@ + .macro m arg1, arg2 + .byte \arg1 + .byte \arg2 + .endm + + m 1, 2 + m 3 4 + m "5", "6" + m "7" "8" + m "" "" "" + + .if 0 + m 1 +2 + m (3) +4 + m (5) (6) + .endif