Revisions motivated by WRS:
[binutils-gdb.git] / gas / doc / as.texinfo
1 \input texinfo
2 @c @tex
3 @c \special{twoside}
4 @c @end tex
5 @setfilename as
6 @synindex ky cp
7 @ifinfo
8 This file documents the GNU Assembler "as".
9
10 Copyright (C) 1991 Free Software Foundation, Inc.
11
12 Permission is granted to make and distribute verbatim copies of
13 this manual provided the copyright notice and this permission notice
14 are preserved on all copies.
15
16 @ignore
17 Permission is granted to process this file through Tex and print the
18 results, provided the printed document carries copying permission
19 notice identical to this one except for the removal of this paragraph
20 (this paragraph not being relevant to the printed manual).
21
22 @end ignore
23 Permission is granted to copy and distribute modified versions of this
24 manual under the conditions for verbatim copying, provided also that the
25 section entitled ``GNU General Public License'' is included exactly as
26 in the original, and provided that the entire resulting derived work is
27 distributed under the terms of a permission notice identical to this
28 one.
29
30 Permission is granted to copy and distribute translations of this manual
31 into another language, under the above conditions for modified versions,
32 except that the section entitled ``GNU General Public License'' may be
33 included in a translation approved by the author instead of in the
34 original English.
35 @end ifinfo
36
37 @setchapternewpage odd
38 @settitle as (680x0)
39 @titlepage
40 @title{as}
41 @subtitle{The GNU Assembler}
42 @c if m680x0
43 @subtitle{(Motorola 680x0 version)}
44 @c fi m680x0
45 @sp 1
46 @subtitle January 1991
47 @sp 13
48 The Free Software Foundation Inc. thanks The Nice Computer
49 Company of Australia for loaning Dean Elsner to write the
50 first (Vax) version of @code{as} for Project GNU.
51 The proprietors, management and staff of TNCCA thank FSF for
52 distracting the boss while they got some work
53 done.
54 @sp 3
55 @author{Dean Elsner, Jay Fenlason & friends}
56 @author{revised by Roland Pesch for Cygnus Support}
57 @c pesch@cygnus.com
58 @page
59 @tex
60 \def\$#1${{#1}} % Kluge: collect RCS revision info without $...$
61 \xdef\manvers{\$Revision$} % For use in headers, footers too
62 {\parskip=0pt
63 \hfill Cygnus Support\par
64 \hfill \manvers\par
65 \hfill \TeX{}info \texinfoversion\par
66 }
67 @end tex
68
69 @vskip 0pt plus 1filll
70 Copyright @copyright{} 1991 Free Software Foundation, Inc.
71
72 Permission is granted to make and distribute verbatim copies of
73 this manual provided the copyright notice and this permission notice
74 are preserved on all copies.
75
76 Permission is granted to copy and distribute modified versions of this
77 manual under the conditions for verbatim copying, provided also that the
78 section entitled ``GNU General Public License'' is included exactly as
79 in the original, and provided that the entire resulting derived work is
80 distributed under the terms of a permission notice identical to this
81 one.
82
83 Permission is granted to copy and distribute translations of this manual
84 into another language, under the above conditions for modified versions,
85 except that the section entitled ``GNU General Public License'' may be
86 included in a translation approved by the author instead of in the
87 original English.
88 @end titlepage
89 @page
90
91 @node top, Syntax, top, top
92 @chapter Overview
93
94 @menu
95 * Syntax:: The (machine independent) syntax that assembly language
96 files must follow. The machine dependent syntax
97 can be found in the machine dependent section of
98 the manual for the machine that you are using.
99 * Segments:: How to use segments and subsegments, and how the
100 assembler and linker will relocate things.
101 * Symbols:: How to set up and manipulate symbols.
102 * Expressions:: And how the assembler deals with them.
103 * Pseudo Ops:: The assorted machine directives that tell the
104 assembler exactly what to do with its input.
105 * Machine Dependent:: Information specific to each machine.
106 @ignore @c pesch@cygnus.com---see comments at nodes ignored
107 * Maintenance:: Keeping the assembler running.
108 * Retargeting:: Teaching the assembler about new machines.
109 @end ignore
110 * License:: The GNU General Public License gives you permission
111 to redistribute GNU "as" on certain terms; and also
112 explains that there is no warranty.
113 @end menu
114
115 This manual is a user guide to the GNU assembler @code{as}.
116 @c pesch@cygnus.com:
117 @c The following should be conditional on machine config
118 @c if 680x0
119 This version of the manual describes @code{as} configured to generate
120 code for Motorola 680x0 architectures.
121 @c fi 680x0
122
123 @section Command-line Synopsis
124
125 @example
126 as [ -f ] [ -k ] [ -L ] [ -o @var{objfile} ] [ -R ] [ -v ] [ -w ]
127 @c if 680x0
128 [ -l ] [ -mc68000 | -mc68010 | -mc68020 ]
129 @c fi 680x0
130 [ -- | @var{files} @dots{} ]
131 @end example
132
133 @table @code
134 @item -f
135 ``fast''---skip preprocessing (assume source is compiler output)
136
137 @item -k
138 Issue warnings when difference tables altered for long displacements
139
140 @item -L
141 Keep (in symbol table) local symbols, starting with @samp{L}
142
143 @item -o @var{objfile}
144 Name the object-file output from @code{as}
145
146 @item -R
147 Fold data segment into text segment
148
149 @item -W
150 Supress warning messages
151
152 @c if 680x0
153 @item -l
154 Shorten references to undefined symbols, to one word instead of two
155
156 @item -mc68000 | -mc68010 | -mc68020
157 Specify what processor in the 68000 family is the target (default 68020)
158 @c fi 680x0
159
160 @item -- | @var{files} @dots{}
161 Source files to assemble, or standard input
162 @end table
163
164 @section Structure of this Manual
165 This document is intended to describe what you need to know to use GNU
166 @code{as}. We cover the syntax expected in source files, including
167 notation for symbols, constants, and expressions; the directives that
168 @code{as} understands; and of course how to invoke @code{as}.
169
170 @c if 680x0
171 We also cover special features in the 68000 configuration of @code{as},
172 including pseudo-operations.
173 @c fi 680x0
174
175 @ignore
176 This document also describes some of the
177 machine-dependent features of various flavors of the assembler.
178 This document also describes how the assembler works internally, and
179 provides some information that may be useful to people attempting to
180 port the assembler to another machine.
181 @end ignore
182
183 On the other hand, this manual is @emph{not} intended as an introduction
184 to assembly language programming---let alone programming in general! In
185 a similar vein, we make no attempt to introduce the machine
186 architecture; we do @emph{not} describe the instruction set, standard
187 mnemonics, registers or addressing modes that are standard to a
188 particular architecture. You may want to consult the manufacturer's
189 machine-architecture manual for this information.
190
191 @c I think this is premature---pesch@cygnus.com, 17jan1991
192 @ignore
193 Throughout this document, we assume that you are running @dfn{GNU},
194 the portable operating system from the @dfn{Free Software
195 Foundation, Inc.}. This restricts our attention to certain kinds of
196 computer (in particular, the kinds of computers that GNU can run on);
197 once this assumption is granted examples and definitions need less
198 qualification.
199
200 @code{as} is part of a team of programs that turn a high-level
201 human-readable series of instructions into a low-level
202 computer-readable series of instructions. Different versions of
203 @code{as} are used for different kinds of computer. In particular,
204 at the moment, @code{as} only works for the DEC Vax, the Motorola
205 680x0, the Intel 80386, the Sparc, and the National Semiconductor
206 32032/32532.
207 @end ignore
208
209 @section Terminology
210 @ignore
211 @c if all-architectures
212 GNU and @code{as} assume the computer that will run the programs it
213 assembles will obey these rules.
214
215 A (memory) @dfn{address} is 32 bits. The lowest address is zero.
216 @c fi all-architectures
217 @end ignore
218
219 Certain terms used in computing vary slightly in meaning according to
220 context. This is how we use some of them in this manual:
221
222 The @dfn{contents} of any memory address is one @dfn{byte} of
223 exactly 8 bits.
224
225 A @dfn{word} is 16 bits stored in two bytes of memory. The addresses
226 of these bytes differ by exactly 1.
227 @ignore
228 @c if all-architectures
229 Notice that the interpretation of
230 the bits in a word and of how to address a word depends on which
231 particular computer you are assembling for.
232 @c fi all-architectures
233 @end ignore
234
235 A @dfn{long word}, or @dfn{long}, is 32 bits stored in four contiguous
236 bytes of memory.
237 @ignore
238 @c if all-architectures
239 Again the interpretation and addressing of those bits is
240 machine dependent. For example, National Semiconductor 32x32 computers say
241 @emph{double word} where we say @emph{long}.
242 @c fi all-architectures
243 @end ignore
244
245 @ignore
246 @c if all-architectures
247 Numeric quantities are usually @emph{unsigned} or @emph{2's complement}.
248 @c fi all-architectures
249 @end ignore
250 Bytes, words and longs may store numbers. @code{as} manipulates
251 integer expressions as 32-bit numbers in 2's complement format.
252 When asked to store an integer in a byte or word, the lowest order
253 bits are stored.
254 @ignore
255 @c if all-architectures
256 The order of bytes in a word or long in memory is
257 determined by what kind of computer will run the assembled program.
258 We won't mention this important caveat again.
259 @c fi all-architectures
260 @end ignore
261
262 The meaning of these terms has changed over time. Although ``byte''
263 used to mean any length of contiguous bits, ``byte'' now pervasively
264 means exactly 8 contiguous bits. A ``word'' of 16 bits made sense
265 for 16-bit computers. Even on 32-bit computers, ``word'' still
266 means 16 bits---to machine language programmers. To many other
267 programmers ``word'' means 32 bits; if your habits differ from our
268 convention, you may need to pay special attention to this usage.
269 @ignore
270 @c if 32x32
271 Similarly ``long'' means 32 bits: from ``long word''. National
272 Semiconductor 32x32 machine language calls a 32-bit number a ``double
273 word''.
274 @c fi 32x32
275 @end ignore
276
277 The following table shows the terms used with GNU @code{as} for units of
278 memory, and contrasts them with normal usage in some other contexts.
279
280 @iftex
281 @sp 1
282 @end iftex
283 @center @emph{Names for integers of different sizes: some conventions}
284 @ifinfo
285 @example
286
287
288 length as GNU C 680x0 vax 32x32
289 (bits)
290
291 8 byte char byte byte byte
292 16 word short (int) word word word
293 32 long long (int) long(-word) long(-word) double-word
294 64 quad quad(-word)
295 128 octa octa-word
296
297 @end example
298 @end ifinfo
299 @tex
300 \halign{\tt\hfil #\quad&\rm #\hfil\quad&\rm #\hfil\quad&\rm
301 #\hfil\quad&\rm #\hfil\quad&\rm #\hfil\quad\cr
302 {\it length}\cr
303 {\it (bits)}&{\bf as}&{\bf GNU C}&{\bf 680x0}&{\bf vax}&{\bf 32x32}\cr
304 \noalign{\hrule}
305 8 &byte &char &byte &byte &byte \cr
306 16 &word &short (int)&word &word &word \cr
307 32 &long &long (int) &long(-word)&long(-word)&double-word\cr
308 64 &quad & & &quad(-word)\cr
309 128 &octa & & &octa-word\cr
310 }
311 @end tex
312
313 @section as, the GNU Assembler
314 @code{as} is primarily intended to assemble the output of the GNU C
315 compiler @code{gcc} for use by the linker @code{ld}. Nevertheless,
316 @code{as} tries to assemble correctly everything that the native
317 assembler would; any exceptions are documented explicitly
318 (@pxref{Machine Dependent}). This doesn't necessarily mean @code{as}
319 will use the same syntax as another assembler for the same architecture;
320 for example, we know of several incompatible versions of 680x0 assembly
321 language syntax.
322
323 GNU @code{as} is really a family of assemblers. If you use (or have
324 used) GNU @code{as} on another architecture, you should find a fairly
325 similar environment. Each version has much in common with the others,
326 including object file formats, most assembler directives (often called
327 @dfn{pseudo-ops)} and assembler syntax.
328
329 Unlike older assemblers, @code{as} tries to assemble a source program in
330 one pass of the source file. This has a subtle impact on the @kbd{.org}
331 directive (@pxref{Org}).
332
333 @section Command Line Options
334 @example
335 as [ options @dots{} ] [ file1 @dots{} ]
336 @end example
337
338 After the program name @code{as}, the command line may contain
339 options and file names. Options may be in any order, and may be
340 before, after, or between file names. The order of file names is
341 significant.
342
343 @subsection Options
344
345 @file{--} (two hyphens) by itself names the standard input file
346 explicitly, as one of the files for @code{as} tp assemble.
347
348 Except for @samp{--} any command line argument that begins with a
349 hyphen (@samp{-}) is an option. Each option changes the behavior of
350 @code{as}. No option changes the way another option works. An
351 option is a @samp{-} followed by one or more letters; the case of
352 the letter is important. No option (letter) should be used twice on
353 the same command line. (Nobody has decided what two copies of the
354 same option should mean.) All options are optional.
355
356 Some options expect exactly one file name to follow them. The file
357 name may either immediately follow the option's letter (compatible
358 with older assemblers) or it may be the next command argument (GNU
359 standard). These two command lines are equivalent:
360
361 @example
362 as -o my-object-file.o mumble
363 as -omy-object-file.o mumble
364 @end example
365
366 @section Input Files
367
368 We use the phrase @dfn{source program}, abbreviated @dfn{source}, to
369 describe the program input to one run of @code{as}. The program may
370 be in one or more files; how the source is partitioned into files
371 doesn't change the meaning of the source.
372
373 The source program is a catenation of the text in all the files, in the
374 order specified.
375
376 Each time you run @code{as} it assembles exactly one source
377 program. The source program is made up of one or more files.
378 (The standard input is also a file.)
379
380 You give @code{as} a command line that has zero or more input file
381 names. The input files are read (from left file name to right). A
382 command line argument (in any position) that has no special meaning
383 is taken to be an input file name.
384
385 If @code{as} is given no file names it attempts to read one input file
386 from @code{as}'s standard input, which is normally your terminal. You
387 may have to type @key{ctl-D} to tell @code{as} there is no more program
388 to assemble.
389
390 Use @samp{--} if you need to explicitly name the standard input file
391 in your command line.
392
393 If the source is empty, code{as} will produce a small, empty object
394 file.
395
396 @subsection Input Filenames and Line-numbers
397 There are two ways of locating a line in the input file (or files) and both
398 are used in reporting error messages. One way refers to a line
399 number in a physical file; the other refers to a line number in a
400 ``logical'' file.
401
402 @dfn{Physical files} are those files named in the command line given
403 to @code{as}.
404
405 @dfn{Logical files} are simply names declared explicitly by assembler
406 directives; they bear no relation to physical files. Logical file names
407 help error messages reflect the original source file, when @code{as}
408 source is itself synthesized from other files. @xref{File}.
409
410 @section Output (Object) File
411 Every time you run @code{as} it produces an output file, which is
412 your assembly language program translated into numbers. This file
413 is the object file, named @code{a.out} unless you tell @code{as} to
414 give it another name by using the @code{-o} option. Conventionally,
415 object file names end with @file{.o}. The default name of
416 @file{a.out} is used for historical reasons: older assemblers were
417 capable of assembling self-contained programs directly into a
418 runnable program.
419 @c This may still work, but hasn't been tested.
420
421 The object file is meant for input to the linker @code{ld}. It contains
422 assembled program code, information to help @code{ld} to integrate
423 the assembled program into a runnable file and (optionally) symbolic
424 information for the debugger.
425
426 @comment link above to some info file(s) like the description of a.out.
427 @comment don't forget to describe GNU info as well as Unix lossage.
428
429 @section Error and Warning Messages
430
431 @code{as} may write warnings and error messages to the standard
432 error file (usually your terminal). This should not happen when
433 @code{as} is run automatically by a compiler. Error messages are
434 meant for those few people who still write in assembly language.
435
436 Warnings report an assumption made so that @code{as} could keep
437 assembling a flawed program.
438
439 Errors report a grave problem that stops the assembly.
440
441 Warning messages have the format
442 @example
443 file_name:line_number:Warning Message Text
444 @end example
445 If a logical file name has been given (@xref{File}.) it is used for
446 the filename, otherwise the name of the current input file is used.
447 If a logical line number was given (@xref{Line}.) then it is used to
448 calculate the number printed, otherwise the actual line in the
449 current source file is printed. The message text is intended to be
450 self explanatory (In the grand Unix tradition).
451
452 Error messages have the format
453 @example
454 file_name:line_number:FATAL:Error Message Text
455 @end example
456 The file name and line number are derived as for warning
457 messages. The actual message text may be rather less explanatory
458 because many of them aren't supposed to happen.
459
460 @section Options
461 @subsection Work Faster: -f
462 @samp{-f} should only be used when assembling programs written by a
463 (trusted) compiler. @samp{-f} stops the assembler from pre-processing
464 the input file(s) before assembling them. @emph{Warning:} if the files
465 actually need to be pre-processed (if the contain comments, for
466 example), @code{as} will not work correctly if @samp{-f} is used.
467
468 @subsection Warn if difference tables altered: -k
469 @code{as} sometimes alters the code emitted for directives of the form
470 @samp{.word @var{sym1}-@var{sym2}}; @pxref{Word}.
471 You can use the @samp{-k} option if you want a warning issued when this
472 is done.
473
474 @subsection Include Local Labels: -L
475 For historical reasons, labels beginning with @samp{L} (upper case only)
476 are called @dfn{local labels}. Normally you don't see such labels when
477 debugging, because they are intended for the use of programs (like
478 compilers) that compose assembler programs, not for your notice.
479 Normally both @code{as} and @code{ld} discard such labels, so you don't
480 normally debug with them.
481
482 This option tells @code{as} to retain those @samp{L@dots{}} symbols
483 in the object file. Usually if you do this you also tell the linker
484 @code{ld} to preserve symbols whose names begin with @samp{L}.
485
486 @subsection Name the Object File: -o
487 There is always one object file output when you run @code{as}. By
488 default it has the name @file{a.out}. You use this option (which
489 takes exactly one filename) to give the object file a different name.
490
491 Whatever the object file is called, @code{as} will overwrite any
492 existing file of the same name.
493
494 @subsection Fold Data Segment into Text Segment: -R
495 @code{-R} tells @code{as} to write the object file as if all
496 data-segment data lives in the text segment. This is only done at
497 the very last moment: your binary data are the same, but data
498 segment parts are relocated differently. The data segment part of
499 your object file is zero bytes long because all it bytes are
500 appended to the text segment. (@xref{Segments}.)
501
502 When you specify code{-R} it would be possible to generate shorter
503 address displacements (because we don't have to cross between text and
504 data segment). We don't do this simply for compatibility with older
505 versions of @code{as}. @code{-R} may work this way in future.
506
507 @subsection Supress Warnings: -W
508 @code{as} should never give a warning or error message when
509 assembling compiler output. But programs written by people often
510 cause @code{as} to give a warning that a particular assumption was
511 made. All such warnings are directed to the standard error file.
512 If you use this option, no warnings are issued. This option only
513 affects the warning messages: it does not change any particular of how
514 @code{as} assembles your file. Errors, which stop the assembly, are
515 still reported.
516
517 @node Syntax, Segments, top, top
518 @chapter Syntax
519 This chapter describes the machine-independent syntax allowed in a
520 source file. @code{as} syntax is similar to what many other assemblers
521 use; it is inspired in BSD 4.2 assembler, except that @code{as} does not
522 assemble Vax bit-fields.
523
524 @section The Pre-processor
525 The pre-processor adjusts and removes extra whitespace. It leaves
526 one space or tab before the keywords on a line, and turns any other
527 whitespace on the line into a single space.
528
529 The pre-processor removes all comments, replacing them with a single
530 space (for /* @dots{} */ comments), or an appropriate number of
531 newlines.
532
533 The pre-processor converts character constants into the appropriate
534 numeric values.
535
536 This means that excess whitespace, comments, and character constants
537 cannot be used in the portions of the input text that are not
538 pre-processed.
539
540 If the first line of an input file is @code{#NO_APP} or the
541 @samp{-f} option is given, the input file will not be
542 pre-processed. Within such an input file, parts of the file can be
543 pre-processed by putting a line that says @code{#APP} before the
544 text that should be pre-processed, and putting a line that says
545 @code{#NO_APP} after them. This feature is mainly intend to support
546 asm statements in compilers whose output normally does not need to
547 be pre-processed.
548
549 @section Whitespace
550 @dfn{Whitespace} is one or more blanks or tabs, in any order.
551 Whitespace is used to separate symbols, and to make programs neater
552 for people to read. Unless within character constants
553 (@xref{Characters}.), any whitespace means the same as exactly one
554 space.
555
556 @section Comments
557 There are two ways of rendering comments to @code{as}. In both
558 cases the comment is equivalent to one space.
559
560 Anything from @samp{/*} through the next @samp{*/} is a comment.
561 This means you may not nest these comments.
562
563 @example
564 /*
565 The only way to include a newline ('\n') in a comment
566 is to use this sort of comment.
567 */
568
569 /* This sort of comment does not nest. */
570 @end example
571
572 Anything from the @dfn{line comment} character to the next newline
573 is considered a comment and is ignored. The line comment character is
574 @c if vax
575 @c @samp{#} on the Vax
576 @c @fi vax
577 @c if 680x0
578 @samp{|} on the 680x0. @xref{Machine Dependent}.
579 @c fi 680x0
580 @ignore
581 @if all-arch
582 On some machines there are two different
583 line comment characters. One will only begin a comment if it is the
584 first non-whitespace character on a line, while the other will
585 always begin a comment.
586 @fi all-arch
587 @end ignore
588
589 To be compatible with past assemblers a special interpretation is
590 given to lines that begin with @samp{#}. Following the @samp{#} an
591 absolute expression (@pxref{Expressions}) is expected: this will be
592 the logical line number of the @b{next} line. Then a string
593 (@xref{Strings}.) is allowed: if present it is a new logical file
594 name. The rest of the line, if any, should be whitespace.
595
596 If the first non-whitespace characters on the line are not numeric,
597 the line is ignored. (Just like a comment.)
598 @example
599 # This is an ordinary comment.
600 # 42-6 "new_file_name" # New logical file name
601 # This is logical line # 36.
602 @end example
603 This feature is deprecated, and may disappear from future versions
604 of @code{as}.
605
606 @section Symbols
607 A @dfn{symbol} is one or more characters chosen from the set of all
608 letters (both upper and lower case), digits and the three characters
609 @samp{_.$}. No symbol may begin with a digit. Case is
610 significant. There is no length limit: all characters are
611 significant. Symbols are delimited by characters not in that set,
612 or by begin/end-of-file. (@xref{Symbols}.)
613
614 @section Statements
615 A @dfn{statement} ends at a newline character (@samp{\n}) or at a
616 semicolon (@samp{;}). The newline or semicolon is considered part
617 of the preceding statement. Newlines and semicolons within
618 character constants are an exception: they don't end statements.
619 It is an error to end any statement with end-of-file: the last
620 character of any input file should be a newline.
621
622 You may write a statement on more than one line if you put a
623 backslash (@kbd{\}) immediately in front of any newlines within the
624 statement. When @code{as} reads a backslashed newline both
625 characters are ignored. You can even put backslashed newlines in
626 the middle of symbol names without changing the meaning of your
627 source program.
628
629 An empty statement is allowed, and may include whitespace. It is ignored.
630
631 A statement begins with zero or more labels, optionally followed by a
632 @dfn{key symbol} which determines what kind of statement it is. The key
633 symbol determines the syntax of the rest of the statement. If the
634 symbol begins with a dot (@t{.}) then the statement is an assembler
635 directive: typically valid for any computer. If the symbol begins with
636 a letter the statement is an assembly language @dfn{instruction}: it
637 will assemble into a machine language instruction. Different versions
638 of @code{as} for different computers will recognize different
639 instructions. In fact, the same symbol may represent a different
640 instruction in a different computer's assembly language.
641
642 A label is a symbol immediately followed by a colon (@code{:}).
643 Whitespace before a label or after a colon is permitted, but you may not
644 have whitespace between a label's symbol and its colon. @xref{Labels}.
645
646 @example
647 label: .directive followed by something
648 another$label: # This is an empty statement.
649 instruction operand_1, operand_2, @dots{}
650 @end example
651
652 @section Constants
653 A constant is a number, written so that its value is known by
654 inspection, without knowing any context. Like this:
655 @example
656 .byte 74, 0112, 092, 0x4A, 0X4a, 'J, '\J # All the same value.
657 .ascii "Ring the bell\7" # A string constant.
658 .octa 0x123456789abcdef0123456789ABCDEF0 # A bignum.
659 .float 0f-314159265358979323846264338327\
660 95028841971.693993751E-40 # - pi, a flonum.
661 @end example
662
663 @node Characters, Strings, , Syntax
664 @subsection Character Constants
665 There are two kinds of character constants. A @dfn{character} stands
666 for one character in one byte and its value may be used in
667 numeric expressions. String constants (properly called string
668 @emph{literals}) are potentially many bytes and their values may not be
669 used in arithmetic expressions.
670
671 @node Strings, , Characters, Syntax
672 @subsubsection Strings
673 A @dfn{string} is written between double-quotes. It may contain
674 double-quotes or null characters. The way to get special characters
675 into a string is to @dfn{escape} these characters: precede them with
676 a backslash (@code{\}) character. For example @samp{\\} represents
677 one backslash: the first @code{\} is an escape which tells
678 @code{as} to interpret the second character literally as a backslash
679 (which prevents @code{as} from recognizing the second @code{\} as an
680 escape character). The complete list of escapes follows.
681
682 @table @kbd
683 @item \EOF
684 A @kbd{\} followed by end-of-file: erroneous. It is treated just
685 like an end-of-file without a preceding backslash.
686 @c @item \a
687 @c Mnemonic for ACKnowledge; for ASCII this is octal code 007.
688 @item \b
689 Mnemonic for backspace; for ASCII this is octal code 010.
690 @c @item \e
691 @c Mnemonic for EOText; for ASCII this is octal code 004.
692 @item \f
693 Mnemonic for FormFeed; for ASCII this is octal code 014.
694 @item \n
695 Mnemonic for newline; for ASCII this is octal code 012.
696 @c @item \p
697 @c Mnemonic for prefix; for ASCII this is octal code 033, usually known as @code{escape}.
698 @item \r
699 Mnemonic for carriage-Return; for ASCII this is octal code 015.
700 @c @item \s
701 @c Mnemonic for space; for ASCII this is octal code 040. Included for compliance with
702 @c other assemblers.
703 @item \t
704 Mnemonic for horizontal Tab; for ASCII this is octal code 011.
705 @c @item \v
706 @c Mnemonic for Vertical tab; for ASCII this is octal code 013.
707 @c @item \x @var{digit} @var{digit} @var{digit}
708 @c A hexadecimal character code. The numeric code is 3 hexadecimal digits.
709 @item \ @var{digit} @var{digit} @var{digit}
710 An octal character code. The numeric code is 3 octal digits.
711 For compatibility with other Unix systems, 8 and 9 are accepted as digits:
712 for example, @code{\008} has the value 010, and @code{\009} the value 011.
713 @item \\
714 Represents one @samp{\} character.
715 @c @item \'
716 @c Represents one @samp{'} (accent acute) character.
717 @c This is needed in single character literals
718 @c (@xref{Characters}.) to represent
719 @c a @samp{'}.
720 @item \"
721 Represents one @samp{"} character. Needed in strings to represent
722 this character, because an unescaped @samp{"} would end the string.
723 @item \ @var{anything-else}
724 Any other character when escaped by @kbd{\} will give a warning, but
725 assemble as if the @samp{\} was not present. The idea is that if
726 you used an escape sequence you clearly didn't want the literal
727 interpretation of the following character. However @code{as} has no
728 other interpretation, so @code{as} knows it is giving you the wrong
729 code and warns you of the fact.
730 @end table
731
732 Which characters are escapable, and what those escapes represent,
733 varies widely among assemblers. The current set is what we think
734 BSD 4.2 @code{as} recognizes, and is a subset of what most C
735 compilers recognize. If you are in doubt, don't use an escape
736 sequence.
737
738 @subsubsection Characters
739 A single character may be written as a single quote immediately
740 followed by that character. The same escapes apply to characters as
741 to strings. So if you want to write the character backslash, you
742 must write @kbd{'\\} where the first @code{\} escapes the second
743 @code{\}. As you can see, the quote is an acute accent, not an
744 grave accent. A newline (or semicolon @samp{;}) immediately
745 following an accent acute is taken as a literal character and does
746 not count as the end of a statement. The value of a character
747 constant in a numeric expression is the machine's byte-wide code for
748 that character. @code{as} assumes your character code is ASCII: @kbd{'A}
749 means 65, @kbd{'B} means 66, and so on.
750
751 @subsection Number Constants
752 @code{as} distinguishes 3 flavors of numbers according to how they
753 are stored in the target machine. @emph{Integers} are numbers that
754 would fit into an @code{int} in the C language. @emph{Bignums} are
755 integers, but they are stored in a more than 32 bits. @emph{Flonums}
756 are floating point numbers, described below.
757
758 @subsubsection Integers
759 An octal integer is @samp{0} followed by zero or more of the octal
760 digits (@samp{01234567}).
761
762 A decimal integer starts with a non-zero digit followed by zero or
763 more digits (@samp{0123456789}).
764
765 A hexadecimal integer is @samp{0x} or @samp{0X} followed by one or
766 more hexadecimal digits chosen from @samp{0123456789abcdefABCDEF}.
767
768 Integers have the usual values. To denote a negative integer, use
769 the unary operator @samp{-} discussed under expressions
770 (@xref{Unops}.).
771
772 @subsubsection Bignums
773 A @dfn{bignum} has the same syntax and semantics as an integer
774 except that the number (or its negative) takes more than 32 bits to
775 represent in binary. The distinction is made because in some places
776 integers are permitted while bignums are not.
777
778 @subsubsection Flonums
779 A @dfn{flonum} represents a floating point number. The translation
780 is complex: a decimal floating point number from the text is
781 converted by @code{as} to a generic binary floating point number of
782 more than sufficient precision. This generic floating point number
783 is converted to the particular computer's floating point format(s)
784 by a portion of @code{as} specialized to that computer.
785
786 A flonum is written by writing (in order)
787 @itemize @bullet
788 @item
789 The digit @samp{0}.
790 @item
791 A letter, to tell @code{as} the rest of the number is a flonum.
792 @kbd{e}
793 is recommended. Case is not important.
794 (Any otherwise illegal letter will work here,
795 but that might be changed. Vax BSD 4.2 assembler
796 seems to allow any of @samp{defghDEFGH}.)
797 @item
798 An optional sign: either @samp{+} or @samp{-}.
799 @item
800 An optional @dfn{integer part}: zero or more decimal digits.
801 @item
802 An optional @dfn{fraction part}: @samp{.} followed by zero
803 or more decimal digits.
804 @item
805 An optional exponent, consisting of:
806 @itemize @bullet
807 @item
808 A letter; the exact significance varies according to
809 the computer that executes the program. @code{as}
810 accepts any letter for now. Case is not important.
811 @item
812 Optional sign: either @samp{+} or @samp{-}.
813 @item
814 One or more decimal digits.
815 @end itemize
816 @end itemize
817
818 At least one of @var{integer part} or @var{fraction part} must be
819 present. The floating point number has the usual base-10 value.
820
821 @code{as} does all processing using integers. Flonums are computed
822 independently of any floating point hardware in the computer running
823 @code{as}.
824
825 @node Segments, Symbols, Syntax, top
826 @chapter Segments and Relocation
827 Roughly, a segment is a range of addresses, with no gaps; all data
828 ``in'' those addresses is treated the same for some particular purpose.
829 For example there may be a ``read only'' segment.
830
831 The linker @code{ld} reads many object files (partial programs) and
832 combines their contents to form a runnable program. When @code{as}
833 emits an object file, the partial program is assumed to start at address
834 0. @code{ld} will assign the final addresses the partial program
835 occupies, so that different partial programs don't overlap. This is
836 actually an over-simplification, but it will suffice to explain how
837 @code{as} uses segments.
838
839 @code{ld} moves blocks of bytes of your program to their run-time
840 addresses. These blocks slide to their run-time addresses as rigid
841 units; their length does not change and neither does the order of bytes
842 within them. Such a rigid unit is called a @emph{segment}. Assigning
843 run-time addresses to segments is called @dfn{relocation}. It includes
844 the task of adjusting mentions of object-file addresses so they refer to
845 the proper run-time addresses.
846
847 An object file written by @code{as} has three segments, any of which
848 may be empty. These are named @emph{text}, @emph{data} and @emph{bss}
849 segments. Within the object file, the text segment starts at
850 address 0, the data segment follows, and the bss segment follows the
851 data segment.
852
853 To let @code{ld} know which data will change when the segments are
854 relocated, and how to change that data, @code{as} also writes to the
855 object file details of the relocation needed. To perform relocation
856 @code{ld} must know, each time an address in the object
857 file is mentioned:
858 @itemize @bullet
859 @item
860 Where in the object file is the beginning of this reference to
861 an address?
862 @item
863 How long (in bytes) is this reference?
864 @item
865 Which segment does the address refer to?
866 What is the numeric value of (@var{address} @t{-}
867 @var{start-address of segment})?
868 @item
869 Is the reference to an address ``Program-counter relative''?
870 @end itemize
871
872 In fact, every address @code{as} ever uses is expressed as
873 (@var{segment} @t{+} @var{offset into segment}). Further, every
874 expression @code{as} computes is of this segmented nature.
875 @dfn{Absolute expression} means an expression with segment ``absolute''
876 (@pxref{ld Segments}). A @dfn{pass1 expression} means an expression with
877 segment ``pass1'' (@pxref{as Segments}). In this manual we use the
878 notation @{@var{segname} @var{N}@} to mean ``offset @var{N} into segment
879 @var{segname}''.
880
881 Apart from text, data and bss segments you need to know about the
882 @dfn{absolute} segment. When @code{ld} mixes partial programs,
883 addresses in the absolute segment remain unchanged. That is, address
884 @{absolute 0@} is ``relocated'' to run-time address 0 by @code{ld}.
885 Although two partial programs' data segments will not overlap addresses
886 after linking, @b{by definition} their absolute segments will overlap.
887 Address @{absolute 239@} in one partial program will always be the same
888 address when the program is running as address @{absolute 239@} in any
889 other partial program.
890
891 The idea of segments is extended to the @dfn{undefined} segment. Any
892 address whose segment is unknown at assembly time is by definition
893 rendered @{undefined @var{U}@}---where @var{U} will be filled in later.
894 Since numbers are always defined, the only way to generate an undefined
895 address is to mention an undefined symbol. A reference to a named
896 common block would be such a symbol: its value is unknown at assembly
897 time so it has segment @emph{undefined}.
898
899 By analogy the word @emph{segment} is to describe groups of segments in
900 the linked program. @code{ld} puts all partial programs' text
901 segments in contiguous addresses in the linked program. It is
902 customary to refer to the @emph{text segment} of a program, meaning all
903 the addresses of all partial program's text segments. Likewise for
904 data and bss segments.
905
906 @section Segments
907 Some segments are manipulated by @code{ld}; others are invented for
908 use of @code{as} and have no meaning except during assembly.
909
910 @node ld Segments, , ,
911 @subsection ld Segments
912 @code{ld} deals with just 5 kinds of segments, summarized below.
913
914 @table @b
915
916 @item text segment
917 @itemx data segment
918 These segments hold your program. @code{as} and @code{ld} treat them as
919 separate but equal segments. Anything you can say of one segment is
920 true of the other. When the program is running however it is customary
921 for the text segment to be unalterable, and often shared among
922 processes: it will contain instructions, constants and the like. The
923 data segment of a running program is usually alterable: for example, C
924 variables would be stored in the data segment.
925
926 @item bss segment
927 This segment contains zeroed bytes when your program begins running. It
928 is used to hold unitialized variables or common storage. The length of
929 each partial program's bss segment is important, but because it starts
930 out containing zeroed bytes there is no need to store explicit zero
931 bytes in the object file. The Bss segment was invented to eliminate
932 those explicit zeros from object files.
933
934 @item absolute segment
935 Address 0 of this segment is always ``relocated'' to runtime address 0.
936 This is useful if you want to refer to an address that @code{ld} must
937 not change when relocating. In this sense we speak of absolute
938 addresses being ``unrelocatable'': they don't change during relocation.
939
940 @item undefined segment
941 This ``segment'' is a catch-all for address references to objects not in
942 the preceding segments.
943 @c FIXME: ref to some other doc on obj-file formats could go here.
944
945 @end table
946
947 An idealized example of the 3 relocatable segments follows. Memory
948 addresses are on the horizontal axis.
949
950 @example
951 +-----+----+--+
952 partial program # 1: |ttttt|dddd|00|
953 +-----+----+--+
954
955 text data bss
956 seg. seg. seg.
957
958 +---+---+---+
959 partial program # 2: |TTT|DDD|000|
960 +---+---+---+
961
962 +--+---+-----+--+----+---+-----+~~
963 linked program: | |TTT|ttttt| |dddd|DDD|00000|
964 +--+---+-----+--+----+---+-----+~~
965
966 addresses: 0 @dots{}
967 @end example
968
969 @node as Segments, , ,
970 @subsection as Internal Segments
971 These segments are invented for the internal use of @code{as}. They
972 have no meaning at run-time. You don't need to know about these
973 segments except that they might be mentioned in @code{as}' warning
974 messages. These segments are invented to permit the value of every
975 expression in your assembly language program to be a segmented
976 address.
977
978 @table @b
979 @item absent segment
980 An expression was expected and none was
981 found.
982
983 @item goof segment
984 An internal assembler logic error has been
985 found. This means there is a bug in the assembler.
986
987 @item grand segment
988 A @dfn{grand number} is a bignum or a flonum, but not an integer. If a
989 number can't be written as a C @code{int} constant, it is a grand
990 number. @code{as} has to remember that a flonum or a bignum does not
991 fit into 32 bits, and cannot be an argument (@pxref{Argument}) in an
992 expression: this is done by making a flonum or bignum be in segment
993 ``grand''. This is purely for internal @code{as} convenience; grand
994 segment behaves similarly to absolute segment.
995
996 @item pass1 segment
997 The expression was impossible to evaluate in the first pass. The
998 assembler will attempt a second pass (second reading of the source) to
999 evaluate the expression. Your expression mentioned an undefined symbol
1000 in a way that defies the one-pass (segment + offset in segment) assembly
1001 process. No compiler need emit such an expression.
1002
1003 The second pass is currently not implemented. @code{as} will abort with
1004 an error message if one is required.
1005
1006 @item difference segment
1007 As an assist to the C compiler, expressions of the forms
1008 @example
1009 @var{(undefined symbol)} - @var{(expression)}
1010 @var{(something)} - @var{(undefined symbol)}
1011 @var{(undefined symbol)} - @var{(undefined symbol)}
1012 @end example
1013 are permitted, and belong to the ``difference'' segment. @code{as}
1014 re-evaluates such expressions after the source file has been read and
1015 the symbol table built. If by that time there are no undefined symbols
1016 in the expression then the expression assumes a new segment. The
1017 intention is to permit statements like
1018 @samp{.word label - base_of_table}
1019 to be assembled in one pass where both @code{label} and
1020 @code{base_of_table} are undefined. This is useful for compiling C and
1021 Algol switch statements, Pascal case statements, FORTRAN computed goto
1022 statements and the like.
1023 @end table
1024
1025 @section Sub-Segments
1026 Assembled bytes fall into two segments: text and data. Because you
1027 may have groups of text or data that you want to end up near to each
1028 other in the object file, @code{as}, allows you to use
1029 @dfn{subsegments}. Within each segment, there can be numbered
1030 subsegments with values from 0 to 8192. Objects assembled into the
1031 same subsegment will be grouped with other objects in the same
1032 subsegment when they are all put into the object file. For example,
1033 a compiler might want to store constants in the text segment, but
1034 might not want to have them interspersed with the program being
1035 assembled. In this case, the compiler could issue a @code{text 0}
1036 before each section of code being output, and a @code{text 1} before
1037 each group of constants being output.
1038
1039 Subsegments are optional. If you don't used subsegments, everything
1040 will be stored in subsegment number zero.
1041
1042 Each subsegment is zero-padded up to a multiple of four bytes.
1043 (Subsegments may be padded a different amount on different flavors
1044 of @code{as}.) Subsegments appear in your object file in numeric
1045 order, lowest numbered to highest. (All this to be compatible with
1046 other people's assemblers.) The object file, @code{ld} @emph{etc.}
1047 have no concept of subsegments. They just see all your text
1048 subsegments as a text segment, and all your data subsegments as a
1049 data segment.
1050
1051 To specify which subsegment you want subsequent statements assembled
1052 into, use a @samp{.text @var{expression}} or a @samp{.data
1053 @var{expression}} statement. @var{Expression} should be an absolute
1054 expression. (@xref{Expressions}.) If you just say @samp{.text}
1055 then @samp{.text 0} is assumed. Likewise @samp{.data} means
1056 @samp{.data 0}. Assembly begins in @code{text 0}.
1057 For instance:
1058 @example
1059 .text 0 # The default subsegment is text 0 anyway.
1060 .ascii "This lives in the first text subsegment. *"
1061 .text 1
1062 .ascii "But this lives in the second text subsegment."
1063 .data 0
1064 .ascii "This lives in the data segment,"
1065 .ascii "in the first data subsegment."
1066 .text 0
1067 .ascii "This lives in the first text segment,"
1068 .ascii "immediately following the asterisk (*)."
1069 @end example
1070
1071 Each segment has a @dfn{location counter} incremented by one for
1072 every byte assembled into that segment. Because subsegments are
1073 merely a convenience restricted to @code{as} there is no concept of
1074 a subsegment location counter. There is no way to directly
1075 manipulate a location counter. The location counter of the segment
1076 that statements are being assembled into is said to be the
1077 @dfn{active} location counter.
1078
1079 @section Bss Segment
1080 The @code{bss} segment is used for local common variable storage.
1081 You may allocate address space in the @code{bss} segment, but you may
1082 not dictate data to load into it before your program executes. When
1083 your program starts running, all the contents of the @code{bss}
1084 segment are zeroed bytes.
1085
1086 Addresses in the bss segment are allocated with special directives;
1087 you may not assemble anything directly into the bss segment. Hence
1088 there are no bss subsegments. @xref{Comm}; @pxref{Lcomm}.
1089
1090 @node Symbols, Expressions, Segments, top
1091 @chapter Symbols
1092 Symbols are a central concept: the programmer uses symbols to name
1093 things, the linker uses symbols to link, and the debugger uses symbols
1094 to debug.
1095
1096 @code{as} does not place symbols in the object file in the same order
1097 they were declared. This may break some debuggers.
1098
1099 @node Labels, , , Symbols
1100 @section Labels
1101 A @dfn{label} is written as a symbol immediately followed by a colon
1102 (@samp{:}). The symbol then represents the current value of the
1103 active location counter, and is, for example, a suitable instruction
1104 operand. You are warned if you use the same symbol to represent two
1105 different locations: the first definition overrides any other
1106 definitions.
1107
1108 @section Giving Symbols Other Values
1109 A symbol can be given an arbitrary value by writing a symbol followed
1110 by an equals sign (@samp{=}) followed by an expression
1111 (@pxref{Expressions}). This is equivalent to using the @code{.set}
1112 directive. (@xref{Set}.)
1113
1114 @section Symbol Names
1115 Symbol names begin with a letter or with one of @samp{$._}. That
1116 character may be followed by any string of digits, letters,
1117 underscores and dollar signs. Case of letters is significant:
1118 @code{foo} is a different symbol name than @code{Foo}.
1119
1120 Each symbol has exactly one name. Each name in an assembly language
1121 program refers to exactly one symbol. You may use that symbol name any
1122 number of times in a program.
1123
1124 @subsection Local Symbol Names
1125
1126 Local symbols help compilers and programmers use names temporarily.
1127 There are ten @dfn{local} symbol names, which are re-used throughout
1128 the program. Their names are @samp{0} @samp{1} @dots{} @samp{9}.
1129 To define a local symbol, write a label of the form
1130 @var{digit}@t{:}. To refer to the most recent previous definition
1131 of that symbol write @var{digit}@t{b}, using the same digit as when
1132 you defined the label. To refer to the next definition of a local
1133 label, write @var{digit}@t{f} where @var{digit} gives you a choice
1134 of 10 forward references. The @samp{b} stands for ``backwards'' and
1135 the @samp{f} stands for ``forwards''.
1136
1137 Local symbols are not used by the current GNU C compiler.
1138
1139 There is no restriction on how you can use these labels, but
1140 remember that at any point in the assembly you can refer to at most
1141 10 prior local labels and to at most 10 forward local labels.
1142
1143 Local symbol names are only a notation device. They are immediately
1144 transformed into more conventional symbol names before the assembler
1145 uses them. The symbol names stored in the symbol table, appearing in
1146 error messages and optionally emitted to the object file have these
1147 parts:
1148
1149 @table @code
1150 @item L
1151 All local labels begin with @samp{L}. Normally both @code{as} and
1152 @code{ld} forget symbols that start with @samp{L}. These labels are
1153 used for symbols you are never intended to see. If you give the
1154 @samp{-L} option then @code{as} will retain these symbols in the
1155 object file. By instructing @code{ld} to also retain these symbols,
1156 you may use them in debugging.
1157
1158 @item @var{digit}
1159 If the label is written @samp{0:} then the digit is @samp{0}.
1160 If the label is written @samp{1:} then the digit is @samp{1}.
1161 And so on up through @samp{9:}.
1162
1163 @item @ctrl{A}
1164 This unusual character is included so you don't accidentally invent
1165 a symbol of the same name. The character has ASCII value
1166 @samp{\001}.
1167
1168 @item @emph{ordinal number}
1169 This is a serial number to keep the labels distinct. The first
1170 @samp{0:} gets the number @samp{1}; The 15th @samp{0:} gets the
1171 number @samp{15}; @emph{etc.}. Likewise for the other labels @samp{1:}
1172 through @samp{9:}.
1173 @end table
1174
1175 For instance, the first @code{1:} is named @code{L1@ctrl{A}1}, the 44th
1176 @code{3:} is named @code{L3@ctrl{A}44}.
1177
1178 @section The Special Dot Symbol
1179
1180 The special symbol @code{.} refers to the current address that
1181 @code{as} is assembling into. Thus, the expression @samp{melvin:
1182 .long .} will cause @var{melvin} to contain its own address.
1183 Assigning a value to @code{.} is treated the same as a @code{.org}
1184 directive. Thus, the expression @samp{.=.+4} is the same as saying
1185 @samp{.space 4}.
1186
1187 @section Symbol Attributes
1188 Every symbol has these attributes: Value, Type, Descriptor, and ``Other''.
1189 @c if internals
1190 @c The detailed definitions are in <a.out.h>.
1191 @c fi internals
1192
1193 If you use a symbol without defining it, @code{as} assumes zero for
1194 all these attributes, and probably won't warn you. This makes the
1195 symbol an externally defined symbol, which is generally what you
1196 would want.
1197
1198 @subsection Value
1199 The value of a symbol is (usually) 32 bits, the size of one GNU C
1200 @code{int}. For a symbol which labels a location in the
1201 @code{text}, @code{data}, @code{bss} or @code{Absolute} segments the
1202 value is the number of addresses from the start of that segment to
1203 the label. Naturally for @code{text} @code{data} and @code{bss}
1204 segments the value of a symbol changes as @code{ld} changes segment
1205 base addresses during linking. @code{absolute} symbols' values do
1206 not change during linking: that is why they are called absolute.
1207
1208 The value of an undefined symbol is treated in a special way. If it
1209 is 0 then the symbol is not defined in this assembler source
1210 program, and @code{ld} will try to determine its value from other
1211 programs it is linked with. You make this kind of symbol simply by
1212 mentioning a symbol name without defining it. A non-zero value
1213 represents a @code{.comm} common declaration. The value is how much
1214 common storage to reserve, in bytes (@emph{i.e.} addresses). The
1215 symbol refers to the first address of the allocated storage.
1216
1217 @subsection Type
1218 The type attribute of a symbol is 8 bits encoded in a devious way.
1219 We kept this coding standard for compatibility with older operating
1220 systems.
1221
1222 @example
1223
1224 7 6 5 4 3 2 1 0 bit numbers
1225 +-----+-----+-----+-----+-----+-----+-----+-----+
1226 | | | |
1227 | N_STAB bits | N_TYPE bits |N_EXT|
1228 | | | bit |
1229 +-----+-----+-----+-----+-----+-----+-----+-----+
1230
1231 n_type byte
1232 @end example
1233
1234 @subsubsection N_EXT bit
1235 This bit is set if @code{ld} might need to use the symbol's type bits
1236 and value. If this bit is off, then @code{ld} can ignore the
1237 symbol while linking. It is set in two cases. If the symbol is
1238 undefined, then @code{ld} is expected to find the symbol's value
1239 elsewhere in another program module. Otherwise the symbol has the
1240 value given, but this symbol name and value are revealed to any other
1241 programs linked in the same executable program. This second use of
1242 the @code{N_EXT} bit is most often done by a @code{.globl} statement.
1243
1244 @subsubsection N_TYPE bits
1245 These establish the symbol's ``type'', which is mainly a relocation
1246 concept. Common values are detailed in the manual describing the
1247 executable file format.
1248
1249 @subsubsection N_STAB bits
1250 Common values for these bits are described in the manual on the
1251 executable file format.
1252
1253 @subsection Descriptor
1254 This is an arbitrary 16-bit value. You may establish a symbol's
1255 descriptor value by using a @code{.desc} statement (@pxref{Desc}).
1256 A descriptor value means nothing to @code{as}.
1257
1258 @subsection Other
1259 This is an arbitrary 8-bit value. It means nothing to @code{as}.
1260
1261 @node Expressions, Pseudo Ops, Symbols, top
1262 @chapter Expressions
1263 An @dfn{expression} specifies an address or numeric value.
1264 Whitespace may precede and/or follow an expression.
1265
1266 @section Empty Expressions
1267 An empty expression has no value: it is just whitespace or null.
1268 Wherever an absolute expression is required, you may omit the
1269 expression and @code{as} will assume a value of (absolute) 0. This
1270 is compatible with other assemblers.
1271
1272 @section Integer Expressions
1273 An @dfn{integer expression} is one or more @emph{arguments} delimited
1274 by @emph{operators}.
1275
1276 @node Argument, Unops, , Expressions
1277 @subsection Arguments
1278
1279 @dfn{Arguments} are symbols, numbers or subexpressions. In other
1280 contexts arguments are sometimes called ``arithmetic operands''. In
1281 this manual, to avoid confusing them with the ``instruction operands'' of
1282 the machine language, we use the term ``argument'' to refer to parts of
1283 expressions only, and the word ``operand'' to refer only to machine
1284 instruction operands.
1285
1286 Symbols are evaluated to yield @{@var{segment} @var{value}@} where
1287 @var{segment} is one of @b{text}, @b{data}, @b{bss}, @b{absolute},
1288 or @b{undefined}. @var{value} is a signed, 2's complement 32 bit
1289 integer.
1290
1291 Numbers are usually integers.
1292
1293 A number can be a flonum or bignum. In this case, you are warned
1294 that only the low order 32 bits are used, and @code{as} pretends
1295 these 32 bits are an integer. You may write integer-manipulating
1296 instructions that act on exotic constants, compatible with other
1297 assemblers.
1298
1299 Subexpressions are a left parenthesis (@t{(}) followed by an integer
1300 expression followed by a right parenthesis (@t{)}), or a unary
1301 operator followed by an argument.
1302
1303 @subsection Operators
1304 @dfn{Operators} are arithmetic functions, like @t{+} or @t{%}. Unary
1305 operators are followed by an argument. Binary operators appear
1306 between their arguments. Operators may be preceded and/or followed by
1307 whitespace.
1308
1309 @subsection Unary Operators
1310 @node Unops, , Argument, Expressions
1311 @code{as} has the following @dfn{unary operators}. They each take
1312 one argument, which must be absolute.
1313 @table @t
1314 @item -
1315 Hyphen. @dfn{Negation}. Two's complement negation.
1316 @item ~
1317 Tilde. @dfn{Complementation}. Bitwise not.
1318 @end table
1319
1320 @subsection Binary Operators
1321
1322 @dfn{Binary operators} are infix. Operators have precedence, but
1323 operators with equal precedence are performed left to right.
1324 Apart from @code{+} or @code{-}, both arguments must be absolute, and
1325 the result is absolute.
1326
1327 @enumerate
1328
1329 @item
1330 Highest Precedence
1331 @table @code
1332 @item *
1333 @dfn{Multiplication}.
1334 @item /
1335 @dfn{Division}. Truncation is the same as the C operator @samp{/}
1336 @item %
1337 @dfn{Remainder}.
1338 @item <
1339 @itemx <<
1340 @dfn{Shift Left}. Same as the C operator @samp{<<}
1341 @item >
1342 @itemx >>
1343 @dfn{Shift Right}. Same as the C operator @samp{>>}
1344 @end table
1345
1346 @item
1347 Intermediate precedence
1348 @table @code
1349 @item |
1350 @dfn{Bitwise Inclusive Or}.
1351 @item &
1352 @dfn{Bitwise And}.
1353 @item ^
1354 @dfn{Bitwise Exclusive Or}.
1355 @item !
1356 @dfn{Bitwise Or Not}.
1357 @end table
1358
1359 @item
1360 Lowest Precedence
1361 @table @code
1362 @item +
1363 @dfn{Addition}. If either argument is absolute, the result
1364 has the segment of the other argument.
1365 If either argument is pass1 or undefined, the result is pass1.
1366 Otherwise @code{+} is illegal.
1367 @item -
1368 @dfn{Subtraction}. If the right argument is absolute, the
1369 result has the segment of the left argument.
1370 If either argument is pass1 the result is pass1.
1371 If either argument is undefined the result is difference segment.
1372 If both arguments are in the same segment, the result is absolute---provided
1373 that segment is one of @b{text}, @b{data} or @b{bss}.
1374 Otherwise @code{-} is illegal.
1375 @end table
1376 @end enumerate
1377
1378 The sense of the rule for @code{+} is that it's only meaningful to add
1379 the @emph{offsets} in an address; you can only have a defined segment in
1380 one of the two arguments.
1381
1382 Similarly, you can't subtract quantities from two different segments.
1383
1384 @node Pseudo Ops, Machine Dependent, Expressions, top
1385 @chapter Assembler Directives
1386 @menu
1387 * Abort:: The Abort directive causes as to abort
1388 * Align:: Pad the location counter to a power of 2
1389 * Ascii:: Fill memory with bytes of ASCII characters
1390 * Asciz:: Fill memory with bytes of ASCII characters followed
1391 by a null.
1392 * Byte:: Fill memory with 8-bit integers
1393 * Comm:: Reserve public space in the BSS segment
1394 * Data:: Change to the data segment
1395 * Desc:: Set the n_desc of a symbol
1396 * Double:: Fill memory with double-precision floating-point numbers
1397 * File:: Set the logical file name
1398 * Fill:: Fill memory with repeated values
1399 * Float:: Fill memory with single-precision floating-point numbers
1400 * Global:: Make a symbol visible to the linker
1401 * Int:: Fill memory with 32-bit integers
1402 * Lcomm:: Reserve private space in the BSS segment
1403 * Line:: Set the logical line number
1404 * Long:: Fill memory with 32-bit integers
1405 * Lsym:: Create a local symbol
1406 * Octa:: Fill memory with 128-bit integers
1407 * Org:: Change the location counter
1408 * Quad:: Fill memory with 64-bit integers
1409 * Set:: Set the value of a symbol
1410 * Short:: Fill memory with 16-bit integers
1411 * Space:: Fill memory with a repeated value
1412 * Stab:: Store debugging information
1413 * Text:: Change to the text segment
1414 * Word:: Fill memory with 16-bit integers
1415 @end menu
1416
1417 All assembler directives have names that begin with a period (@samp{.}).
1418 The rest of the name is letters: their case does not matter.
1419
1420 @node Abort, Align, Pseudo Ops, Pseudo Ops
1421 @section .abort
1422 This directive stops the assembly immediately. It is for
1423 compatibility with other assemblers. The original idea was that the
1424 assembler program would be piped into the assembler. If the sender
1425 of a program quit, it could use this directive tells @code{as} to
1426 quit also. One day @code{.abort} will not be supported.
1427
1428 @node Align, Ascii, Abort, Pseudo Ops
1429 @section .align @var{absolute-expression} , @var{absolute-expression}
1430 Pad the location counter (in the current subsegment) to a word,
1431 longword or whatever boundary. The first expression is the number
1432 of low-order zero bits the location counter will have after
1433 advancement. For example @samp{.align 3} will advance the location
1434 counter until it a multiple of 8. If the location counter is
1435 already a multiple of 8, no change is needed.
1436
1437 The second expression gives the value to be stored in the padding
1438 bytes. It (and the comma) may be omitted. If it is omitted, the
1439 padding bytes are zero.
1440
1441 @node Ascii, Asciz, Align, Pseudo Ops
1442 @section .ascii @var{strings}
1443 @code{.ascii} expects zero or more string literals (@pxref{Strings})
1444 separated by commas. It assembles each string (with no automatic
1445 trailing zero byte) into consecutive addresses.
1446
1447 @node Asciz, Byte, Ascii, Pseudo Ops
1448 @section .asciz @var{strings}
1449 @code{.asciz} is just like @code{.ascii}, but each string is followed by a zero byte.
1450 The ``z'' in @samp{.asciz} stands for ``zero''.
1451
1452 @node Byte, Comm, Asciz, Pseudo Ops
1453 @section .byte @var{expressions}
1454
1455 @code{.byte} expects zero or more expressions, separated by commas.
1456 Each expression is assembled into the next byte.
1457
1458 @node Comm, Data, Byte, Pseudo Ops
1459 @section .comm @var{symbol} , @var{length}
1460 @code{.comm} declares a named common area in the bss segment. Normally
1461 @code{ld} reserves memory addresses for it during linking, so no partial
1462 program defines the location of the symbol. Use @code{.comm} to tell
1463 @code{ld} that it must be at least @var{length} bytes long. @code{ld}
1464 will allocate space for each @code{.comm} symbol that is at least as
1465 long as the longest @code{.comm} request in any of the partial programs
1466 linked. @var{length} is an absolute expression.
1467
1468 @node Data, Desc, Comm, Pseudo Ops
1469 @section .data @var{subsegment}
1470 @code{.data} tells @code{as} to assemble the following statements onto the
1471 end of the data subsegment numbered @var{subsegment} (which is an
1472 absolute expression). If @var{subsegment} is omitted, it defaults
1473 to zero.
1474
1475 @node Desc, Double, Data, Pseudo Ops
1476 @section .desc @var{symbol}, @var{absolute-expression}
1477 This directive sets @code{n_desc} of the symbol to the low 16 bits of
1478 @var{absolute-expression}.
1479
1480 @node Double, File, Desc, Pseudo Ops
1481 @section .double @var{flonums}
1482 @code{.double} expects zero or more flonums, separated by commas. It assembles
1483 floating point numbers. The exact kind of floating point numbers
1484 emitted depends on how @code{as} is configured. @xref{Machine Dependent}.
1485
1486 @node File, Fill, Double, Pseudo Ops
1487 @section .file @var{string}
1488 @code{.file} tells @code{as} that we are about to start a new logical
1489 file. @var{String} is the new file name. An empty file name
1490 is permitted, but you must still give the quotes: @code{""}. This
1491 statement may go away in future: it is only recognized to
1492 be compatible with old @code{as} programs.
1493
1494 @node Fill, Float, File, Pseudo Ops
1495 @section .fill @var{repeat} , @var{size} , @var{value}
1496 @var{result}, @var{size} and @var{value} are absolute expressions.
1497 This emits @var{repeat} copies of @var{size} bytes. @var{Repeat}
1498 may be zero or more. @var{Size} may be zero or more, but if it is
1499 more than 8, then it is deemed to have the value 8, compatible with
1500 other people's assemblers. The contents of each @var{repeat} bytes
1501 is taken from an 8-byte number. The highest order 4 bytes are
1502 zero. The lowest order 4 bytes are @var{value} rendered in the
1503 byte-order of an integer on the computer @code{as} is assembling for.
1504 Each @var{size} bytes in a repetition is taken from the lowest order
1505 @var{size} bytes of this number. Again, this bizarre behavior is
1506 compatible with other people's assemblers.
1507
1508 @var{Size} and @var{value} are optional.
1509 If the second comma and @var{value} are absent, @var{value} is
1510 assumed zero. If the first comma and following tokens are absent,
1511 @var{size} is assumed to be 1.
1512
1513 @node Float, Global, Fill, Pseudo Ops
1514 @section .float @var{flonums}
1515 This directive assembles zero or more flonums, separated by commas.
1516 The exact kind of floating point numbers emitted depends on how
1517 @code{as} is configured. @xref{Machine Dependent}.
1518
1519 @node Global, Int, Float, Pseudo Ops
1520 @section .global @var{symbol}
1521 @code{.global} makes the symbol visible to @code{ld}. If you define
1522 @var{symbol} in your partial program, its value is made available to
1523 other partial programs that are linked with it. Otherwise,
1524 @var{symbol} will take its attributes from a symbol of the same name
1525 from another partial program it is linked with.
1526
1527 This is done by setting the @code{N_EXT} bit
1528 of that symbol's @code{n_type} to 1.
1529
1530 @node Int, Lcomm, Global, Pseudo Ops
1531 @section .int @var{expressions}
1532 Expect zero or more @var{expressions}, of any segment, separated by
1533 commas. For each expression, emit a 32-bit number that will, at run
1534 time, be the value of that expression. The byte order of the
1535 expression depends on what kind of computer will run the program.
1536
1537 @node Lcomm, Line, Int, Pseudo Ops
1538 @section .lcomm @var{symbol} , @var{length}
1539 Reserve @var{length} (an absolute expression) bytes for a local
1540 common denoted by @var{symbol}. The segment and value of @var{symbol} are
1541 those of the new local common. The addresses are allocated in the
1542 @code{bss} segment, so at run-time the bytes will start off zeroed.
1543 @var{Symbol} is not declared global (@pxref{Global}), so is normally
1544 not visible to @code{ld}.
1545
1546 @node Line, Long, Lcomm, Pseudo Ops
1547 @section .line @var{logical line number}
1548 @code{.line} tells @code{as} to change the logical line number.
1549 @var{logical line number} is an absolute expression. The next line
1550 will have that logical line number. So any other statements on the
1551 current line (after a @code{;}) will be reported as on logical line
1552 number @var{logical line number} - 1. One day this directive will
1553 be unsupported: it is used only for compatibility with existing
1554 assembler programs.
1555
1556 @node Long, Lsym, Line, Pseudo Ops
1557 @section .long @var{expressions}
1558 @code{.long} is the same as @samp{.int}, @pxref{Int}.
1559
1560 @node Lsym, Octa, Long, Pseudo Ops
1561 @section .lsym @var{symbol}, @var{expression}
1562 @code{.lsym} creates a new symbol named @var{symbol}, but does not put it in
1563 the hash table, ensuring it cannot be referenced by name during the
1564 rest of the assembly. This sets the attributes of the symbol to be
1565 the same as the expression value:
1566 @table @code
1567 @item n_other = n_desc = 0
1568 @itemx n_type = @r{(segment of @var{expression})}
1569 @itemx N_EXT = 0
1570 @itemx n_value = @var{expression}
1571 @end table
1572
1573 @node Octa, Org, Lsym, Pseudo Ops
1574 @section .octa @var{bignums}
1575 This directive expects zero or more bignums, separated by commas. For each
1576 bignum, it emits an 16-byte (@b{octa}-word) integer.
1577
1578 @node Org, Quad, Octa, Pseudo Ops
1579 @section .org @var{new-lc} , @var{fill}
1580
1581 @code{.org} will advance the location counter of the current segment to
1582 @var{new-lc}. @var{new-lc} is either an absolute expression or an
1583 expression with the same segment as the current subsegment. That is,
1584 you can't use @code{.org} to cross segments: if @var{new-lc} has the
1585 wrong segment, the @code{.org} directive is ignored. To be compatible
1586 with former assemblers, if the segment of @var{new-lc} is absolute,
1587 @code{as} will issue a warning, then pretend the segment of @var{new-lc}
1588 is the same as the current subsegment.
1589
1590 @code{.org} may only increase the location counter, or leave it
1591 unchanged; you cannot use @code{.org} to move the location counter
1592 backwards.
1593
1594 Because @code{as} tries to assemble programs in one pass @var{new-lc}
1595 must be defined. If you really detest this restriction we eagerly await
1596 a chance to share your improved assembler.
1597
1598 Beware that the origin is relative to the start of the segment, not
1599 to the start of the subsegment. This is compatible with other
1600 people's assemblers.
1601
1602 When the location counter (of the current subsegment) is advanced, the
1603 intervening bytes are filled with @var{fill} which should be an
1604 absolute expression. If the comma and @var{fill} are omitted,
1605 @var{fill} defaults to zero.
1606
1607 @node Quad, Set, Org, Pseudo Ops
1608 @section .quad @var{bignums}
1609 @code{.quad} expects zero or more bignums, separated by commas. For each
1610 bignum, it emits an 8-byte (@b{quad}-word) integer. If the bignum
1611 won't fit in a quad-word, it prints a warning message; and just
1612 takes the lowest order 8 bytes of the bignum.
1613
1614 @node Set, Short, Quad, Pseudo Ops
1615 @section .set @var{symbol}, @var{expression}
1616
1617 This directive sets the value of @var{symbol} to @var{expression}. This
1618 will change @code{n_value} and @code{n_type} to conform to
1619 @var{expression}. If @code{n_ext} is set, it remains set.
1620
1621 You may @code{.set} a symbol many times in the same assembly.
1622 If the expression's segment is unknowable during pass 1, a second
1623 pass over the source program will be forced. The second pass is
1624 currently not implemented. @code{as} will abort with an error
1625 message if one is required.
1626
1627 If you @code{.set} a global symbol, the value stored in the object
1628 file is the last value stored into it.
1629
1630 @node Short, Space, Set, Pseudo Ops
1631 @section .short @var{expressions}
1632 @c if not sparc
1633 @code{.short} is the same as @samp{.word}. @xref{Word}.
1634 @c fi not sparc
1635 @c if sparc
1636 @c On the sparc, this expects zero or more @var{expressions}, and emits
1637 @c a 16 bit number for each.
1638 @c fi sparc
1639
1640 @node Space, Stab, Short, Pseudo Ops
1641 @section .space @var{size} , @var{fill}
1642 This directive emits @var{size} bytes, each of value @var{fill}. Both
1643 @var{size} and @var{fill} are absolute expressions. If the comma
1644 and @var{fill} are omitted, @var{fill} is assumed to be zero.
1645
1646 @node Stab, Text, Space, Pseudo Ops
1647 @section .stabd, .stabn, .stabs
1648 There are three directives that begin @samp{.stab}.
1649 All emit symbols, for use by symbolic debuggers.
1650 The symbols are not entered in @code{as}' hash table: they
1651 cannot be referenced elsewhere in the source file.
1652 Up to five fields are required:
1653 @table @var
1654 @item string
1655 This is the symbol's name. It may contain any character except @samp{\000},
1656 so is more general than ordinary symbol names. Some debuggers used to
1657 code arbitrarily complex structures into symbol names using this field.
1658 @item type
1659 An absolute expression. The symbol's @code{n_type} is set to the low 8
1660 bits of this expression.
1661 Any bit pattern is permitted, but @code{ld} and debuggers will choke on
1662 silly bit patterns.
1663 @item other
1664 An absolute expression.
1665 The symbol's @code{n_other} is set to the low 8 bits of this expression.
1666 @item desc
1667 An absolute expression.
1668 The symbol's @code{n_desc} is set to the low 16 bits of this expression.
1669 @item value
1670 An absolute expression which becomes the symbol's @code{n_value}.
1671 @end table
1672
1673 If a warning is detected while reading a @code{.stab@var{X}}
1674 statement, the symbol has probably already been created and you will
1675 get a half-formed symbol in your object file. This is compatible
1676 with earlier assemblers!
1677
1678 @table @code
1679 @item .stabd @var{type} , @var{other} , @var{desc}
1680
1681 The ``name'' of the symbol generated is not even an empty string.
1682 It is a null pointer, for compatibility. Older assemblers used a
1683 null pointer so they didn't waste space in object files with empty
1684 strings.
1685
1686 The symbol's @code{n_value} is set to the location counter,
1687 relocatably. When your program is linked, the value of this symbol
1688 will be where the location counter was when the @code{.stabd} was
1689 assembled.
1690
1691 @item .stabn @var{type} , @var{other} , @var{desc} , @var{value}
1692
1693 The name of the symbol is set to the empty string @code{""}.
1694
1695 @item .stabs @var{string} , @var{type} , @var{other} , @var{desc} , @var{value}
1696
1697 All five fields are specified.
1698 @end table
1699
1700 @node Text, Word, Stab, Pseudo Ops
1701 @section .text @var{subsegment}
1702 Tells @code{as} to assemble the following statements onto the end of
1703 the text subsegment numbered @var{subsegment}, which is an absolute
1704 expression. If @var{subsegment} is omitted, subsegment number zero
1705 is used.
1706
1707 @node Word, , Text, Pseudo Ops
1708 @section .word @var{expressions}
1709 @c if sparc
1710 @c On the Sparc, this produces 32-bit numbers instead of 16-bit ones.
1711 @c fi sparc
1712 This directive expects zero or more @var{expressions}, of any segment,
1713 separated by commas. For each expression, @code{as} emits a 16-bit number.
1714 @ignore
1715 @c if all-arch
1716 The byte order
1717 of the expression depends on what kind of computer will run the
1718 program.
1719 @c fi all-arch
1720 @end ignore
1721
1722 @subsection Special Treatment to support Compilers
1723
1724 In order to assemble compiler output into something that will work,
1725 @code{as} will occasionlly do strange things to @samp{.word} directives.
1726 Directives of the form @samp{.word sym1-sym2} are often emitted by
1727 compilers as part of jump tables. Therefore, when @code{as} assembles a
1728 directive of the form @samp{.word sym1-sym2}, and the difference between
1729 @code{sym1} and @code{sym2} does not fit in 16 bits, @code{as} will
1730 create a @dfn{secondary jump table}, immediately before the next label.
1731 This @var{secondary jump table} will be preceded by a short-jump to the
1732 first byte after the secondary table. This short-jump prevents the flow
1733 of control from accidentally falling into the new table. Inside the
1734 table will be a long-jump to @code{sym2}. The original @samp{.word}
1735 will contain @code{sym1} minus the address of the long-jump to
1736 @code{sym2}.
1737
1738 If there were several occurrences of @samp{.word sym1-sym2} before the
1739 secondary jump table, all of them will be adjusted. If there was a
1740 @samp{.word sym3-sym4}, that also did not fit in sixteen bits, a
1741 long-jump to @code{sym4} will be included in the secondary jump table,
1742 and the @code{.word} directives will be adjusted to contain @code{sym3}
1743 minus the address of the long-jump to @code{sym4}; and so on, for as many
1744 entries in the original jump table as necessary.
1745
1746 @ignore
1747 @c if internals
1748 @emph{This feature may be disabled by compiling @code{as} with the
1749 @samp{-DWORKING_DOT_WORD} option.} This feature is likely to confuse
1750 assembly language programmers.
1751 @c fi internals
1752 @end ignore
1753
1754
1755 @section Deprecated Directives
1756 One day these directives won't work.
1757 They are included for compatibility with older assemblers.
1758 @table @t
1759 @item .abort
1760 @item .file
1761 @item .line
1762 @end table
1763
1764 @node Machine Dependent, License, Pseudo Ops, top
1765 @chapter Machine Dependent Features:
1766 @c if 680x0
1767 Motorola 680x0 @refill
1768 @c fi 680x0
1769 @c pesch@cygnus.com: This version of the manual is specifically hacked
1770 @c for 68K gas. We should have a config method of
1771 @c automating this; in the meantime, use ignore
1772 @c for the other architectures (or for their stubs)
1773 @ignore
1774 @section Vax
1775 @subsection Options
1776
1777 The Vax version of @code{as} accepts any of the following options,
1778 gives a warning message that the option was ignored and proceeds.
1779 These options are for compatibility with scripts designed for other
1780 people's assemblers.
1781
1782 @table @asis
1783 @item @kbd{-D} (Debug)
1784 @itemx @kbd{-S} (Symbol Table)
1785 @itemx @kbd{-T} (Token Trace)
1786 These are obsolete options used to debug old assemblers.
1787
1788 @item @kbd{-d} (Displacement size for JUMPs)
1789 This option expects a number following the @kbd{-d}. Like options
1790 that expect filenames, the number may immediately follow the
1791 @kbd{-d} (old standard) or constitute the whole of the command line
1792 argument that follows @kbd{-d} (GNU standard).
1793
1794 @item @kbd{-V} (Virtualize Interpass Temporary File)
1795 Some other assemblers use a temporary file. This option
1796 commanded them to keep the information in active memory rather
1797 than in a disk file. @code{as} always does this, so this
1798 option is redundant.
1799
1800 @item @kbd{-J} (JUMPify Longer Branches)
1801 Many 32-bit computers permit a variety of branch instructions
1802 to do the same job. Some of these instructions are short (and
1803 fast) but have a limited range; others are long (and slow) but
1804 can branch anywhere in virtual memory. Often there are 3
1805 flavors of branch: short, medium and long. Some other
1806 assemblers would emit short and medium branches, unless told by
1807 this option to emit short and long branches.
1808
1809 @item @kbd{-t} (Temporary File Directory)
1810 Some other assemblers may use a temporary file, and this option
1811 takes a filename being the directory to site the temporary
1812 file. @code{as} does not use a temporary disk file, so this
1813 option makes no difference. @kbd{-t} needs exactly one
1814 filename.
1815 @end table
1816
1817 The Vax version of the assembler accepts two options when
1818 compiled for VMS. They are @kbd{-h}, and @kbd{-+}. The
1819 @kbd{-h} option prevents @code{as} from modifying the
1820 symbol-table entries for symbols that contain lowercase
1821 characters (I think). The @kbd{-+} option causes @code{as} to
1822 print warning messages if the FILENAME part of the object file,
1823 or any symbol name is larger than 31 characters. The @kbd{-+}
1824 option also insertes some code following the @samp{_main}
1825 symbol so that the object file will be compatible with Vax-11
1826 "C".
1827
1828 @subsection Floating Point
1829 Conversion of flonums to floating point is correct, and
1830 compatible with previous assemblers. Rounding is
1831 towards zero if the remainder is exactly half the least significant bit.
1832
1833 @code{D}, @code{F}, @code{G} and @code{H} floating point formats
1834 are understood.
1835
1836 Immediate floating literals (@emph{e.g.} @samp{S`$6.9})
1837 are rendered correctly. Again, rounding is towards zero in the
1838 boundary case.
1839
1840 The @code{.float} directive produces @code{f} format numbers.
1841 The @code{.double} directive produces @code{d} format numbers.
1842
1843 @subsection Machine Directives
1844 The Vax version of the assembler supports four directives for
1845 generating Vax floating point constants. They are described in the
1846 table below.
1847
1848 @table @code
1849 @item .dfloat
1850 This expects zero or more flonums, separated by commas, and
1851 assembles Vax @code{d} format 64-bit floating point constants.
1852
1853 @item .ffloat
1854 This expects zero or more flonums, separated by commas, and
1855 assembles Vax @code{f} format 32-bit floating point constants.
1856
1857 @item .gfloat
1858 This expects zero or more flonums, separated by commas, and
1859 assembles Vax @code{g} format 64-bit floating point constants.
1860
1861 @item .hfloat
1862 This expects zero or more flonums, separated by commas, and
1863 assembles Vax @code{h} format 128-bit floating point constants.
1864
1865 @end table
1866
1867 @subsection Opcodes
1868 All DEC mnemonics are supported. Beware that @code{case@dots{}}
1869 instructions have exactly 3 operands. The dispatch table that
1870 follows the @code{case@dots{}} instruction should be made with
1871 @code{.word} statements. This is compatible with all unix
1872 assemblers we know of.
1873
1874 @subsection Branch Improvement
1875 Certain pseudo opcodes are permitted. They are for branch
1876 instructions. They expand to the shortest branch instruction that
1877 will reach the target. Generally these mnemonics are made by
1878 substituting @samp{j} for @samp{b} at the start of a DEC mnemonic.
1879 This feature is included both for compatibility and to help
1880 compilers. If you don't need this feature, don't use these
1881 opcodes. Here are the mnemonics, and the code they can expand into.
1882
1883 @table @code
1884 @item jbsb
1885 @samp{Jsb} is already an instruction mnemonic, so we chose @samp{jbsb}.
1886 @table @asis
1887 @item (byte displacement)
1888 @kbd{bsbb @dots{}}
1889 @item (word displacement)
1890 @kbd{bsbw @dots{}}
1891 @item (long displacement)
1892 @kbd{jsb @dots{}}
1893 @end table
1894 @item jbr
1895 @itemx jr
1896 Unconditional branch.
1897 @table @asis
1898 @item (byte displacement)
1899 @kbd{brb @dots{}}
1900 @item (word displacement)
1901 @kbd{brw @dots{}}
1902 @item (long displacement)
1903 @kbd{jmp @dots{}}
1904 @end table
1905 @item j@var{COND}
1906 @var{COND} may be any one of the conditional branches
1907 @code{neq nequ eql eqlu gtr geq lss gtru lequ vc vs gequ cc lssu cs}.
1908 @var{COND} may also be one of the bit tests
1909 @code{bs bc bss bcs bsc bcc bssi bcci lbs lbc}.
1910 @var{NOTCOND} is the opposite condition to @var{COND}.
1911 @table @asis
1912 @item (byte displacement)
1913 @kbd{b@var{COND} @dots{}}
1914 @item (word displacement)
1915 @kbd{b@var{UNCOND} foo ; brw @dots{} ; foo:}
1916 @item (long displacement)
1917 @kbd{b@var{UNCOND} foo ; jmp @dots{} ; foo:}
1918 @end table
1919 @item jacb@var{X}
1920 @var{X} may be one of @code{b d f g h l w}.
1921 @table @asis
1922 @item (word displacement)
1923 @kbd{@var{OPCODE} @dots{}}
1924 @item (long displacement)
1925 @kbd{@var{OPCODE} @dots{}, foo ; brb bar ; foo: jmp @dots{} ; bar:}
1926 @end table
1927 @item jaob@var{YYY}
1928 @var{YYY} may be one of @code{lss leq}.
1929 @item jsob@var{ZZZ}
1930 @var{ZZZ} may be one of @code{geq gtr}.
1931 @table @asis
1932 @item (byte displacement)
1933 @kbd{@var{OPCODE} @dots{}}
1934 @item (word displacement)
1935 @kbd{@var{OPCODE} @dots{}, foo ; brb bar ; foo: brw @var{destination} ; bar:}
1936 @item (long displacement)
1937 @kbd{@var{OPCODE} @dots{}, foo ; brb bar ; foo: jmp @var{destination} ; bar: }
1938 @end table
1939 @item aobleq
1940 @itemx aoblss
1941 @itemx sobgeq
1942 @itemx sobgtr
1943 @table @asis
1944 @item (byte displacement)
1945 @kbd{@var{OPCODE} @dots{}}
1946 @item (word displacement)
1947 @kbd{@var{OPCODE} @dots{}, foo ; brb bar ; foo: brw @var{destination} ; bar:}
1948 @item (long displacement)
1949 @kbd{@var{OPCODE} @dots{}, foo ; brb bar ; foo: jmp @var{destination} ; bar:}
1950 @end table
1951 @end table
1952
1953 @subsection operands
1954 The immediate character is @samp{$} for Unix compatibility, not
1955 @samp{#} as DEC writes it.
1956
1957 The indirect character is @samp{*} for Unix compatibility, not
1958 @samp{@@} as DEC writes it.
1959
1960 The displacement sizing character is @samp{`} (an accent grave) for
1961 Unix compatibility, not @samp{^} as DEC writes it. The letter
1962 preceding @samp{`} may have either case. @samp{G} is not
1963 understood, but all other letters (@code{b i l s w}) are understood.
1964
1965 Register names understood are @code{r0 r1 r2 @dots{} r15 ap fp sp
1966 pc}. Any case of letters will do.
1967
1968 For instance
1969 @example
1970 tstb *w`$4(r5)
1971 @end example
1972
1973 Any expression is permitted in an operand. Operands are comma
1974 separated.
1975
1976 @c There is some bug to do with recognizing expressions
1977 @c in operands, but I forget what it is. It is
1978 @c a syntax clash because () is used as an address mode
1979 @c and to encapsulate sub-expressions.
1980 @subsection Not Supported
1981 Vax bit fields can not be assembled with @code{as}. Someone
1982 can add the required code if they really need it.
1983 @end ignore
1984
1985 @c if 680x0
1986 @section Options
1987 The 680x0 version of @code{as} has two machine dependent options.
1988 One shortens undefined references from 32 to 16 bits, while the
1989 other is used to tell @code{as} what kind of machine it is
1990 assembling for.
1991
1992 You can use the @kbd{-l} option to shorten the size of references to
1993 undefined symbols. If the @kbd{-l} option is not given, references to
1994 undefined symbols will be a full long (32 bits) wide. (Since @code{as}
1995 cannot know where these symbols will end up, @code{as} can only allocate
1996 space for the linker to fill in later. Since @code{as} doesn't know how
1997 far away these symbols will be, it allocates as much space as it can.)
1998 If this option is given, the references will only be one word wide (16
1999 bits). This may be useful if you want the object file to be as small as
2000 possible, and you know that the relevant symbols will be less than 17
2001 bits away.
2002
2003 The 680x0 version of @code{as} is most frequently used to assemble
2004 programs for the Motorola MC68020 microprocessor. Occasionally it is
2005 used to assemble programs for the mostly similar, but slightly different
2006 MC68000 or MC68010 microprocessors. You can give @code{as} the options
2007 @samp{-m68000}, @samp{-mc68000}, @samp{-m68010}, @samp{-mc68010},
2008 @samp{-m68020}, and @samp{-mc68020} to tell it what processor is the
2009 target.
2010
2011 @section Syntax
2012
2013 The 680x0 version of @code{as} uses syntax similar to the Sun assembler.
2014 Size modifiers are appended directly to the end of the opcode without an
2015 intervening period. For example, write @samp{movl} rather than
2016 @samp{move.l}.
2017
2018 @c pesch@cygnus.com: Vintage Release c1.37 isn't compiled with
2019 @c SUN_ASM_SYNTAX.
2020 @ignore
2021 If @code{as} is compiled with SUN_ASM_SYNTAX defined, it will also allow
2022 Sun-style local labels of the form @samp{1$} through @samp{$9}.
2023 @end ignore
2024
2025 In the following table @dfn{apc} stands for any of the address
2026 registers (@samp{a0} through @samp{a7}), nothing, (@samp{}), the
2027 Program Counter (@samp{pc}), or the zero-address relative to the
2028 program counter (@samp{zpc}).
2029
2030 The following addressing modes are understood:
2031 @table @dfn
2032 @item Immediate
2033 @samp{#@var{digits}}
2034
2035 @item Data Register
2036 @samp{d0} through @samp{d7}
2037
2038 @item Address Register
2039 @samp{a0} through @samp{a7}
2040
2041 @item Address Register Indirect
2042 @samp{a0@@} through @samp{a7@@}
2043
2044 @item Address Register Postincrement
2045 @samp{a0@@+} through @samp{a7@@+}
2046
2047 @item Address Register Predecrement
2048 @samp{a0@@-} through @samp{a7@@-}
2049
2050 @item Indirect Plus Offset
2051 @samp{@var{apc}@@(@var{digits})}
2052
2053 @item Index
2054 @samp{@var{apc}@@(@var{digits},@var{register}:@var{size}:@var{scale})}
2055 or @samp{@var{apc}@@(@var{register}:@var{size}:@var{scale})}
2056
2057 @item Postindex
2058 @samp{@var{apc}@@(@var{digits})@@(@var{digits},@var{register}:@var{size}:@var{scale})}
2059 or @samp{@var{apc}@@(@var{digits})@@(@var{register}:@var{size}:@var{scale})}
2060
2061 @item Preindex
2062 @samp{@var{apc}@@(@var{digits},@var{register}:@var{size}:@var{scale})@@(@var{digits})}
2063 or @samp{@var{apc}@@(@var{register}:@var{size}:@var{scale})@@(@var{digits})}
2064
2065 @item Memory Indirect
2066 @samp{@var{apc}@@(@var{digits})@@(@var{digits})}
2067
2068 @item Absolute
2069 @samp{@var{symbol}}, or @samp{@var{digits}}
2070 @ignore
2071 @c pesch@cygnus.com: gnu, rich concur the following needs careful
2072 @c research before documenting.
2073 , or either of the above followed
2074 by @samp{:b}, @samp{:w}, or @samp{:l}.
2075 @end ignore
2076 @end table
2077
2078 @section Floating Point
2079 The floating point code is not too well tested, and may have
2080 subtle bugs in it.
2081
2082 Packed decimal (P) format floating literals are not supported.
2083 Feel free to add the code!
2084
2085 The floating point formats generated by directives are these.
2086 @table @code
2087 @item .float
2088 @code{Single} precision floating point constants.
2089 @item .double
2090 @code{Double} precision floating point constants.
2091 @end table
2092
2093 There is no directive to produce regions of memory holding
2094 extended precision numbers, however they can be used as
2095 immediate operands to floating-point instructions. Adding a
2096 directive to create extended precision numbers would not be
2097 hard, but it has not yet seemed necessary.
2098
2099 @section Machine Directives
2100 In order to be compatible with the Sun assembler the 680x0 assembler
2101 understands the following directives.
2102 @table @code
2103 @item .data1
2104 This directive is identical to a @code{.data 1} directive.
2105 @item .data2
2106 This directive is identical to a @code{.data 2} directive.
2107 @item .even
2108 This directive is identical to a @code{.align 1} directive.
2109 @c Is this true? does it work???
2110 @item .skip
2111 This directive is identical to a @code{.space} directive.
2112 @end table
2113
2114 @section Opcodes
2115 @c pesch@cygnus.com: I don't see any point in the following
2116 @c paragraph. Bugs are bugs; how does saying this
2117 @c help anyone?
2118 @ignore
2119 Danger: Several bugs have been found in the opcode table (and
2120 fixed). More bugs may exist. Be careful when using obscure
2121 instructions.
2122 @end ignore
2123
2124 @subsection Branch Improvement
2125
2126 Certain pseudo opcodes are permitted for branch instructions.
2127 They expand to the shortest branch instruction that will reach the
2128 target. Generally these mnemonics are made by substituting @samp{j} for
2129 @samp{b} at the start of a Motorola mnemonic.
2130
2131 The following table summarizes the pseudo-operations. A @code{*} flags
2132 cases that are more fully described after the table:
2133
2134 @example
2135 Displacement
2136 +---------------------------------------------------------
2137 | 68020 68000/10
2138 Pseudo-Op |BYTE WORD LONG LONG non-PC relative
2139 +---------------------------------------------------------
2140 jbsr |bsrs bsr bsrl jsr jsr
2141 jra |bras bra bral jmp jmp
2142 * jXX |bXXs bXX bXXl bNXs;jmpl bNXs;jmp
2143 * dbXX |dbXX dbXX dbXX; bra; jmpl
2144 * fjXX |fbXXw fbXXw fbXXl fbNXw;jmp
2145
2146 XX: condition
2147 NX: negative of condition XX
2148
2149 @end example
2150 @center{@code{*}---see full description below}
2151
2152 @table @code
2153 @item jbsr
2154 @itemx jra
2155 These are the simplest jump pseudo-operations; they always map to one
2156 particular machine instruction, depending on the displacement to the
2157 branch target.
2158
2159 @item j@var{XX}
2160 Here, @samp{j@var{XX}} stands for an entire family of pseudo-operations,
2161 where @var{XX} is a conditional branch or condition-code test. The full
2162 list of pseudo-ops in this family is:
2163 @example
2164 jhi jls jcc jcs jne jeq jvc
2165 jvs jpl jmi jge jlt jgt jle
2166 @end example
2167
2168 For the cases of non-PC relative displacements and long displacements on
2169 the 68000 or 68010, @code{as} will issue a longer code fragment in terms of
2170 @var{NX}, the opposite condition to @var{XX}:
2171 @example
2172 j@var{XX} foo
2173 @end example
2174 gives
2175 @example
2176 b@var{NX}s oof
2177 jmp foo
2178 oof:
2179 @end example
2180
2181 @item db@var{XX}
2182 The full family of pseudo-operations covered here is
2183 @example
2184 dbhi dbls dbcc dbcs dbne dbeq dbvc
2185 dbvs dbpl dbmi dbge dblt dbgt dble
2186 dbf dbra dbt
2187 @end example
2188
2189 Other than for word and byte displacements, when the source reads
2190 @samp{db@var{XX} foo}, @code{as} will emit
2191 @example
2192 db@var{XX} oo1
2193 bra oo2
2194 oo1:jmpl foo
2195 oo2:
2196 @end example
2197
2198 @item fj@var{XX}
2199 This family includes
2200 @example
2201 fjne fjeq fjge fjlt fjgt fjle fjf
2202 fjt fjgl fjgle fjnge fjngl fjngle fjngt
2203 fjnle fjnlt fjoge fjogl fjogt fjole fjolt
2204 fjor fjseq fjsf fjsne fjst fjueq fjuge
2205 fjugt fjule fjult fjun
2206 @end example
2207
2208 For branch targets that are not PC relative, @code{as} emits
2209 @example
2210 fb@var{NX} oof
2211 jmp foo
2212 oof:
2213 @end example
2214 when it encounters @samp{fj@var{XX} foo}.
2215
2216 @end table
2217
2218 @subsection Special Characters
2219 The immediate character is @samp{#} for Sun compatibility. The
2220 line-comment character is @samp{|}. If a @samp{#} appears at the
2221 beginning of a line, it is treated as a comment unless it looks like
2222 @samp{# line file}, in which case it is treated normally.
2223 @c fi 680x0
2224
2225 @c pesch@cygnus.com: see remarks at ignore for vax.
2226 @ignore
2227 @section 32x32
2228 @section Options
2229 The 32x32 version of @code{as} accepts a @kbd{-m32032} option to
2230 specify thiat it is compiling for a 32032 processor, or a
2231 @kbd{-m32532} to specify that it is compiling for a 32532 option.
2232 The default (if neither is specified) is chosen when the assembler
2233 is compiled.
2234
2235 @subsection Syntax
2236 I don't know anything about the 32x32 syntax assembled by
2237 @code{as}. Someone who undersands the processor (I've never seen
2238 one) and the possible syntaxes should write this section.
2239
2240 @subsection Floating Point
2241 The 32x32 uses IEEE floating point numbers, but @code{as} will only
2242 create single or double precision values. I don't know if the 32x32
2243 understands extended precision numbers.
2244
2245 @subsection Machine Directives
2246 The 32x32 has no machine dependent directives.
2247
2248 @section Sparc
2249 @subsection Options
2250 The sparc has no machine dependent options.
2251
2252 @subsection syntax
2253 I don't know anything about Sparc syntax. Someone who does
2254 will have to write this section.
2255
2256 @subsection Floating Point
2257 The Sparc uses ieee floating-point numbers.
2258
2259 @subsection Machine Directives
2260 The Sparc version of @code{as} supports the following additional
2261 machine directives:
2262
2263 @table @code
2264 @item .common
2265 This must be followed by a symbol name, a positive number, and
2266 @code{"bss"}. This behaves somewhat like @code{.comm}, but the
2267 syntax is different.
2268
2269 @item .global
2270 This is functionally identical to @code{.globl}.
2271
2272 @item .half
2273 This is functionally identical to @code{.short}.
2274
2275 @item .proc
2276 This directive is ignored. Any text following it on the same
2277 line is also ignored.
2278
2279 @item .reserve
2280 This must be followed by a symbol name, a positive number, and
2281 @code{"bss"}. This behaves somewhat like @code{.lcomm}, but the
2282 syntax is different.
2283
2284 @item .seg
2285 This must be followed by @code{"text"}, @code{"data"}, or
2286 @code{"data1"}. It behaves like @code{.text}, @code{.data}, or
2287 @code{.data 1}.
2288
2289 @item .skip
2290 This is functionally identical to the .space directive.
2291
2292 @item .word
2293 On the Sparc, the .word directive produces 32 bit values,
2294 instead of the 16 bit values it produces on every other machine.
2295
2296 @end table
2297
2298 @section Intel 80386
2299 @subsection Options
2300 The 80386 has no machine dependent options.
2301
2302 @subsection AT&T Syntax versus Intel Syntax
2303 In order to maintain compatibility with the output of @code{GCC},
2304 @code{as} supports AT&T System V/386 assembler syntax. This is quite
2305 different from Intel syntax. We mention these differences because
2306 almost all 80386 documents used only Intel syntax. Notable differences
2307 between the two syntaxes are:
2308 @itemize @bullet
2309 @item
2310 AT&T immediate operands are preceded by @samp{$}; Intel immediate
2311 operands are undelimited (Intel @samp{push 4} is AT&T @samp{pushl $4}).
2312 AT&T register operands are preceded by @samp{%}; Intel register operands
2313 are undelimited. AT&T absolute (as opposed to PC relative) jump/call
2314 operands are prefixed by @samp{*}; they are undelimited in Intel syntax.
2315
2316 @item
2317 AT&T and Intel syntax use the opposite order for source and destination
2318 operands. Intel @samp{add eax, 4} is @samp{addl $4, %eax}. The
2319 @samp{source, dest} convention is maintained for compatibility with
2320 previous Unix assemblers.
2321
2322 @item
2323 In AT&T syntax the size of memory operands is determined from the last
2324 character of the opcode name. Opcode suffixes of @samp{b}, @samp{w},
2325 and @samp{l} specify byte (8-bit), word (16-bit), and long (32-bit)
2326 memory references. Intel syntax accomplishes this by prefixes memory
2327 operands (@emph{not} the opcodes themselves) with @samp{byte ptr},
2328 @samp{word ptr}, and @samp{dword ptr}. Thus, Intel @samp{mov al, byte
2329 ptr @var{foo}} is @samp{movb @var{foo}, %al} in AT&T syntax.
2330
2331 @item
2332 Immediate form long jumps and calls are
2333 @samp{lcall/ljmp $@var{segment}, $@var{offset}} in AT&T syntax; the
2334 Intel syntax is
2335 @samp{call/jmp far @var{segment}:@var{offset}}. Also, the far return
2336 instruction
2337 is @samp{lret $@var{stack-adjust}} in AT&T syntax; Intel syntax is
2338 @samp{ret far @var{stack-adjust}}.
2339
2340 @item
2341 The AT&T assembler does not provide support for multiple segment
2342 programs. Unix style systems expect all programs to be single segments.
2343 @end itemize
2344
2345 @subsection Opcode Naming
2346 Opcode names are suffixed with one character modifiers which specify the
2347 size of operands. The letters @samp{b}, @samp{w}, and @samp{l} specify
2348 byte, word, and long operands. If no suffix is specified by an
2349 instruction and it contains no memory operands then @code{as} tries to
2350 fill in the missing suffix based on the destination register operand
2351 (the last one by convention). Thus, @samp{mov %ax, %bx} is equivalent
2352 to @samp{movw %ax, %bx}; also, @samp{mov $1, %bx} is equivalent to
2353 @samp{movw $1, %bx}. Note that this is incompatible with the AT&T Unix
2354 assembler which assumes that a missing opcode suffix implies long
2355 operand size. (This incompatibility does not affect compiler output
2356 since compilers always explicitly specify the opcode suffix.)
2357
2358 Almost all opcodes have the same names in AT&T and Intel format. There
2359 are a few exceptions. The sign extend and zero extend instructions need
2360 two sizes to specify them. They need a size to sign/zero extend
2361 @emph{from} and a size to zero extend @emph{to}. This is accomplished
2362 by using two opcode suffixes in AT&T syntax. Base names for sign extend
2363 and zero extend are @samp{movs@dots{}} and @samp{movz@dots{}} in AT&T
2364 syntax (@samp{movsx} and @samp{movzx} in Intel syntax). The opcode
2365 suffixes are tacked on to this base name, the @emph{from} suffix before
2366 the @emph{to} suffix. Thus, @samp{movsbl %al, %edx} is AT&T syntax for
2367 ``move sign extend @emph{from} %al @emph{to} %edx.'' Possible suffixes,
2368 thus, are @samp{bl} (from byte to long), @samp{bw} (from byte to word),
2369 and @samp{wl} (from word to long).
2370
2371 The Intel syntax conversion instructions
2372 @itemize @bullet
2373 @item
2374 @samp{cbw} --- sign-extend byte in @samp{%al} to word in @samp{%ax},
2375 @item
2376 @samp{cwde} --- sign-extend word in @samp{%ax} to long in @samp{%eax},
2377 @item
2378 @samp{cwd} --- sign-extend word in @samp{%ax} to long in @samp{%dx:%ax},
2379 @item
2380 @samp{cdq} --- sign-extend dword in @samp{%eax} to quad in @samp{%edx:%eax},
2381 @end itemize
2382 are called @samp{cbtw}, @samp{cwtl}, @samp{cwtd}, and @samp{cltd} in
2383 AT&T naming. @code{as} accepts either naming for these instructions.
2384
2385 Far call/jump instructions are @samp{lcall} and @samp{ljmp} in
2386 AT&T syntax, but are @samp{call far} and @samp{jump far} in Intel
2387 convention.
2388
2389 @subsection Register Naming
2390 Register operands are always prefixes with @samp{%}. The 80386 registers
2391 consist of
2392 @itemize @bullet
2393 @item
2394 the 8 32-bit registers @samp{%eax} (the accumulator), @samp{%ebx},
2395 @samp{%ecx}, @samp{%edx}, @samp{%edi}, @samp{%esi}, @samp{%ebp} (the
2396 frame pointer), and @samp{%esp} (the stack pointer).
2397
2398 @item
2399 the 8 16-bit low-ends of these: @samp{%ax}, @samp{%bx}, @samp{%cx},
2400 @samp{%dx}, @samp{%di}, @samp{%si}, @samp{%bp}, and @samp{%sp}.
2401
2402 @item
2403 the 8 8-bit registers: @samp{%ah}, @samp{%al}, @samp{%bh},
2404 @samp{%bl}, @samp{%ch}, @samp{%cl}, @samp{%dh}, and @samp{%dl} (These
2405 are the high-bytes and low-bytes of @samp{%ax}, @samp{%bx},
2406 @samp{%cx}, and @samp{%dx})
2407
2408 @item
2409 the 6 segment registers @samp{%cs} (code segment), @samp{%ds}
2410 (data segment), @samp{%ss} (stack segment), @samp{%es}, @samp{%fs},
2411 and @samp{%gs}.
2412
2413 @item
2414 the 3 processor control registers @samp{%cr0}, @samp{%cr2}, and
2415 @samp{%cr3}.
2416
2417 @item
2418 the 6 debug registers @samp{%db0}, @samp{%db1}, @samp{%db2},
2419 @samp{%db3}, @samp{%db6}, and @samp{%db7}.
2420
2421 @item
2422 the 2 test registers @samp{%tr6} and @samp{%tr7}.
2423
2424 @item
2425 the 8 floating point register stack @samp{%st} or equivalently
2426 @samp{%st(0)}, @samp{%st(1)}, @samp{%st(2)}, @samp{%st(3)},
2427 @samp{%st(4)}, @samp{%st(5)}, @samp{%st(6)}, and @samp{%st(7)}.
2428 @end itemize
2429
2430 @subsection Opcode Prefixes
2431 Opcode prefixes are used to modify the following opcode. They are used
2432 to repeat string instructions, to provide segment overrides, to perform
2433 bus lock operations, and to give operand and address size (16-bit
2434 operands are specified in an instruction by prefixing what would
2435 normally be 32-bit operands with a ``operand size'' opcode prefix).
2436 Opcode prefixes are usually given as single-line instructions with no
2437 operands, and must directly precede the instruction they act upon. For
2438 example, the @samp{scas} (scan string) instruction is repeated with:
2439 @example
2440 repne
2441 scas
2442 @end example
2443
2444 Here is a list of opcode prefixes:
2445 @itemize @bullet
2446 @item
2447 Segment override prefixes @samp{cs}, @samp{ds}, @samp{ss}, @samp{es},
2448 @samp{fs}, @samp{gs}. These are automatically added by specifying
2449 using the @var{segment}:@var{memory-operand} form for memory references.
2450
2451 @item
2452 Operand/Address size prefixes @samp{data16} and @samp{addr16}
2453 change 32-bit operands/addresses into 16-bit operands/addresses. Note
2454 that 16-bit addressing modes (i.e. 8086 and 80286 addressing modes)
2455 are not supported (yet).
2456
2457 @item
2458 The bus lock prefix @samp{lock} inhibits interrupts during
2459 execution of the instruction it precedes. (This is only valid with
2460 certain instructions; see a 80386 manual for details).
2461
2462 @item
2463 The wait for coprocessor prefix @samp{wait} waits for the
2464 coprocessor to complete the current instruction. This should never be
2465 needed for the 80386/80387 combination.
2466
2467 @item
2468 The @samp{rep}, @samp{repe}, and @samp{repne} prefixes are added
2469 to string instructions to make them repeat @samp{%ecx} times.
2470 @end itemize
2471
2472 @subsection Memory References
2473 An Intel syntax indirect memory reference of the form
2474 @example
2475 @var{segment}:[@var{base} + @var{index}*@var{scale} + @var{disp}]
2476 @end example
2477 is translated into the AT&T syntax
2478 @example
2479 @var{segment}:@var{disp}(@var{base}, @var{index}, @var{scale})
2480 @end example
2481 where @var{base} and @var{index} are the optional 32-bit base and
2482 index registers, @var{disp} is the optional displacement, and
2483 @var{scale}, taking the values 1, 2, 4, and 8, multiplies @var{index}
2484 to calculate the address of the operand. If no @var{scale} is
2485 specified, @var{scale} is taken to be 1. @var{segment} specifies the
2486 optional segment register for the memory operand, and may override the
2487 default segment register (see a 80386 manual for segment register
2488 defaults). Note that segment overrides in AT&T syntax @emph{must} have
2489 be preceded by a @samp{%}. If you specify a segment override which
2490 coincides with the default segment register, @code{as} will @emph{not}
2491 output any segment register override prefixes to assemble the given
2492 instruction. Thus, segment overrides can be specified to emphasize which
2493 segment register is used for a given memory operand.
2494
2495 Here are some examples of Intel and AT&T style memory references:
2496 @table @asis
2497
2498 @item AT&T: @samp{-4(%ebp)}, Intel: @samp{[ebp - 4]}
2499 @var{base} is @samp{%ebp}; @var{disp} is @samp{-4}. @var{segment} is
2500 missing, and the default segment is used (@samp{%ss} for addressing with
2501 @samp{%ebp} as the base register). @var{index}, @var{scale} are both missing.
2502
2503 @item AT&T: @samp{foo(,%eax,4)}, Intel: @samp{[foo + eax*4]}
2504 @var{index} is @samp{%eax} (scaled by a @var{scale} 4); @var{disp} is
2505 @samp{foo}. All other fields are missing. The segment register here
2506 defaults to @samp{%ds}.
2507
2508 @item AT&T: @samp{foo(,1)}; Intel @samp{[foo]}
2509 This uses the value pointed to by @samp{foo} as a memory operand.
2510 Note that @var{base} and @var{index} are both missing, but there is only
2511 @emph{one} @samp{,}. This is a syntactic exception.
2512
2513 @item AT&T: @samp{%gs:foo}; Intel @samp{gs:foo}
2514 This selects the contents of the variable @samp{foo} with segment
2515 register @var{segment} being @samp{%gs}.
2516
2517 @end table
2518
2519 Absolute (as opposed to PC relative) call and jump operands must be
2520 prefixed with @samp{*}. If no @samp{*} is specified, @code{as} will
2521 always choose PC relative addressing for jump/call labels.
2522
2523 Any instruction that has a memory operand @emph{must} specify its size (byte,
2524 word, or long) with an opcode suffix (@samp{b}, @samp{w}, or @samp{l},
2525 respectively).
2526
2527 @subsection Handling of Jump Instructions
2528 Jump instructions are always optimized to use the smallest possible
2529 displacements. This is accomplished by using byte (8-bit) displacement
2530 jumps whenever the target is sufficiently close. If a byte displacement
2531 is insufficient a long (32-bit) displacement is used. We do not support
2532 word (16-bit) displacement jumps (i.e. prefixing the jump instruction
2533 with the @samp{addr16} opcode prefix), since the 80386 insists upon masking
2534 @samp{%eip} to 16 bits after the word displacement is added.
2535
2536 Note that the @samp{jcxz}, @samp{jecxz}, @samp{loop}, @samp{loopz},
2537 @samp{loope}, @samp{loopnz} and @samp{loopne} instructions only come in
2538 byte displacements, so that it is possible that use of these
2539 instructions (@code{GCC} does not use them) will cause the assembler to
2540 print an error message (and generate incorrect code). The AT&T 80386
2541 assembler tries to get around this problem by expanding @samp{jcxz foo} to
2542 @example
2543 jcxz cx_zero
2544 jmp cx_nonzero
2545 cx_zero: jmp foo
2546 cx_nonzero:
2547 @end example
2548
2549 @subsection Floating Point
2550 All 80387 floating point types except packed BCD are supported.
2551 (BCD support may be added without much difficulty). These data
2552 types are 16-, 32-, and 64- bit integers, and single (32-bit),
2553 double (64-bit), and extended (80-bit) precision floating point.
2554 Each supported type has an opcode suffix and a constructor
2555 associated with it. Opcode suffixes specify operand's data
2556 types. Constructors build these data types into memory.
2557
2558 @itemize @bullet
2559 @item
2560 Floating point constructors are @samp{.float} or @samp{.single},
2561 @samp{.double}, and @samp{.tfloat} for 32-, 64-, and 80-bit formats.
2562 These correspond to opcode suffixes @samp{s}, @samp{l}, and @samp{t}.
2563 @samp{t} stands for temporary real, and that the 80387 only supports
2564 this format via the @samp{fldt} (load temporary real to stack top) and
2565 @samp{fstpt} (store temporary real and pop stack) instructions.
2566
2567 @item
2568 Integer constructors are @samp{.word}, @samp{.long} or @samp{.int}, and
2569 @samp{.quad} for the 16-, 32-, and 64-bit integer formats. The corresponding
2570 opcode suffixes are @samp{s} (single), @samp{l} (long), and @samp{q}
2571 (quad). As with the temporary real format the 64-bit @samp{q} format is
2572 only present in the @samp{fildq} (load quad integer to stack top) and
2573 @samp{fistpq} (store quad integer and pop stack) instructions.
2574 @end itemize
2575
2576 Register to register operations do not require opcode suffixes,
2577 so that @samp{fst %st, %st(1)} is equivalent to @samp{fstl %st, %st(1)}.
2578
2579 Since the 80387 automatically synchronizes with the 80386 @samp{fwait}
2580 instructions are almost never needed (this is not the case for the
2581 80286/80287 and 8086/8087 combinations). Therefore, @code{as} supresses
2582 the @samp{fwait} instruction whenever it is implicitly selected by one
2583 of the @samp{fn@dots{}} instructions. For example, @samp{fsave} and
2584 @samp{fnsave} are treated identically. In general, all the @samp{fn@dots{}}
2585 instructions are made equivalent to @samp{f@dots{}} instructions. If
2586 @samp{fwait} is desired it must be explicitly coded.
2587
2588 @subsection Notes
2589 There is some trickery concerning the @samp{mul} and @samp{imul}
2590 instructions that deserves mention. The 16-, 32-, and 64-bit expanding
2591 multiplies (base opcode @samp{0xf6}; extension 4 for @samp{mul} and 5
2592 for @samp{imul}) can be output only in the one operand form. Thus,
2593 @samp{imul %ebx, %eax} does @emph{not} select the expanding multiply;
2594 the expanding multiply would clobber the @samp{%edx} register, and this
2595 would confuse @code{GCC} output. Use @samp{imul %ebx} to get the
2596 64-bit product in @samp{%edx:%eax}.
2597
2598 We have added a two operand form of @samp{imul} when the first operand
2599 is an immediate mode expression and the second operand is a register.
2600 This is just a shorthand, so that, multiplying @samp{%eax} by 69, for
2601 example, can be done with @samp{imul $69, %eax} rather than @samp{imul
2602 $69, %eax, %eax}.
2603 @end ignore
2604 @c pesch@cygnus.com: we also ignore the following chapters, but for
2605 @c a different reason---internals are changing
2606 @c rapidly. These may need to be moved to another
2607 @c book anyhow, if we adopt the model of user/modifier
2608 @c books.
2609 @ignore
2610 @node Maintenance, Retargeting, Machine Dependent, top
2611 @chapter Maintaining the Assembler
2612 [[this chapter is still being built]]
2613
2614 @section Design
2615 We had these goals, in descending priority:
2616 @table @b
2617 @item Accuracy.
2618 For every program composed by a compiler, @code{as} should emit
2619 ``correct'' code. This leaves some latitude in choosing addressing
2620 modes, order of @code{relocation_info} structures in the object
2621 file, @emph{etc}.
2622
2623 @item Speed, for usual case.
2624 By far the most common use of @code{as} will be assembling compiler
2625 emissions.
2626
2627 @item Upward compatibility for existing assembler code.
2628 Well @dots{} we don't support Vax bit fields but everything else
2629 seems to be upward compatible.
2630
2631 @item Readability.
2632 The code should be maintainable with few surprises. (JF: ha!)
2633
2634 @end table
2635
2636 We assumed that disk I/O was slow and expensive while memory was
2637 fast and access to memory was cheap. We expect the in-memory data
2638 structures to be less than 10 times the size of the emitted object
2639 file. (Contrast this with the C compiler where in-memory structures
2640 might be 100 times object file size!)
2641 This suggests:
2642 @itemize @bullet
2643 @item
2644 Try to read the source file from disk only one time. For other
2645 reasons, we keep large chunks of the source file in memory during
2646 assembly so this is not a problem. Also the assembly algorithm
2647 should only scan the source text once if the compiler composed the
2648 text according to a few simple rules.
2649 @item
2650 Emit the object code bytes only once. Don't store values and then
2651 backpatch later.
2652 @item
2653 Build the object file in memory and do direct writes to disk of
2654 large buffers.
2655 @end itemize
2656
2657 RMS suggested a one-pass algorithm which seems to work well. By not
2658 parsing text during a second pass considerable time is saved on
2659 large programs (@emph{e.g.} the sort of C program @code{yacc} would
2660 emit).
2661
2662 It happened that the data structures needed to emit relocation
2663 information to the object file were neatly subsumed into the data
2664 structures that do backpatching of addresses after pass 1.
2665
2666 Many of the functions began life as re-usable modules, loosely
2667 connected. RMS changed this to gain speed. For example, input
2668 parsing routines which used to work on pre-sanitized strings now
2669 must parse raw data. Hence they have to import knowledge of the
2670 assemblers' comment conventions @emph{etc}.
2671
2672 @section Deprecated Feature(?)s
2673 We have stopped supporting some features:
2674 @itemize @bullet
2675 @item
2676 @code{.org} statements must have @b{defined} expressions.
2677 @item
2678 Vax Bit fields (@kbd{:} operator) are entirely unsupported.
2679 @end itemize
2680
2681 It might be a good idea to not support these features in a future release:
2682 @itemize @bullet
2683 @item
2684 @kbd{#} should begin a comment, even in column 1.
2685 @item
2686 Why support the logical line & file concept any more?
2687 @item
2688 Subsegments are a good candidate for flushing.
2689 Depends on which compilers need them I guess.
2690 @end itemize
2691
2692 @section Bugs, Ideas, Further Work
2693 Clearly the major improvement is DON'T USE A TEXT-READING
2694 ASSEMBLER for the back end of a compiler. It is much faster to
2695 interpret binary gobbledygook from a compiler's tables than to
2696 ask the compiler to write out human-readable code just so the
2697 assembler can parse it back to binary.
2698
2699 Assuming you use @code{as} for human written programs: here are
2700 some ideas:
2701 @itemize @bullet
2702 @item
2703 Document (here) @code{APP}.
2704 @item
2705 Take advantage of knowing no spaces except after opcode
2706 to speed up @code{as}. (Modify @code{app.c} to flush useless spaces:
2707 only keep space/tabs at begin of line or between 2
2708 symbols.)
2709 @item
2710 Put pointers in this documentation to @file{a.out} documentation.
2711 @item
2712 Split the assembler into parts so it can gobble direct binary
2713 from @emph{e.g.} @code{cc}. It is silly for@code{cc} to compose text
2714 just so @code{as} can parse it back to binary.
2715 @item
2716 Rewrite hash functions: I want a more modular, faster library.
2717 @item
2718 Clean up LOTS of code.
2719 @item
2720 Include all the non-@file{.c} files in the maintenance chapter.
2721 @item
2722 Document flonums.
2723 @item
2724 Implement flonum short literals.
2725 @item
2726 Change all talk of expression operands to expression quantities,
2727 or perhaps to expression arguments.
2728 @item
2729 Implement pass 2.
2730 @item
2731 Whenever a @code{.text} or @code{.data} statement is seen, we close
2732 of the current frag with an imaginary @code{.fill 0}. This is
2733 because we only have one obstack for frags, and we can't grow new
2734 frags for a new subsegment, then go back to the old subsegment and
2735 append bytes to the old frag. All this nonsense goes away if we
2736 give each subsegment its own obstack. It makes code simpler in
2737 about 10 places, but nobody has bothered to do it because C compiler
2738 output rarely changes subsegments (compared to ending frags with
2739 relaxable addresses, which is common).
2740 @end itemize
2741
2742 @section Sources
2743 @c The following files in the @file{as} directory
2744 @c are symbolic links to other files, of
2745 @c the same name, in a different directory.
2746 @c @itemize @bullet
2747 @c @item
2748 @c @file{atof_generic.c}
2749 @c @item
2750 @c @file{atof_vax.c}
2751 @c @item
2752 @c @file{flonum_const.c}
2753 @c @item
2754 @c @file{flonum_copy.c}
2755 @c @item
2756 @c @file{flonum_get.c}
2757 @c @item
2758 @c @file{flonum_multip.c}
2759 @c @item
2760 @c @file{flonum_normal.c}
2761 @c @item
2762 @c @file{flonum_print.c}
2763 @c @end itemize
2764
2765 Here is a list of the source files in the @file{as} directory.
2766
2767 @table @file
2768 @item app.c
2769 This contains the pre-processing phase, which deletes comments,
2770 handles whitespace, etc. This was recently re-written, since app
2771 used to be a separate program, but RMS wanted it to be inline.
2772
2773 @item append.c
2774 This is a subroutine to append a string to another string returning a
2775 pointer just after the last @code{char} appended. (JF: All these
2776 little routines should probably all be put in one file.)
2777
2778 @item as.c
2779 Here you will find the main program of the assembler @code{as}.
2780
2781 @item expr.c
2782 This is a branch office of @file{read.c}. This understands
2783 expressions, arguments. Inside @code{as}, arguments are called
2784 (expression) @emph{operands}. This is confusing, because we also talk
2785 (elsewhere) about instruction @emph{operands}. Also, expression
2786 operands are called @emph{quantities} explicitly to avoid confusion
2787 with instruction operands. What a mess.
2788
2789 @item frags.c
2790 This implements the @b{frag} concept. Without frags, finding the
2791 right size for branch instructions would be a lot harder.
2792
2793 @item hash.c
2794 This contains the symbol table, opcode table @emph{etc.} hashing
2795 functions.
2796
2797 @item hex_value.c
2798 This is a table of values of digits, for use in atoi() type
2799 functions. Could probably be flushed by using calls to strtol(), or
2800 something similar.
2801
2802 @item input-file.c
2803 This contains Operating system dependent source file reading
2804 routines. Since error messages often say where we are in reading
2805 the source file, they live here too. Since @code{as} is intended to
2806 run under GNU and Unix only, this might be worth flushing. Anyway,
2807 almost all C compilers support stdio.
2808
2809 @item input-scrub.c
2810 This deals with calling the pre-processor (if needed) and feeding the
2811 chunks back to the rest of the assembler the right way.
2812
2813 @item messages.c
2814 This contains operating system independent parts of fatal and
2815 warning message reporting. See @file{append.c} above.
2816
2817 @item output-file.c
2818 This contains operating system dependent functions that write an
2819 object file for @code{as}. See @file{input-file.c} above.
2820
2821 @item read.c
2822 This implements all the directives of @code{as}. This also deals
2823 with passing input lines to the machine dependent part of the
2824 assembler.
2825
2826 @item strstr.c
2827 This is a C library function that isn't in most C libraries yet.
2828 See @file{append.c} above.
2829
2830 @item subsegs.c
2831 This implements subsegments.
2832
2833 @item symbols.c
2834 This implements symbols.
2835
2836 @item write.c
2837 This contains the code to perform relaxation, and to write out
2838 the object file. It is mostly operating system independent, but
2839 different OSes have different object file formats in any case.
2840
2841 @item xmalloc.c
2842 This implements @code{malloc()} or bust. See @file{append.c} above.
2843
2844 @item xrealloc.c
2845 This implements @code{realloc()} or bust. See @file{append.c} above.
2846
2847 @item atof-generic.c
2848 The following files were taken from a machine-independent subroutine
2849 library for manipulating floating point numbers and very large
2850 integers.
2851
2852 @file{atof-generic.c} turns a string into a flonum internal format
2853 floating-point number.
2854
2855 @item flonum-const.c
2856 This contains some potentially useful floating point numbers in
2857 flonum format.
2858
2859 @item flonum-copy.c
2860 This copies a flonum.
2861
2862 @item flonum-multip.c
2863 This multiplies two flonums together.
2864
2865 @item bignum-copy.c
2866 This copies a bignum.
2867
2868 @end table
2869
2870 Here is a table of all the machine-specific files (this includes
2871 both source and header files). Typically, there is a
2872 @var{machine}.c file, a @var{machine}-opcode.h file, and an
2873 atof-@var{machine}.c file. The @var{machine}-opcode.h file should
2874 be identical to the one used by GDB (which uses it for disassembly.)
2875
2876 @table @file
2877
2878 @item atof-ieee.c
2879 This contains code to turn a flonum into a ieee literal constant.
2880 This is used by tye 680x0, 32x32, sparc, and i386 versions of @code{as}.
2881
2882 @item i386-opcode.h
2883 This is the opcode-table for the i386 version of the assembler.
2884
2885 @item i386.c
2886 This contains all the code for the i386 version of the assembler.
2887
2888 @item i386.h
2889 This defines constants and macros used by the i386 version of the assembler.
2890
2891 @item m-generic.h
2892 generic 68020 header file. To be linked to m68k.h on a
2893 non-sun3, non-hpux system.
2894
2895 @item m-sun2.h
2896 68010 header file for Sun2 workstations. Not well tested. To be linked
2897 to m68k.h on a sun2. (See also @samp{-DSUN_ASM_SYNTAX} in the
2898 @file{Makefile}.)
2899
2900 @item m-sun3.h
2901 68020 header file for Sun3 workstations. To be linked to m68k.h before
2902 compiling on a Sun3 system. (See also @samp{-DSUN_ASM_SYNTAX} in the
2903 @file{Makefile}.)
2904
2905 @item m-hpux.h
2906 68020 header file for a HPUX (system 5?) box. Which box, which
2907 version of HPUX, etc? I don't know.
2908
2909 @item m68k.h
2910 A hard- or symbolic- link to one of @file{m-generic.h},
2911 @file{m-hpux.h} or @file{m-sun3.h} depending on which kind of
2912 680x0 you are assembling for. (See also @samp{-DSUN_ASM_SYNTAX} in the
2913 @file{Makefile}.)
2914
2915 @item m68k-opcode.h
2916 Opcode table for 68020. This is now a link to the opcode table
2917 in the @code{GDB} source directory.
2918
2919 @item m68k.c
2920 All the mc680x0 code, in one huge, slow-to-compile file.
2921
2922 @item ns32k.c
2923 This contains the code for the ns32032/ns32532 version of the
2924 assembler.
2925
2926 @item ns32k-opcode.h
2927 This contains the opcode table for the ns32032/ns32532 version
2928 of the assembler.
2929
2930 @item vax-inst.h
2931 Vax specific file for describing Vax operands and other Vax-ish things.
2932
2933 @item vax-opcode.h
2934 Vax opcode table.
2935
2936 @item vax.c
2937 Vax specific parts of @code{as}. Also includes the former files
2938 @file{vax-ins-parse.c}, @file{vax-reg-parse.c} and @file{vip-op.c}.
2939
2940 @item atof-vax.c
2941 Turns a flonum into a Vax constant.
2942
2943 @item vms.c
2944 This file contains the special code needed to put out a VMS
2945 style object file for the Vax.
2946
2947 @end table
2948
2949 Here is a list of the header files in the source directory.
2950 (Warning: This section may not be very accurate. I didn't
2951 write the header files; I just report them.) Also note that I
2952 think many of these header files could be cleaned up or
2953 eliminated.
2954
2955 @table @file
2956
2957 @item a.out.h
2958 This describes the structures used to create the binary header data
2959 inside the object file. Perhaps we should use the one in
2960 @file{/usr/include}?
2961
2962 @item as.h
2963 This defines all the globally useful things, and pulls in <stdio.h>
2964 and <assert.h>.
2965
2966 @item bignum.h
2967 This defines macros useful for dealing with bignums.
2968
2969 @item expr.h
2970 Structure and macros for dealing with expression()
2971
2972 @item flonum.h
2973 This defines the structure for dealing with floating point
2974 numbers. It #includes @file{bignum.h}.
2975
2976 @item frags.h
2977 This contains macro for appending a byte to the current frag.
2978
2979 @item hash.h
2980 Structures and function definitions for the hashing functions.
2981
2982 @item input-file.h
2983 Function headers for the input-file.c functions.
2984
2985 @item md.h
2986 structures and function headers for things defined in the
2987 machine dependent part of the assembler.
2988
2989 @item obstack.h
2990 This is the GNU systemwide include file for manipulating obstacks.
2991 Since nobody is running under real GNU yet, we include this file.
2992
2993 @item read.h
2994 Macros and function headers for reading in source files.
2995
2996 @item struct-symbol.h
2997 Structure definition and macros for dealing with the gas
2998 internal form of a symbol.
2999
3000 @item subsegs.h
3001 structure definition for dealing with the numbered subsegments
3002 of the text and data segments.
3003
3004 @item symbols.h
3005 Macros and function headers for dealing with symbols.
3006
3007 @item write.h
3008 Structure for doing segment fixups.
3009 @end table
3010
3011 @comment ~subsection Test Directory
3012 @comment (Note: The test directory seems to have disappeared somewhere
3013 @comment along the line. If you want it, you'll probably have to find a
3014 @comment REALLY OLD dump tape~dots{})
3015 @comment
3016 @comment The ~file{test/} directory is used for regression testing.
3017 @comment After you modify ~code{as}, you can get a quick go/nogo
3018 @comment confidence test by running the new ~code{as} over the source
3019 @comment files in this directory. You use a shell script ~file{test/do}.
3020 @comment
3021 @comment The tests in this suite are evolving. They are not comprehensive.
3022 @comment They have, however, caught hundreds of bugs early in the debugging
3023 @comment cycle of ~code{as}. Most test statements in this suite were naturally
3024 @comment selected: they were used to demonstrate actual ~code{as} bugs rather
3025 @comment than being written ~i{a prioi}.
3026 @comment
3027 @comment Another testing suggestion: over 30 bugs have been found simply by
3028 @comment running examples from this manual through ~code{as}.
3029 @comment Some examples in this manual are selected
3030 @comment to distinguish boundary conditions; they are good for testing ~code{as}.
3031 @comment
3032 @comment ~subsubsection Regression Testing
3033 @comment Each regression test involves assembling a file and comparing the
3034 @comment actual output of ~code{as} to ``known good'' output files. Both
3035 @comment the object file and the error/warning message file (stderr) are
3036 @comment inspected. Optionally ~code{as}' exit status may be checked.
3037 @comment Discrepencies are reported. Each discrepency means either that
3038 @comment you broke some part of ~code{as} or that the ``known good'' files
3039 @comment are now out of date and should be changed to reflect the new
3040 @comment definition of ``good''.
3041 @comment
3042 @comment Each regression test lives in its own directory, in a tree
3043 @comment rooted in the directory ~file{test/}. Each such directory
3044 @comment has a name ending in ~file{.ret}, where `ret' stands for
3045 @comment REgression Test. The ~file{.ret} ending allows ~code{find
3046 @comment (1)} to find all regression tests in the tree, without
3047 @comment needing to list them explicitly.
3048 @comment
3049 @comment Any ~file{.ret} directory must contain a file called
3050 @comment ~file{input} which is the source file to assemble. During
3051 @comment testing an object file ~file{output} is created, as well as
3052 @comment a file ~file{stdouterr} which contains the output to both
3053 @comment stderr and stderr. If there is a file ~file{output.good} in
3054 @comment the directory, and if ~file{output} contains exactly the
3055 @comment same data as ~file{output.good}, the file ~file{output} is
3056 @comment deleted. Likewise ~file{stdouterr} is removed if it exactly
3057 @comment matches a file ~file{stdouterr.good}. If file
3058 @comment ~file{status.good} is present, containing a decimal number
3059 @comment before a newline, the exit status of ~code{as} is compared
3060 @comment to this number. If the status numbers are not equal, a file
3061 @comment ~file{status} is written to the directory, containing the
3062 @comment actual status as a decimal number followed by newline.
3063 @comment
3064 @comment Should any of the ~file{*.good} files fail to match their corresponding
3065 @comment actual files, this is noted by a 1-line message on the screen during
3066 @comment the regression test, and you can use ~code{find (1)} to find any
3067 @comment files named ~file{status}, ~file {output} or ~file{stdouterr}.
3068 @comment
3069 @node Retargeting, License, Maintenance, top
3070 @chapter Teaching the Assembler about a New Machine
3071
3072 This chapter describes the steps required in order to make the
3073 assembler work with another machine's assembly language. This
3074 chapter is not complete, and only describes the steps in the
3075 broadest terms. You should look at the source for the
3076 currently supported machine in order to discover some of the
3077 details that aren't mentioned here.
3078
3079 You should create a new file called @file{@var{machine}.c}, and
3080 add the appropriate lines to the file @file{Makefile} so that
3081 you can compile your new version of the assembler. This should
3082 be straighforward; simply add lines similar to the ones there
3083 for the four current versions of the assembler.
3084
3085 If you want to be compatible with GDB, (and the current
3086 machine-dependent versions of the assembler), you should create
3087 a file called @file{@var{machine}-opcode.h} which should
3088 contain all the information about the names of the machine
3089 instructions, their opcodes, and what addressing modes they
3090 support. If you do this right, the assembler and GDB can share
3091 this file, and you'll only have to write it once. Note that
3092 while you're writing @code{as}, you may want to use an
3093 independent program (if you have access to one), to make sure
3094 that @code{as} is emitting the correct bytes. Since @code{as}
3095 and @code{GDB} share the opcode table, an incorrect opcode
3096 table entry may make invalid bytes look OK when you disassemble
3097 them with @code{GDB}.
3098
3099 @section Functions You will Have to Write
3100
3101 Your file @file{@var{machine}.c} should contain definitions for
3102 the following functions and variables. It will need to include
3103 some header files in order to use some of the structures
3104 defined in the machine-independent part of the assembler. The
3105 needed header files are mentioned in the descriptions of the
3106 functions that will need them.
3107
3108 @table @code
3109
3110 @item long omagic;
3111 This long integer holds the value to place at the beginning of
3112 the @file{a.out} file. It is usually @samp{OMAGIC}, except on
3113 machines that store additional information in the magic-number.
3114
3115 @item char comment_chars[];
3116 This character array holds the values of the characters that
3117 start a comment anywhere in a line. Comments are stripped off
3118 automatically by the machine independent part of the
3119 assembler. Note that the @samp{/*} will always start a
3120 comment, and that only @samp{*/} will end a comment started by
3121 @samp{*/}.
3122
3123 @item char line_comment_chars[];
3124 This character array holds the values of the chars that start a
3125 comment only if they are the first (non-whitespace) character
3126 on a line. If the character @samp{#} does not appear in this
3127 list, you may get unexpected results. (Various
3128 machine-independent parts of the assembler treat the comments
3129 @samp{#APP} and @samp{#NO_APP} specially, and assume that lines
3130 that start with @samp{#} are comments.)
3131
3132 @item char EXP_CHARS[];
3133 This character array holds the letters that can separate the
3134 mantissa and the exponent of a floating point number. Typical
3135 values are @samp{e} and @samp{E}.
3136
3137 @item char FLT_CHARS[];
3138 This character array holds the letters that--when they appear
3139 immediately after a leading zero--indicate that a number is a
3140 floating-point number. (Sort of how 0x indicates that a
3141 hexadecimal number follows.)
3142
3143 @item pseudo_typeS md_pseudo_table[];
3144 (@var{pseudo_typeS} is defined in @file{md.h})
3145 This array contains a list of the machine_dependent directives
3146 the assembler must support. It contains the name of each
3147 pseudo op (Without the leading @samp{.}), a pointer to a
3148 function to be called when that directive is encountered, and
3149 an integer argument to be passed to that function.
3150
3151 @item void md_begin(void)
3152 This function is called as part of the assembler's
3153 initialization. It should do any initialization required by
3154 any of your other routines.
3155
3156 @item int md_parse_option(char **optionPTR, int *argcPTR, char ***argvPTR)
3157 This routine is called once for each option on the command line
3158 that the machine-independent part of @code{as} does not
3159 understand. This function should return non-zero if the option
3160 pointed to by @var{optionPTR} is a valid option. If it is not
3161 a valid option, this routine should return zero. The variables
3162 @var{argcPTR} and @var{argvPTR} are provided in case the option
3163 requires a filename or something similar as an argument. If
3164 the option is multi-character, @var{optionPTR} should be
3165 advanced past the end of the option, otherwise every letter in
3166 the option will be treated as a separate single-character
3167 option.
3168
3169 @item void md_assemble(char *string)
3170 This routine is called for every machine-dependent
3171 non-directive line in the source file. It does all the real
3172 work involved in reading the opcode, parsing the operands,
3173 etc. @var{string} is a pointer to a null-terminated string,
3174 that comprises the input line, with all excess whitespace and
3175 comments removed.
3176
3177 @item void md_number_to_chars(char *outputPTR,long value,int nbytes)
3178 This routine is called to turn a C long int, short int, or char
3179 into the series of bytes that represents that number on the
3180 target machine. @var{outputPTR} points to an array where the
3181 result should be stored; @var{value} is the value to store; and
3182 @var{nbytes} is the number of bytes in 'value' that should be
3183 stored.
3184
3185 @item void md_number_to_imm(char *outputPTR,long value,int nbytes)
3186 This routine is called to turn a C long int, short int, or char
3187 into the series of bytes that represent an immediate value on
3188 the target machine. It is identical to the function @code{md_number_to_chars},
3189 except on NS32K machines.@refill
3190
3191 @item void md_number_to_disp(char *outputPTR,long value,int nbytes)
3192 This routine is called to turn a C long int, short int, or char
3193 into the series of bytes that represent an displacement value on
3194 the target machine. It is identical to the function @code{md_number_to_chars},
3195 except on NS32K machines.@refill
3196
3197 @item void md_number_to_field(char *outputPTR,long value,int nbytes)
3198 This routine is identical to @code{md_number_to_chars},
3199 except on NS32K machines.
3200
3201 @item void md_ri_to_chars(struct relocation_info *riPTR,ri)
3202 (@code{struct relocation_info} is defined in @file{a.out.h})
3203 This routine emits the relocation info in @var{ri}
3204 in the appropriate bit-pattern for the target machine.
3205 The result should be stored in the location pointed
3206 to by @var{riPTR}. This routine may be a no-op unless you are
3207 attempting to do cross-assembly.
3208
3209 @item char *md_atof(char type,char *outputPTR,int *sizePTR)
3210 This routine turns a series of digits into the appropriate
3211 internal representation for a floating-point number.
3212 @var{type} is a character from @var{FLT_CHARS[]} that describes
3213 what kind of floating point number is wanted; @var{outputPTR}
3214 is a pointer to an array that the result should be stored in;
3215 and @var{sizePTR} is a pointer to an integer where the size (in
3216 bytes) of the result should be stored. This routine should
3217 return an error message, or an empty string (not (char *)0) for
3218 success.
3219
3220 @item int md_short_jump_size;
3221 This variable holds the (maximum) size in bytes of a short (16
3222 bit or so) jump created by @code{md_create_short_jump()}. This
3223 variable is used as part of the broken-word feature, and isn't
3224 needed if the assembler is compiled with
3225 @samp{-DWORKING_DOT_WORD}.
3226
3227 @item int md_long_jump_size;
3228 This variable holds the (maximum) size in bytes of a long (32
3229 bit or so) jump created by @code{md_create_long_jump()}. This
3230 variable is used as part of the broken-word feature, and isn't
3231 needed if the assembler is compiled with
3232 @samp{-DWORKING_DOT_WORD}.
3233
3234 @item void md_create_short_jump(char *resultPTR,long from_addr,
3235 @code{long to_addr,fragS *frag,symbolS *to_symbol)}
3236 This function emits a jump from @var{from_addr} to @var{to_addr} in
3237 the array of bytes pointed to by @var{resultPTR}. If this creates a
3238 type of jump that must be relocated, this function should call
3239 @code{fix_new()} with @var{frag} and @var{to_symbol}. The jump
3240 emitted by this function may be smaller than @var{md_short_jump_size},
3241 but it must never create a larger one.
3242 (If it creates a smaller jump, the extra bytes of memory will not be
3243 used.) This function is used as part of the broken-word feature,
3244 and isn't needed if the assembler is compiled with
3245 @samp{-DWORKING_DOT_WORD}.@refill
3246
3247 @item void md_create_long_jump(char *ptr,long from_addr,
3248 @code{long to_addr,fragS *frag,symbolS *to_symbol)}
3249 This function is similar to the previous function,
3250 @code{md_create_short_jump()}, except that it creates a long
3251 jump instead of a short one. This function is used as part of
3252 the broken-word feature, and isn't needed if the assembler is
3253 compiled with @samp{-DWORKING_DOT_WORD}.
3254
3255 @item int md_estimate_size_before_relax(fragS *fragPTR,int segment_type)
3256 This function does the initial setting up for relaxation. This
3257 includes forcing references to still-undefined symbols to the
3258 appropriate addressing modes.
3259
3260 @item relax_typeS md_relax_table[];
3261 (relax_typeS is defined in md.h)
3262 This array describes the various machine dependent states a
3263 frag may be in before relaxation. You will need one group of
3264 entries for each type of addressing mode you intend to relax.
3265
3266 @item void md_convert_frag(fragS *fragPTR)
3267 (@var{fragS} is defined in @file{as.h})
3268 This routine does the required cleanup after relaxation.
3269 Relaxation has changed the type of the frag to a type that can
3270 reach its destination. This function should adjust the opcode
3271 of the frag to use the appropriate addressing mode.
3272 @var{fragPTR} points to the frag to clean up.
3273
3274 @item void md_end(void)
3275 This function is called just before the assembler exits. It
3276 need not free up memory unless the operating system doesn't do
3277 it automatically on exit. (In which case you'll also have to
3278 track down all the other places where the assembler allocates
3279 space but never frees it.)
3280
3281 @end table
3282
3283 @section External Variables You will Need to Use
3284
3285 You will need to refer to or change the following external variables
3286 from within the machine-dependent part of the assembler.
3287
3288 @table @code
3289 @item extern char flagseen[];
3290 This array holds non-zero values in locations corresponding to
3291 the options that were on the command line. Thus, if the
3292 assembler was called with @samp{-W}, @var{flagseen['W']} would
3293 be non-zero.
3294
3295 @item extern fragS *frag_now;
3296 This pointer points to the current frag--the frag that bytes
3297 are currently being added to. If nothing else, you will need
3298 to pass it as an argument to various machine-independent
3299 functions. It is maintained automatically by the
3300 frag-manipulating functions; you should never have to change it
3301 yourself.
3302
3303 @item extern LITTLENUM_TYPE generic_bignum[];
3304 (@var{LITTLENUM_TYPE} is defined in @file{bignum.h}.
3305 This is where @dfn{bignums}--numbers larger than 32 bits--are
3306 returned when they are encountered in an expression. You will
3307 need to use this if you need to implement directives (or
3308 anything else) that must deal with these large numbers.
3309 @code{Bignums} are of @code{segT} @code{SEG_BIG} (defined in
3310 @file{as.h}, and have a positive @code{X_add_number}. The
3311 @code{X_add_number} of a @code{bignum} is the number of
3312 @code{LITTLENUMS} in @var{generic_bignum} that the number takes
3313 up.
3314
3315 @item extern FLONUM_TYPE generic_floating_point_number;
3316 (@var{FLONUM_TYPE} is defined in @file{flonum.h}.
3317 The is where @dfn{flonums}--floating-point numbers within
3318 expressions--are returned. @code{Flonums} are of @code{segT}
3319 @code{SEG_BIG}, and have a negative @code{X_add_number}.
3320 @code{Flonums} are returned in a generic format. You will have
3321 to write a routine to turn this generic format into the
3322 appropriate floating-point format for your machine.
3323
3324 @item extern int need_pass_2;
3325 If this variable is non-zero, the assembler has encountered an
3326 expression that cannot be assembled in a single pass. Since
3327 the second pass isn't implemented, this flag means that the
3328 assembler is punting, and is only looking for additional syntax
3329 errors. (Or something like that.)
3330
3331 @item extern segT now_seg;
3332 This variable holds the value of the segment the assembler is
3333 currently assembling into.
3334
3335 @end table
3336
3337 @section External functions will you need
3338
3339 You will find the following external functions useful (or
3340 indispensable) when you're writing the machine-dependent part
3341 of the assembler.
3342
3343 @table @code
3344
3345 @item char *frag_more(int bytes)
3346 This function allocates @var{bytes} more bytes in the current
3347 frag (or starts a new frag, if it can't expand the current frag
3348 any more.) for you to store some object-file bytes in. It
3349 returns a pointer to the bytes, ready for you to store data in.
3350
3351 @item void fix_new(fragS *frag, int where, short size, symbolS *add_symbol, symbolS *sub_symbol, long offset, int pcrel)
3352 This function stores a relocation fixup to be acted on later.
3353 @var{frag} points to the frag the relocation belongs in;
3354 @var{where} is the location within the frag where the relocation begins;
3355 @var{size} is the size of the relocation, and is usually 1 (a single byte),
3356 2 (sixteen bits), or 4 (a longword).
3357 The value @var{add_symbol} @minus{} @var{sub_symbol} + @var{offset}, is added to the byte(s)
3358 at @var{frag->literal[where]}. If @var{pcrel} is non-zero, the address of the
3359 location is subtracted from the result. A relocation entry is also added
3360 to the @file{a.out} file. @var{add_symbol}, @var{sub_symbol}, and/or
3361 @var{offset} may be NULL.@refill
3362
3363 @item char *frag_var(relax_stateT type, int max_chars, int var,
3364 @code{relax_substateT subtype, symbolS *symbol, char *opcode)}
3365 This function creates a machine-dependent frag of type @var{type}
3366 (usually @code{rs_machine_dependent}).
3367 @var{max_chars} is the maximum size in bytes that the frag may grow by;
3368 @var{var} is the current size of the variable end of the frag;
3369 @var{subtype} is the sub-type of the frag. The sub-type is used to index into
3370 @var{md_relax_table[]} during @code{relaxation}.
3371 @var{symbol} is the symbol whose value should be used to when relax-ing this frag.
3372 @var{opcode} points into a byte whose value may have to be modified if the
3373 addressing mode used by this frag changes. It typically points into the
3374 @var{fr_literal[]} of the previous frag, and is used to point to a location
3375 that @code{md_convert_frag()}, may have to change.@refill
3376
3377 @item void frag_wane(fragS *fragPTR)
3378 This function is useful from within @code{md_convert_frag}. It
3379 changes a frag to type rs_fill, and sets the variable-sized
3380 piece of the frag to zero. The frag will never change in size
3381 again.
3382
3383 @item segT expression(expressionS *retval)
3384 (@var{segT} is defined in @file{as.h}; @var{expressionS} is defined in @file{expr.h})
3385 This function parses the string pointed to by the external char
3386 pointer @var{input_line_pointer}, and returns the segment-type
3387 of the expression. It also stores the results in the
3388 @var{expressionS} pointed to by @var{retval}.
3389 @var{input_line_pointer} is advanced to point past the end of
3390 the expression. (@var{input_line_pointer} is used by other
3391 parts of the assembler. If you modify it, be sure to restore
3392 it to its original value.)
3393
3394 @item as_warn(char *message,@dots{})
3395 If warning messages are disabled, this function does nothing.
3396 Otherwise, it prints out the current file name, and the current
3397 line number, then uses @code{fprintf} to print the
3398 @var{message} and any arguments it was passed.
3399
3400 @item as_bad(char *message,@dots{})
3401 This function should be called when @code{as} encounters
3402 conditions that are bad enough that @code{as} should not
3403 produce an object file, but should continue reading input and
3404 printing warning and bad error messages.
3405
3406 @item as_fatal(char *message,@dots{})
3407 This function prints out the current file name and line number,
3408 prints the word @samp{FATAL:}, then uses @code{fprintf} to
3409 print the @var{message} and any arguments it was passed. Then
3410 the assembler exits. This function should only be used for
3411 serious, unrecoverable errors.
3412
3413 @item void float_const(int float_type)
3414 This function reads floating-point constants from the current
3415 input line, and calls @code{md_atof} to assemble them. It is
3416 useful as the function to call for the directives
3417 @samp{.single}, @samp{.double}, @samp{.float}, etc.
3418 @var{float_type} must be a character from @var{FLT_CHARS}.
3419
3420 @item void demand_empty_rest_of_line(void);
3421 This function can be used by machine-dependent directives to
3422 make sure the rest of the input line is empty. It prints a
3423 warning message if there are additional characters on the line.
3424
3425 @item long int get_absolute_expression(void)
3426 This function can be used by machine-dependent directives to
3427 read an absolute number from the current input line. It
3428 returns the result. If it isn't given an absolute expression,
3429 it prints a warning message and returns zero.
3430
3431 @end table
3432
3433
3434 @section The concept of Frags
3435
3436 This assembler works to optimize the size of certain addressing
3437 modes. (e.g. branch instructions) This means the size of many
3438 pieces of object code cannot be determined until after assembly
3439 is finished. (This means that the addresses of symbols cannot be
3440 determined until assembly is finished.) In order to do this,
3441 @code{as} stores the output bytes as @dfn{frags}.
3442
3443 Here is the definition of a frag (from @file{as.h})
3444 @example
3445 struct frag
3446 @{
3447 long int fr_fix;
3448 long int fr_var;
3449 relax_stateT fr_type;
3450 relax_substateT fr_substate;
3451 unsigned long fr_address;
3452 long int fr_offset;
3453 struct symbol *fr_symbol;
3454 char *fr_opcode;
3455 struct frag *fr_next;
3456 char fr_literal[];
3457 @}
3458 @end example
3459
3460 @table @var
3461 @item fr_fix
3462 is the size of the fixed-size piece of the frag.
3463
3464 @item fr_var
3465 is the maximum (?) size of the variable-sized piece of the frag.
3466
3467 @item fr_type
3468 is the type of the frag.
3469 Current types are:
3470 rs_fill
3471 rs_align
3472 rs_org
3473 rs_machine_dependent
3474
3475 @item fr_substate
3476 This stores the type of machine-dependent frag this is. (what
3477 kind of addressing mode is being used, and what size is being
3478 tried/will fit/etc.
3479
3480 @item fr_address
3481 @var{fr_address} is only valid after relaxation is finished.
3482 Before relaxation, the only way to store an address is (pointer
3483 to frag containing the address) plus (offset into the frag).
3484
3485 @item fr_offset
3486 This contains a number, whose meaning depends on the type of
3487 the frag.
3488 for machine_dependent frags, this contains the offset from
3489 fr_symbol that the frag wants to go to. Thus, for branch
3490 instructions it is usually zero. (unless the instruction was
3491 @samp{jba foo+12} or something like that.)
3492
3493 @item fr_symbol
3494 for machine_dependent frags, this points to the symbol the frag
3495 needs to reach.
3496
3497 @item fr_opcode
3498 This points to the location in the frag (or in a previous frag)
3499 of the opcode for the instruction that caused this to be a frag.
3500 @var{fr_opcode} is needed if the actual opcode must be changed
3501 in order to use a different form of the addressing mode.
3502 (For example, if a conditional branch only comes in size tiny,
3503 a large-size branch could be implemented by reversing the sense
3504 of the test, and turning it into a tiny branch over a large jump.
3505 This would require changing the opcode.)
3506
3507 @var{fr_literal} is a variable-size array that contains the
3508 actual object bytes. A frag consists of a fixed size piece of
3509 object data, (which may be zero bytes long), followed by a
3510 piece of object data whose size may not have been determined
3511 yet. Other information includes the type of the frag (which
3512 controls how it is relaxed),
3513
3514 @item fr_next
3515 This is the next frag in the singly-linked list. This is
3516 usually only needed by the machine-independent part of
3517 @code{as}.
3518
3519 @end table
3520 @end ignore
3521
3522 @node License, , Machine Dependent, Top
3523 @unnumbered GNU GENERAL PUBLIC LICENSE
3524 @center Version 1, February 1989
3525
3526 @display
3527 Copyright @copyright{} 1989 Free Software Foundation, Inc.
3528 675 Mass Ave, Cambridge, MA 02139, USA
3529
3530 Everyone is permitted to copy and distribute verbatim copies
3531 of this license document, but changing it is not allowed.
3532 @end display
3533
3534 @unnumberedsec Preamble
3535
3536 The license agreements of most software companies try to keep users
3537 at the mercy of those companies. By contrast, our General Public
3538 License is intended to guarantee your freedom to share and change free
3539 software---to make sure the software is free for all its users. The
3540 General Public License applies to the Free Software Foundation's
3541 software and to any other program whose authors commit to using it.
3542 You can use it for your programs, too.
3543
3544 When we speak of free software, we are referring to freedom, not
3545 price. Specifically, the General Public License is designed to make
3546 sure that you have the freedom to give away or sell copies of free
3547 software, that you receive source code or can get it if you want it,
3548 that you can change the software or use pieces of it in new free
3549 programs; and that you know you can do these things.
3550
3551 To protect your rights, we need to make restrictions that forbid
3552 anyone to deny you these rights or to ask you to surrender the rights.
3553 These restrictions translate to certain responsibilities for you if you
3554 distribute copies of the software, or if you modify it.
3555
3556 For example, if you distribute copies of a such a program, whether
3557 gratis or for a fee, you must give the recipients all the rights that
3558 you have. You must make sure that they, too, receive or can get the
3559 source code. And you must tell them their rights.
3560
3561 We protect your rights with two steps: (1) copyright the software, and
3562 (2) offer you this license which gives you legal permission to copy,
3563 distribute and/or modify the software.
3564
3565 Also, for each author's protection and ours, we want to make certain
3566 that everyone understands that there is no warranty for this free
3567 software. If the software is modified by someone else and passed on, we
3568 want its recipients to know that what they have is not the original, so
3569 that any problems introduced by others will not reflect on the original
3570 authors' reputations.
3571
3572 The precise terms and conditions for copying, distribution and
3573 modification follow.
3574
3575 @iftex
3576 @unnumberedsec TERMS AND CONDITIONS
3577 @end iftex
3578 @ifinfo
3579 @center TERMS AND CONDITIONS
3580 @end ifinfo
3581
3582 @enumerate
3583 @item
3584 This License Agreement applies to any program or other work which
3585 contains a notice placed by the copyright holder saying it may be
3586 distributed under the terms of this General Public License. The
3587 ``Program'', below, refers to any such program or work, and a ``work based
3588 on the Program'' means either the Program or any work containing the
3589 Program or a portion of it, either verbatim or with modifications. Each
3590 licensee is addressed as ``you''.
3591
3592 @item
3593 You may copy and distribute verbatim copies of the Program's source
3594 code as you receive it, in any medium, provided that you conspicuously and
3595 appropriately publish on each copy an appropriate copyright notice and
3596 disclaimer of warranty; keep intact all the notices that refer to this
3597 General Public License and to the absence of any warranty; and give any
3598 other recipients of the Program a copy of this General Public License
3599 along with the Program. You may charge a fee for the physical act of
3600 transferring a copy.
3601
3602 @item
3603 You may modify your copy or copies of the Program or any portion of
3604 it, and copy and distribute such modifications under the terms of Paragraph
3605 1 above, provided that you also do the following:
3606
3607 @itemize @bullet
3608 @item
3609 cause the modified files to carry prominent notices stating that
3610 you changed the files and the date of any change; and
3611
3612 @item
3613 cause the whole of any work that you distribute or publish, that
3614 in whole or in part contains the Program or any part thereof, either
3615 with or without modifications, to be licensed at no charge to all
3616 third parties under the terms of this General Public License (except
3617 that you may choose to grant warranty protection to some or all
3618 third parties, at your option).
3619
3620 @item
3621 If the modified program normally reads commands interactively when
3622 run, you must cause it, when started running for such interactive use
3623 in the simplest and most usual way, to print or display an
3624 announcement including an appropriate copyright notice and a notice
3625 that there is no warranty (or else, saying that you provide a
3626 warranty) and that users may redistribute the program under these
3627 conditions, and telling the user how to view a copy of this General
3628 Public License.
3629
3630 @item
3631 You may charge a fee for the physical act of transferring a
3632 copy, and you may at your option offer warranty protection in
3633 exchange for a fee.
3634 @end itemize
3635
3636 Mere aggregation of another independent work with the Program (or its
3637 derivative) on a volume of a storage or distribution medium does not bring
3638 the other work under the scope of these terms.
3639
3640 @item
3641 You may copy and distribute the Program (or a portion or derivative of
3642 it, under Paragraph 2) in object code or executable form under the terms of
3643 Paragraphs 1 and 2 above provided that you also do one of the following:
3644
3645 @itemize @bullet
3646 @item
3647 accompany it with the complete corresponding machine-readable
3648 source code, which must be distributed under the terms of
3649 Paragraphs 1 and 2 above; or,
3650
3651 @item
3652 accompany it with a written offer, valid for at least three
3653 years, to give any third party free (except for a nominal charge
3654 for the cost of distribution) a complete machine-readable copy of the
3655 corresponding source code, to be distributed under the terms of
3656 Paragraphs 1 and 2 above; or,
3657
3658 @item
3659 accompany it with the information you received as to where the
3660 corresponding source code may be obtained. (This alternative is
3661 allowed only for noncommercial distribution and only if you
3662 received the program in object code or executable form alone.)
3663 @end itemize
3664
3665 Source code for a work means the preferred form of the work for making
3666 modifications to it. For an executable file, complete source code means
3667 all the source code for all modules it contains; but, as a special
3668 exception, it need not include source code for modules which are standard
3669 libraries that accompany the operating system on which the executable
3670 file runs, or for standard header files or definitions files that
3671 accompany that operating system.
3672
3673 @item
3674 You may not copy, modify, sublicense, distribute or transfer the
3675 Program except as expressly provided under this General Public License.
3676 Any attempt otherwise to copy, modify, sublicense, distribute or transfer
3677 the Program is void, and will automatically terminate your rights to use
3678 the Program under this License. However, parties who have received
3679 copies, or rights to use copies, from you under this General Public
3680 License will not have their licenses terminated so long as such parties
3681 remain in full compliance.
3682
3683 @item
3684 By copying, distributing or modifying the Program (or any work based
3685 on the Program) you indicate your acceptance of this license to do so,
3686 and all its terms and conditions.
3687
3688 @item
3689 Each time you redistribute the Program (or any work based on the
3690 Program), the recipient automatically receives a license from the original
3691 licensor to copy, distribute or modify the Program subject to these
3692 terms and conditions. You may not impose any further restrictions on the
3693 recipients' exercise of the rights granted herein.
3694
3695 @item
3696 The Free Software Foundation may publish revised and/or new versions
3697 of the General Public License from time to time. Such new versions will
3698 be similar in spirit to the present version, but may differ in detail to
3699 address new problems or concerns.
3700
3701 Each version is given a distinguishing version number. If the Program
3702 specifies a version number of the license which applies to it and ``any
3703 later version'', you have the option of following the terms and conditions
3704 either of that version or of any later version published by the Free
3705 Software Foundation. If the Program does not specify a version number of
3706 the license, you may choose any version ever published by the Free Software
3707 Foundation.
3708
3709 @item
3710 If you wish to incorporate parts of the Program into other free
3711 programs whose distribution conditions are different, write to the author
3712 to ask for permission. For software which is copyrighted by the Free
3713 Software Foundation, write to the Free Software Foundation; we sometimes
3714 make exceptions for this. Our decision will be guided by the two goals
3715 of preserving the free status of all derivatives of our free software and
3716 of promoting the sharing and reuse of software generally.
3717
3718 @iftex
3719 @heading NO WARRANTY
3720 @end iftex
3721 @ifinfo
3722 @center NO WARRANTY
3723 @end ifinfo
3724
3725 @item
3726 BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
3727 FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
3728 OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
3729 PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
3730 OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
3731 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
3732 TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
3733 PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
3734 REPAIR OR CORRECTION.
3735
3736 @item
3737 IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL
3738 ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
3739 REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
3740 INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES
3741 ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT
3742 LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES
3743 SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE
3744 WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
3745 ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
3746 @end enumerate
3747
3748 @iftex
3749 @heading END OF TERMS AND CONDITIONS
3750 @end iftex
3751 @ifinfo
3752 @center END OF TERMS AND CONDITIONS
3753 @end ifinfo
3754
3755 @page
3756 @unnumberedsec Appendix: How to Apply These Terms to Your New Programs
3757
3758 If you develop a new program, and you want it to be of the greatest
3759 possible use to humanity, the best way to achieve this is to make it
3760 free software which everyone can redistribute and change under these
3761 terms.
3762
3763 To do so, attach the following notices to the program. It is safest to
3764 attach them to the start of each source file to most effectively convey
3765 the exclusion of warranty; and each file should have at least the
3766 ``copyright'' line and a pointer to where the full notice is found.
3767
3768 @smallexample
3769 @var{one line to give the program's name and a brief idea of what it does.}
3770 Copyright (C) 19@var{yy} @var{name of author}
3771
3772 This program is free software; you can redistribute it and/or modify
3773 it under the terms of the GNU General Public License as published by
3774 the Free Software Foundation; either version 1, or (at your option)
3775 any later version.
3776
3777 This program is distributed in the hope that it will be useful,
3778 but WITHOUT ANY WARRANTY; without even the implied warranty of
3779 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
3780 GNU General Public License for more details.
3781
3782 You should have received a copy of the GNU General Public License
3783 along with this program; if not, write to the Free Software
3784 Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
3785 @end smallexample
3786
3787 Also add information on how to contact you by electronic and paper mail.
3788
3789 If the program is interactive, make it output a short notice like this
3790 when it starts in an interactive mode:
3791
3792 @smallexample
3793 Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author}
3794 Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
3795 This is free software, and you are welcome to redistribute it
3796 under certain conditions; type `show c' for details.
3797 @end smallexample
3798
3799 The hypothetical commands `show w' and `show c' should show the
3800 appropriate parts of the General Public License. Of course, the
3801 commands you use may be called something other than `show w' and `show
3802 c'; they could even be mouse-clicks or menu items---whatever suits your
3803 program.
3804
3805 You should also get your employer (if you work as a programmer) or your
3806 school, if any, to sign a ``copyright disclaimer'' for the program, if
3807 necessary. Here a sample; alter the names:
3808
3809 @example
3810 Yoyodyne, Inc., hereby disclaims all copyright interest in the
3811 program `Gnomovision' (a program to direct compilers to make passes
3812 at assemblers) written by James Hacker.
3813
3814 @var{signature of Ty Coon}, 1 April 1989
3815 Ty Coon, President of Vice
3816 @end example
3817
3818 That's all there is to it!
3819
3820
3821 @summarycontents
3822 @contents
3823 @bye