@page
@node Top, Conventions,, (DIR)
-@chapter Cpplib - the core of the GNU C Preprocessor
+@chapter Cpplib---the core of the GNU C Preprocessor
The GNU C preprocessor in GCC 3.0 has been completely rewritten. It is
now implemented as a library, cpplib, so it can be easily shared between
@cindex interface
@cindex header files
-cpplib has two interfaces - one is exposed internally only, and the
+cpplib has two interfaces---one is exposed internally only, and the
other is for both internal and external use.
The convention is that functions and types that are exposed to multiple
files internally are prefixed with @samp{_cpp_}, and are to be found in
-the file @samp{cpphash.h}. Functions and types exposed to external
-clients are in @samp{cpplib.h}, and prefixed with @samp{cpp_}. For
+the file @file{cpphash.h}. Functions and types exposed to external
+clients are in @file{cpplib.h}, and prefixed with @samp{cpp_}. For
historical reasons this is no longer quite true, but we should strive to
stick to it.
-We are striving to reduce the information exposed in cpplib.h to the
+We are striving to reduce the information exposed in @file{cpplib.h} to the
bare minimum necessary, and then to keep it there. This makes clear
exactly what external clients are entitled to assume, and allows us to
change internals in the future without worrying whether library clients
@cindex lexer
@cindex tokens
-The lexer is contained in the file @samp{cpplex.c}. We want to have a
+The lexer is contained in the file @file{cpplex.c}. We want to have a
lexer that is single-pass, for efficiency reasons. We would also like
the lexer to only step forwards through the input files, and not step
back. This will make future changes to support different character
the trigraph @samp{??/} to introduce an escaped newline.
Escaped newlines are tedious because theoretically they can occur
-anywhere - between the @samp{+} and @samp{=} of the @samp{+=} token,
+anywhere---between the @samp{+} and @samp{=} of the @samp{+=} token,
within the characters of an identifier, and even between the @samp{*}
and @samp{/} that terminates a comment. Moreover, you cannot be sure
-there is just one - there might be an arbitrarily long sequence of them.
+there is just one---there might be an arbitrarily long sequence of them.
So the routine @samp{parse_identifier}, that lexes an identifier, cannot
assume that it can scan forwards until the first non-identifier
which returns the first character after any intervening newlines.
The lexer needs to keep track of the correct column position,
-including counting tabs as specified by the @samp{-ftabstop=} option.
+including counting tabs as specified by the @option{-ftabstop=} option.
This should be done even within comments; C-style comments can appear in
the middle of a line, and we want to report diagnostics in the correct
position for text appearing after the end of the comment.
backwards in the input stream. Currently @samp{skip_escaped_newlines}
does step back, though with care it should be possible to adjust it so
that this does not happen. For example, one tricky issue is if we meet
-a trigraph, but the command line option @samp{-trigraphs} is not in
-force but @samp{-Wtrigraphs} is, we need to warn about it but then
+a trigraph, but the command line option @option{-trigraphs} is not in
+force but @option{-Wtrigraphs} is, we need to warn about it but then
buffer it and continue to treat it as 3 separate characters.
@node Whitespace, Hash Nodes, Lexer, Top
argument are all flagged @samp{AVOID_LPASTE} by the macro expander.
If a token flagged in this way does not have a @samp{PREV_WHITE} flag,
-and the routine @var{cpp_avoid_paste} determines that it might be
+and the routine @code{cpp_avoid_paste} determines that it might be
misinterpreted by the lexer if a space is not inserted between it and
the immediately preceding token, then stand-alone CPP's output routines
will insert a space between them. To avoid excessive spacing,
-@var{cpp_avoid_paste} tries hard to only request a space if one is
+@code{cpp_avoid_paste} tries hard to only request a space if one is
likely to be necessary, but for reasons of efficiency it is slightly
conservative and might recommend a space where one is not strictly
needed.
newlines appearing in the macro's arguments are interpreted as a single
space, with the result that the macro's replacement appears in full on
the same line that the macro name appeared in the source file. This is
-particularly important for stringification of arguments - newlines
+particularly important for stringification of arguments---newlines
embedded in the arguments must appear in the string as spaces.
@end itemize
-The source file location is maintained in the @var{lineno} member of the
-@var{cpp_buffer} structure, and the column number inferred from the
-current position in the buffer relative to the @var{line_base} buffer
+The source file location is maintained in the @code{lineno} member of the
+@code{cpp_buffer} structure, and the column number inferred from the
+current position in the buffer relative to the @code{line_base} buffer
variable, which is updated with every newline whether escaped or not.
TODO: Finish this.
@cindex assertions
@cindex named operators
-When cpplib encounters an "identifier", it generates a hash code for it
-and stores it in the hash table. By "identifier" we mean tokens with
+When cpplib encounters an ``identifier'', it generates a hash code for it
+and stores it in the hash table. By ``identifier'' we mean tokens with
type @samp{CPP_NAME}; this includes identifiers in the usual C sense, as
well as keywords, directive names, macro names and so on. For example,
-all of "pragma", "int", "foo" and "__GNUC__" are identifiers and hashed
+all of @samp{pragma}, @samp{int}, @samp{foo} and @samp{__GNUC__} are identifiers and hashed
when lexed.
Each node in the hash table contain various information about the
@item Void
-Everything else falls into this category - an identifier that is not
+Everything else falls into this category---an identifier that is not
currently a macro, or a macro that has since been undefined with
@code{#undef}.
@cindex files
Fairly obviously, the file handling code of cpplib resides in the file
-@samp{cppfiles.c}. It takes care of the details of file searching,
+@file{cppfiles.c}. It takes care of the details of file searching,
opening, reading and caching, for both the main source file and all the
headers it recursively includes.