* cppinternals.texi: Update for file handling.

author Neil Booth <neil@daikokuya.demon.co.uk>

Mon, 12 Mar 2001 23:53:35 +0000 (23:53 +0000)

committer Neil Booth <neil@gcc.gnu.org>

Mon, 12 Mar 2001 23:53:35 +0000 (23:53 +0000)
author Neil Booth <neil@daikokuya.demon.co.uk>
Mon, 12 Mar 2001 23:53:35 +0000 (23:53 +0000)
committer Neil Booth <neil@gcc.gnu.org>
Mon, 12 Mar 2001 23:53:35 +0000 (23:53 +0000)
diff --git a/gcc/ChangeLog b/gcc/ChangeLog

index 912cd5f3cb1772b25d0079e1d27ee08256dad1ac..2e3fa88eed7bcb85e5ad3ab73510fa15848549b7 100644 (file)
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2001-03-12  Neil Booth  <neil@daikokuya.demon.co.uk>
+
+       * cppinternals.texi: Update for file handling.
+
  2001-03-12  Jeffrey Oldham  <oldham@codesourcery.com>
  
         * emit-rtl.c (remove_unnecessary_notes): Reverse Richard Kenner's
  2001-03-12  Jeffrey Oldham  <oldham@codesourcery.com>
  
         * emit-rtl.c (remove_unnecessary_notes): Reverse Richard Kenner's
diff --git a/gcc/cppinternals.texi b/gcc/cppinternals.texi

index 54560b76cef187ac8742f5f25abb0f10d501e8d7..c60a9a8d32bf4ed742657abf74fbbd1b81f19e5d 100644 (file)
--- a/gcc/cppinternals.texi
+++ b/gcc/cppinternals.texi
@@ -184,7 +184,7 @@ problem.
  
  Another place where state flags are used to change behaviour is whilst
  parsing header names.  Normally, a @samp{<} would be lexed as a single
  
  Another place where state flags are used to change behaviour is whilst
  parsing header names.  Normally, a @samp{<} would be lexed as a single
-token.  After a @samp{#include} directive, though, it should be lexed
+token.  After a @code{#include} directive, though, it should be lexed
  as a single token as far as the nearest @samp{>} character.  Note that
  we don't allow the terminators of header names to be escaped; the first
  @samp{"} or @samp{>} terminates the header name.
  as a single token as far as the nearest @samp{>} character.  Note that
  we don't allow the terminators of header names to be escaped; the first
  @samp{"} or @samp{>} terminates the header name.
@@ -311,7 +311,7 @@ time, each identifier falls into exactly one of three categories:
  @item Macros
  
  These have been declared to be macros, either on the command line or
  @item Macros
  
  These have been declared to be macros, either on the command line or
-with @samp{#define}.  A few, such as @samp{__TIME__} are builtins
+with @code{#define}.  A few, such as @samp{__TIME__} are builtins
  entered in the hash table during initialisation.  The hash node for a
  normal macro points to a structure with more information about the
  macro, such as whether it is function-like, how many arguments it takes,
  entered in the hash table during initialisation.  The hash node for a
  normal macro points to a structure with more information about the
  macro, such as whether it is function-like, how many arguments it takes,
@@ -321,7 +321,7 @@ contain an enum indicating which of the various builtin macros it is.
  @item Assertions
  
  Assertions are in a separate namespace to macros.  To enforce this, cpp
  @item Assertions
  
  Assertions are in a separate namespace to macros.  To enforce this, cpp
-actually prepends a @samp{#} character before hashing and entering it in
+actually prepends a @code{#} character before hashing and entering it in
  the hash table.  An assertion's node points to a chain of answers to
  that assertion.
  
  the hash table.  An assertion's node points to a chain of answers to
  that assertion.
  
@@ -329,21 +329,21 @@ that assertion.
  
  Everything else falls into this category - an identifier that is not
  currently a macro, or a macro that has since been undefined with
  
  Everything else falls into this category - an identifier that is not
  currently a macro, or a macro that has since been undefined with
-@samp{#undef}.
+@code{#undef}.
  
  When preprocessing C++, this category also includes the named operators,
  such as @samp{xor}.  In expressions these behave like the operators they
  represent, but in contexts where the spelling of a token matters they
  are spelt differently.  This spelling distinction is relevant when they
  
  When preprocessing C++, this category also includes the named operators,
  such as @samp{xor}.  In expressions these behave like the operators they
  represent, but in contexts where the spelling of a token matters they
  are spelt differently.  This spelling distinction is relevant when they
-are operands of the stringizing and pasting macro operators @samp{#} and
-@samp{##}.  Named operator hash nodes are flagged, both to catch the
+are operands of the stringizing and pasting macro operators @code{#} and
+@code{##}.  Named operator hash nodes are flagged, both to catch the
  spelling distinction and to prevent them from being defined as macros.
  @end itemize
  
  The same identifiers share the same hash node.  Since each identifier
  token, after lexing, contains a pointer to its hash node, this is used
  to provide rapid lookup of various information.  For example, when
  spelling distinction and to prevent them from being defined as macros.
  @end itemize
  
  The same identifiers share the same hash node.  Since each identifier
  token, after lexing, contains a pointer to its hash node, this is used
  to provide rapid lookup of various information.  For example, when
-parsing a @samp{#define} statement, CPP flags each argument's identifier
+parsing a @code{#define} statement, CPP flags each argument's identifier
  hash node with the index of that argument.  This makes duplicated
  argument checking an O(1) operation for each argument.  Similarly, for
  each identifier in the macro's expansion, lookup to see if it is an
  hash node with the index of that argument.  This makes duplicated
  argument checking an O(1) operation for each argument.  Similarly, for
  each identifier in the macro's expansion, lookup to see if it is an
@@ -353,11 +353,74 @@ enum stored in its hash node, so that directive lookup is also O(1).
  
  @node Macro Expansion, Files, Hash Nodes, Top
  @unnumbered Macro Expansion Algorithm
  
  @node Macro Expansion, Files, Hash Nodes, Top
  @unnumbered Macro Expansion Algorithm
-@printindex cp
  
  @node Files, Index, Macro Expansion, Top
  @unnumbered File Handling
  
  @node Files, Index, Macro Expansion, Top
  @unnumbered File Handling
-@printindex cp
+@cindex files
+
+Fairly obviously, the file handling code of cpplib resides in the file
+@samp{cppfiles.c}.  It takes care of the details of file searching,
+opening, reading and caching, for both the main source file and all the
+headers it recursively includes.
+
+The basic strategy is to minimize the number of system calls.  On many
+systems, the basic @code{open ()} and @code{fstat ()} system calls can
+be quite expensive.  For every @code{#include}-d file, we need to try
+all the directories in the search path until we find a match.  Some
+projects, such as glibc, pass twenty or thirty include paths on the
+command line, so this can rapidly become time consuming.
+
+For a header file we have not encountered before we have little choice
+but to do this.  However, it is often the case that the same headers are
+repeatedly included, and in these cases we try to avoid repeating the
+filesystem queries whilst searching for the correct file.
+
+For each file we try to open, we store the constructed path in a splay
+tree.  This path first undergoes simplification by the function
+@code{_cpp_simplify_pathname}.  For example,
+@samp{/usr/include/bits/../foo.h} is simplified to
+@samp{/usr/include/foo.h} before we enter it in the splay tree and try
+to @code{open ()} the file.  CPP will then find subsequent uses of
+@samp{foo.h}, even as @samp{/usr/include/foo.h}, in the splay tree and
+save system calls.
+
+Further, it is likely the file contents have also been cached, saving a
+@code{read ()} system call.  We don't bother caching the contents of
+header files that are re-inclusion protected, and whose re-inclusion
+macro is defined when we leave the header file for the first time.  If
+the host supports it, we try to map suitably large files into memory,
+rather than reading them in directly.
+
+The include paths are intenally stored on a null-terminated
+singly-linked list, starting with the @code{"header.h"} directory search
+chain, which then links into the @code{<header.h>} directory chain.
+
+Files included with the @code{<foo.h>} syntax start the lookup directly
+in the second half of this chain.  However, files included with the
+@code{"foo.h"} syntax start at the beginning of the chain, but with one
+extra directory prepended.  This is the directory of the current file;
+the one containing the @code{#include} directive.  Prepending this
+directory on a per-file basis is handled by the function
+@code{search_from}.
+
+Note that a header included with a directory component, such as
+@code{#include "mydir/foo.h"} and opened as
+@samp{/usr/local/include/mydir/foo.h}, will have the complete path minus
+the basename @samp{foo.h} as the current directory.
+
+Enough information is stored in the splay tree that CPP can immediately
+tell whether it can skip the header file because of the multiple include
+optimisation, whether the file didn't exist or couldn't be opened for
+some reason, or whether the header was flagged not to be re-used, as it
+is with the obsolete @code{#import} directive.
+
+For the benefit of MS-DOS filesystems with an 8.3 filename limitation,
+CPP offers the ability to treat various include file names as aliases
+for the real header files with shorter names.  The map from one to the
+other is found in a special file called @samp{header.gcc}, stored in the
+command line (or system) include directories to which the mapping
+applies.  This may be higher up the directory tree than the full path to
+the file minus the base name.
  
  @node Index,, Files, Top
  @unnumbered Index
  
  @node Index,, Files, Top
  @unnumbered Index
author	Neil Booth <neil@daikokuya.demon.co.uk>
	Mon, 12 Mar 2001 23:53:35 +0000 (23:53 +0000)
committer	Neil Booth <neil@gcc.gnu.org>
	Mon, 12 Mar 2001 23:53:35 +0000 (23:53 +0000)
gcc/ChangeLog		patch \| blob \| history
gcc/cppinternals.texi		patch \| blob \| history