From e9ae2d45ea1658dcfc254ec04ed22670f909b78b Mon Sep 17 00:00:00 2001 From: Nathan Sidwell Date: Mon, 14 Dec 2020 13:15:17 -0800 Subject: [PATCH] doc: Document C++ 20 modules And here is the user-facing documentation. gcc/ * doc/cppopts.texi: Document new cpp opt. * doc/invoke.texi: Add C++20 module option & documentation. --- gcc/doc/cppopts.texi | 4 + gcc/doc/invoke.texi | 435 ++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 434 insertions(+), 5 deletions(-) diff --git a/gcc/doc/cppopts.texi b/gcc/doc/cppopts.texi index 7f1849d841f..e5ece92487b 100644 --- a/gcc/doc/cppopts.texi +++ b/gcc/doc/cppopts.texi @@ -139,6 +139,10 @@ this useless. This feature is used in automatic updating of makefiles. +@item -Mno-modules +@opindex Mno-modules +Disable dependency generation for compiled module interfaces. + @item -MP @opindex MP This option instructs CPP to add a phony target for each dependency diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index b06ebbad847..2cebe7ab319 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -172,6 +172,7 @@ listing and explanation of the binary and decimal byte size prefixes. * Spec Files:: How to pass switches to sub-processes. * Environment Variables:: Env vars that affect GCC. * Precompiled Headers:: Compiling a header once, and using it many times. +* C++ Modules:: Experimental C++20 module system. @end menu @c man begin OPTIONS @@ -219,7 +220,13 @@ in the following sections. -fno-gnu-keywords @gol -fno-implicit-templates @gol -fno-implicit-inline-templates @gol --fno-implement-inlines -fms-extensions @gol +-fno-implement-inlines @gol +-fmodule-header@r{[}=@var{kind}@r{]} -fmodule-only -fmodules-ts @gol +-fmodule-implicit-inline @gol +-fno-module-lazy @gol +-fmodule-mapper=@var{specification} @gol +-fmodule-version-ignore @gol +-fms-extensions @gol -fnew-inheriting-ctors @gol -fnew-ttp-matching @gol -fno-nonansi-builtins -fnothrow-opt -fno-operator-names @gol @@ -233,15 +240,18 @@ in the following sections. -fvisibility-inlines-hidden @gol -fvisibility-ms-compat @gol -fext-numeric-literals @gol +-flang-info-include-translate@r{[}=@var{name}@r{]} @gol +-flang-info-include-translate-not @gol -Wabi-tag -Wcatch-value -Wcatch-value=@var{n} @gol -Wno-class-conversion -Wclass-memaccess @gol -Wcomma-subscript -Wconditionally-supported @gol -Wno-conversion-null -Wctad-maybe-unsupported @gol -Wctor-dtor-privacy -Wno-delete-incomplete @gol --Wdelete-non-virtual-dtor -Wdeprecated-copy -Wdeprecated-copy-dtor @gol +-Wdelete-non-virtual-dtor -Wdeprecated-copy -Wdeprecated-copy-dtor @gol -Wno-deprecated-enum-enum-conversion -Wno-deprecated-enum-float-conversion @gol -Weffc++ -Wno-exceptions -Wextra-semi -Wno-inaccessible-base @gol -Wno-inherited-variadic-ctor -Wno-init-list-lifetime @gol +-Winvalid-imported-macros @gol -Wno-invalid-offsetof -Wno-literal-suffix @gol -Wno-mismatched-new-delete -Wmismatched-tags @gol -Wmultiple-inheritance -Wnamespaces -Wnarrowing @gol @@ -600,7 +610,7 @@ Objective-C and Objective-C++ Dialects}. -fpreprocessed -ftabstop=@var{width} -ftrack-macro-expansion @gol -fwide-exec-charset=@var{charset} -fworking-directory @gol -H -imacros @var{file} -include @var{file} @gol --M -MD -MF -MG -MM -MMD -MP -MQ -MT @gol +-M -MD -MF -MG -MM -MMD -MP -MQ -MT -Mno-modules @gol -no-integrated-cpp -P -pthread -remap @gol -traditional -traditional-cpp -trigraphs @gol -U@var{macro} -undef @gol @@ -1572,7 +1582,7 @@ name suffix). This option applies to all following input files until the next @option{-x} option. Possible values for @var{language} are: @smallexample c c-header cpp-output -c++ c++-header c++-cpp-output +c++ c++-header c++-system-header c++-user-header c++-cpp-output objective-c objective-c-header objective-c-cpp-output objective-c++ objective-c++-header objective-c++-cpp-output assembler assembler-with-cpp @@ -3057,6 +3067,52 @@ To save space, do not emit out-of-line copies of inline functions controlled by @code{#pragma implementation}. This causes linker errors if these functions are not inlined everywhere they are called. +@item -fmodules-ts +@itemx -fno-modules-ts +@opindex fmodules-ts +@opindex fno-modules-ts +Enable support for C++20 modules (@xref{C++ Modules}). The +@option{-fno-modules-ts} is usually not needed, as that is the +default. Even though this is a C++20 feature, it is not currently +implicitly enabled by selecting that standard version. + +@item -fmodule-header +@itemx -fmodule-header=user +@itemx -fmodule-header=system +@opindex fmodule-header +Compile a header file to create an importable header unit. + +@item -fmodule-implicit-inline +@opindex fmodule-implicit-inline +Member functions defined in their class definitions are not implicitly +inline for modular code. This is different to traditional C++ +behavior, for good reasons. However, it may result in a difficulty +during code porting. This option makes such function definitions +implicitly inline. It does however generate an ABI incompatibility, +so you must use it everywhere or nowhere. (Such definitions outside +of a named module remain implicitly inline, regardless.) + +@item -fno-module-lazy +@opindex fno-module-lazy +@opindex fmodule-lazy +Disable lazy module importing and module mapper creation. + +@item -fmodule-mapper=@r{[}@var{hostname}@r{]}:@var{port}@r{[}?@var{ident}@r{]} +@itemx -fmodule-mapper=|@var{program}@r{[}?@var{ident}@r{]} @var{args...} +@itemx -fmodule-mapper==@var{socket}@r{[}?@var{ident}@r{]} +@itemx -fmodule-mapper=<>@r{[}@var{inout}@r{]}@r{[}?@var{ident}@r{]} +@itemx -fmodule-mapper=<@var{in}>@var{out}@r{[}?@var{ident}@r{]} +@itemx -fmodule-mapper=@var{file}@r{[}?@var{ident}@r{]} +@vindex CXX_MODULE_MAPPER @r{environment variable} +@opindex fmodule-mapper +An oracle to query for module name to filename mappings. If +unspecified the @env{CXX_MODULE_MAPPER} environment variable is used, +and if that is unset, an in-process default is provided. + +@item -fmodule-only +@opindex fmodule-only +Only emit the Compiled Module Interface, inhibiting any object file. + @item -fms-extensions @opindex fms-extensions Disable Wpedantic warnings about constructs used in MFC, such as implicit @@ -3304,6 +3360,14 @@ for ISO C++11 onwards (@option{-std=c++11}, ...). Do not search for header files in the standard directories specific to C++, but do still search the other standard directories. (This option is used when building the C++ library.) + +@item -flang-info-include-translate +@itemx -flang-info-include-translate-not +@itemx -flang-info-include-translate=@var{header} +@opindex flang-info-include-translate +@opindex flang-info-include-translate-not +Diagnose include translation events. + @end table In addition, these warning options have meanings only for C++ programs: @@ -3461,6 +3525,14 @@ the variable declaration statement. @end itemize +@item -Winvalid-imported-macros +@opindex Winvalid-imported-macros +@opindex Wno-invalid-imported-macros +Verify all imported macro definitions are valid at the end of +compilation. This is not enabled by default, as it requires +additional processing to determine. It may be useful when preparing +sets of header-units to ensure consistent macros. + @item -Wno-literal-suffix @r{(C++ and Objective-C++ only)} @opindex Wliteral-suffix @opindex Wno-literal-suffix @@ -16966,6 +17038,11 @@ By default, the dump will contain messages about successful optimizations (equivalent to @option{-optimized}) together with low-level details about the analysis. +@item -fdump-lang +@opindex fdump-lang +Dump language-specific information. The file name is made by appending +@file{.lang} to the source file name. + @item -fdump-lang-all @itemx -fdump-lang-@var{switch} @itemx -fdump-lang-@var{switch}-@var{options} @@ -16986,6 +17063,14 @@ Enable all language-specific dumps. Dump class hierarchy information. Virtual table information is emitted unless '@option{slim}' is specified. This option is applicable to C++ only. +@item module +Dump module information. Options @option{lineno} (locations), +@option{graph} (reachability), @option{blocks} (clusters), +@option{uid} (serialization), @option{alias} (mergeable), +@option{asmname} (Elrond), @option{eh} (mapper) & @option{vops} +(macros) may provide additional information. This option is +applicable to C++ only. + @item raw Dump the raw internal tree data. This option is applicable to C++ only. @@ -32188,7 +32273,7 @@ usage: @item @code{sanitize} The @code{sanitize} spec function takes no arguments. It returns non-NULL if -any address, thread or undefined behaviour sanitizers are active. +any address, thread or undefined behavior sanitizers are active. @smallexample %@{%:sanitize(address):-funwind-tables@} @@ -32748,3 +32833,343 @@ precompiled header, the actual behavior is a mixture of the behavior for the options. For instance, if you use @option{-g} to generate the precompiled header but not when using it, you may or may not get debugging information for routines in the precompiled header. + +@node C++ Modules +@section C++ Modules +@cindex speed of compilation + +Modules are a C++ 20 language feature. As the name suggests, it +provides a modular compilation system, intending to provide both +faster builds and better library isolation. The ``Merging Modules'' +paper @uref{https://wg21.link/p1103}, provides the easiest to read set +of changes to the standard, although it does not capture later +changes. That specification is now part of C++20, +@uref{git@@github.com:cplusplus/draft.git}, it is considered complete +(there may be defect reports to come). + +@emph{G++'s modules support is not complete.} Other than bugs, the +known missing pieces are: + +@table @emph + +@item Private Module Fragment +The Private Module Fragment is recognized, but an error is emitted. + +@item Partition definition visibility rules +Entities may be defined in implementation partitions, and those +definitions are not available outside of the module. This is not +implemented, and the definitions are available to extra-module use. + +@item Textual merging of reachable GM entities +Entities may be multiply defined across different header-units. +These must be de-duplicated, and this is implemented across imports, +or when an import redefines a textually-defined entity. However the +reverse is not implemented---textually redefining an entity that has +been defined in an imported header-unit. A redefinition error is +emitted. + +@item Translation-Unit local referencing rules +Papers p1815 (@uref{https://wg21.link/p1815}) and p2003 +(@uref{https://wg21.link/p2003} add limitations on which entities an +exported region may reference (for instance, the entities an exported +template definition may reference). These are not fully implemented. + +@item Language-linkage module attachment +Declarations with explicit language linkage (@code{extern "C"} or +@code{extern "C++"}) are attached to the global module, even when in +the purview of a named module. This is not implemented. Such +declarations will be attached to the module, if any, in which they are +declared. + +@end table + +Modular compilation is @emph{not} enabled with just the +@option{-std=c++20} option. You must explicitly enable it with the +@option{-fmodules-ts} option. It is independent of the language +version selected, although in pre-C++20 versions, it is of course an +extension. + +No new source file suffixes are required or supported. If you wish to +use a non-standard suffix (@xref{Overall Options}), you also need +to provide a @option{-x c++} option too.@footnote{Some users like to +distinguish module interface files with a new suffix, such as naming +the source @code{module.cppm}, which involves +teaching all tools about the new suffix. A different scheme, such as +naming @code{module-m.cpp} would be less invasive.} + +Compiling a module interface unit produces an additional output (to +the assembly or object file), called a Compiled Module Interface +(CMI). This encodes the exported declarations of the module. +Importing a module reads in the CMI. The import graph is a Directed +Acyclic Graph (DAG). You must build imports before the importer. + +Header files may themselves be compiled to header units, which are a +transitional ability aiming at faster compilation. The +@option{-fmodule-header} option is used to enable this, and implies +the @option{-fmodules-ts} option. These CMIs are named by the fully +resolved underlying header file, and thus may be a complete pathname +containing subdirectories. If the header file is found at an absolute +pathname, the CMI location is still relative to a CMI root directory. + +As header files often have no suffix, you commonly have to specify a +@option{-x} option to tell the compiler the source is a header file. +You may use @option{-x c++-header}, @option{-x c++-user-header} or +@option{-x c++-system-header}. When used in conjunction with +@option{-fmodules-ts}, these all imply an appropriate +@option{-fmodule-header} option. The latter two variants use the +user or system include path to search for the file specified. This +allows you to, for instance, compile standard library header files as +header units, without needing to know exactly where they are +installed. Specifying the language as one of these variants also +inhibits output of the object file, as header files have no associated +object file. + +The @option{-fmodule-only} option disables generation of the +associated object file for compiling a module interface. Only the CMI +is generated. This option is implied when using the +@option{-fmodule-header} option. + +The @option{-flang-info-include-translate} and +@option{-flang-info-include-translate-not} options notes whether +include translation occurs or not. With no argument, the first will +note all include translation. The second will note all +non-translations of include files not known to intentionally be +textual. With an argument, queries about include translation of a +header files with that particular trailing pathname are noted. You +may repeat this form to cover several different header files. This +option may be helpful in determining whether include translation is +happening---if it is working correctly, it'll behave as if it wasn't +there at all. + +The @option{-Winvalid-imported-macros} option causes all imported macros +to be resolved at the end of compilation. Without this, imported +macros are only resolved when expanded or (re)defined. This option +detects conflicting import definitions for all macros. + +@xref{C++ Module Mapper} for details of the @option{-fmodule-mapper} +family of options. + +@menu +* C++ Module Mapper:: Module Mapper +* C++ Module Preprocessing:: Module Preprocessing +* C++ Compiled Module Interface:: Compiled Module Interface +@end menu + +@node C++ Module Mapper +@subsection Module Mapper +@cindex C++ Module Mapper + +A module mapper provides a server or file that the compiler queries to +determine the mapping between module names and CMI files. It is also +used to build CMIs on demand. @emph{Mapper functionality is in its +infancy and is intended for experimentation with build system +interactions.} + +You can specify a mapper with the @option{-fmodule-mapper=@var{val}} +option or @env{CXX_MODULE_MAPPER} environment variable. The value may +have one of the following forms: + +@table @gcctabopt + +@item @r{[}@var{hostname}@r{]}:@var{port}@r{[}?@var{ident}@r{]} +An optional hostname and a numeric port number to connect to. If the +hostname is omitted, the loopback address is used. If the hostname +corresponds to multiple IPV6 addresses, these are tried in turn, until +one is successful. If your host lacks IPv6, this form is +non-functional. If you must use IPv4 use +@option{-fmodule-mapper='|ncat @var{ipv4host} @var{port}'}. + +@item =@var{socket}@r{[}?@var{ident}@r{]} +A local domain socket. If your host lacks local domain sockets, this +form is non-functional. + +@item |@var{program}@r{[}?@var{ident}@r{]} @r{[}@var{args...}@r{]} +A program to spawn, and communicate with on its stdin/stdout streams. +Your @var{PATH} environment variable is searched for the program. +Arguments are separated by space characters, (it is not possible for +one of the arguments delivered to the program to contain a space). An +exception is if @var{program} begins with @@. In that case +@var{program} (sans @@) is looked for in the compiler's internal +binary directory. Thus the sample mapper-server can be specified +with @code{@@g++-mapper-server}. + +@item <>@r{[}?@var{ident}@r{]} +@item <>@var{inout}@r{[}?@var{ident}@r{]} +@item <@var{in}>@var{out}@r{[}?@var{ident}@r{]} +Named pipes or file descriptors to communicate over. The first form, +@option{<>}, communicates over stdin and stdout. The other forms +allow you to specify a file descriptor or name a pipe. A numeric value +is interpreted as a file descriptor, otherwise named pipe is opened. +The second form specifies a bidirectional pipe and the last form +allows specifying two independent pipes. Using file descriptors +directly in this manner is fragile in general, as it can require the +cooperation of intermediate processes. In particular using stdin & +stdout is fraught with danger as other compiler options might also +cause the compiler to read stdin or write stdout, and it can have +unfortunate interactions with signal delivery from the terminal. + +@item @var{file}@r{[}?@var{ident}@r{]} +A mapping file consisting of space-separated module-name, filename +pairs, one per line. Only the mappings for the direct imports and any +module export name need be provided. If other mappings are provided, +they override those stored in any imported CMI files. A repository +root may be specified in the mapping file by using @samp{$root} as the +module name in the first active line. + +@end table + +As shown, an optional @var{ident} may suffix the first word of the +option, indicated by a @samp{?} prefix. The value is used in the +initial handshake with the module server, or to specify a prefix on +mapping file lines. In the server case, the main source file name is +used if no @var{ident} is specified. In the file case, all non-blank +lines are significant, unless a value is specified, in which case only +lines beginning with @var{ident} are significant. The @var{ident} +must be separated by whitespace from the module name. Be aware that +@samp{<}, @samp{>}, @samp{?}, and @samp{|} characters are often +significant to the shell, and therefore may need quoting. + +The mapper is connected to or loaded lazily, when the first module +mapping is required. The networking protocols are only supported on +hosts that provide networking. If no mapper is specified a default is +provided. + +A project-specific mapper is expected to be provided by the build +system that invokes the compiler. It is not expected that a +general-purpose server is provided for all compilations. As such, the +server will know the build configuration, the compiler it invoked, and +the environment (such as working directory) in which that is +operating. As it may parallelize builds, several compilations may +connect to the same socket. + +The default mapper generates CMI files in a @samp{gcm.cache} +directory. CMI files have a @samp{.gcm} suffix. The module unit name +is used directly to provide the basename. Header units construct a +relative path using the underlying header file name. If the path is +already relative, a @samp{,} directory is prepended. Internal +@samp{..} components are translated to @samp{,,}. No attempt is made +to canonicalize these filenames beyond that done by the preprocessor's +include search algorithm, as in general it is ambiguous when symbolic +links are present. + +The mapper protocol was published as ``A Module Mapper'' +@uref{https://wg21.link/p1184}. The implementation is provided by +@command{libcody}, @uref{https://www.github.com/urnathan/libcody}, +which specifies the canonical protocol definition. A proof of concept +server implementation embedded in @command{make} was described in +''Make Me A Module'', @uref{https://wg21.link/p1602}. + +@node C++ Module Preprocessing +@subsection Module Preprocessing +@cindex C++ Module Preprocessing + +Modules affect preprocessing because of header units and include +translation. Some uses of the preprocessor as a separate step either +do not produce a correct output, or require CMIs to be available. + +Header units import macros. These macros can affect later conditional +inclusion, which therefore can cascade to differing import sets. When +preprocessing, it is necessary to load the CMI. If a header unit is +unavailable, the preprocessor issues a warning and continue (when +not just preprocessing, an error is emitted). Detecting such imports +requires preprocessor tokenization of the input stream to phase 4 +(macro expansion). + +Include translation converts @code{#include}, @code{#include_next} and +@code{#import} directives to internal @code{import} declarations. +Whether a particular directive is translated is controlled by the +module mapper. Header unit names are canonicalized during +preprocessing. + +Dependency information can be emitted for macro import, extending the +functionality of @option{-MD} and @option{-MMD} options. Detection of +import declarations also requires phase 4 preprocessing, and thus +requires full preprocessing (or compilation). + +The @option{-M}, @option{-MM} and @option{-E -fdirectives-only} options halt +preprocessing before phase 4. + +The @option{-save-temps} option uses @option{-fdirectives-only} for +preprocessing, and preserve the macro definitions in the preprocessed +output. Usually you also want to use this option when explicitly +preprocessing a header-unit, or consuming such preprocessed output: + +@smallexample +g++ -fmodules-ts -E -fdirectives-only my-header.hh -o my-header.ii +g++ -x c++-header -fmodules-ts -fpreprocessed -fdirectives-only my-header.ii +@end smallexample + +@node C++ Compiled Module Interface +@subsection Compiled Module Interface +@cindex C++ Compiled Module Interface + +CMIs are an additional artifact when compiling named module +interfaces, partitions or header units. These are read when +importing. CMI contents are implementation-specific, and in GCC's +case tied to the compiler version. Consider them a rebuildable cache +artifact, not a distributable object. + +When creating an output CMI, any missing directory components are +created in a manner that is safe for concurrent builds creating +multiple, different, CMIs within a common subdirectory tree. + +CMI contents are written to a temporary file, which is then atomically +renamed. Observers either see old contents (if there is an +existing file), or complete new contents. They do not observe the +CMI during its creation. This is unlike object file writing, which +may be observed by an external process. + +CMIs are read in lazily, if the host OS provides @code{mmap} +functionality. Generally blocks are read when name lookup or template +instantiation occurs. To inhibit this, the @option{-fno-module-lazy} +option may be used. + +The @option{--param lazy-modules=@var{n}} parameter controls the limit +on the number of concurrently open module files during lazy loading. +Should more modules be imported, an LRU algorithm is used to determine +which files to close---until that file is needed again. This limit +may be exceeded with deep module dependency hierarchies. With large +code bases there may be more imports than the process limit of file +descriptors. By default, the limit is a few less than the per-process +file descriptor hard limit, if that is determinable.@footnote{Where +applicable the soft limit is incremented as needed towards the hard limit.} + +GCC CMIs use ELF32 as an architecture-neutral encapsulation mechanism. +You may use @command{readelf} to inspect them, although section +contents are largely undecipherable. There is a section named +@code{.gnu.c++.README}, which contains human-readable text. Other +than the first line, each line consists of @code{@var{tag}: @code{value}} +tuples. + +@smallexample +> @command{readelf -p.gnu.c++.README gcm.cache/foo.gcm} + +String dump of section '.gnu.c++.README': + [ 0] GNU C++ primary module interface + [ 21] compiler: 11.0.0 20201116 (experimental) [c++-modules revision 20201116-0454] + [ 6f] version: 2020/11/16-04:54 + [ 89] module: foo + [ 95] source: c_b.ii + [ a4] dialect: C++20/coroutines + [ be] cwd: /data/users/nathans/modules/obj/x86_64/gcc + [ ee] repository: gcm.cache + [ 104] buildtime: 2020/11/16 15:03:21 UTC + [ 127] localtime: 2020/11/16 07:03:21 PST + [ 14a] export: foo:part1 foo-part1.gcm +@end smallexample + +Amongst other things, this lists the source that was built, C++ +dialect used and imports of the module.@footnote{The precise contents +of this output may change.} The timestamp is the same value as that +provided by the @code{__DATE__} & @code{__TIME__} macros, and may be +explicitly specified with the environment variable +@code{SOURCE_DATE_EPOCH}. @xref{Environment Variables} for further +details. + +A set of related CMIs may be copied, provided the relative pathnames +are preserved. + +The @code{.gnu.c++.README} contents do not affect CMI integrity, and +it may be removed or altered. The section numbering of the sections +whose names do not begin with @code{.gnu.c++.}, or are not the string +section is significant and must not be altered. -- 2.30.2