From ad82183b0e8843ad9c406978474fc41b9d733538 Mon Sep 17 00:00:00 2001 From: Phil Edwards Date: Wed, 30 Aug 2000 20:18:12 +0000 Subject: [PATCH] codecvt.html: Behind-the-scenes ASCII->HTML tweaks for certain browsers. 2000-08-30 Phil Edwards * docs/22_locale/codecvt.html: Behind-the-scenes ASCII->HTML tweaks for certain browsers. From-SVN: r36067 --- libstdc++-v3/ChangeLog | 5 ++ libstdc++-v3/docs/22_locale/codecvt.html | 60 ++++++++++++------------ 2 files changed, 36 insertions(+), 29 deletions(-) diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog index 7fce4b7cfce..3debd0e253f 100644 --- a/libstdc++-v3/ChangeLog +++ b/libstdc++-v3/ChangeLog @@ -1,3 +1,8 @@ +2000-08-30 Phil Edwards + + * docs/22_locale/codecvt.html: Behind-the-scenes ASCII->HTML + tweaks for certain browsers. + 2000-08-28 Benjamin Kosnik * docs/22_locale/codecvt.html: Add more bits, format. diff --git a/libstdc++-v3/docs/22_locale/codecvt.html b/libstdc++-v3/docs/22_locale/codecvt.html index de14677786d..9289d7dd634 100644 --- a/libstdc++-v3/docs/22_locale/codecvt.html +++ b/libstdc++-v3/docs/22_locale/codecvt.html @@ -17,7 +17,7 @@ The standard class codecvt attempts to address conversions between different character encoding schemes. In particular, the standard attempts to detail conversions between the implementation-defined wide characters (hereafter referred to as wchar_t) and the standard type -char that is so beloved in classic "C" (which can now be referred to +char that is so beloved in classic "C" (which can now be referred to as narrow characters.) This document attempts to describe how the GNU libstdc++-v3 implementation deals with the conversion between wide and narrow characters, and also presents a framework for dealing with the @@ -42,7 +42,7 @@ The text around the codecvt definition gives some clues:
--1- The class codecvt is for use when +-1- The class codecvt<internT,externT,stateT> is for use when converting from one codeset to another, such as from wide characters to multibyte characters, between wide character encodings such as Unicode and EUC. @@ -68,11 +68,11 @@ Ah ha! Another clue...
-3- The instantiations required in the Table ?? -(lib.locale.category), namely codecvt and -codecvt, convert the implementation-defined -native character set. codecvt implements a +(lib.locale.category), namely codecvt<wchar_t,char,mbstate_t> and +codecvt<char,char,mbstate_t>, convert the implementation-defined +native character set. codecvt<char,char,mbstate_t> implements a degenerate conversion; it does not convert at -all. codecvt converts between the native +all. codecvt<wchar_t,char,mbstate_t> converts between the native character sets for tiny and wide characters. Instantiations on mbstate_t perform conversion between encodings known to the library implementor. Other encodings can be converted by specializing on a @@ -100,7 +100,7 @@ mcsrtombs and wcsrtombs in particular.

2. Some thoughts on what would be useful Probably the most frequently asked question about code conversion is: -"So dudes, what's the deal with Unicode strings?" The dude part is +"So dudes, what's the deal with Unicode strings?" The dude part is optional, but apparently the usefulness of Unicode strings is pretty widely appreciated. Sadly, this specific encoding (And other useful encodings like UTF8, UCS4, ISO 8859-10, etc etc etc) are not mentioned @@ -168,7 +168,8 @@ UTF-16, UTF8, UTF16).

For iconv-based implementations, string literals for each of the -encodings (ie. "UCS-2" and "UTF-8") are necessary, although for other, +encodings (ie. "UCS-2" and "UTF-8") are necessary, +although for other, non-iconv implementations a table of enumerated values or some other mechanism may be required. @@ -178,13 +179,13 @@ mechanism may be required.

  • Some encodings are require explicit endian-ness. As such, some kind of endian marker or other byte-order marker will be necessary. See - "Footnotes for C/C++ developers" in Haible for more information on + "Footnotes for C/C++ developers" in Haible for more information on UCS-2/Unicode endian issues. (Summary: big endian seems most likely, however implementations, most notably Microsoft, vary.)
  • Types representing the conversion state, for conversions involving - the machinery in the "C" library, or the conversion descriptor, for + the machinery in the "C" library, or the conversion descriptor, for conversions using iconv (such as the type iconv_t.) Note that the conversion descriptor encodes more information than a simple encoding state type. @@ -207,14 +208,14 @@ mechanism may be required.

    -3. Problems with "C" code conversions : thread safety, global locales, - termination. +3. Problems with "C" code conversions : thread safety, global +locales, termination.

    In addition, multi-threaded and multi-locale environments also impact the design and requirements for code conversions. In particular, they -affect the required specialization codecvt -when implemented using standard "C" functions. +affect the required specialization codecvt<wchar_t, char, mbstate_t> +when implemented using standard "C" functions.

    Three problems arise, one big, one of medium importance, and one small. @@ -233,7 +234,7 @@ incorrect. Yikes!

    The last, and fundamental problem, is the assumption of a global -locale for all the "C" functions referenced above. For something like +locale for all the "C" functions referenced above. For something like C++ iostreams (where codecvt is explicitly used) the notion of multiple locales is fundamental. In practice, most users may not run into this limitation. However, as a quality of implementation issue, @@ -243,7 +244,7 @@ correct results. In short, libstdc++-v3 is trying to offer, as an option, a high-quality implementation, damn the additional complexity!

    -For the required specialization codecvt , +For the required specialization codecvt<wchar_t, char, mbstate_t> , conversions are made between the internal character set (always UCS4 on GNU/Linux) and whatever the currently selected locale for the LC_CTYPE category implements. @@ -256,7 +257,7 @@ The two required specializations are implemented as follows:

    -codecvt<char, char, mbstate_t> +codecvt<char, char, mbstate_t>

    This is a degenerate (ie, does nothing) specialization. Implementing @@ -264,7 +265,7 @@ this was a piece of cake.

    -codecvt<char, wchar_t, mbstate_t> +codecvt<char, wchar_t, mbstate_t>

    This specialization, by specifying all the template parameters, pretty @@ -353,7 +354,7 @@ ready to convert and will return true.

    -__enc_traits(const __enc_traits&) +__enc_traits(const __enc_traits&)

    As iconv allocates memory and sets up conversion descriptors, the copy @@ -363,8 +364,8 @@ themselves.

    Definitions for all the required codecvt member functions are provided -for this specialization, and usage of codecvt is consistent with other +for this specialization, and usage of codecvt<internal character type, +external character type, __enc_traits> is consistent with other codecvt usage.

    @@ -379,7 +380,7 @@ a. conversions involving string literals typedef unicode_t int_type; typedef char ext_type; typedef __enc_traits enc_type; - typedef codecvt unicode_codecvt; + typedef codecvt<int_type, ext_type, enc_type> unicode_codecvt; const ext_type* e_lit = "black pearl jasmine tea"; int size = strlen(e_lit); @@ -399,8 +400,8 @@ a. conversions involving string literals // construct a locale object with the specialized facet. locale loc(locale::classic(), new unicode_codecvt); // sanity check the constructed locale has the specialized facet. - VERIFY( has_facet(loc) ); - const unicode_codecvt& cvt = use_facet(loc); + VERIFY( has_facet<unicode_codecvt>(loc) ); + const unicode_codecvt& cvt = use_facet<unicode_codecvt>(loc); // convert between const char* and unicode strings unicode_codecvt::state_type state01("UNICODE", "ISO_8859-1"); initialize_state(state01); @@ -454,7 +455,8 @@ codecvt_wchar_t_char.cc standards-conformant manner?

  • - how to synchronize the "C" and "C++" conversion information? + how to synchronize the "C" and "C++" + conversion information?
  • wchar_t/char internal buffers and conversions between @@ -475,17 +477,17 @@ specialization hints, language clarification, and wchar_t fixes. 8. Bibliography / Referenced Documents -Drepper, Ulrich, GNU libc (glibc) 2.2 manual. In particular, Chapters "6. Character Set Handling" and "7 Locales and Internationalization" +Drepper, Ulrich, GNU libc (glibc) 2.2 manual. In particular, Chapters "6. Character Set Handling" and "7 Locales and Internationalization"

    Drepper, Ulrich, Numerous, late-night email correspondence

    -Feather, Clive, "A brief description of Normative Addendum 1," in particular the parts on Extended Character Sets +Feather, Clive, "A brief description of Normative Addendum 1," in particular the parts on Extended Character Sets http://www.lysator.liu.se/c/na1.html

    -Haible, Bruno, "The Unicode HOWTO" v0.18, 4 August 2000 +Haible, Bruno, "The Unicode HOWTO" v0.18, 4 August 2000 ftp://ftp.ilog.fr/pub/Users/haible/utf8/Unicode-HOWTO.html

    @@ -495,7 +497,7 @@ ISO/IEC 14882:1998 Programming languages - C++ ISO/IEC 9899:1999 Programming languages - C

    -Khun, Markus, "UTF-8 and Unicode FAQ for Unix/Linux" +Khun, Markus, "UTF-8 and Unicode FAQ for Unix/Linux" http://www.cl.cam.ac.uk/~mgk25/unicode.html

    -- 2.30.2