From: Benjamin Kosnik Date: Fri, 25 Aug 2000 08:52:56 +0000 (+0000) Subject: howto.html: Add notes on codecvt implementation. X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=e403cf292201bc8a8471c7a841edc3f50ec76bd6;p=gcc.git howto.html: Add notes on codecvt implementation. 2000-08-24 Benjamin Kosnik * docs/22_locale/howto.html: Add notes on codecvt implementation. * docs/22_locale/codecvt.html: New file. In progress. From-SVN: r35975 --- diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog index bd61c592ffd..0ea096b1f08 100644 --- a/libstdc++-v3/ChangeLog +++ b/libstdc++-v3/ChangeLog @@ -1,3 +1,8 @@ +2000-08-24 Benjamin Kosnik + + * docs/22_locale/howto.html: Add notes on codecvt implementation. + * docs/22_locale/codecvt.html: New file. In progress. + 2000-08-24 Benjamin Kosnik * acconfig.h: Revert. diff --git a/libstdc++-v3/docs/22_locale/codecvt.html b/libstdc++-v3/docs/22_locale/codecvt.html new file mode 100644 index 00000000000..58895df19e8 --- /dev/null +++ b/libstdc++-v3/docs/22_locale/codecvt.html @@ -0,0 +1,112 @@ + + + + + + + + + + + + + + + +AbiWord Document + + + +
+

Notes on the codecvt implementation.

+

prepared by Benjamin Kosnik (bkoz@redhat.com) on August 25, 2000

+

+

+

1. Abstract

+

Around page 425 of the C++ Standard, this charming heading comes into view:

+

+

22.2.1.5 - Template class codecvt [lib.locale.codecvt]

+

+

The standard class codecvt attempts to address conversions between different character encoding schemes. In particular, the standard attempts to detail conversions between the implementation-defined wide characters (hereafter referred to as wchar_t) and the standard type char that is so beloved in classic "C" (which can now be referred to as narrow characters.)

+

This document attempts to describe how the GNU libstdc++-v3 implementation deals with the conversion between wide and narrow characters, and also presents a framework for dealing with the huge number of other encodings that iconv can convert, including Unicode and UTF8. Design issues and requirements are addressed, and examples of correct usage for both the required specializations for wide and narrow characters and the implementation-provided extended functionality are given.

+

+

2. Intro, ,standard says

+

+

2. Some thoughts on what would be useful

+

+

Probably the most frequently asked question about code conversion is: "So dudes, what's the deal with Unicode strings?" The dude part is optional, but apparently the usefulness of Unicode strings is pretty widely appreciated. Sadly, this specific encoding (And other useful encodings like UTF8, UCS4, ISO 8859-10, etc etc etc) are not mentioned in the C++ standard.

+

+

In particular, the simple implementation detail of wchar_t's size seems to repeatedly confound people. Many systems use a two byte, unsigned integral type to represent wide characters, and use an internal encoding of Unicode or UCS2. (See AIX, Microsoft NT, Java, others.) Other systems, use a four byte, unsigned integral type to represent wide characters, and use an internal encoding of UCS4. (GNU/Linux systems using glibc, in particular.) The C programming language (and thus C++) does not specify a specific size for the type wchar_t.

+

+

Thus, portable C++ code cannot assume a byte size (or endianness) either.

+

+

Getting back to the frequently asked question: What about Unicode strings?

+

+

The text around the codecvt definition gives some clues:

+

+

-1- The class codecvt<internT,externT,stateT> is for use when converting from one

+

codeset to another, such as from wide characters to multibyte characters, between wide

+

character encodings such as Unicode and EUC.

+

+

Hmm. So, in some unspecified way, Unicode encodings and translations between other character sets should be handled by this class.

+

+

-2- The stateT argument selects the pair of codesets being mapped between.

+

+

Ah ha! Another clue...

+

+

-3- The instantiations required in the Table ?? (lib.locale.category), namely

+

codecvt<wchar_t,char,mbstate_t> and codecvt<char,char,mbstate_t>, convert the

+

implementation-defined native character set. codecvt<char,char,mbstate_t> implements

+

a degenerate conversion; it does not convert at all. codecvt<wchar_t,char,mbstate_t>

+

converts between the native character sets for tiny and wide characters. Instantiations on

+

mbstate_t perform conversion between encodings known to the library implementor.

+

Other encodings can be converted by specializing on a user-defined stateT type. The

+

stateT object can contain any state that is useful to communicate to or from the

+

specialized do_convert member.

+

+

At this point, the initial design of the library becomes clear:

+

+

3. How to accomplish this: partial specialization with and iconv wrapper class, __enc_traits.

+

+

+

4. Design

+

a. goals.

+

b. drawbacks

+

c. things that are sketchy

+

+

+

5. Examples

+

a. conversions involving string literals

+

b. conversions invollving std::string

+

c. conversions involving std::filebuf and std::ostream

+

+

+

6. Acknowledgments

+

Ulrich Drepper for the iconv suggestions and patient question answering, Jason Merrill for the template partial specialization hints and wchar_t fixes, etc etc etc.

+

+

+

7. Bibliography / Referenced Documents

+

ISO/IEC 14882:1998 Programming languages - C++

+

+

ISO/IEC 9899:1999 Programming languages - C

+

+

glibc-2.2 docs

+

+

System Interface Definitions, Issue 6 (IEEE Std. 1003.1-200x)

+

The Open Group/The Institute of Electrical and Electronics Engineers, Inc.

+

http://www.opennc.org/austin/docreg.html

+

+

Appendix D, The C++ Programming Language, Special Edition, Bjarne Stroustrup, Addison Wesley, Inc. 2000

+

+

Standard C++ IOStreams and Locales, Advanced Programmer's Guide and Reference, Angelika Langer and Klaus Kreft, Addison Wesley Longman, Inc. 2000

+

+

Numerous, late-night email correspondence with Ulrich Drepper (drepper@redhat.com).

+

+

+
+ + diff --git a/libstdc++-v3/docs/22_locale/howto.html b/libstdc++-v3/docs/22_locale/howto.html index 62d0ce88964..17295292d10 100644 --- a/libstdc++-v3/docs/22_locale/howto.html +++ b/libstdc++-v3/docs/22_locale/howto.html @@ -9,7 +9,7 @@ libstdc++-v3 HOWTO: Chapter 22 - + @@ -25,7 +25,7 @@

Contents


@@ -45,9 +45,10 @@


-

Topic

-

More stuff will have to wait until somebody with locale - experience can share it... +

Notes on the codecvt implementation

+

This document turned out to be larger than anticipated. As + such, it gets its own page, which can be found + here.

Return to top of page or to the FAQ. @@ -63,7 +64,7 @@ Comments and suggestions are welcome, and may be sent to Phil Edwards or Gabriel Dos Reis. -
$Id: howto.html,v 1.1 2000/04/21 20:33:31 bkoz Exp $ +
$Id: howto.html,v 1.2 2000/07/11 21:45:07 pme Exp $