From: Phil Edwards
$Id: howto.html,v 1.3 2000/07/11 21:45:07 pme Exp $
+
$Id: howto.html,v 1.4 2000/07/19 20:20:51 pme Exp $
Things are not as bad as they seem. In - this + this message, Joe Buck points out a few very important things:
The solution is surprisingly easy. The original answer pages - on the GotW website have been removed into cold storage, in - preparation for a published book of GotW notes. Before being + on the GotW website were removed into cold storage, in + preparation for + a + published book of GotW notes. Before being put on the web, of course, it was posted on Usenet, and that posting containing the answer is available here. @@ -170,7 +172,7 @@ on why case-insensitive comparisons are not as easy as they seem, and why creating a class is the wrong way to go about it in production code. (The GotW answer mentions one of the principle - difficulties; this article mentions more.) + difficulties; his article mentions more.)
Basically, this is "easy" only if you ignore some things, things which may be too important to your program to ignore. (I chose @@ -178,6 +180,11 @@ that nobody ever called me on it...) The GotW question and answer remain useful instructional tools, however.
+Added September 2000: James Kanze provided a link to a + Unicode + Technical Report discussing case handling, which provides some + very good information. +
Return to top of page or to the FAQ.
@@ -204,9 +211,9 @@ a more general (but less readable) form of it for parsing command strings and the like. If you compiled and ran this code using it:
- std::list<string> ls;
+ std::list<string> ls;
stringtok (ls, " this \t is\t\n a test ");
- for (std::list<string>::const_iterator i = ls.begin();
+ for (std::list<string>const_iterator i = ls.begin();
i != ls.end(); ++i)
{
std::cerr << ':' << (*i) << ":\n";
@@ -226,8 +233,9 @@
Another version of stringtok is given
here, suggested by Chris King and tweaked by Petr Prikryl,
and this one uses the
- transformation functions given below. If you are comfortable with
- reading the new function names, this version is recommended as an example.
+ transformation functions mentioned below. If you are comfortable
+ with reading the new function names, this version is recommended
+ as an example.
Return to top of page or
to the FAQ.
@@ -240,30 +248,45 @@
to all upper case." The word transformations is especially
apt, because the standard template function
transform<> is used.
+
+ This code will go through some iterations (no pun). Here's the
+ simplistic version usually seen on Usenet:
- #include <string>
- #include <algorithm>
- #include <cctype> // old <ctype.h>
- std::string s ("Some Kind Of Initial Input Goes Here");
-
- // Change everything into upper case
- std::transform (s.begin(), s.end(), s.begin(), toupper);
-
- // Change everything into lower case
- std::transform (s.begin(), s.end(), s.begin(), tolower);
-
- // Change everything back into upper case, but store the
- // result in a different string
- std::string capital_s;
- capital_s.reserve(s.size());
- std::transform (s.begin(), s.end(), capital_s.begin(), tolower);
+ #include <string>
+ #include <algorithm>
+ #include <cctype> // old <ctype.h>
+
+ std::string s ("Some Kind Of Initial Input Goes Here");
+
+ // Change everything into upper case
+ std::transform (s.begin(), s.end(), s.begin(), toupper);
+
+ // Change everything into lower case
+ std::transform (s.begin(), s.end(), s.begin(), tolower);
+
+ // Change everything back into upper case, but store the
+ // result in a different string
+ std::string capital_s;
+ capital_s.reserve(s.size());
+ std::transform (s.begin(), s.end(), capital_s.begin(), tolower);
Note that these calls all involve
the global C locale through the use of the C functions
toupper/tolower. This is absolutely guaranteed to work --
- but only if you're using English text (bummer). A much better and
- more portable solution is to use a facet for a particular locale
- and call its conversion functions. (These are discussed more in
- Chapter 22.)
+ but only if the string contains only characters
+ from the basic source character set, and there are only
+ 96 of those. Which means that not even all English text can be
+ represented (certain British spellings, proper names, and so forth).
+ So, if all your input forevermore consists of only those 96
+ characters (hahahahahaha), then you're done.
+
+ At minimum, you can write +
+The correct method is to use a facet for a particular locale + and call its conversion functions. These are discussed more in + Chapter 22; the specific part is + here, which shows the + final version of this code. (Thanks to James Kanze for assistance + and suggestions on all of this.)
Another common operation is trimming off excess whitespace. Much
like transformations, this task is trivial with the use of string's
@@ -297,7 +320,7 @@
Comments and suggestions are welcome, and may be sent to
Phil Edwards or
Gabriel Dos Reis.
-
$Id: howto.html,v 1.2 2000/07/07 21:13:28 pme Exp $
+
$Id: howto.html,v 1.3 2000/07/11 21:45:07 pme Exp $
He also writes: +
+ Please note that I still consider this detailed description of + locales beyond the needs of most C++ programmers. It is written + with experienced programmers in mind and novices will do best to + avoid it. ++
Return to top of page or to the FAQ.
@@ -92,6 +101,114 @@ functionality are given. to the FAQ. +A very common question on newsgroups and mailing lists is, "How + do I do <foo> to a character string?" where <foo> is + a task such as changing all the letters to uppercase, to lowercase, + testing for digits, etc. A skilled and conscientious programmer + will follow the question with another, "And how do I make the + code portable?" +
+(Poor innocent programmer, you have no idea the depths of trouble + you are getting yourself into. 'Twould be best for your sanity if + you dropped the whole idea and took up basket weaving instead. No? + Fine, you asked for it...) +
+The task of changing the case of a letter or classifying a character + as numeric, graphical, etc, all depends on the cultural context of the + program at runtime. So, first you must take the portability question + into account. Once you have localized the program to a particular + natural language, only then can you perform the specific task. + Unfortunately, specializing a function for a human language is not + as simple as declaring + extern "Danish" int tolower (int); . +
+The C++ code to do all this proceeds in the same way. First, a locale + is created. Then member functions of that locale are called to + perform minor tasks. Continuing the example from Chapter 21, we wish + to use the following convenience functions: +
+ namespace std {
+ template <class charT>
+ charT
+ toupper (charT c, const locale& loc) const;
+ template <class charT>
+ charT
+ tolower (charT c, const locale& loc) const;
+ }
+ This function extracts the appropriate "facet" from the
+ locale loc and calls the appropriate member function of that
+ facet, passing c as its argument. The resulting character
+ is returned.
+
+ For the C/POSIX locale, the results are the same as calling the + classic C toupper/tolower function that was used in previous + examples. For other locales, the code should Do The Right Thing. +
+Of course, these functions take a second argument, and the + transformation algorithm's operator argument can only take a single + parameter. So we write simple wrapper structs to handle that. +
+The next-to-final version of the code started in Chapter 21 looks like: +
+ #include <iterator> // for back_inserter
+ #include <locale>
+ #include <string>
+ #include <algorithm>
+ #include <cctype> // old <ctype.h>
+
+ struct Toupper
+ {
+ Toupper (std::locale const& l) : loc(l) {;}
+ char operator() (char c) { return std::toupper(c,loc); }
+ private:
+ std::locale const& loc;
+ };
+
+ struct Tolower
+ {
+ Tolower (std::locale const& l) : loc(l) {;}
+ char operator() (char c) { return std::tolower(c,loc); }
+ private:
+ std::locale const& loc;
+ };
+
+ int main ()
+ {
+ std::string s ("Some Kind Of Initial Input Goes Here");
+ Toupper up ( std::locale("C") );
+ Tolower down ( std::locale("C") );
+
+ // Change everything into upper case
+ std::transform (s.begin(), s.end(), s.begin(),
+ up
+ );
+
+ // Change everything into lower case
+ std::transform (s.begin(), s.end(), s.begin(),
+ down
+ );
+
+ // Change everything back into upper case, but store the
+ // result in a different string
+ std::string capital_s;
+ std::transform (s.begin(), s.end(), std::back_inserter(capital_s),
+ up
+ );
+ }
+
+ The final version of the code uses bind2nd to eliminate + the wrapper structs, but the resulting code is tricky. I have not + shown it here because no compilers currently available to me will + handle it. +
+Return to top of page or + to the FAQ. +
+ + @@ -101,7 +218,7 @@ functionality are given. Comments and suggestions are welcome, and may be sent to Phil Edwards or Gabriel Dos Reis. -We'd also like to thank the folks who have contributed time and energy in testing libstdc++-v3, especially those sending in testsuite - evaluations: + evaluations and documentation corrections: