ChapterÂ 7.Â Strings

String Classes

Simple Transformations

Here are Standard, simple, and portable ways to perform common transformations on a string instance, such as "convert to all upper case." The word transformations @@ -64,7 +64,7 @@ are overloaded names (declared in <cctype> and <locale>) so the template-arguments for transform<> cannot be deduced, as explained in - this + this message. At minimum, you can write short wrappers like @@ -89,9 +89,9 @@ str.erase(notwhite+1);

Obviously, the calls to find could be inserted directly into the calls to erase, in case your compiler does not optimize named temporaries out of existence. -

Case Sensitivity

The well-known-and-if-it-isn't-well-known-it-ought-to-be - Guru of the Week + Guru of the Week discussions held on Usenet covered this topic in January of 1998. Briefly, the challenge was, âwrite a 'ci_string' class which is identical to the standard 'string' class, but is @@ -108,10 +108,10 @@ assert( strcmp( s.c_str(), "AbCdE" ) == 0 ); assert( strcmp( s.c_str(), "abcde" ) != 0 );

The solution is surprisingly easy. The original answer was posted on Usenet, and a revised version appears in Herb Sutter's - book Exceptional C++ and on his website as GotW 29. + book Exceptional C++ and on his website as GotW 29.

See? Told you it was easy!

Added June 2000: The May 2000 issue of C++ - Report contains a fascinating article by + Report contains a fascinating article by Matt Austern (yes, the Matt Austern) on why case-insensitive comparisons are not as easy as they seem, and why creating a class is the wrong way to go @@ -123,10 +123,10 @@ that nobody ever called me on it...) The GotW question and answer remain useful instructional tools, however.

Added September 2000: James Kanze provided a link to a - Unicode + Unicode Technical Report discussing case handling, which provides some very good information. -

Arbitrary Character Types

The std::basic_string is tantalizingly general, in that it is parameterized on the type of the characters which it holds. In theory, you could whip up a Unicode character class and instantiate @@ -169,18 +169,18 @@ works and can be specialized even for int and other built-in types.

If you want to use your own special character class, then you have - a lot + a lot of work to do, especially if you with to use i18n features (facets require traits information but don't have a traits argument). -

Another example of how to specialize char_traits was given on the +

Another example of how to specialize char_traits was given on the mailing list and at a later date was put into the file include/ext/pod_char_traits.h. We agree that the way it's used with basic_string (scroll down to main()) - doesn't look nice, but that's because the - nice-looking first attempt turned out to not + doesn't look nice, but that's because the + nice-looking first attempt turned out to not be conforming C++, due to the rule that CharT must be a POD. (See how tricky this is?) -

Tokenizing

The Standard C (and C++) function strtok() leaves a lot to be desired in terms of user-friendliness. It's unintuitive, it destroys the character string on which it operates, and it requires @@ -256,7 +256,7 @@ stringtok(Container &container, string const &in, tokenizing as well. Build an istringstream from the input text, and then use std::getline with varying delimiters (the three-argument signature) to extract tokens into a string. -

Shrink to Fit

From GCC 3.4 calling s.reserve(res) on a string s with res < s.capacity() will reduce the string's capacity to std::max(s.size(), res). @@ -269,10 +269,10 @@ stringtok(Container &container, string const &in, (see this FAQ entry) but the regular copy constructor cannot be used because libstdc++'s string is Copy-On-Write. -

In C++0x mode you can call +

In C++11 mode you can call s.shrink_to_fit() to achieve the same effect as s.reserve(s.size()). -

CString (MFC)

A common lament seen in various newsgroups deals with the Standard string class as opposed to the Microsoft Foundation Class called CString. Often programmers realize that a standard portable @@ -280,9 +280,9 @@ stringtok(Container &container, string const &in, their application from a Win32 platform, they discover that they are relying on special functions offered by the CString class.

Things are not as bad as they seem. In - this + this message, Joe Buck points out a few very important things: -

The Standard string supports all the operations +
- The Standard string supports all the operations that CString does, with three exceptions.
- Two of those exceptions (whitespace trimming and case conversion) are trivial to implement. In fact, we do so @@ -340,7 +340,7 @@ stringtok(Container &container, string const &in, performance is O(n).
  Joe Buck also pointed out some other things to keep in mind when comparing CString and the Standard string class: -
  - CString permits access to its internal representation; coders +
    CString permits access to its internal representation; coders who exploited that may have problems moving to string.
    Microsoft ships the source to CString (in the files MFC\SRC\Str{core,ex}.cpp), so you could fix the allocation @@ -360,7 +360,7 @@ stringtok(Container &container, string const &in, libstdc++ string, the SGI string, and the SGI rope, and this is all before any allocator or traits customizations! (More choices than you can shake a stick at -- want fries with that?) -

PrevÂ

Â Next

TraitsÂ

Home

Â ChapterÂ 8.Â +

PrevÂ	Up	Â Next
TraitsÂ	Home	Â ChapterÂ 8.Â Localization

ChapterÂ 7.Â +