TITLE: Monolithic classes (Source: comp.lang.c++.moderated, 8 Aug 2002) ---------------------------------------------------------------------------- ----- Clarification: the code in a C++ tip does not imply that I think the code exhibits good programming practice or is of production quality, only that there is worthwhile technical content. For past tips, see http://cpptips.hyperformix.com -Allan "Need a technical writer? See our corporate sponsor at http://www.prospring.net" ---------------------------------------------------------------------------- ----- SUTTER: Herb Sutter (hsutter@gotw.ca) ------------------------------------------------------------------- Guru of the Week problems and solutions are posted regularly on news:comp.lang.c++.moderated. For past problems and solutions see the GotW archive at www.GotW.ca. (c) 2002 H.P.Sutter News archives may keep copies of this article. ------------------------------------------------------------------- ______________________________________________________________________ GotW #84: Monoliths "Unstrung" Difficulty: 3 / 10 ______________________________________________________________________ [Wow, at nearly 7,200 words this is the longest GotW ever! Whew. -hps] Avoid Unduly Monolithic Designs ------------------------------- >1. What is a "monolithic" class, and why is it bad? Explain. The word "monolithic" is used to describe software that is a single big, indivisible piece, like a monolith. The word "monolith" comes from the words "mono" (one) and "lith" (stone) whose vivid image of a single solid and heavy stone well illustrates the heavyweight and indivisible nature of such code. A single heavyweight facility that thinks it does everything is often a dead end. After all, often a single big heavyweight facility doesn't really do more -- it often does less, because the more complex it is, the narrower its application and relevance is likely to be. In particular, a class might fall into the monolith trap by trying to offer its functionality through member functions instead of nonmember functions, even when nonmember nonfriend functions would be possible and at least as good. This has at least two drawbacks: -- (Major) It isolates potentially independent functionality inside a class. The operation in question might otherwise be nice to use with other types, but because it's hardwired into a particular class that won't be possible, whereas if it were exposed as a nonmember function template it could be more widely usable. -- (Minor) It can discourage extending the class with new functionality. "But wait!" someone might say. "It doesn't matter whether the class's existing interface leans toward member or nonmember functions, because I can equally well extend it with my own new nonmember functions either way!" That's technically true, and misses the semantic point: If the class's built-in functionality is offered mainly (or only) via member functions and therefore present that as the class's natural idiom, then the extended nonmember functions cannot follow the original natural idiom and will always remain visibly second-class johnny-come-latelies. That the class presents its functions as members is a semantic statement to its users, who will be used to the member syntax that extenders of the class cannot use. (Do we really want to go out of our way to make our users commonly have to look up which functions are members and which ones aren't? That already happens often enough even in better designs.) Where it's practical, break generic components down into pieces. Guideline: Prefer "one class (or function), one responsibility." Guideline: Where possible, prefer writing functions as nonmember nonfriends. The rest of this article will go further toward justifying the latter guideline. The Basics of Strings --------------------- >2. List all the member functions of std::basic_string. It's, ahem, a fairly long list. Counting constructors, there are no fewer than 103 member functions. Really. If that's not monolithic, it's hard to imagine what would be. Imagine yourself in an underground cavern, at the edge of a subterranean lake. There is a submerged channel that leads to the next air-filled cavern. You prepare to dive into the black water and swim through the flooded tunnel... Take a deep breath, and repeat after ISO/IEC 14882:1998(E) [1]: namespace std { template, class Allocator = allocator > class basic_string { // ... some typedefs ... explicit basic_string(const Allocator& a = Allocator()); basic_string(const basic_string& str, size_type pos = 0, size_type n = npos, const Allocator& a = Allocator()); basic_string(const charT* s, size_type n, const Allocator& a = Allocator()); basic_string(const charT* s, const Allocator& a = Allocator()); basic_string(size_type n, charT c, const Allocator& a = Allocator()); template basic_string(InputIterator begin, InputIterator end, const Allocator& a = Allocator()); ~basic_string(); basic_string& operator=(const basic_string& str); basic_string& operator=(const charT* s); basic_string& operator=(charT c); iterator begin(); const_iterator begin() const; iterator end(); const_iterator end() const; reverse_iterator rbegin(); const_reverse_iterator rbegin() const; reverse_iterator rend(); const_reverse_iterator rend() const; size_type size() const; size_type length() const; size_type max_size() const; void resize(size_type n, charT c); void resize(size_type n); size_type capacity() const; void reserve(size_type res_arg = 0); void clear(); bool empty() const; const_reference operator[](size_type pos) const; reference operator[](size_type pos); const_reference at(size_type n) const; reference at(size_type n); basic_string& operator+=(const basic_string& str); basic_string& operator+=(const charT* s); basic_string& operator+=(charT c); basic_string& append(const basic_string& str); basic_string& append(const basic_string& str, size_type pos, size_type n); basic_string& append(const charT* s, size_type n); basic_string& append(const charT* s); basic_string& append(size_type n, charT c); template basic_string& append(InputIterator first, InputIterator last); void push_back(const charT); basic_string& assign(const basic_string&); basic_string& assign(const basic_string& str, size_type pos, size_type n); basic_string& assign(const charT* s, size_type n); basic_string& assign(const charT* s); basic_string& assign(size_type n, charT c); template basic_string& assign(InputIterator first, InputIterator last); basic_string& insert(size_type pos1, const basic_string& str); basic_string& insert(size_type pos1, const basic_string& str, size_type pos2, size_type n); basic_string& insert(size_type pos, const charT* s, size_type n); basic_string& insert(size_type pos, const charT* s); basic_string& insert(size_type pos, size_type n, charT c); iterator insert(iterator p, charT c); void insert(iterator p, size_type n, charT c); template void insert(iterator p, InputIterator first, InputIterator last); You break surface at a small air pocket and gasp. Don't give up -- we're halfway there! Another deep breath, and: basic_string& erase(size_type pos = 0, size_type n = npos); iterator erase(iterator position); iterator erase(iterator first, iterator last); basic_string& replace(size_type pos1, size_type n1, const basic_string& str); basic_string& replace(size_type pos1, size_type n1, const basic_string& str, size_type pos2, size_type n2); basic_string& replace(size_type pos, size_type n1, const charT* s, size_type n2); basic_string& replace(size_type pos, size_type n1, const charT* s); basic_string& replace(size_type pos, size_type n1, size_type n2, charT c); basic_string& replace(iterator i1, iterator i2, const basic_string& str); basic_string& replace(iterator i1, iterator i2, const charT* s, size_type n); basic_string& replace(iterator i1, iterator i2, const charT* s); basic_string& replace(iterator i1, iterator i2, size_type n, charT c); template basic_string& replace(iterator i1, iterator i2, InputIterator j1, InputIterator j2); size_type copy(charT* s, size_type n, size_type pos = 0) const; void swap(basic_string&); const charT* c_str() const; // explicit const charT* data() const; allocator_type get_allocator() const; size_type find (const basic_string& str, size_type pos = 0) const; size_type find (const charT* s, size_type pos, size_type n) const; size_type find (const charT* s, size_type pos = 0) const; size_type find (charT c, size_type pos = 0) const; size_type rfind(const basic_string& str, size_type pos = npos) const; size_type rfind(const charT* s, size_type pos, size_type n) const; size_type rfind(const charT* s, size_type pos = npos) const; size_type rfind(charT c, size_type pos = npos) const; size_type find_first_of(const basic_string& str, size_type pos = 0) const; size_type find_first_of(const charT* s, size_type pos, size_type n) const; size_type find_first_of(const charT* s, size_type pos = 0) const; size_type find_first_of(charT c, size_type pos = 0) const; size_type find_last_of (const basic_string& str, size_type pos = npos) const; size_type find_last_of (const charT* s, size_type pos, size_type n) const; size_type find_last_of (const charT* s, size_type pos = npos) const; size_type find_last_of (charT c, size_type pos = npos) const; size_type find_first_not_of(const basic_string& str, size_type pos = 0) const; size_type find_first_not_of(const charT* s, size_type pos, size_type n) const; size_type find_first_not_of(const charT* s, size_type pos = 0) const; size_type find_first_not_of(charT c, size_type pos = 0) const; size_type find_last_not_of (const basic_string& str, size_type pos = npos) const; size_type find_last_not_of (const charT* s, size_type pos, size_type n) const; size_type find_last_not_of (const charT* s, size_type pos = npos) const; size_type find_last_not_of (charT c, size_type pos = npos) const; basic_string substr(size_type pos = 0, size_type n = npos) const; int compare(const basic_string& str) const; int compare(size_type pos1, size_type n1, const basic_string& str) const; int compare(size_type pos1, size_type n1, const basic_string& str, size_type pos2, size_type n2) const; int compare(const charT* s) const; int compare(size_type pos1, size_type n1, const charT* s, size_type n2 = npos) const; }; } Whew -- 103 member functions or member function templates! The rocky tunnel expands and we break through a new lake's surface just in time. Somewhat waterlogged but better persons for the experience, we look around inside the new cavern to see if the exercise has let us discover anything interesting. Was the recitation worth it? With the drudge work now behind us, let's look back at what we've just traversed and use the information to analyze Question 3: Membership Has Its Rewards -- and Its Costs ------------------------------------------- >3. Which ones should be members, and which should not? Why? Recall the advice from #1: Where it's practical, break generic components down into pieces. Guideline: Prefer "one class (or function), one responsibility." Guideline: Where possible, prefer writing functions as nonmember nonfriends. For example, if you write a string class and make searching, pattern-matching, and tokenizing available as member functions, you've hardwired those facilities so that they can't be used with any other kind of sequence. (If this frank preamble is giving you an uncomfortable feeling about basic_string, well and good.) On the other hand, a facility that accomplishes the same goal but is composed of several parts that are also independently usable is often a better design. In this example, it's often best to separate the algorithm from the container, which is what the STL does most of the time. I [2,3] and Scott Meyers [4] have written before on why some nonmember functions are a legitimate part of a type's interface, and why nonmember nonfriends should be preferred (among other reasons, to improve encapsulation). For example, as Scott wrote in his opening words of the cited article: I'll start with the punchline: If you're writing a function that can be implemented as either a member or as a non-friend non-member, you should prefer to implement it as a non-member function. That decision increases class encapsulation. When you think encapsulation, you should think non-member functions. [4] So when we consider the functions that will operate on a basic_string (or any other class type), we want to make them nonmember nonfriends if reasonably possible. Hence, here are some questions to ask about the members of basic_string in particular: -- Always make it a member if it has to be one: Which operations must be members, because the C++ language just says so (e.g., constructors) or because of functional reasons (e.g., they must be virtual)? If they have to be, then oh well, they just have to be; case closed. -- Prefer to make it a member if it needs access to internals: Which operations need access to internal data we would otherwise have to grant via friendship? These should normally be members. (There are some rare exceptions such as operations needing conversions on their left-hand arguments and some like operator<<() whose signatures don't allow the *this reference to be their first parameters; even these can normally be nonfriends implemented in terms of (possibly virtual) members, but sometimes doing that is merely an exercise in contortionism and they're best and naturally expressed as friends.) -- In all other cases, prefer to make it a nonmember nonfriend: Which operations can work well as nonmember nonfriends? These can, and therefore normally should, be nonmembers. This should be the default case to strive for. It's important to ask these and related questions because we want to put as many functions into the last bucket as possible. Operations That Must Be Members ------------------------------- As a first pass, let's sift out those operations that just have to be members. There are some obvious ones at the beginning of the list. constructors (6) destructor assignment operators (3) [] operators (2) Clearly the above functions must be members. It's impossible to write a constructor, destructor, assignment operator, or [] operator in any other way! 12 functions down, 91 to go... Operations That Should Be Members --------------------------------- We asked: Which operations need access to internal data we would otherwise have to grant via friendship? Clearly these have a reason to be members, and normally ought to be. This list includes the following, some of which provide indirect access to (e.g., begin()) or change (e.g., reserve()) the internal state of the string: begin (2) end (2) rbegin (2) rend (2) size max_size capacity reserve swap c_str data get_allocator The above ought to be members not only because they're tightly bound to basic_string, but they also happen to form the public interface that nonmember nonfriend functions will need to use. Sure, you could implement these as nonmember friends, but why? There are a few more that I'm going to add to this list as fundamental string operations: insert (three-parameter version) erase (1 -- the "iter, iter" version) replace (2 -- the "iter, iter, num, char" and templated versions) We'll return to the question of insert(), erase(), and replace() a little later. For replace() in particular, it's important to be able to choose well and make the most flexible and fundamental version(s) into member(s). Into the Fray: Possibly-Contentious Operations That Can Be Nonmember Nonfriends ----------------------------------------------- First in this section, allow me to perform a stunning impersonation of a lightning rod by pointing out that all of the following functions have something fundamental in common, to wit: Each one could obviously as easily -- and as efficiently -- be a nonmember nonfriend. at (2) clear empty length Sure they can be nonmember nonfriends, that's obvious, no sweat. Of course, the above functions also happen to have something else pretty fundamental in common: They're mentioned in the standard library's container and sequence requirements, as member functions. Hence the lightning-rod effect... "Yeah, wait!" I can already hear some standardiste-minded people saying, heading in my direction and resembling the beginnings of a brewing lynch mob. "Not so fast! Don't you know that basic_string is designed to meet the C++ Standard's container and sequence requirements, and those requirements require or suggest that some of those functions be members? So quit misleading the readers! They're members -- live with it!" Indeed, and true, but for the sake of this discussion I'm going to waive those requirements with a dismissive wave of a hand and a quick escape across some back yards past small barking dogs and various clothes drying on the line. Having left my pursuers far enough behind to resume reasoned discourse, here's the point: For once, the question I'm considering here isn't what the container requirements say. It's which functions can without loss of efficiency be made nonmember nonfriends, and whether there's any additional benefit to be gained from doing so. If those benefits exist for something the container requirements say must be a member, well, why not point out that the container requirements could be improved while we're at it? And so we shall... Take empty() as a case in point. Can we implement it as a nonmember nonfriend? Sure... the standard itself requires the following behavior of basic_string::empty(), in the C++ Standard, subclause 21.3.3, paragraph 14: Returns: size() == 0. Well, now, that's pretty easy to write as a nonmember without loss of efficiency: template bool empty( const basic_string& s ) { return s.size() == 0; } Notice that, while we can make size() a member and implement a nonmember empty() in terms of it, we could not do the reverse. In several cases here there's a group of related functions, and perhaps more than one could be nominated as a member and the others implemented in terms of that one as nonmembers. Which function should we nominate to be the member? My advice is to choose the most flexible one that doesn't force a loss of efficiency -- that will be the enabling flexible foundation on which the others can be built. In this case, we choose size() as the member because its result can always be cached (indeed, the standard encourages that it be cached because size() "should" run in constant time), in which case an empty() implemented only in terms of size() is no less efficient than anything we could do with full direct access to the string's internal data structures. For another case in point, what about at()? The same reasoning applies. For both the const and non-const versions, the standard requires the following: Throws: out_of_range if pos >= size(). Returns: operator[]( pos ). That's easy to provide as a nonmember, too. Each is just a two-line function template, albeit a bit syntactically tedious because of all those template parameters and nested type names -- and remember all your typenames! Here's the code: template typename basic_string::const_reference at( const basic_string& s, typename basic_string::size_type pos ) { if( pos >= s.size() ) throw out_of_range( "don't do that" ); return s[pos]; } template typename basic_string::reference at( basic_string& s, typename basic_string::size_type pos ) { if( pos >= s.size() ) throw out_of_range( "I said, don't do that" ); return s[pos]; } What about clear()? Easy, that's the same as erase(begin(),end()). No fuss, no muss. Exercise for the reader, and all that. What about length()? Easy, again -- it's defined to give the same result as size(). What's more, note that the other containers don't have length(), and it's there in the basic_string interface as a sort of "string thing", but by making it a nonmember suddenly we can consisently say "length()" about any container. Not too useful in this case because it's just a synonym for size(), I grant you, but a noteworthy point in the principle it illustrates -- making algorithms nonmembers immediately also makes them more widely useful and usable. In summary, let's consider the benefits and drawbacks of providing functions like at() and empty() as members vs. nonmembers. First, just because we can write these members as nonmembers (and nonfriends) without loss of efficiency, why should we do so? What are the actual or potential benefits? Here are several: -- Simplicity. Making them nonmembers lets us write and maintain less code. We can write empty() just once and be done with it forever. Why write it many times as basic_string::empty() and vector::empty() and list::empty() and so forth, including writing them over and over again for each new STL-compliant container that comes along in third-party or internal libraries and even in future versions of the C++ standard? Write it once, be done, move on. -- Consistency. It avoids gratuitous incompatibilities between the member algorithms of different containers, and between the member and nonmember versions of algorithms (some existing inconsistencies between members and nonmembers are pointed out in [5]). If customized behavior is needed, then specialization or overloading of the nonmember function templates should be able to accommodate it. -- Encapsulation. It improves encapsulation (as argued by Meyers strongly and at length in [4]). Having said that, what are the potential drawbacks? Here are two, although in my opinion they are outweighed by the above advantages: -- Consistency. You could argue that keeping things like empty() as members follows the principle of least surprise -- similar functions are members, and in other class libraries things like IsEmpty() functions are commonly members, after all. I think this argument is valid, but weakened when we notice that this wouldn't be surprising at all if people were in the habit of following Meyers' advice in the first place, routinely writing functions as nonmembers whenever reasonably possible. So the question really comes down to whether we ought to trade away real benefits in order to follow a tradition, or to change the tradition to get real benefits. (Of course, if we didn't already have size(), then implementing empty() in particular as a nonmember nonfriend would not be possible. The class's public member functions do have to provide the necessary and sufficient functionality already.) I think this argument is weak because writing them as nonmembers yields a greater consistency, as noted above, than the questionable inconsistency being claimed here. -- Namespace pollution. Because empty() is such a common name, putting it at namespace scope risks namespace pollution -- after all, will every function named empty() want to have exactly these semantics? This argument is also valid to a point, but weakened in two ways: First, by noting that encouraging consistent semantics for functions is a Good Thing; and second, by noticing that overload resolution has turned out to be a very good tool for disambiguation, and namespace pollution has never been as big a problem as some have made it out to be in the past. Really, by putting all the names in one place and sharing implementations, we're not polluting that one namespace as much as we're sanitizing the functions by gathering them up and helping to avoid the gratuitous and needless incompatibilities that Meyers mentions (see above). With perhaps these more contentious choices out of the way, then, let's continue with other operations that can be nonmember nonfriends. Some, like those listed above, are mentioned in the container requirements as members; again, here we're considering not what the standard says we must do, but rather what given a blank page we might choose to do in designing these as members vs. nonmember nonfriends. More Operations That Can Be Nonmember Nonfriends ------------------------------------------------ Further, all of the remaining functions can be implemented as nonmember nonfriends: resize (2) assign (6) += (3) append (6) push_back insert (7 -- all but the three-parameter version) erase (2 -- all but the "iter, iter" version) replace (8 -- all but the "iter, iter, num, char" and templated versions) copy substr compare (5) find (4) rfind (4) find_first_of (4) first_last_of (4) find_first_not_of (4) find_last_not_of (4) Consider first resize(): void resize(size_type n, charT c); void resize(size_type n); Can each resize() be a nonmember nonfriend? Sure it can, because it can be implemented in terms of basic_string's public interface without loss of efficiency. Indeed, the standard's own functional specifications express both versions in terms of the functions we've already considered above. For example: template void resize( basic_string& s, typename Allocator::size_type n, charT c) { if( n > s.max_size() ) throw length_error( "won't fit" ); if( n <= s.size() ) { basic_string temp( s, 0, n ); s.swap( temp ); } else { s.append( n - s.size(), c ); } } template void resize( basic_string& s, typename Allocator::size_type n ) { resize( s, n, charT() ); } Next, assign(): We have six, count 'em, six forms of assign(). Fortunately, this case is simple: most of them are already specified in terms of each other, and all can be implemented in terms of a constructor and operator=() combination. Next, operator+=(), append(), and push_back(): What about all those pesky flavors of appending operations, namely the three forms of operator+=(), the six-count'em-six forms of append(), and the lone push_back()? Just the similarity ought to alert us to the fact that probably at most one needs to be a member; after all, they're doing about the same thing, even if the details differ slightly, for example appending a character in one case, a string in another case, and an iterator range in still another case. Indeed, as it turns out, all of them can likewise be implemented as nonmember nonfriends without loss of efficiency: -- Clearly operator+=() can be implemented in terms of append(), because that's how it's specified in the C++ standard. -- Equally clearly, five of the six versions of append() can be nonmember nonfriends because they are specified in terms of the three-parameter version of append(), and that in turn can be implemented in terms of insert(), all of which quite closes the append() category. -- Determining the status of push_back() takes only slightly more work, because its operational semantics aren't specified in the basic_string section, but only in the container requirements section, and there we find the specification that a.push_back(x) is just a synonym for a.insert(a.end(),x). What's the linchpin holding all the members of this group together as valid nonmember nonfriends? It's insert(), hence insert()'s status as a good choice to be the member that does the work and encapsulates the append-related access to the string's internals in one place, instead of spreading the internal access all over the place in a dozen different functions. Next, insert(): For those of you who might think that six-count'em-six versions of assign() and six-count'em-six versions of append() might have been a little much, those were just the warmup... now we consider the eight-count'em-eight versions of insert(). Above, I've already nominated the three-parameter version of insert() as a member, and now it's time to justify why. First, as noted above, insert() is a more general operation than append(), and having a member insert() allowed all the append() operations to be nonmember nonfriends; if we didn't have at least one insert() as a member, then at least one of the append()s would have had to be a member, and so I chose to nominate the more fundamental and flexible operation. But we have eight-count'em-eight flavors of insert() -- which one (or more) ought to be the member(s)? Five of the eight forms of insert() are already specified in terms of the three-parameter form, and the others can also be implemented efficiently in terms of the three-parameter form, so we only need that one form to be a member. The others can all be nonmember nonfriends. For those of you who might think that the either-count'em-eight versions of insert() take the cake, well, that was warmup too. In a moment we'll consider the ten-count'em-ten forms of replace(). Before we attempt those, though, let's take a short break to tackle an easier function first, because it turns out that erase() is instructive in building up to dealing with replace()... Coffee Break (Sort Of): Erasing erase() --------------------------------------- Next erase(): After talking about the total 30-count'em-30 flavors of assign(), append(), insert(), and replace() -- and having dealt with 20 of the 30 already above -- you will be relieved to know that there are only three forms of erase(), and that only two of them belong in this section. After what we just went through for the others, this is like knocking off for a well-deserved coffee break... The troika of erase() members is a little interesting. At least one of these erase() functions must be a member (or friend), there being no other way to erase efficiently using the other already-mentioned member functions alone. There are actually two "families" of erase() functions: erase( pos, length ) basic_string& erase(size_type pos = 0, size_type n = npos); erase( iter, ... ) iterator erase(iterator position); iterator erase(iterator first, iterator last); First, notice that the two families' return types are not consistent: the first version returns a reference to the string, whereas the other two return iterators pointing immediately after the erased character or range. Second, notice that the two families' argument types are not consistent: the first version takes an offset and length, whereas the other two take an iterator or iterator range; fortunately, we can convert from iterators to offsets via pos = iter - begin() and from offsets to iterators via iter = begin() + pos. (Aside: The standard does not require, but an implementer can choose, that basic_string objects store their data in a contiguous charT array buffer in memory. If they do, then the conversion from iterators to positional offsets and vice versa clearly need incur no overhead. (I would argue that even segmented storage schemes could provide for very efficient conversion back and forth between offsets and iterators using only the container's and iterator's public interfaces.) This allows the first two forms to be expressed in terms of the third (again, remember your typenames and qualifications!): template basic_string& erase( basic_string& s, typename Allocator::size_type pos = 0, typename Allocator::size_type n = basic_string::npos ) { if( pos > s.size() ) throw out_of_range( "yes, we have no bananas" ); typename basic_string::iterator first = s.begin()+pos, last = n == basic_string::npos ? s.end() : first + n; s.erase( first, last ); return s; } template typename basic_string::iterator erase( basic_string& s, typename basic_string::iterator position ) { return s.erase( position, s.end() ); } OK, coffee break's over... Back to Work: Replacing replace() --------------------------------- Next, replace(): Truth be told, the ten-count'em-ten replace() members are less interesting than they are tedious and exhausting. At least one of these replace() functions must be a member (or friend), there being no other way to replace efficiently using the other already-mentioned member functions alone. In particular, note that you can't efficiently implement replace() in terms erase() followed by insert() or vice versa because both ways would require more character shuffling, and possibly buffer reallocation. Note again that we have two families of replace() functions: replace( pos, length, ... ) basic_string& replace(size_type pos1, size_type n1, // #1 const basic_string& str); basic_string& replace(size_type pos1, size_type n1, // #2 const basic_string& str, size_type pos2, size_type n2); basic_string& replace(size_type pos, size_type n1, const charT* s, // #3 size_type n2); basic_string& replace(size_type pos, size_type n1, const charT* s); // #4 basic_string& replace(size_type pos, size_type n1, size_type n2, // #5 charT c); replace( iter, iter, ... ) basic_string& replace(iterator i1, iterator i2, // #6 const basic_string& str); basic_string& replace(iterator i1, iterator i2, const charT* s, // #7 size_type n); basic_string& replace(iterator i1, iterator i2, const charT* s); // #8 basic_string& replace(iterator i1, iterator i2, // #9 size_type n, charT c); template // #10 basic_string& replace(iterator i1, iterator i2, InputIterator j1, InputIterator j2); This time, the two families' return types are consistent; that's a small pleasure. But, as with erase(), the two families' argument types are not consistent: one family is based on an offset and length, whereas the other is based on an iterator range. As with erase(), because we can convert between iterators and positions, we can easily implement one family in terms of the other. When considering which must be members, we want to choose the most flexible and fundamental version(s) as members and implement the rest in terms of those. Here are a few pitfalls one might encounter while doing this analysis, and how one might avoid them. Consider first the first family: -- One function (#2)? One might notice that the standard specifies all of the first family in terms of the #2 version. Unfortunately, some of the passthroughs would needlessly construct temporary basic_string objects, so we can't get by with #2 alone even for just the first family. The standard specifies the observable behavior, but the operational specification isn't necessarily the best way to actually implement the a given function. -- Two functions (#3 and #5)? One might notice that all but #5 in the first family can be implemented efficiently in terms of #3, but then #5 would still need to be special-cased to avoid needlessly creating a temporary string object (or its equivalent). Consider second the second family: -- One function (#6)? One might notice that the standard specifies all of the second family in terms of #6. Unfortunately, again, some of the passthroughs would needlessly construct temporary basic_string objects, so we can't get by with #6 alone even for just the second family. -- Three functions (#7, #9, #10)? One might notice that most of the functions in the second family can be implemented efficiently in terms of #7, except for #9 (for the same reason that made #5 the outcast in the first family, namely that there was no existing buffer with the correct contents to point to) and #10 (which cannot assume that iterators are pointers, or even for that matter basic_string::iterators!). -- Two functions (#9, #10)! One might then immediately notice that all but #9 in the second family can be implemented efficiently in terms of #10, including #7. In fact, assuming string contiguity and position/iterator convertability as we've already assumed, we could probably even handle all the members of the first family... aha! That's it. So it appears that the best we can do is two member functions upon which we can then build everything else as nonmember nonfriends. The members are the "iter, iter, num, char" and templated versions. The nonmembers are everything else. (Exercise for the reader: For each of the other eight versions, write sample implementations as efficient nonmember nonfriends.) Note that #10 well illustrates the power of templates -- this one function can be used to implement all but two of the others without any loss of efficiency, and to implement the remaining ones with what would probably be only a minor loss of efficiency (constructing a temporary basic_string containing n copies of a character). Time for another quick coffee break... Coffee Break #2: Copying copy() and substr() -------------------------------------------- Oh, copy(), schmopy(). Note that copy() is a somewhat unusual beast, and that its interface is inconsistent with the std::copy() algorithm. Note again the signature: size_type copy(charT* s, size_type n, size_type pos = 0) const; The function is const; it does not modify the string. Rather, what the string object has to do is copy part of itself (up to n characters, starting at position pos) and dump it into the target buffer (note, I deliberately did not say "C-style string"), s, which is required to be big enough -- if it's not, oh well, then the program will scribble onward into whatever memory happens to follow the string and get stuck somewhere in the Undefined Behavior swamp. And, better still, basic_string::copy() does not, repeat not, append a null object to the target buffer, which is what makes s not a C-style string (besides, charT doesn't need to be char; this function will copy into a buffer of whatever kind of characters the string itself is made of). It is also what makes copy() a dangerous function. Guideline: Never use functions that write to range-unchecked buffers (e.g., strcpy(), sprintf(), basic_string::copy()). They are not only crashes waiting to happen, but a clear and present security hazard -- buffer overrun attacks continue to be a perennially popular pastime for hackers and malware writers. All of the required work could be done pretty much as simply, and a lot more flexibly, just by using plain old std::copy(): string s = "0123456789"; char* buf1 = new char[5]; s.copy( buf1, 0, 5 ); // ok: buf will contain the chars // '0', '1', '2', '3', '4' copy( s.begin(), s.begin()+5, buf1 ); // ok: buf will contain the chars // '0', '1', '2', '3', '4' int* buf2 = new int[5]; s.copy( buf2, 0, 5 ); // error: first parameter is not char* copy( s.begin(), s.begin()+5, buf2 ); // ok: buf2 will contain the values // corresponding to '0', '1', '2', // '3', '4' (e.g., ASCII values) Incidentally, the above code also shows how basic_string::copy() can trivially be written as a nonmember nonfriend, most trivially in terms of the copy() algorithm -- another exercise for the reader, but do remember to handle the n == npos special case correctly. While we're taking a breather, let's knock off another simple one at the same time: substr(). Recall its signature: basic_string substr(size_type pos = 0, size_type n = npos) const; We'll neglect for the moment to comment on the irksome fact that this function chooses to take its parameters in the order "position, length", whereas the copy() we just considered takes the very same parameters in the order "length, position". Nor will we mention that, besides being aesthetically inconsistent, trying to remember which function takes the parameters in which order makes for an easy trap for users of basic_string to stumble into because both parameters also happen to be of the same type (size_type), worse luck, so that if the users get it wrong their code will continue to happily compile without any err