TITLE: Monolithic classes

(Source: comp.lang.c++.moderated, 8 Aug 2002)

----------------------------------------------------------------------------
-----
Clarification: the code in a C++ tip does not imply that I think the code 
  exhibits good programming practice or is of production quality, only that 
  there is worthwhile technical content. For past tips, see 
  http://cpptips.hyperformix.com   -Allan

"Need a technical writer?  See our corporate sponsor at
http://www.prospring.net"  
----------------------------------------------------------------------------
-----
 
SUTTER: Herb Sutter (hsutter@gotw.ca)

  -------------------------------------------------------------------
    Guru of the Week problems and solutions are posted regularly on
     news:comp.lang.c++.moderated. For past problems and solutions
       see the GotW archive at www.GotW.ca. (c) 2002 H.P.Sutter
             News archives may keep copies of this article.
  -------------------------------------------------------------------

______________________________________________________________________

GotW #84:   Monoliths "Unstrung"

Difficulty: 3 / 10
______________________________________________________________________


[Wow, at nearly 7,200 words this is the longest GotW ever! Whew. -hps]


Avoid Unduly Monolithic Designs
-------------------------------

 >1. What is a "monolithic" class, and why is it bad? Explain.

The word "monolithic" is used to describe software that is a single
big, indivisible piece, like a monolith.  The word "monolith" comes
from the words "mono" (one) and "lith" (stone) whose vivid image of a
single solid and heavy stone well illustrates the heavyweight and
indivisible nature of such code.

A single heavyweight facility that thinks it does everything is often
a dead end.  After all, often a single big heavyweight facility
doesn't really do more -- it often does less, because the more complex
it is, the narrower its application and relevance is likely to be.

In particular, a class might fall into the monolith trap by trying to
offer its functionality through member functions instead of nonmember
functions, even when nonmember nonfriend functions would be possible
and at least as good.  This has at least two drawbacks:

    -- (Major) It isolates potentially independent functionality inside a
       class.  The operation in question might otherwise be nice to use
       with other types, but because it's hardwired into a particular
       class that won't be possible, whereas if it were exposed as a
       nonmember function template it could be more widely usable.

    -- (Minor) It can discourage extending the class with new
       functionality.  "But wait!" someone might say.  "It doesn't
       matter whether the class's existing interface leans toward
       member or nonmember functions, because I can equally well extend
       it with my own new nonmember functions either way!"  That's
       technically true, and misses the semantic point:  If the class's
       built-in functionality is offered mainly (or only) via member
       functions and therefore present that as the class's natural
       idiom, then the extended nonmember functions cannot follow the
       original natural idiom and will always remain visibly
       second-class johnny-come-latelies.  That the class presents its
       functions as members is a semantic statement to its users, who
       will be used to the member syntax that extenders of the class
       cannot use.  (Do we really want to go out of our way to make our
       users commonly have to look up which functions are members and
       which ones aren't?  That already happens often enough even in
       better designs.)

Where it's practical, break generic components down into pieces.

    Guideline:  Prefer "one class (or function), one responsibility."

    Guideline:  Where possible, prefer writing functions as nonmember
                nonfriends.

The rest of this article will go further toward justifying the latter
guideline.


The Basics of Strings
---------------------

 >2. List all the member functions of std::basic_string.

It's, ahem, a fairly long list.  Counting constructors, there are no
fewer than 103 member functions.  Really.  If that's not monolithic,
it's hard to imagine what would be.

Imagine yourself in an underground cavern, at the edge of a
subterranean lake.  There is a submerged channel that leads to the
next air-filled cavern.  You prepare to dive into the black water and
swim through the flooded tunnel...

Take a deep breath, and repeat after ISO/IEC 14882:1998(E) [1]:

     namespace std {
       template<class charT, class traits = char_traits<charT>,
                class Allocator = allocator<charT> >
       class basic_string {

         // ... some typedefs ...

         explicit basic_string(const Allocator& a = Allocator());
         basic_string(const basic_string& str, size_type pos = 0,
                      size_type n = npos, const Allocator& a = Allocator());
         basic_string(const charT* s,
                      size_type n, const Allocator& a = Allocator());
         basic_string(const charT* s, const Allocator& a = Allocator());
         basic_string(size_type n, charT c, const Allocator& a =
Allocator());
         template<class InputIterator>
           basic_string(InputIterator begin, InputIterator end,
                        const Allocator& a = Allocator());
        ~basic_string();
         basic_string& operator=(const basic_string& str);
         basic_string& operator=(const charT* s);
         basic_string& operator=(charT c);

         iterator       begin();
         const_iterator begin() const;
         iterator       end();
         const_iterator end() const;

         reverse_iterator       rbegin();
         const_reverse_iterator rbegin() const;
         reverse_iterator       rend();
         const_reverse_iterator rend() const;

         size_type size() const;
         size_type length() const;
         size_type max_size() const;
         void resize(size_type n, charT c);
         void resize(size_type n);
         size_type capacity() const;
         void reserve(size_type res_arg = 0);
         void clear();
         bool empty() const;

         const_reference operator[](size_type pos) const;
         reference       operator[](size_type pos);
         const_reference at(size_type n) const;
         reference       at(size_type n);

         basic_string& operator+=(const basic_string& str);
         basic_string& operator+=(const charT* s);
         basic_string& operator+=(charT c);
         basic_string& append(const basic_string& str);
         basic_string& append(const basic_string& str, size_type pos,
                              size_type n);
         basic_string& append(const charT* s, size_type n);
         basic_string& append(const charT* s);
         basic_string& append(size_type n, charT c);
         template<class InputIterator>
           basic_string& append(InputIterator first, InputIterator last);
         void push_back(const charT);

         basic_string& assign(const basic_string&);
         basic_string& assign(const basic_string& str, size_type pos,
                              size_type n);
         basic_string& assign(const charT* s, size_type n);
         basic_string& assign(const charT* s);
         basic_string& assign(size_type n, charT c);
         template<class InputIterator>
           basic_string& assign(InputIterator first, InputIterator last);

         basic_string& insert(size_type pos1, const basic_string& str);
         basic_string& insert(size_type pos1, const basic_string& str,
                              size_type pos2, size_type n);
         basic_string& insert(size_type pos, const charT* s, size_type n);
         basic_string& insert(size_type pos, const charT* s);
         basic_string& insert(size_type pos, size_type n, charT c);
         iterator insert(iterator p, charT c);
         void     insert(iterator p, size_type n, charT c);
         template<class InputIterator>
           void insert(iterator p, InputIterator first, InputIterator last);

You break surface at a small air pocket and gasp. Don't give up --
we're halfway there! Another deep breath, and:

         basic_string& erase(size_type pos = 0, size_type n = npos);
         iterator erase(iterator position);
         iterator erase(iterator first, iterator last);

         basic_string& replace(size_type pos1, size_type n1,
                               const basic_string& str);
         basic_string& replace(size_type pos1, size_type n1,
                               const basic_string& str,
                               size_type pos2, size_type n2);
         basic_string& replace(size_type pos, size_type n1, const charT* s,
                               size_type n2);
         basic_string& replace(size_type pos, size_type n1, const charT* s);
         basic_string& replace(size_type pos, size_type n1, size_type n2,
                               charT c);

         basic_string& replace(iterator i1, iterator i2,
                               const basic_string& str);
         basic_string& replace(iterator i1, iterator i2, const charT* s,
                               size_type n);
         basic_string& replace(iterator i1, iterator i2, const charT* s);
         basic_string& replace(iterator i1, iterator i2,
                               size_type n, charT c);
         template<class InputIterator>
           basic_string& replace(iterator i1, iterator i2,
                                 InputIterator j1, InputIterator j2);

         size_type copy(charT* s, size_type n, size_type pos = 0) const;
         void swap(basic_string<charT,traits,Allocator>&);

         const charT* c_str() const;         //  explicit
         const charT* data() const;
         allocator_type get_allocator() const;

         size_type find (const basic_string& str, size_type pos = 0) const;
         size_type find (const charT* s, size_type pos, size_type n) const;
         size_type find (const charT* s, size_type pos = 0) const;
         size_type find (charT c, size_type pos = 0) const;
         size_type rfind(const basic_string& str, size_type pos = npos)
const;
         size_type rfind(const charT* s, size_type pos, size_type n) const;
         size_type rfind(const charT* s, size_type pos = npos) const;
         size_type rfind(charT c, size_type pos = npos) const;

         size_type find_first_of(const basic_string& str,
                                 size_type pos = 0) const;
         size_type find_first_of(const charT* s,
                                 size_type pos, size_type n) const;
         size_type find_first_of(const charT* s, size_type pos = 0) const;
         size_type find_first_of(charT c, size_type pos = 0) const;
         size_type find_last_of (const basic_string& str,
                                 size_type pos = npos) const;
         size_type find_last_of (const charT* s,
                                 size_type pos, size_type n) const;
         size_type find_last_of (const charT* s, size_type pos = npos)
const;
         size_type find_last_of (charT c, size_type pos = npos) const;

         size_type find_first_not_of(const basic_string& str,
                                     size_type pos = 0) const;
         size_type find_first_not_of(const charT* s, size_type pos,
                                     size_type n) const;
         size_type find_first_not_of(const charT* s, size_type pos = 0)
const;
         size_type find_first_not_of(charT c, size_type pos = 0) const;
         size_type find_last_not_of (const basic_string& str,
                                     size_type pos = npos) const;
         size_type find_last_not_of (const charT* s, size_type pos,
                                     size_type n) const;
         size_type find_last_not_of (const charT* s,
                                     size_type pos = npos) const;
         size_type find_last_not_of (charT c, size_type pos = npos) const;

         basic_string substr(size_type pos = 0, size_type n = npos) const;
         int compare(const basic_string& str) const;
         int compare(size_type pos1, size_type n1,
                     const basic_string& str) const;
         int compare(size_type pos1, size_type n1,
                     const basic_string& str,
                     size_type pos2, size_type n2) const;
         int compare(const charT* s) const;
         int compare(size_type pos1, size_type n1,
                     const charT* s, size_type n2 = npos) const;
       };
     }

Whew -- 103 member functions or member function templates!  The rocky
tunnel expands and we break through a new lake's surface just in time.
Somewhat waterlogged but better persons for the experience, we look
around inside the new cavern to see if the exercise has let us
discover anything interesting.

Was the recitation worth it?  With the drudge work now behind us,
let's look back at what we've just traversed and use the information
to analyze Question 3:


Membership Has Its Rewards -- and Its Costs
-------------------------------------------

 >3. Which ones should be members, and which should not? Why?

Recall the advice from #1:  Where it's practical, break generic
components down into pieces.

    Guideline:  Prefer "one class (or function), one responsibility."

    Guideline:  Where possible, prefer writing functions as nonmember
                nonfriends.

For example, if you write a string class and make searching,
pattern-matching, and tokenizing available as member functions, you've
hardwired those facilities so that they can't be used with any other
kind of sequence.  (If this frank preamble is giving you an
uncomfortable feeling about basic_string, well and good.)  On the
other hand, a facility that accomplishes the same goal but is composed
of several parts that are also independently usable is often a better
design.  In this example, it's often best to separate the algorithm
from the container, which is what the STL does most of the time.

I [2,3] and Scott Meyers [4] have written before on why some nonmember
functions are a legitimate part of a type's interface, and why
nonmember nonfriends should be preferred (among other reasons, to
improve encapsulation).  For example, as Scott wrote in his opening
words of the cited article:

    I'll start with the punchline:  If you're writing a function that
    can be implemented as either a member or as a non-friend non-member,
    you should prefer to implement it as a non-member function.  That
    decision increases class encapsulation.  When you think
    encapsulation, you should think non-member functions.  [4]

So when we consider the functions that will operate on a basic_string
(or any other class type), we want to make them nonmember nonfriends
if reasonably possible.  Hence, here are some questions to ask about
the members of basic_string in particular:

    -- Always make it a member if it has to be one:  Which operations
       must be members, because the C++ language just says so (e.g.,
       constructors) or because of functional reasons (e.g., they must
       be virtual)?  If they have to be, then oh well, they just have
       to be; case closed.

    -- Prefer to make it a member if it needs access to internals:
       Which operations need access to internal data we would otherwise
       have to grant via friendship?  These should normally be members.
       (There are some rare exceptions such as operations needing
       conversions on their left-hand arguments and some like
       operator<<() whose signatures don't allow the *this reference to
       be their first parameters; even these can normally be nonfriends
       implemented in terms of (possibly virtual) members, but
       sometimes doing that is merely an exercise in contortionism and
       they're best and naturally expressed as friends.)

    -- In all other cases, prefer to make it a nonmember nonfriend:
       Which operations can work well as nonmember nonfriends?  These
       can, and therefore normally should, be nonmembers.  This should
       be the default case to strive for.

It's important to ask these and related questions because we want to
put as many functions into the last bucket as possible.


Operations That Must Be Members
-------------------------------

As a first pass, let's sift out those operations that just have to be
members.  There are some obvious ones at the beginning of the list.

    constructors (6)
    destructor
    assignment operators (3)
    [] operators (2)

Clearly the above functions must be members.  It's impossible to write
a constructor, destructor, assignment operator, or [] operator in any
other way!

12 functions down, 91 to go...


Operations That Should Be Members
---------------------------------

We asked:  Which operations need access to internal data we would
otherwise have to grant via friendship?  Clearly these have a reason
to be members, and normally ought to be.  This list includes the
following, some of which provide indirect access to (e.g., begin()) or
change (e.g., reserve()) the internal state of the string:

    begin (2)
    end (2)
    rbegin (2)
    rend (2)
    size
    max_size
    capacity
    reserve
    swap
    c_str
    data
    get_allocator

The above ought to be members not only because they're tightly bound
to basic_string, but they also happen to form the public interface
that nonmember nonfriend functions will need to use.  Sure, you could
implement these as nonmember friends, but why?

There are a few more that I'm going to add to this list as fundamental
string operations:

    insert (three-parameter version)
    erase (1 -- the "iter, iter" version)
    replace (2 -- the "iter, iter, num, char" and templated versions)

We'll return to the question of insert(), erase(), and replace() a
little later.  For replace() in particular, it's important to be able
to choose well and make the most flexible and fundamental version(s)
into member(s).


Into the Fray: Possibly-Contentious Operations
                That Can Be Nonmember Nonfriends
-----------------------------------------------

First in this section, allow me to perform a stunning impersonation of
a lightning rod by pointing out that all of the following functions
have something fundamental in common, to wit:  Each one could
obviously as easily -- and as efficiently -- be a nonmember nonfriend.

    at (2)
    clear
    empty
    length

Sure they can be nonmember nonfriends, that's obvious, no sweat.  Of
course, the above functions also happen to have something else pretty
fundamental in common:  They're mentioned in the standard library's
container and sequence requirements, as member functions.  Hence the
lightning-rod effect...

"Yeah, wait!"  I can already hear some standardiste-minded people
saying, heading in my direction and resembling the beginnings of a
brewing lynch mob.  "Not so fast!  Don't you know that basic_string is
designed to meet the C++ Standard's container and sequence
requirements, and those requirements require or suggest that some of
those functions be members?  So quit misleading the readers!  They're
members -- live with it!"  Indeed, and true, but for the sake of this
discussion I'm going to waive those requirements with a dismissive
wave of a hand and a quick escape across some back yards past small
barking dogs and various clothes drying on the line.

Having left my pursuers far enough behind to resume reasoned
discourse, here's the point:  For once, the question I'm considering
here isn't what the container requirements say.  It's which functions
can without loss of efficiency be made nonmember nonfriends, and
whether there's any additional benefit to be gained from doing so.  If
those benefits exist for something the container requirements say must
be a member, well, why not point out that the container requirements
could be improved while we're at it?  And so we shall...

Take empty() as a case in point.  Can we implement it as a nonmember
nonfriend?  Sure... the standard itself requires the following
behavior of basic_string::empty(), in the C++ Standard, subclause
21.3.3, paragraph 14:

    Returns: size() == 0.

Well, now, that's pretty easy to write as a nonmember without loss of
efficiency:

    template<class charT, class traits, class Allocator>
    bool empty( const basic_string<charT, traits, Allocator>& s )
    {
      return s.size() == 0;
    }

Notice that, while we can make size() a member and implement a
nonmember empty() in terms of it, we could not do the reverse.  In
several cases here there's a group of related functions, and perhaps
more than one could be nominated as a member and the others
implemented in terms of that one as nonmembers.  Which function should
we nominate to be the member?  My advice is to choose the most
flexible one that doesn't force a loss of efficiency -- that will be
the enabling flexible foundation on which the others can be built.  In
this case, we choose size() as the member because its result can
always be cached (indeed, the standard encourages that it be cached
because size() "should" run in constant time), in which case an
empty() implemented only in terms of size() is no less efficient than
anything we could do with full direct access to the string's internal
data structures.

For another case in point, what about at()?  The same reasoning
applies.  For both the const and non-const versions, the standard
requires the following:

    Throws: out_of_range if pos >= size().

    Returns: operator[]( pos ).

That's easy to provide as a nonmember, too.  Each is just a two-line
function template, albeit a bit syntactically tedious because of all
those template parameters and nested type names -- and remember all
your typenames!  Here's the code:

    template<class charT, class traits, class Allocator>
    typename basic_string<charT, traits, Allocator>::const_reference
    at( const basic_string<charT, traits, Allocator>& s,
        typename basic_string<charT, traits, Allocator>::size_type pos )
    {
      if( pos >= s.size() ) throw out_of_range( "don't do that" );
      return s[pos];
    }

    template<class charT, class traits, class Allocator>
    typename basic_string<charT, traits, Allocator>::reference
    at( basic_string<charT, traits, Allocator>& s,
        typename basic_string<charT, traits, Allocator>::size_type pos )
    {
      if( pos >= s.size() ) throw out_of_range( "I said, don't do that" );
      return s[pos];
    }

What about clear()?  Easy, that's the same as erase(begin(),end()).
No fuss, no muss.  Exercise for the reader, and all that.

What about length()?  Easy, again -- it's defined to give the same
result as size().  What's more, note that the other containers don't
have length(), and it's there in the basic_string interface as a sort
of "string thing", but by making it a nonmember suddenly we can
consisently say "length()" about any container.  Not too useful in
this case because it's just a synonym for size(), I grant you, but a
noteworthy point in the principle it illustrates -- making algorithms
nonmembers immediately also makes them more widely useful and usable.

In summary, let's consider the benefits and drawbacks of providing
functions like at() and empty() as members vs. nonmembers.  First,
just because we can write these members as nonmembers (and nonfriends)
without loss of efficiency, why should we do so?  What are the actual
or potential benefits?  Here are several:

    -- Simplicity.  Making them nonmembers lets us write and maintain
       less code.  We can write empty() just once and be done with it
       forever.  Why write it many times as basic_string::empty() and
       vector::empty() and list::empty() and so forth, including
       writing them over and over again for each new STL-compliant
       container that comes along in third-party or internal libraries
       and even in future versions of the C++ standard?  Write it once,
       be done, move on.

    -- Consistency.  It avoids gratuitous incompatibilities between the
       member algorithms of different containers, and between the
       member and nonmember versions of algorithms (some existing
       inconsistencies between members and nonmembers are pointed out
       in [5]).  If customized behavior is needed, then specialization
       or overloading of the nonmember function templates should be
       able to accommodate it.

    -- Encapsulation.  It improves encapsulation (as argued by Meyers
       strongly and at length in [4]).

Having said that, what are the potential drawbacks?  Here are two,
although in my opinion they are outweighed by the above advantages:

    -- Consistency.  You could argue that keeping things like empty()
       as members follows the principle of least surprise -- similar
       functions are members, and in other class libraries things like
       IsEmpty() functions are commonly members, after all.  I think
       this argument is valid, but weakened when we notice that this
       wouldn't be surprising at all if people were in the habit of
       following Meyers' advice in the first place, routinely writing
       functions as nonmembers whenever reasonably possible.  So the
       question really comes down to whether we ought to trade away
       real benefits in order to follow a tradition, or to change the
       tradition to get real benefits.  (Of course, if we didn't
       already have size(), then implementing empty() in particular as
       a nonmember nonfriend would not be possible.  The class's public
       member functions do have to provide the necessary and sufficient
       functionality already.)  I think this argument is weak because
       writing them as nonmembers yields a greater consistency, as
       noted above, than the questionable inconsistency being claimed
       here.

    -- Namespace pollution.  Because empty() is such a common name,
       putting it at namespace scope risks namespace pollution -- after
       all, will every function named empty() want to have exactly
       these semantics?  This argument is also valid to a point, but
       weakened in two ways:  First, by noting that encouraging
       consistent semantics for functions is a Good Thing; and second,
       by noticing that overload resolution has turned out to be a very
       good tool for disambiguation, and namespace pollution has never
       been as big a problem as some have made it out to be in the
       past.  Really, by putting all the names in one place and sharing
       implementations, we're not polluting that one namespace as much
       as we're sanitizing the functions by gathering them up and
       helping to avoid the gratuitous and needless incompatibilities
       that Meyers mentions (see above).  With perhaps these more
       contentious choices out of the way, then, let's continue with
       other operations that can be nonmember nonfriends.  Some, like
       those listed above, are mentioned in the container requirements
       as members; again, here we're considering not what the standard
       says we must do, but rather what given a blank page we might
       choose to do in designing these as members vs. nonmember
       nonfriends.


More Operations That Can Be Nonmember Nonfriends
------------------------------------------------

Further, all of the remaining functions can be implemented as
nonmember nonfriends:

    resize (2)
    assign (6)
    += (3)
    append (6)
    push_back
    insert (7 -- all but the three-parameter version)
    erase (2 -- all but the "iter, iter" version)
    replace (8 -- all but the "iter, iter, num, char" and templated
versions)
    copy
    substr
    compare (5)
    find (4)
    rfind (4)
    find_first_of (4)
    first_last_of (4)
    find_first_not_of (4)
    find_last_not_of (4)

Consider first resize():

        void resize(size_type n, charT c);
        void resize(size_type n);

Can each resize() be a nonmember nonfriend?  Sure it can, because it
can be implemented in terms of basic_string's public interface without
loss of efficiency.  Indeed, the standard's own functional
specifications express both versions in terms of the functions we've
already considered above.  For example:

    template<class charT, class traits, class Allocator>
    void resize( basic_string<charT, traits, Allocator>& s,
                 typename Allocator::size_type n, charT c)
    {
      if( n > s.max_size() ) throw length_error( "won't fit" );
      if( n <= s.size() )
      {
        basic_string<charT, traits, Allocator> temp( s, 0, n );
        s.swap( temp );
      }
      else
      {
        s.append( n - s.size(), c );
      }
    }

    template<class charT, class traits, class Allocator>
    void resize( basic_string<charT, traits, Allocator>& s,
                 typename Allocator::size_type n )
    {
      resize( s, n, charT() );
    }

Next, assign():  We have six, count 'em, six forms of assign().
Fortunately, this case is simple:  most of them are already specified
in terms of each other, and all can be implemented in terms of a
constructor and operator=() combination.

Next, operator+=(), append(), and push_back():  What about all those
pesky flavors of appending operations, namely the three forms of
operator+=(), the six-count'em-six forms of append(), and the lone
push_back()?  Just the similarity ought to alert us to the fact that
probably at most one needs to be a member; after all, they're doing
about the same thing, even if the details differ slightly, for example
appending a character in one case, a string in another case, and an
iterator range in still another case.  Indeed, as it turns out, all of
them can likewise be implemented as nonmember nonfriends without loss
of efficiency:

    -- Clearly operator+=() can be implemented in terms of append(),
       because that's how it's specified in the C++ standard.

    -- Equally clearly, five of the six versions of append() can be
       nonmember nonfriends because they are specified in terms of the
       three-parameter version of append(), and that in turn can be
       implemented in terms of insert(), all of which quite closes the
       append() category.

    -- Determining the status of push_back() takes only slightly more
       work, because its operational semantics aren't specified in the
       basic_string section, but only in the container requirements
       section, and there we find the specification that a.push_back(x)
       is just a synonym for a.insert(a.end(),x).

What's the linchpin holding all the members of this group together as
valid nonmember nonfriends?  It's insert(), hence insert()'s status as
a good choice to be the member that does the work and encapsulates the
append-related access to the string's internals in one place, instead
of spreading the internal access all over the place in a dozen
different functions.

Next, insert():  For those of you who might think that
six-count'em-six versions of assign() and six-count'em-six versions of
append() might have been a little much, those were just the warmup...
now we consider the eight-count'em-eight versions of insert().  Above,
I've already nominated the three-parameter version of insert() as a
member, and now it's time to justify why.  First, as noted above,
insert() is a more general operation than append(), and having a
member insert() allowed all the append() operations to be nonmember
nonfriends; if we didn't have at least one insert() as a member, then
at least one of the append()s would have had to be a member, and so I
chose to nominate the more fundamental and flexible operation.  But we
have eight-count'em-eight flavors of insert() -- which one (or more)
ought to be the member(s)?  Five of the eight forms of insert() are
already specified in terms of the three-parameter form, and the others
can also be implemented efficiently in terms of the three-parameter
form, so we only need that one form to be a member.  The others can
all be nonmember nonfriends.

For those of you who might think that the either-count'em-eight
versions of insert() take the cake, well, that was warmup too.  In a
moment we'll consider the ten-count'em-ten forms of replace().  Before
we attempt those, though, let's take a short break to tackle an easier
function first, because it turns out that erase() is instructive in
building up to dealing with replace()...


Coffee Break (Sort Of): Erasing erase()
---------------------------------------

Next erase():  After talking about the total 30-count'em-30 flavors of
assign(), append(), insert(), and replace() -- and having dealt with
20 of the 30 already above -- you will be relieved to know that there
are only three forms of erase(), and that only two of them belong in
this section.  After what we just went through for the others, this is
like knocking off for a well-deserved coffee break...

The troika of erase() members is a little interesting.  At least one
of these erase() functions must be a member (or friend), there being
no other way to erase efficiently using the other already-mentioned
member functions alone.  There are actually two "families" of erase()
functions:

    erase( pos, length )
        basic_string& erase(size_type pos = 0, size_type n = npos);

    erase( iter, ... )
        iterator erase(iterator position);
        iterator erase(iterator first, iterator last);

First, notice that the two families' return types are not consistent:
the first version returns a reference to the string, whereas the other
two return iterators pointing immediately after the erased character
or range.  Second, notice that the two families' argument types are
not consistent:  the first version takes an offset and length, whereas
the other two take an iterator or iterator range; fortunately, we can
convert from iterators to offsets via pos = iter - begin() and from
offsets to iterators via iter = begin() + pos.

(Aside:  The standard does not require, but an implementer can choose,
that basic_string objects store their data in a contiguous charT array
buffer in memory.  If they do, then the conversion from iterators to
positional offsets and vice versa clearly need incur no overhead.  (I
would argue that even segmented storage schemes could provide for very
efficient conversion back and forth between offsets and iterators
using only the container's and iterator's public interfaces.)

This allows the first two forms to be expressed in terms of the third
(again, remember your typenames and qualifications!):

    template<class charT, class traits, class Allocator>
    basic_string<charT, traits, Allocator>&
    erase( basic_string<charT, traits, Allocator>& s,
           typename Allocator::size_type pos = 0,
           typename Allocator::size_type n =
             basic_string<charT, traits, Allocator>::npos )
    {
      if( pos > s.size() ) throw out_of_range( "yes, we have no bananas" );
      typename basic_string<charT, traits, Allocator>::iterator
        first = s.begin()+pos,
        last = n == basic_string<charT, traits, Allocator>::npos
               ? s.end() : first + n;
      s.erase( first, last );
      return s;
    }

    template<class charT, class traits, class Allocator>
    typename basic_string<charT, traits, Allocator>::iterator
    erase( basic_string<charT, traits, Allocator>& s,
           typename basic_string<charT, traits, Allocator>::iterator
position )
    {
      return s.erase( position, s.end() );
    }

OK, coffee break's over...


Back to Work: Replacing replace()
---------------------------------

Next, replace():  Truth be told, the ten-count'em-ten replace()
members are less interesting than they are tedious and exhausting.

At least one of these replace() functions must be a member (or
friend), there being no other way to replace efficiently using the
other already-mentioned member functions alone.  In particular, note
that you can't efficiently implement replace() in terms erase()
followed by insert() or vice versa because both ways would require
more character shuffling, and possibly buffer reallocation.

Note again that we have two families of replace() functions:

    replace( pos, length, ... )
        basic_string& replace(size_type pos1, size_type n1,
// #1
      const basic_string& str);
        basic_string& replace(size_type pos1, size_type n1,
// #2
      const basic_string& str,
      size_type pos2, size_type n2);
        basic_string& replace(size_type pos, size_type n1, const charT* s,
// #3
      size_type n2);
        basic_string& replace(size_type pos, size_type n1, const charT* s);
// #4
        basic_string& replace(size_type pos, size_type n1, size_type n2,
// #5
      charT c);

    replace( iter, iter, ... )
        basic_string& replace(iterator i1, iterator i2,
// #6
      const basic_string& str);
        basic_string& replace(iterator i1, iterator i2, const charT* s,
// #7
      size_type n);
        basic_string& replace(iterator i1, iterator i2, const charT* s);
// #8
        basic_string& replace(iterator i1, iterator i2,
// #9
      size_type n, charT c);
        template<class InputIterator>
// #10
          basic_string& replace(iterator i1, iterator i2,
        InputIterator j1, InputIterator j2);

This time, the two families' return types are consistent; that's a
small pleasure.  But, as with erase(), the two families' argument
types are not consistent:  one family is based on an offset and
length, whereas the other is based on an iterator range.  As with
erase(), because we can convert between iterators and positions, we
can easily implement one family in terms of the other.

When considering which must be members, we want to choose the most
flexible and fundamental version(s) as members and implement the rest
in terms of those.  Here are a few pitfalls one might encounter while
doing this analysis, and how one might avoid them.  Consider first the
first family:

    -- One function (#2)?  One might notice that the standard specifies
       all of the first family in terms of the #2 version.
       Unfortunately, some of the passthroughs would needlessly
       construct temporary basic_string objects, so we can't get by
       with #2 alone even for just the first family.  The standard
       specifies the observable behavior, but the operational
       specification isn't necessarily the best way to actually
       implement the a given function.

    -- Two functions (#3 and #5)?  One might notice that all but #5 in
       the first family can be implemented efficiently in terms of #3,
       but then #5 would still need to be special-cased to avoid
       needlessly creating a temporary string object (or its
       equivalent).

Consider second the second family:

    -- One function (#6)?  One might notice that the standard specifies
       all of the second family in terms of #6.  Unfortunately, again,
       some of the passthroughs would needlessly construct temporary
       basic_string objects, so we can't get by with #6 alone even for
       just the second family.

    -- Three functions (#7, #9, #10)?  One might notice that most of
       the functions in the second family can be implemented
       efficiently in terms of #7, except for #9 (for the same reason
       that made #5 the outcast in the first family, namely that there
       was no existing buffer with the correct contents to point to)
       and #10 (which cannot assume that iterators are pointers, or
       even for that matter basic_string::iterators!).

    -- Two functions (#9, #10)!  One might then immediately notice that
       all but #9 in the second family can be implemented efficiently
       in terms of #10, including #7.  In fact, assuming string
       contiguity and position/iterator convertability as we've already
       assumed, we could probably even handle all the members of the
       first family... aha!  That's it.

So it appears that the best we can do is two member functions upon
which we can then build everything else as nonmember nonfriends.  The
members are the "iter, iter, num, char" and templated versions.  The
nonmembers are everything else.  (Exercise for the reader:  For each
of the other eight versions, write sample implementations as efficient
nonmember nonfriends.)

Note that #10 well illustrates the power of templates -- this one
function can be used to implement all but two of the others without
any loss of efficiency, and to implement the remaining ones with what
would probably be only a minor loss of efficiency (constructing a
temporary basic_string containing n copies of a character).

Time for another quick coffee break...


Coffee Break #2: Copying copy() and substr()
--------------------------------------------

Oh, copy(), schmopy().  Note that copy() is a somewhat unusual beast,
and that its interface is inconsistent with the std::copy() algorithm.
Note again the signature:

        size_type copy(charT* s, size_type n, size_type pos = 0) const;

The function is const; it does not modify the string.  Rather, what
the string object has to do is copy part of itself (up to n
characters, starting at position pos) and dump it into the target
buffer (note, I deliberately did not say "C-style string"), s, which
is required to be big enough -- if it's not, oh well, then the program
will scribble onward into whatever memory happens to follow the string
and get stuck somewhere in the Undefined Behavior swamp.  And, better
still, basic_string::copy() does not, repeat not, append a null object
to the target buffer, which is what makes s not a C-style string
(besides, charT doesn't need to be char; this function will copy into
a buffer of whatever kind of characters the string itself is made of).
It is also what makes copy() a dangerous function.

    Guideline:  Never use functions that write to range-unchecked
       buffers (e.g., strcpy(), sprintf(), basic_string::copy()).  They
       are not only crashes waiting to happen, but a clear and present
       security hazard -- buffer overrun attacks continue to be a
       perennially popular pastime for hackers and malware writers.

All of the required work could be done pretty much as simply, and a
lot more flexibly, just by using plain old std::copy():

    string s = "0123456789";

    char* buf1 = new char[5];
    s.copy( buf1, 0, 5 );                 // ok: buf will contain the chars
                                          //     '0', '1', '2', '3', '4'
    copy( s.begin(), s.begin()+5, buf1 ); // ok: buf will contain the chars
                                          //     '0', '1', '2', '3', '4'

    int* buf2 = new int[5];
    s.copy( buf2, 0, 5 );                 // error: first parameter is not
char*
    copy( s.begin(), s.begin()+5, buf2 ); // ok: buf2 will contain the
values
                                          //     corresponding to '0', '1',
'2',
                                          //     '3', '4' (e.g., ASCII
values)

Incidentally, the above code also shows how basic_string::copy() can
trivially be written as a nonmember nonfriend, most trivially in terms
of the copy() algorithm -- another exercise for the reader, but do
remember to handle the n == npos special case correctly.

While we're taking a breather, let's knock off another simple one at
the same time:  substr().  Recall its signature:

        basic_string substr(size_type pos = 0, size_type n = npos) const;

We'll neglect for the moment to comment on the irksome fact that this
function chooses to take its parameters in the order "position,
length", whereas the copy() we just considered takes the very same
parameters in the order "length, position".  Nor will we mention that,
besides being aesthetically inconsistent, trying to remember which
function takes the parameters in which order makes for an easy trap
for users of basic_string to stumble into because both parameters also
happen to be of the same type (size_type), worse luck, so that if the
users get it wrong their code will continue to happily compile without
any err