Operators for Large Data Structures in C++ (Revisited)

Let’s say you’ve got a non-trivial data structure; a bit map, a 4x4 matrix, or a fixed array of some sort. You want to be able to treat it as though it were a simple value and provide arithmetic operator overloading, but without tearing your hair out figuring how to avoid passing by value.

Looking back

Francis Glassborow, decrying textbook methods, provided one way in EXE April 94, and in belated response I wrote a letter (EXE September 94) defending my recollection of the textbook version. While reflecting that perhaps ‘elegance’ is a word programmers should use at their peril, I decided that I might as well turn a hint within my letter into an article, describing how to take the ‘textbook’ method further. But first, a few points about the example I supplied with my original letter.

1) The following line of code would break my assignment operator. However, my habit of resetting freed pointers proves to be a wise one, as the code, in taking a reference through a null pointer, should always cause a trappable run-time error, rather than usually appearing to work (only sometimes crashing).

CSimple a; a=a;

This is because my assignment operator deletes storage of the LHS before copying the RHS. When they refer to the same storage then the RHS is effectively invalid before it can be copied. The moral of this is to copy before delete.

Incidentally, in referring to one of my textbooks, you may understand why I made this mistake. Stanley B. Lippman on page 254 of his book, C++ Primer, also deletes the LHS before copying the RHS, in his ‘String::operator=’ member function.

2) The assignment operator as I implemented it may not be what is desired. This is because new storage is created for the object’s new value, rather than copying the new value into the existing storage. This would tend to preclude the implementation of a CLarge& conversion operator (because the reference would not remain valid for very long).

When developing a simple container class, it is therefore important to consider whether the contained value must occupy the same storage for the lifetime of the container. If this is the case then you should use the contained classes’ assignment operator.

3) Some compilers optimise the passing of huge data structures by value anyway. They do this in a similar way, by passing by reference and only creating a new copy when an attempt to modify it is made (or its address is taken, say). This is one of the reasons why the volatile specifier is provided, to let the compiler know that a copy must always be immediately made, as the variable risks being changed by another process at any time.

But, we’re getting into that thorny dilemma of who’s responsible for efficiency - the programmer or compiler? My own thoughts are that the compiler should not be an excuse for clumsy code. The programmer should never need to know how the compiler turns high-level code into low-level code, and therefore should write according to implicit behaviour of the language’s manipulation of data structures. You never know when you may have to use a compiler (or assembler) that does not provide particular optimisations.

So, I wouldn’t be lazy and ignore the apparent code improvements that can be made, even if it provides negligible run-time performance benefits. The important thing is that a compiler should not concentrate on providing performance through optimising lousy source, but on efficient coding of efficient source.

4) There is a little typographical error in the example: p_pLarge should be p_Large. Well, I’d hate to disprove the rule that says no code is bug free. You’ll be glad to hear that I’ve actually compiled my examples in this article.

Further advances

In my letter I suggested “For the more advanced, you can introduce an instance counter to avoid replicating unchanged instances of the large object. Each copy increments the counter, each destruction decrements the counter, causing deletion on zero. Obviously any need to change the large object can only go ahead if it's the only instance, otherwise a copy must be made.”

Well, if you’ve already done your homework you can compare your efforts with mine, because I now present my solution, which should save all the armchair programmers out there a bit of effort.

Most large data structures, especially those of variable length, can only be feasibly handled by dynamic allocation. A general purpose string class, for example, screams out for this approach - any other would soon end in tears. Developing a string class is a classic exercise that every programmer should have attempted, because it automatically encourages the correct approach to similar problems such as developing a matrix class. Classes developed to contain such data structures tend to consist of pointers to the dynamically allocated storage, dimensional details and supplementary information. Typically, there is such overhead in manipulating the data that object passing overheads are not very significant.

If you’ve written a string class, it’s unlikely you’d have any difficulty in developing a class to handle large data structures, but if you haven’t, or you’d like to check you wouldn’t have missed anything, read on.

In figures 1 and 2 we have the large class, appropriately named CLarge (only consisting of a single integer in order to reduce example code size). It would be most appropriate if this were a significantly complex, fixed length data structure. I am presuming that CLarge is an external class, or one that you do not have the source code to. Otherwise, you may find it easier to do things differently from me.

In figure 3 we have a test program that uses the CSimple class as a container of the CLarge object. Now, to avoid replicating unchanged instances of the CLarge object, I have developed the functionality of the CSimple class and introduced a new class CLargeInstance - see figures 4 and 5.

CLargeInstance is derived from CLarge to provide a reference counter. Now if CLarge were being developed from scratch you could do the derivation the other way around. You could even merge the two, which would save having to duplicate the constructors as has to be done in CLargeInstance.

CLargeInstance allows the tracking of multiple references to a single instance. The first time an instance is required, it is created, the second and subsequent requirements refer to the original. However, the management of the references is performed by the CSimple class. CLargeInstance simply provides member functions to access the reference counter. These are: Reference(), which increments the counter; Dereference(), which decrements the counter, returning True upon zero; and References(), which returns the value of the counter. There is a check in the destructor that the object has no references. Note that the constructors initialise the reference count to zero, this is in case the compiler wishes to make its own instantiation for any purpose. Any time we create a CLargeInstance we will always be taking a reference, but you just never know when your over zealous compiler will create one as a temporary. I know some of you may think that the constructors should therefore be non-public and CSimple made a friend, but I have found such an approach too much hassle.

CSimple is where the real fun starts. It’s listed in figures 6 and 7. Again, you have to duplicate the constructors of the original CLarge class, though if none of your code needs to handle CLarge objects, then you don’t need to provide the respective constructors or conversions.

In the constructors and other member functions, every reference to a CLargeInstance object, whether an existing instance or a new one, uses the Refer() member function. Refer() takes a reference of the supplied instance, which replaces the current reference, which is then dereferenced and the instance deleted if necessary. Note that I have taken heed of my own lesson that I pointed out earlier, that you take a reference before you dereference, just in case they are to the same object. Otherwise, you might delete what you were about to refer to.

I have been fairly liberal with the asserts. This is because things can get rather hairy, and you don’t want memory leaks or references to freed storage. It’s as well to check that every creation of a CLarge gets destroyed eventually (by putting diagnostic messages in the constructors and destructors).

The next member function worthy of note is the assignment operator. Since it is no longer possible to guarantee that the same storage is used for the CLarge object for the lifetime of the CSimple object, as was the case in my letter’s example, there is little point in using the CLarge assignment operator. This is unless the CLarge assignment involves more than just the duplication of the value.

All modifying operators should first call Modify() to ensure that a new instance is created if the current one has more than one reference. The LI() member function simplifies the replication of the CLarge operators. Note that it is probably most sensible if the operators that create a new instance are implemented in the CSimple class, and the operators that modify an instance are implemented in the CLarge class and duplicated in the CSimple class. Thus binary addition is performed by CSimple, by using the reflexive addition of CSimple (being a duplicate of the operation actually performed by CLarge).

If you do still need to convert between CSimple and CLarge, you must not (except with extreme caution) provide public references to CSimple’s CLarge object instance. Although it may appear unnecessarily inefficient, it is safest to convert CSimple into a CLarge value. If you use the LI() member function remember that it provides a non-const reference and that you should be careful of supplying it when calling any other non-local function. Modify() must be called if there’s any risk that the current instance could be changed. Indeed, it doesn’t hurt and may be wise to call Modify() before every modification, just in case another reference is taken between one modification and the next.

Note that references to CSimple objects are just as valid as with any other object.

Although, I used MS Visual C/C++ version 1.0 to compile the test program, you should find my code fairly easy on any compiler, the only possible problems being whether you have the same standard libraries: IOStream.h, Assert.h, and Limits.h. By the way, the output you get from the program should be “[2]+[2]=[4]”. You will naturally be interested in trying it with far more challenging expressions, and inserting traces in appropriate places.

As long as you understand how CSimple works, deriving from it should present no problems.

Again, I’ll leave something for ‘the more advanced’. It might be possible to reduce the number of instances created for holding intermediate values in a long calculation such as z=a+b+c+d. You could add an attribute to the CSimple class such that you would recognise when an intermediate object is used in a calculation and reuse its storage. Now such a technique converges to Francis Glassborow’s, and you’re bordering into unravelling expressions from within operators. If you must do such esoteric things then I’d recommend an operator queue, where rather than perform the operations, you queue them. When the value is required, the queue is processed, enabling expression optimisation and reuse of intermediate value storage. The disadvantage is that you increase the lifetime of the values involved in the expression, and incur an overhead of a queue within the container. The easiest thing is to ensure that your expressions are simple and reduce intermediate values. Thus you’d write “z=a; z+=b; z+=c; z+=d;”.

Figure 1

/* "LARGE.HPP": CLarge class, being a class containing large amounts of data */

#if !defined(LARGE_HPP)

#define LARGE_HPP

#include<IOStream.h>

class CLarge

{

private:

int m_nBig; /* This could be a 100x100 matrix of 'double', say */

public:

CLarge(int=0);

CLarge& operator+=(const CLarge&); /* Concentrate upon providing reflexive operators and implement binary ones in the CSimple class */

friend ostream& operator<<(ostream&,const CLarge&);

};

#endif

Figure 2

/* "LARGE.CPP": Definitions of members of CLarge */

#include "Large.hpp"

CLarge::CLarge(int p_n)

: m_nBig(p_n)

{

}

CLarge& CLarge::operator+=(const CLarge& p_Large)

{ m_nBig+=p_Large.m_nBig;

return *this;

}

ostream& operator<<(ostream& p_os,const CLarge& p_Large)

{ return p_os<<"["<<p_Large.m_nBig<<"]";

}

Figure 3

/* "BIGOP.CPP": Program to test Simple class */

#include <IOStream.h>

#include "Simple.hpp"

int main()

{ CSimple t_Sa(2);

CSimple t_Sb(t_Sa);

cout<<t_Sa<<"+"<<t_Sb<<"="<<t_Sa+t_Sb<<endl;

return 0;

}

Figure 4

/* "LARGEI.HPP": CLargeInstance class, being a derivation of CLarge that provides a reference counted instance */

#if !defined(LARGEI_HPP)

#define LARGEI_HPP

#include "Large.hpp"

class CLargeInstance: public CLarge

{

private:

unsigned long m_ulReferences; /* This allows up to ULONG_MAX references, typically 4 Billion */

public:

CLargeInstance(const CLargeInstance&);

CLargeInstance(int=0); /* Duplicate CLarge's constructors */

CLargeInstance(const CLarge&);

~CLargeInstance();

CLargeInstance* Reference();

int Dereference();

unsigned long References() const;

};

#endif

Figure 5

/* "LARGEI.CPP": Definitions of members of CLargeInstance */

#include "LargeI.hpp"

#include <Assert.h>

#include <Limits.h>

#include <IOStream.h>

CLargeInstance::CLargeInstance(const CLargeInstance& p_LI)

: CLarge(p_LI)

, m_ulReferences(0UL)

{

}

CLargeInstance::CLargeInstance(int p_n)

: CLarge(p_n)

, m_ulReferences(0UL)

{

}

CLargeInstance::CLargeInstance(const CLarge& p_Large)

: CLarge(p_Large)

, m_ulReferences(0UL)

{

}

CLargeInstance::~CLargeInstance()

{ assert(!m_ulReferences);

}

CLargeInstance* CLargeInstance::Reference()

{ assert(m_ulReferences<ULONG_MAX); /* Else overflow */

++m_ulReferences;

return this;

}

int CLargeInstance::Dereference()

{ assert(m_ulReferences>0); /* Else too many dereferences */

return !--m_ulReferences;

}

unsigned long CLargeInstance::References() const

{ assert(m_ulReferences>0); /* If 0, should have been deleted */

return m_ulReferences;

}

Figure 6

/* "SIMPLE.HPP": CSimple class provides simple container to be used for holding CLarge object values */

#if !defined(SIMPLE_HPP)

#define SIMPLE_HPP

#include <IOStream.h>

#include "Large.hpp"

#include "LargeI.hpp"

class CSimple

{

private:

CLargeInstance* m_pLI;

void Refer(CLargeInstance* =(CLargeInstance*)0);

protected:

CLargeInstance& LI() const; /* NB This may still provide non-const access to instance */

void Modify(); /* Call before modifying instance */

public:

CSimple(const CSimple&);

CSimple(int=0); /* Duplicate CLarge's two constructors */

CSimple(const CLarge&);

~CSimple();

CSimple& operator=(const CSimple&);

CSimple& operator+=(const CSimple&);

operator CLarge() const; /* Can't provide reference (const or not) as instance location can easily change */

friend ostream& operator<<(ostream&,const CSimple&);

};

extern CSimple operator+(const CSimple&,const CSimple&); /* Doesn't need friendship */

#endif

Figure 7

/* "SIMPLE.CPP": Definitions of members of CSimple */

#include "Simple.hpp"

#include <Assert.h>

void CSimple::Refer(CLargeInstance* p_pLI) /* Change current reference */

{ CLargeInstance* t_pLI=m_pLI; /* Copy pointer to current instance */

m_pLI=p_pLI?p_pLI->Reference():(CLargeInstance*)0; /* Make new reference to supplied instance (if any) */

if (t_pLI && t_pLI->Dereference()) /* If current instance, delete it if necessary */

delete t_pLI;

}

CLargeInstance& CSimple::LI() const /* Convenient access for implementing operators */

{ assert(m_pLI);

return *m_pLI;

}

void CSimple::Modify()

{ if (LI().References()>1UL) /* If not the only reference to current instance */

Refer(new CLargeInstance(LI())); /* Refer to new instance copy via copy constructor */

}

CSimple::CSimple(const CSimple& p_Simple)

: m_pLI((CLargeInstance*)0) /* No current reference */

{ assert(p_Simple.m_pLI);

Refer(p_Simple.m_pLI); /* Refer to existing instance via additional reference */

assert(m_pLI);

}

CSimple::CSimple(int p_n)

: m_pLI((CLargeInstance*)0) /* No current reference */

{ Refer(new CLargeInstance(p_n)); /* Refer to new instance via int constructor */

assert(m_pLI);

}

CSimple::CSimple(const CLarge& p_Large)

: m_pLI((CLargeInstance*)0) /* No current reference */

{ Refer(new CLargeInstance(p_Large)); /* Refer to new instance via copy constructor */

assert(m_pLI);

}

CSimple::~CSimple()

{ assert(m_pLI);

Refer(); /* Make no reference */

assert(!m_pLI);

}

CSimple& CSimple::operator=(const CSimple& p_Simple)

{ assert(m_pLI);

assert(p_Simple.m_pLI);

Refer(p_Simple.m_pLI); /* Derefer, and refer to existing instance via additional reference */

return *this; /* alternatively use: Modify(); LI()=p_Simple.LI(); */

}

CSimple& CSimple::operator+=(const CSimple& p_Simple)

{ Modify(); /* About to modify current instance */

LI()+=p_Simple.LI(); /* Add using CLarge's += operator */

return *this;

}

CSimple::operator CLarge() const

{ return CLarge(LI()); /* Create value copy */

}

ostream& operator<<(ostream& p_os,const CSimple& p_Simple)

{ return p_os<<p_Simple.LI();

}

CSimple operator+(const CSimple& p_SimpleLeft,const CSimple& p_SimpleRight)

{ CSimple t_SimpleSum(p_SimpleLeft); /* Construct temporary copy of simple left */

return t_SimpleSum+=p_SimpleRight; /* Add in simple right */

} /* Compiler optimises out unnecessary instances */