Archive

Posts Tagged ‘C++11’

Stack, Heap, Pool

September 14, 2015 23 comments

Update 10/7/15: Because this post lacked a context-setting introduction, I wrote a prequel for it after the fact. Please read the prequel before reading this post.

Three Ways

The figure below shows three ways of allocating memory from within a C++ application: stack, heap, custom written pool.

MemAllocation

 

Comparing Performance

The code below measures the performance of the three allocation methods. It computes and prints the time to allocate 1000, 1MB objects from: the stack, the C++ free store, and a custom-written memory pool (the implementation of Pool appears at the end of this post).

#include <chrono>
#include <iostream>
#include <array>
#include "Pool.h"

class BigObject {

public:

  //Each BigObject occupies 1MB of RAM
  static const int32_t NUM_BYTES{1000 *1024};

  void reset() {} //Required by the Pool class template

  void setFirstChar(char c) {byteArray[0] = c;}

private:

  std::array<char, NUM_BYTES> byteArray{};

};

//Compare the performance of three types of memory allocation:
//Stack, free store (heap), custom-written pool
int main() {

  using namespace std::chrono; //Reduce verbosity of subsequent code
  constexpr int32_t NUM_ALLOCS{1000};
  void processObj(BigObject* obj);

  //Time heap allocations
  auto ts = system_clock::now();
  for(int32_t i=0; i<NUM_ALLOCS; ++i) {
    BigObject* obj{new BigObject{}};
    //Do stuff.......
    processObj(obj);
  }
  auto te = system_clock::now();
  std::cout << "Heap Allocations took "
            << duration_cast<milliseconds>(te-ts).count()
            << " milliseconds"
            << std::endl;

  //Time stack allocations
  ts = system_clock::now();
  for(int32_t i=0; i<NUM_ALLOCS; ++i) {
    BigObject obj{};
    //Do stuff.......
    processObj(&obj);
  } //obj is auto-destroyed here
  te = system_clock::now();
  std::cout << "Stack Allocations took "
            << duration_cast<milliseconds>(te-ts).count()
            << " milliseconds"
            <<  std::endl;

  //Pool construction time
  ts = system_clock::now();
  Pool<BigObject, NUM_ALLOCS> objPool;
  te = system_clock::now();
  std::cout << "Pool construction took "
            << duration_cast<milliseconds>(te-ts).count()
            << " milliseconds"
            <<  std::endl;

  //Time pool allocations
  ts = system_clock::now();
  for(int32_t i=0; i<NUM_ALLOCS; ++i) {
    BigObject* poolObj{objPool.acquire()};
    //Do Stuff.......
    processObj(poolObj);
  }
  te = system_clock::now();
  std::cout << "Pool Allocations took "
            << duration_cast<milliseconds>(te-ts).count()
            << " milliseconds";

}

void processObj(BigObject* obj) {

  obj->setFirstChar('a');

}

The results for a typical program run (gcc 4.9.2, cygwin/Win10, compiler) are shown as follows:

DebugRelease

As expected, stack allocation is much faster than heap allocation (3X in this case) because there is no such thing as a “fragmented” stack. The interesting result is how much faster custom pool allocation is than either of the two “standard” out of the box methods. Since each object is 1MB in size, and all of the memory managed by the pool is allocated only once (during construction of the pool object), dynamically acquiring and releasing pointers to the pre-constructed pool objects during runtime is blazingly fast compared to creating a lumbering 1MB object every time the user code needs one.

After you peruse the Pool implementation that follows, I’d love to hear your comments on this post. I’d also appreciate it if you could try to replicate the results I got.

The Pool Class Template – std::array Implementation

Here is my implementation of the Pool class used in the above test code. Note that the Pool class does not use any dynamically resizable STL containers in its implementation. It uses the smarter C++ implementation of the plain ole statically allocated C array. Some safety-critical domains require that no dynamic memory allocation is performed post-initialization – all memory must be allocated either at compile time or upon program initialization.

//Pool.h
#ifndef POOL_H_
#define POOL_H_

#include <array>
#include <algorithm>
#include <memory>

// The OBJ template argument supplied by the user
// must have a publicly defined default ctor and a publicly defined
// OBJ::reset() member function that cleans the object innards so that it
// can be reused again (if needed).
template<typename OBJ, int32_t NUM_OBJS>
class Pool {

public:

  //RAII: Fill up the _available array with newly allocated objects.
  //Since all objects are available for the user to "acquire"
  //on initialization, fill up the _inUse array will nullptrs.
  Pool(){

    //Fill the _available array with default-constructed objects
    std::for_each(std::begin(_available), std::end(_available),
        [](OBJ*& mem){mem = new OBJ{};});

    //Since no objects are currently in use, fill the _inUse array 
    //with nullptrs
    std::for_each(std::begin(_inUse), std::end(_inUse),
        [](OBJ*& mem){mem = nullptr;});

  }

  //Allows a user to acquire an object to manipulate as she wishes
  OBJ* acquire() {

    //Try to find an available object
    auto iter = std::find_if(std::begin(_available), std::end(_available),
        [](OBJ* mem){return (mem not_eq nullptr);});

    //Is the pool exhausted?
    if(iter == std::end(_available)) {

      return nullptr; //Replace this with a "throw" if you need to.

    }
    else {

      //Whoo hoo! An object is available for usage by the user
      OBJ* mem{*iter};

      //Mark it as in-use in the _available array
      *iter = nullptr;

      //Find a spot for it in the _inUse list
      iter = std::find(std::begin(_inUse), std::end(_inUse), nullptr);
  
      //Insert it into the _inUse list. We don't
      //have to check for iter == std::end(_inUse)
      //because we are fully in control!
      *iter = mem;

      //Finally, return the object to the user
      return mem;

    }

  }

  //Returns an object to the pool
  void release(OBJ* obj) {

    //Ensure that the object is indeed in-use. If
    //the user gives us an address that we don't own,
    //we'll have a leak and bad things can happen when we
    //delete the memory during Pool destruction.
    auto iter = std::find(std::begin(_inUse), std::end(_inUse), obj);

    //Simply return control back to the user
    //if we didn't find the object in our
    //in-use list
    if(iter == std::end(_inUse)) {

        return;

    }
    else {

      //Clear out the innards of the object being returned
      //so that the next user doesn't have to "remember" to do it.
      obj->reset();

      //temporarily hold onto the object while we mark it
      //as available in the _inUse array
       OBJ* mem{*iter};

       //Mark it!
       *iter = nullptr;

       //Find a spot for it in the _available list
       iter = std::find(std::begin(_available), std::end(_available), nullptr);

       //Insert the object's address into the _available array
       *iter = mem;
    }

  }

  ~Pool() {

    //Cleanup after ourself: RAII
    std::for_each(std::begin(_available), std::end(_available),
        std::default_delete<OBJ>());

    std::for_each(std::begin(_inUse), std::end(_inUse),
        std::default_delete<OBJ>());

  }

private:

  //Keeps track of all objects available to users
  std::array<OBJ*, NUM_OBJS> _available;

  //Keeps track of all objects in-use by users
  std::array<OBJ*, NUM_OBJS> _inUse;

};

#endif /* POOL_H_ */

The Pool Class Template – std::multiset Implementation

For grins, I also implemented the Pool class below using std::multiset to keep track of the available and in-use objects. For large numbers of objects, the acquire()/release() calls will most likely be faster than their peers in the std::array implementation, but the implementation violates the “no post-intialization dynamically allocated memory allowed” requirement – if you have one.

Since the implementation does perform dynamic memory allocation during runtime, it would probably make more sense to pass the size of the pool via the constructor rather than as a template argument. If so, then functionality can be added to allow users to resize the pool during runtime if needed.

//Pool.h
#ifndef POOL_H_
#define POOL_H_

#include <algorithm>
#include <set>
#include <memory>

// The OBJ template argument supplied by the user
// must have a publicly defined default ctor and a publicly defined
// OBJ::reset() member function that cleans the object innards so that it
// can be reused again (if needed).
template<typename OBJ, int32_t NUM_OBJS>
class Pool {

public:

  //RAII: Fill up the _available set with newly allocated objects.
  //Since all objects are available for the user to "acquire"
  //on initialization, fill up the _inUse set will nullptrs.
  Pool(){

    for(int32_t i=0; i<NUM_OBJS; ++i) {

      _available.insert(new OBJ{});
      _inUse.insert(nullptr);

    }

  }

  //Allows a user to acquire an object to manipulate as she wishes
  OBJ* acquire() {

    //Try to find an available object
    auto iter = std::find_if(std::begin(_available), std::end(_available),
        [](OBJ* mem){return (mem not_eq nullptr);});

    //Is the pool exhausted?
    if(iter == std::end(_available)) {

      return nullptr; //Replace this with a "throw" if you need to.

    }
    else {

      //Whoo hoo! An object is available for usage by the user
      OBJ* mem{*iter};

      //Mark it as in-use in the _available set
      _available.erase(iter);
      _available.insert(nullptr);

      //Find a spot for it in the _inUse set
      iter = std::find(std::begin(_inUse), std::end(_inUse), nullptr);

      //Insert it into the _inUse set. We don't
      //have to check for iter == std::end(_inUse)
      //because we are fully in control!
      _inUse.erase(iter);
      _inUse.insert(mem);

      //Finally, return the object to the user
      return mem;

    }

  }

  //Returns an object to the pool
  void release(OBJ* obj) {

    //Ensure that the object is indeed in-use. If
    //the user gives us an address that we don't own,
    //we'll have a leak and bad things can happen when we
    //delete the memory during Pool destruction.
    auto iter = std::find(std::begin(_inUse), std::end(_inUse), obj);

    //Simply return control back to the user
    //if we didn't find the object in our
    //in-use set
    if(iter == std::end(_inUse)) {

        return;

    }
    else {

      //Clear out the innards of the object being returned
      //so that the next user doesn't have to "remember" to do it.
      obj->reset();

      //temporarily hold onto the object while we mark it
      //as available in the _inUse array
       OBJ* mem{*iter};

       //Mark it!
       _inUse.erase(iter);
       _inUse.insert(nullptr);

       //Find a spot for it in the _available list
       iter = std::find(std::begin(_available), std::end(_available), nullptr);

       //Insert the object's address into the _available array
       _available.erase(iter);
       _available.insert(mem);
    }

  }

  ~Pool() {

    //Cleanup after ourself: RAII
    std::for_each(std::begin(_available), std::end(_available),
        std::default_delete<OBJ>());

    std::for_each(std::begin(_inUse), std::end(_inUse),
        std::default_delete<OBJ>());

  }

private:

  //Keeps track of all objects available to users
  std::multiset<OBJ*> _available;

  //Keeps track of all objects in-use by users
  std::multiset<OBJ*> _inUse;

};

#endif /* POOL_H_ */

Unit Testing

In case you were wondering, I used Phil Nash’s lightweight Catch unit testing framework to verify/validate the behavior of both Pool class implementations:

TEST_CASE( "Allocations/Deallocations", "[Pool]" ) {

    constexpr int32_t NUM_ALLOCS{2};
    Pool<BigObject, NUM_ALLOCS> objPool;

    BigObject* obj1{objPool.acquire()};
    REQUIRE(obj1 not_eq nullptr);

    BigObject* obj2 = objPool.acquire();
    REQUIRE(obj2 not_eq nullptr);

    objPool.release(obj1);
    BigObject* obj3 = objPool.acquire();
    REQUIRE(obj3 == obj1);

    BigObject* obj4 = objPool.acquire();
    REQUIRE(obj4 == nullptr);

    objPool.release(obj2);
    BigObject* obj5 = objPool.acquire();
    REQUIRE(obj5 == obj2);

}

==============================================
All tests passed (5 assertions in 1 test case)

C++11 Features

For fun, I listed the C++11 features I used in the source code:

    * auto
    * std::array<T, int>
    * std::begin(), std::end()
    * nullptr
    * lambda functions
    * Brace initialization { }
    * std::chrono::system_clock, std::chrono::duration_cast<T>
    * std::default_delete<T>

Did I miss any features/classes? How would the C++03 implementations stack up against these C++11 implementations in terms of readability/understandability?

Categories: C++11 Tags: ,

Easier To Use, And More Expressive

August 20, 2015 15 comments

One of the goals for each evolutionary increment in C++ is to decrease the probability of an average programmer from making mistakes by supplanting “old style” features/idioms with new, easier to use, and more expressive alternatives. The following code sample attempts to show an example of this evolution from C++98/03 to C++11 to C++14.

VecVecClear

In C++98/03, there were two ways of clearing out the set of inner vectors in the vector-of-vectors-of-doubles data structure encapsulated by MyClass. One could use a plain ole for-loop or the std::for_each() STL algorithm coupled with a remotely defined function object (ClearVecFunctor). I suspect that with the exception of clever language lawyers, blue collar programmers (like me) used the simple for-loop option because of its reduced verbosity and compactness of expression.

With the arrival of C++11 on the scene, two more options became available to programmers: the range-for loop, and the std::for_each() algorithm combined with an inline-defined lambda function. The range-for loop eliminated the chance of  “off-by-one” errors and the lambda function eliminated the inconvenience of having to write a remotely located functor class.

The ratification of the C++14 standard brought yet another convenient option to the table: the polymorphic lambda. By using auto in the lambda argument list, the programmer is relieved of the obligation to explicitly write out the full type name of the argument.

This example is just one of many evolutionary improvements incorporated into the language. Hopefully, C++17 will introduce many more.

Note: The code compiles with no warnings under gcc 4.9.2. However, as you can see in the image from the bug placed on line 41, the Eclipse CDT indexer has not caught up yet with the C++14 specification. Because auto is used in place of the explicit type name in the lambda argument list, the indexer cannot resolve the std::vector::clear() member function.

8/26/15 Update – As a result of reader code reviews provided in the comments section of this post, I’ve updated the code as follows:

innervec

8/29/15 Update – A Fix to the Line 27 begin-begin bug:

innervec

Categories: C++, C++11, C++14 Tags: , ,

The Move That Wasn’t

June 30, 2015 3 comments

While trying to duplicate the results I measured in my “Time To Get Moving!” post, an astute viewer posted this  comment:

move comment

For your convenience, I re-list the code in question here for your inspection:


#include <iostream>
#include <vector>
#include <utility>
#include <chrono>
using namespace std;

struct Msg{
  vector<double> vDoubs;
  Msg(const int NUM_ELS) : vDoubs(NUM_ELS){
  }
};

int main() {
//Construct a big Msg object!
const int NUM_ELS = 10000000;
Msg msg1{NUM_ELS};

//reduce subsequent code verbosity
using std::chrono::steady_clock;
using std::chrono::duration_cast;
using std::chrono::microseconds;

//Measure the performance of
//the "free" copy operations
auto tStart = steady_clock::now();
Msg msg2{msg1}; //copy ctor
msg1 = msg2; //copy assignment
auto tElapsed = steady_clock::now() - tStart;
cout << "Copy ops took "
     << duration_cast<microseconds>(tElapsed).count()
     << " microseconds\n";

//Measure the performance of
//the "free" move operations
tStart = steady_clock::now();
Msg msg3{std::move(msg1)}; //move ctor
msg1 = std::move(msg3); //move assignment
tElapsed = steady_clock::now() - tStart;
cout << "Move ops took "
     << duration_cast<microseconds>(tElapsed).count()
     << " microseconds\n";

cout << "Size of moved-from object = "
     << msg3.vDoubs.size();
} //"free" dtor is executed here for msg1, msg2, msg3

Sure enough, I duplicated what my friend Gyula discovered about the “move” behavior of the VS2013 C++ compiler:

VS2013 move

move response

Intrigued by the finding, I dove deeper into the anomaly and dug up these statements in the fourth edition of Bjarne Stroustrup’s “The C++ Programming Language“:

default ops

Since the Msg structure in the code listing does not have any copy or move operations manually defined within its definition, the compiler is required to generate them by default. However, since Bjarne doesn’t state that a compiler is required to execute moves when the programmer explicitly directs it to with std::move() expressions, maybe the compiler isn’t required to do so. However, common sense dictates that it should. Hopefully, the next update to the VS2013 compiler will do the right thing – like GCC (and Clang?) currently does.

Time To Get Moving!

June 15, 2015 8 comments

Prior to C++11, for every user-defined type we wrote in C++03, all standards-conforming C++ compilers gave us:

  • a “free” copy constructor
  • a “free” copy assignment operator
  • a “free” default constructor
  • a “free” destructor

The caveat is that we only got them for free if we didn’t manually override the compiler and write them ourselves. And unless we defined reference or pointer members inside of our type, we didn’t have to manually write them.

Starting from C++11 on, we not only get those operations for free for our user-defined types, we also get these turbo-boosters:

  • a “free” move constructor
  • a “free” move assignment operator

In addition, all of the C++ standard library containers have been “move enabled“.

When I first learned how move semantics worked and why this new core language feature dramatically improved program performance over copying, I started wondering about user-defined types that wrapped move-enabled, standard library types. For example,  check out this simple user-defined Msg structure that encapsulates a move-enabled std::vector.

MsgStruct

Logic would dictate that since I get “move” operations from the compiler for free with the Msg type as written, if I manually “moved” a Msg object in some application code, the compiler would “move” the vDoubs member under the covers along with it – for free.

Until now, I didn’t test out that deduction because I heard my bruh Herb Sutter say in a video talk that deep moves came for free with user-defined types as long as each class member in the hierarchical composition is also move-enabled. However, in a more recent video, I saw an excellent C++ teacher explicitly write a move constructor for a class similar to the Msg struct above:

ManualMoveCtor

D’oh! So now I was confused – and determined to figure out was was going on. Here is the program that I wrote to not only verify that manually written “move” operations are not required for the Msg struct, but to also measure the performance difference between moving and copying:

MoveCopyPerf

First, the program built cleanly as expected because the compiler provided the free “move” operations for the Msg struct. Second, the following, 5-run, output results proved that the compiler did indeed perform the deep, under the covers, “move” that my man Herb promised it would do. If the deep move wasn’t executed, there would have been no noticeable difference in performance between the move and copy operations.

copymoveperf

From the eye-popping performance difference shown in the results, we should conclude that it’s time to start replacing copy operations in our code with “move” operations wherever it makes sense. The only thing to watch out for when moving objects from one place to another is that in the scope of the code that performs the move, the internal state of the moved-from object is not needed or used by the code following the move. The following code snippet, which prints out 0, highlights this behavior.

postmovestate

Whence Actors?


While sketching out the following drawing, I was thinking about the migration from sequential to concurrent programming driven by the rise of multicore machines. Can you find anything wrong with it?

Concurrency Abstractions

Right out of the box, C++ provided Objects (via classes) and procedures (via free standing functions). With C++11, standardized support for threads and tasks finally arrived to the language in library form. I can’t wait for Actors to appear in….. ?

cpp actors

Categories: C++11, C++14, C++17 Tags: , , ,

Make Simple Tasks Simple

February 5, 2015 2 comments

In the slide deck that Bjarne Stroustrup presented for his “Make Simple Tasks SimpleCppCon keynote speech, he walked through a progression of refactorings that reduced an (arguably) difficult-to-read, multi-line, C++98 code segment (that searches a container for an element matching a specific pattern) into an expressive, C++11 one liner:

Simplify

Some people may think the time spent refactoring the original code was a pre-mature optimization and, thus, a waste of time. But that wasn’t the point. The intent of the example was to illustrate several features of C++11 that were specifically added to the language to allow sufficiently proficient programmers to “make simple tasks simple (to program)“.

Note: After the fact, I discovered that “MemVec” should be replaced everywhere with “v” (or vice versa), but I was too lazy to fix the pic :). Can you find any other bugs in the code?

Not Just Expert Friendly

October 9, 2014 Leave a comment

Check out this snippet from one of Bjarne Stroustrup’s CppCon keynote slides:

expert friendly

Now take a glance at one of Herb Sutter’s CppCon talk slides:

Complexity Anonymous

Bjarne’s talk was  titled “Make Simple Tasks Simple” and Herb’s talk was tiled “Back To The Basics!: Modern C++ Style“. Since I abhor unessential complexity, I absolutely love the fact that these two dedicated gentlemen are spearheading the effort to evolve C++ in two directions simultaneously: increasing both expert-friendliness AND novice-friendliness.

By counterbalancing the introduction of advanced features like variadic templates and forwarding references with simpler features like range-for loops, nullptr, and brace-initialization, I think Bjarne and Herb (and perhaps other community members I don’t know about) are marvelously succeeding at the monumental task of herding cats. To understand what I mean, take a look at another one of Herb’s slides:

cats in a dot

Do you see that teeny tiny dot at the end of the big arrow down on the lower right edge of the circle? Well, I don’t come anywhere close to qualifying for membership with the cats inside that dot… and I’d speculate that most advanced feature proposals and idiom ideas, whether they are understandable/teachable to mere mortals or not, originate from the really smart cats within that dot.

Unteachable

By gently but doggedly communicating the need for lowering the barriers to entry for potentially new C++ users while still navigating the language forward into unchartered waters, I’m grateful to Herb and Bjarne. Because of these men, the ISO C++ committee actually works – and it is indeed amazing for any committee to “work“.

%d bloggers like this: