ScribbleBook: 2015

Sunday, December 27, 2015

Concurrency and Multithreading

Concurrency is running code/ accomplishing tasks simultaneously within a process. This is achieved by threads. Prior to C++11, it was not supported by the language and needed platform specific API's. Thus, such code could not be ported to different platfoms.

One of the prominent improvement to the C++11 standard is the introduction of concurrency/ multi threading built into the language.

Before jumping into the language specific's, let us take this opportunity to see how threads are useful.

This cartoon dipicts why we need threads and how they are helpful. The store clerk takes orders from customers and leaves notes about the order so that the cook gets them done. Imagine the cook played roles of both the cook and the clerk. One person had to take order and then get it done. Meanwhile, any other customer coming to the shop have to wait untill the first order is delivered or may have to leave! Since the clerk and the cook work CONCURRENTLY one can take orders and the other can get the orders ready. Depending on the number of customers coming to the store, the number of clerks and cooks can be increased to attend to all the customers. Clerks and cooks here represent the threads. The goal here is to not keep the customers waiting. So, threads are units of a process that work concurrently to get the job done.

Extensions C++11 has added to the standard library to support multi-threading are

<mutex>

<condition_variable>

and

.

1. <thread>: An example thread that say's Hello!

void Hello(std::string who)
{
    std::cout << "Hello " << who << "!" << std::endl;
}

int main()
{
    std::string who("world");
    std::thread myThread(Hello, who); // create myThread to run Hello with parameter who
    // while myThread is busy saying Hello. I shall finish other tasks.
    myThread.join(); // wait for myThread to complete
    std::cout << "Thread finished saying Hello!";
    return 0;
}

main

spawns a new thread with

std::thread

and runs function

Hello

with the parameter

who

. Both the main thread and

myThread

now run concurrently. main thread then waits to make sure

myThread

finishes execution.

Now that the tasks are running concurrently, there is great performance and scalability achieved. But, this achievement hits a bottleneck when the threads need to co-ordinate or SYNCHRONIZE for various reasons. One of the main reasons being contending for a shared scarce resource like memory, files, database connections etc.

Consider two threads had to read from and write to a memory location. If the memory is not synchronized between the threads, the data would be corrupted. This is called the data race. Again, when the threads run in an non-deterministic order, the software could produce undefined behavior. This is called the race condition.

C++11 has

<mutex>

for synchronizing the threads.

2. <mutex>: A mutex - MUTual EXclusion - allows only one thread to access a shared resource at a given time. For simplicity, let us synchronize a global variable between the main thread and the worker thread.

std::mutex protector; // mutex to be locked by the threads that contend for who.
std::string who; // shared data/ memory
void Hello()
{
    protector.lock(); // lock the data so that main is not using it.
    std::cout << "Hello " << who << "!" << std::endl;
    protector.unlock(); // unlock the data so that main can continue using it.
}

int main()
{
    protector.lock(); // lock the data so that Hello thread is not using it.
    std::thread myThread(Hello); // create myThread to run Hello
    who = "world";
    protector.unlock(); // Unlock the data so that Hello thread can continue using it.
    // while myThread is busy saying Hello. I shall finish other tasks.
    myThread.join(); // wait for myThread to complete
    std::cout << "Thread finished saying Hello!";
    return 0;
}

when two threads are using the same data, it must be assured that only one thread must access the data for a defined behavior. In the above code, main thread locks the mutex protector so that no other thread will use it. It releases the mutex when it is no longer using the data. Meanwhile,

Hello

thread tries to lock the mutex and since it is being used by the main thread, it waits untill freed by main thread and then acquires the lock. This way, mutexes assure data integrity between the threads.

Mutexes can also be used to resolve race conditions by protecting the entire code sections that need to be executed in a definite order.

Here is an analogy to understand mutex better.

A public restroom is used by many, a shared resource. So, there must be a synchronization mechanism that lets others know when it is being used by one. The physical lock indicates the status of the restroom and keeps others out. Similarly, when two threads are used, they are synchronised using mutex and lock. The thread that is using the resource, locks the corresponding mutex and other threads keep waiting untill released by the thread.

Hence, Mutexes helps in synchronising the threads.

Care must be taken to avoid Deadlock's. Deadlock is a state when a thread is stuck infinitely waiting for a resource which could be accquired by another thread indefinitely. It can happen so that two threads accquire resources required by each other and wait for the resource accquired by the other.

Most common measures taken to prevent such deadlocks are,

The threads accquire resources in the same order irrespective of the order of its usage.

If the mutexes are used to avoid data races, They must be held for as less time as possible.
Whenever possible, design to acquire only one mutex at a given time.

What does not require synchronization is the thread local storage. The data, local to a particular thread. C++11 has introduced an additional storage class specifier

thread_local

which creates one instance of the variable for each of the lifetime of the thread's.

<mutex> header also defines classes for scope based locking of mutexes.

3. <condition_variable> : When multiple threads are used for various reasons, sometimes there is a need for one or more threads to wait for a particular thread to finish it's job. This can be demonstrated with a simple producer-consumer problem. The producer notifies the consumer whenever it produces data to be consumed. Thus, the consumer waits untill there is data produced. Also, the producer waits to be notified by the consumer to produce more data. The condition variables are used for this purpose. The waiting threads are blocked untill notified/ signalled. Once the condition variable is signaled, the thread is resumed accquiring the lock.

Sequence of operations performed by the waiting threads,
- Accquire the mutex.
- Wait to be notified.
- Check the notification variable and continue excution, when woke up my the notifier.
- Release the mutex.

Sequence of operations performed by the notifier,
- Perform the operations that other threads are waiting for.
- Accquire the mutex.
- Modify the notification variable.
- [optional] Release the mutex.
- Notify the waiting threads.

Now, why do we need a notification variable afterall!? It is because of spurious wake up's. It means the waiting threads could be woke up even if they weren't signalled.

A non realistic example to see how to use condition variables.

std::mutex protector;
std::string who; // shared data
std::condition_variable conditionCheck;
bool isComplete = false;

void worker_thread()
{
    // Wait until main() finishes its task.
    protector.lock();
    // Beware of spurious wake up. Check the notification variable to make sure the condition variable is signalled.
    while(!isComplete)  // If it is a spurious wake up, wait till main signals.
         conditionCheck.wait(protector); // give up lock and wait for main to signal

    // lock is automatically re accquired
    std::cout << "Hello " << who << "!" << std::endl;

    protector.unlock();
}

int main()
{
    std::thread myThread(worker_thread);

    protector.lock();
    who = "world";
    isComplete = true;
    protector.unlock();

    conditionCheck.notify_one();

    myThread.join(); // wait for myThread to complete
    std::cout << "Thread finished saying Hello!";
    return 0;
}

4. <future> : When one thread (say A) returns a value to be used by another thread (say B), It can be done so safely with future and promise. A sets the value in a promise and B reads the value from the corresponding future.

void setVal(std::promise<int>& prom) {
  prom.set_value (100);
}

int main ()
{
     // Make a promise.
     std::promise<int> prom; 

     // Promise to be fulfilled in the future.                 
     std::future<int> fut = prom.get_future();
     
     // My thread will fulfil the promise in the future.
     std::thread myThread(setVal, std::ref(prom)); // Pass by reference so that it can set the value in the future obtained.

     // Meanwhile, do other stuff. 

     // Get the value set by the promise or wait till it is set.
     int val = fut.get(); // future is not even ref type, How does it have the value set by the promise!!!!  

     myThread.join();
     return 0;
}

The future can be retrieved only once from the promise. That is because it is moved to the instance when it is first retrieved. Use a

std::shared_future

if multiple threads needs the value from the future.

The standard library provides a much cleaner API that hides the thread creation and promises, the

std::async

API.

// Execute setVal asynchronously and return future for the return value.

 std::future<int> fut = std::async(std::launch::async,  setVal); // How do we know the future type!! It is the same as the type as std::result_of setVal functon

Sunday, July 5, 2015

Save l-values in condition checks

Do you remember the value on the right side of an assignment operator is an r-value and the one on the left side is the l-value - quite obvious!

Flip the status quo to stay safe!
Sometimes, the conditional expressions are mistakenly/ unintentionally written as assignment expressions.

if(value = getValue() ) // a bug! - can cost you hours of debugging.
{
       // I'm lucky to be executed almost always!
}

if(value = VALUE)
{

}

Quick Tip: Make sure the l-values lie on the right hand side of the conditional checks.

if(gettValue() = value) // error : cannot assign to r value
{

}

if(VALUE = value) // error : cannot assign to r value
{

}

Sunday, June 21, 2015

Lambda's for DRY'er code

Lambda's are inline anonymous functions/ functor's introduced in C++11. They are extensively used for higher order programming which helps in keeping the code DRY (Don't Repeat Yourself) by abstraction. Ex, Abstracting the collection iteration mechanics etc

Lambda's and closure's that had been supported by many dynamic programming languages for higher order/ functional programming are now supported by C++ with C++11. It helps in passing and returning functions like data.

let us consider abstracting collection iteration mechanics. STL provides many types of containers for storing a collection of data with iterators to access the data. The iterator's are used to iterate over all the elements in the container to perform operations on every element.

Example:

// A collection (vector) of 10 integers with value 1
std::vector<int> integer_collection(10, 1);

for(std::vector<int>::const_iterator it = integer_collection.begin(); it != integer_collection.end(); it++)
{
      std::cout << *it  << std::endl;
}

With higher order programming, the above code can be DRY'ed to the code below wherein all the iteration mechanism is abstracted into the

for_each

function. The third parameter passed to

for_each

is a lambda function which is called by

for_each

for every element in the vector. the lambda accepts an integer and performs the required operation.

std::for_each(integer_collection.begin(), integer_collection.end(), [](int &n){ 
      std::cout << n << std::endl;
});

Defining functions that take lambda's as parameters:
In the above example, what could be the prototype of

for_each()

i.e, What is the type of the lambda function?

1. The lambda's type can be prototyped with generic templating.

template<typename L>
void function(L lambda)// pass the right lambda!!
{
     lambda(10);// Beware, Compiler could complain if the lambda doesn't accept int parameter.
}

2. The lambda's without capture variables can be cast into function pointers.

void function(void (funptr*) (int x)) // lambda that takes int parameter returning void
{
     funptr(10); // call lambda
}

3. The lambda's type can be automatically deduced in C++11 with

auto

keyword.

void function(auto lambda)
{
     lambda(10);
}

4. The lambda's type can be defined with

std::function

defined in

header from C++11.

void function(std::function<void(int)> lambda)// lambda that takes int parameter returning void
{
     lambda(10);
}

The lambda can not only be used as a functor object which can be passed to other functions. Closure's can be constructed i.e, the functions can access the variable's in the enclosing scope.

Simplified Syntax:

[capture variable's list - optional](additional parameter's - optional) { function body }

A simple example:

{
       int a, b, c;
       func([a, &b] (int x, double y) {
              // Call me back with a int and double
       });
}

The function func takes a lambda parameter which captures the variable

by value and

by reference and takes parameters

and

.

It should be noted that the advantages gained with lambda's can be achieved with function objects too. Thus, lambda's are syntactic sugar which eliminate the need to write function objects elsewhere in the code which is expected to be used only once, keeping the code DRY and in writing elegant functional programs.

Monday, April 13, 2015

RAII- Resource Acquisition Is Initialization

In C++, there is no automatic memory management/ Garbage collection which causes memory leaks. Unused memory should be deleted manually. Similarly, other resources not released even if not used causes resource leaks.

RAII is a commonly used OOP idiom to carefully manage resources. It is a scope based resource management style where the resource is acquired at the beginning of the scope and released at the end of the scope. The scope can be a block, a function or a class. The resources can be memory, files, mutexes etc

With RAII, The resources are acquired in the constructor and released in the destructor of the resource management class, so that the objects of this class can be declared in the required scope and the resources are re-claimed via the destructor when the objects go out of scope.

Here is an example of managing the memory with RAII. Do note that every class can do this management on its own. This example demonstrates RAII along with single responsibility principle. And so a class is exclusively written to manage the memory.

//Example for POD type. parameters for object construction and member access operator (->) are not considered.

template<typename T>
class CManagedMemory
{
     T *m_memory;
     CManagedMemory()
     {
          m_memory = new T(); // no more wild pointers!
     }
     //used for r value
     operator T()
     {
          return *m_memory;
     }
  
     // used for l value
     T operator=(T rVal)
     {
          *m_memory = rVal;
           return rVal;
     }
   
     ~CManagedMemory()
     {
          delete (m_memory);
          m_memory = NULL; // no more dangling pointers!
     }
};

The benefits of RAII are,

1. There are no resource leaks.

2. The resource variable is initialized to a valid state as soon as it is created and reset as soon as it is deleted.
For example, a pointer which is not initialized - wild pointer - will have garbage value (pointing to a random address) or a pointer which is deleted - dangling pointer - points to reusable memory causes unexpected behavior. With RAII, the pointer is initialized when the managed memory object is declared.
Without RAII,

 int *ptr; // contains garbage value
 delete(ptr); // contains dangling value

3. Custom define resource sharing, by appropriately declaring custom copy constructor, assignment operator as either private or public.

4. Resource is released automatically as soon as it is no longer needed. when the managed resource object goes out of scope, the destructor of the object releases the resource.
Without RAII,

{
     int *ptr = new int();
}// memory leak. ptr should be deleted before the end of the block.

With RAII,

{
     CManagedMemory<int> ptr;
}// memory deleted automatically

5. Resource reclaimed even on exceptions.
Without RAII,

{
     int *ptr = new int();
     *ptr = 100 / 0; // divide by zero exception. Statements after this statement are not executed.
     delete (ptr);
}

With RAII,

{
     CManagedMemory<int> ptr;
     ptr = 100 / 0; // divide by zero exception.
}// memory deleted even when the exception is thrown.

6. Code cleanliness. The code to reclaim the resource is in one place, in the destructor. The code need not be written in every procedure call or every branch.

C++11 introduces smart pointers like unique_ptr (no memory sharing), shared_ptr (shared memory) and weak_ptr for managing memory with RAII.

Sunday, February 8, 2015

Understanding data encapsulation

What is data encapsulation?

Data encapsulation is a fundamental OOP concept by which the data and the logic (a function) that uses/ manipulates it are bound together into objects.

How is data encapsulation useful?

Let us consider a problem which data encapsulation addresses. Consider 'C' language which is a sub set of 'C++' and does not support OOP/ data encapsulation. The C standard library provides the below strtok function which tokenizes the string "str" into sub strings separated by the delimiter "delim".

char *strtok(char *str, const char *delim)

The disadvantage of this function is that it cannot tokenize two strings simultaneously. This is because the tokenization requires multiple calls and the subsequent calls are dependent on the previous call. Also the corresponding delimiters for the two stings must be mapped manually for every call.

This link demonstrates the above problem. First token of str1 is printed and then the first token of str2. The next call tokenizes str2 and not str1 and prints the second token for str2.

So, how is data encapsulation helpful?

C++ is built on OOP paradigm where the complete functionality of a software is composed of objects with logically related sub functionalities. The objects consist of data and functions - data encapsulation.

This link demonstrates how the above problem is not seen in C++. The two strings are encapsulated into two different objects and the state of tokenization for each string is maintained separately in different objects. The delimiters are also bound to the tokenizers of each string.

This is one example of how data encapsulation or OOP is helpful.

The STL (C++ Standard Template Library) leverages this OOP concept to provide an enormous set of algorithms and containers.

Friday, January 16, 2015

CRTP

CRTP: Curiously Recurring Template Pattern.

let us consider a scenario which requires the pattern. I would like to design an abstract class to be implemented by the sub classes. Along with my API's, I have added some common members that every subclass must have - say, an id. And I would like to have a static member in each of my sub classes.

Approach: i added a static member in my abstract class. I have declared my static member in the class header and defined in the class cpp file.

problems:
1. All the translation units implementing the abstract class must also include the class cpp file for compilation or include the static library for the abstract interface.

2. The static member is shared accross all the instances of all the sub class types. but I would like to share one static member amongst all the instances of each sub class type.

CRTP to the rescue...

The abstract class is templated for the sub class type. One static member is created for each abstract class instantiation by the sub classes. Thus, the abstract class defines one static member for each sub class.

template<typename _subClass>
class AbstractClass
{

    static int m_staticMember;
    //...    

};

template<typename _subClass>
int AbstractClass<_subClass>::m_staticMember;

CAUTION!

Sunday, December 27, 2015

Concurrency and Multithreading

Sunday, July 5, 2015

Save l-values in condition checks

Sunday, June 21, 2015

Lambda's for DRY'er code

Monday, April 13, 2015

RAII- Resource Acquisition Is Initialization

Sunday, February 8, 2015

Understanding data encapsulation

Friday, January 16, 2015

CRTP