Posts Tagged ‘threads’

Beginner, Intermediate, Expert

September 3, 2012 Leave a comment

Check out this insightful quote from Couchbase CTO Damien Katz:

Threads are hard because, despite being extremely careful, it’s ridiculously easy to code in hard to find, undeterministic, data races and/or deadlocks. That’s why I always model my multi-threaded programs (using the MML, of course) to some extent before I dive into code:

Note that even though I created and evolved (using paygo, of course) the above, one page “agile model” for a program I wrote, I still ended up with an infrequently occurring  data race that took months, yes months, to freakin’ find. The culprit ended up being a data race on the (supposedly) thread-safe DB2 data structure accessed by the AT4, AT6, and AT7 threads. D’oh!

Two Plus Months

June 30, 2012 1 comment

Race conditions are one of the worst plagues of concurrent code: They can cause disastrous effects all the way up to undefined behavior and random code execution, yet they’re hard to discover reliably during testing, hard to reproduce when they do occur, and the icing on the cake is that we have immature and inadequate race detection and prevention tool support available today. – Herb Sutter (

With this opening paragraph in mind, observe the figure below. If you don’t lock-protect a stateful object that’s accessed by more than one thread, you’re guaranteed to fall into the dastardly trap that Herb describes. D’oh!

Now, look at the two object figure below. Unless you protect each of the two objects in the execution path with a lock, you’re hosed!

To improve performance at the expense of higher risk, you can use one lock for the two object example like on the left side of this graphic:

Alas, if you do choose to use one lock in a two object configuration like the example above, you better be sure that you don’t come in through the side with another thread to use the thread-unsafe object2. You also better be sure that a future maintainer of your code doesn’t do the same. But wait… How can you ensure that a maintainer won’t do that? You can’t. So stick with the more conservative, lower performance, one-lock-per-object approach.

Don’t ask me why I wrote this post cuz I ain’t answering. Well, Ok, ask. I wrote this post because I was burned by the left-hand side of the second graphic in this post. It took quite awhile, actually two plus months, to finally localize and squash the bugger in production code. As usual, Herb was right.

And please, don’t tell me that lock-free programming is the answer:

…replacing locks wholesale by writing your own lock-free code is not the answer. Lock-free code has two major drawbacks. First, it’s not broadly useful for solving typical problems—lots of basic data structures, even doubly linked lists, still have no known lock-free implementations. Second, it’s hard even for experts. It’s easy to write lock-free code that appears to work, but it’s very difficult to write lock-free code that is correct and performs well. Even good magazines and refereed journals have published a substantial amount of lock-free code that was actually broken in subtle ways and needed correction. – Herb Sutter (Dr. Dobbs).

Persistent Discomfort

November 3, 2011 Leave a comment

As part of the infrastructure of the distributed, multi-process, multi-threaded system that my team is developing, a parameterized, mutex protected, inter-thread message queue class has been written and dropped into a general purpose library. To unburden application component developers from having to do it, the library-based queue class manages a reusable pool of message buffers that functionally “flow” from one thread to the next.

On the “push” side of the queue, usage is as follows:

  • Thread acquires a handle to the next empty Message buffer
  • Thread fills Message buffer
  • Thread returns handle to the queue (push)

On the “pop” side of the queue, usage is as follows:

  • Thread acquires a handle to the next full Message buffer (pop)
  • Thread processes the Message content
  • Thread returns handle to the queue

So far, so good, right? I thought so too – at the beginning of the project. But as I’ve moved forward during the development of my application component, I’ve been experiencing a growing and persistent discomfort. D’oh!

Using the figure below, I’m gonna share the cause of my “inner thread” discomfort with you.

In order to functionally process an input message and propagate it forward, the inner thread must do the following work:

  • Acquire a handle to the next input Message buffer from queue 1 (pop)
  • Acquire a handle to the next empty output Message buffer from queue 2
  • Utilize the content of the Message from queue 1 to compute/fill in the Message to queue 2
  • Return the handle of the input message to queue 1
  • Return the handle of the output message to queue 2 (push)

For small messages and/or when the messages are of different types, I don’t see much wrong with this inter-thread message passing approach. However, when the messages are big and of the same type, my discomfort surfaces. In this case (as we shall see), the “utilize” bullet amounts to an unnecessary copy. The more “inner” threads there are in the pipeline, the more performance degradation there is from unnecessary copies.

So, how can the copies be eliminated and system performance increased? One way, as the figure below shows, is to move message buffer management responsibility out of the local queue class and into a global, shared message pool class.

In this memory-less queue design, the two pipeline end point threads explicitly assume the responsibility of acquiring and releasing the Message buffer handles from the mutex protected, shared message pool. The first thread “acquires” and the last thread “releases” message buffer handles. Each inner thread, i, in the pipeline performs the following work:

  • Pop the handle to the next input Message buffer from queue i-1
  • Process the message
  • Push the Message buffer handle to queue i

The key to avoiding unessential inner thread copies is that the messages must be intentionally designed to be of the same type.

As soon as I get some schedule breathing room (which may be never), I’m gonna refactor my application infrastructure design and rewrite the code to implement the memoryless queue + global message pool approach. That is, unless someone points out a fatal flaw in my reasoning and/or presents a superior inter-thread message communication pattern.

Design Disclosure

October 31, 2011 Leave a comment

Recently, I had to publicly disclose the design of the multi-threaded CSCI (Computer Software Configuration Item) that I’m currently bringing to life with a small group of colleagues. The figure below shows the entities (packages, CSCs (Computer Software Components), Classes) and the inter-entity relationship schema that I used to create the CSCI design artifacts.

As the figure illustrates, the emerging CSCI contains M packages and K top level CSCs . Each package contains from 1 to N 2nd level CSCs that associate with each other via the “communicates” (via messages) relationship. Each CSC is of the “passive” or “active” class type, where “active” means that the CSC executes within its own thread of control.

Using the schema, I presented the structural and behavioral aspects of the design as a set of views:

Like any “real” design effort (and unlike the standard sequential design-code-test mindset of “authorities“), I covertly used the incremental and iterative PAYGO methodology (design-a-little, code-a-little, test-a-little, document-a-little) in various mini sequences that, shhhh – don’t tell any rational thinker, just “felt right” in the moment.

As we speak, the effort is still underway and, of course, the software is 90% done. Whoo Hoo! Only 10 more percent to go.

Inter-Thread Message Communication

December 13, 2010 Leave a comment

When the asynchronously executing threads in a multi-threaded application process need to communicate with each other using same-address-space messages, a thread-safe way of providing a message passing service is required. The figure below shows a black, um, yellow box model of the functionality that needs to be designed, coded, and tested for solving the problem. The little ovals with arrow thingies on them represent asynchronously executing threads of code either on the same cpu or on  separate  cpus. The little lock thingy represents a mutex that protects the mechanism internals from data corruption via simultaneous, uncontrolled access by more than one thread (I hate when that happens!).

There are at least two ways to implement a thread-safe, inter-thread message passing service; by passing copies of the message objects themselves or by passing (smaller) pointers to message objects. As the figures in the models below illustrate, the design and user API for the pass-by-objects approach is simpler than the pass-by-pointers approach.  The tradeoff is that the performance of the pass-by-objects approach degrades as the message size gets larger. In addition, passing by pointer allows dynamic polymorphism to be utilized.

Option 1: Pass-By-Objects

Option 2: Pass-By-Pointers (a.k.a references)

Since, in option 2, the memory that holds the message content is managed by the mutex-protected message passing mechanism and not shared by the clients themselves, the clients must acquire a pointer to a message memory buffer before either filling (writer) or processing (reader) the payload. Thus, the mutex must be locked/unlocked twice; once to “pop” the pointer and a second time to “push” the pointer.

An alternative to the double queue design in option 2 is to require the clients to manage the message buffers themselves via the aid of a memory pool. The disadvantage of pushing the message memory management out of the queuing mechanism and up into the application layer is that it introduces a long distance coupling between the Writer and Reader – which may be written by different people. If the reader programmer forgets to release a pointer back to the pool after processing a message, a memory leak will occur (I hate when that happens!). By encapsulating the lock/unlock actions within the message passing mechanism written by one person, the chances of introducing a memory leak are reduced and the reader and writer threads remain decoupled.

A third inter-thread message passing design option is to employ a (much trickier to implement) lockless mechanism. Better yet, the use of a programming language that natively supports inter-thread message passing under the covers unburdens application programmers with the subtleties of inter-thread synchronization.

Tasks, Threads, And Processes

Tasks, threads, and processes. What’s the difference between them?  Most, if not all Real-Time Operating Systems (RTOS) are intended to run stand-alone applications that require small memory footprints and low latency responsiveness to external events. They’re designed around the concept of a single, unified, system-wide memory address space that is shared among a set of prioritized application “tasks” that execute under the control of the RTOS. Context switching between tasks is fast in that no Memory Management Unit (MMU) hardware control registers need to be saved and retrieved to implement each switch. The trade-off for this increased speed is that when a single task crashes (divide by zero, or bus error, or illegal instruction, anyone?), the whole application gets hosed. It’s reboot city.

Modern desktop and server operating systems are intended for running multiple, totally independent, applications for multiple users. Thus, they divide the system memory space into separate “process” spaces so that when one process crashes, the remaining processes are unaffected. Hardware support in the form of an MMU is required to pull off process isolation and independence. The trade-off for this increased reliability is slower context switching times and responsiveness.

The figure below graphically summarizes what the text descriptions above have attempted to communicate. All modern, multi-process operating systems (e.g. Linux, Unix, Solaris, Windows) also support multi-threading within a process. A thread is the equivalent of an RTOS task in that if a thread crashes, the application process within which it is running can come tumbling down. Threads provide the option for increased modularity and separation of concerns within a process at the expense of another layer of context switching (beyond the layer of process-to-process context switching) and further decreased responsiveness.

Threads And Processes

%d bloggers like this: