Archive

Posts Tagged ‘multicore’

A Fascinating Conversation

October 22, 2014 2 comments

“A Conversation with Bjarne Stroustrup, Carl Hewitt, and Dave Ungar” is the most fascinating technical video I’ve seen in years. The primary focus of the discussion was how to write applications that efficiently leverage multicore processors. Because of the diversity of views, the video is so engrossing that I watched it three times. I also downloaded the MP3 podcast of the discussion and saved it to a USB stick for the drive to/from work.

When pure physics started limiting the enormous CPU clock speed gains being achieved in the 90s, vertical scaling came to an abrupt halt and horizontal scaling kicked into gear. The number of cores per processor started going up, and it still is today.

Cores vs clock speed

There was much discussion over the difference between a “lock” and “synchronization“. As the figure below illustrates, main memory is physically shared between the cores in a multicore processor. Thus, even if your programming language shields this fact from you by supporting high level, task-based, message passing instead of low-level cooperative threading with locks, some physical form of under-the-hood memory synchronization must be performed in order for the cores to communicate with each other through shared memory without data races.

SMP HW Arch

Here is my layman’s take on each participant’s view of the solution to the multicore efficiency issue:

  • Hewitt: We need new, revolutionary, actor-based programming languages that abandon the traditional, sequential Von Neumann model. The current crop of languages just don’t cut it as the number of cores per processor keeps steadily increasing.
  • Ungar: We need incoherent, unsynchronized hardware memory architectures with background cache error correction. Build a reliable system out of unreliable parts.
  • Stroustrup: Revolutions happen much less than people seem to think. We need to build up and experiment with efficient concurrency abstractions in layered libraries that increasingly hide locks and core-to-memory synchronization details from programmers (the C++ threads-to-tasks-to-“next?” approach).

TTA

Monolithic Redesign

On my latest assignment, I have to reverse engineer and understand a large monolithic block of computationally intense, single-threaded product code that’s been feature-enhanced and bug-fixed many times, by many people, over many years (sound familiar?). In addition to the piled on new features, a boatload of nested if-else structures has been injected into the multi-K SLOC code base over the years to handle special and weird cases observed and reported in from the field.

As you can guess, it’s no one’s fault that the code is a tangled mess. It’s because the second law of thermodynamics has been in action doing its dirty work destroying the system without being periodically harnessed by scheduled acts of husbandry. A handful of maintenance developers imbued with a sense of personal responsibility and ownership have tried their best to refactor the beast into submission; under the radar.

Because the code is CPU intensive and single threaded, it’s not scalable to higher input data rates and its viability has hit “the wall”. Thus, besides refactoring the existing functionality into a more maintainable design, I have to simultaneously morph the mess into a multi-threaded structure that can transparently leverage the increased CPU power supplied by multicore hardware.

Note that redesigning for distributed flexibility and higher throughput doesn’t come for free. Essential complexity is added and additional latency is incurred because each input sample must traverse the 3 thread pipeline.

Piece of cake, no? Since lowly “programmers” are interchangeable, anyone could do it, right? I love this job and I’m having a blast!

%d bloggers like this: