Taming the Parallel Beast

Many programmers seem to think parallelism is hard. A quick Internet search will yield numerous blogs commenting on the difficulty of writing parallel programs (or parallelizing existing serial code). There do seem to be many challenges for novices. Here’s a representative list:

  • Finding the parallelism. This can be difficult because when we tune code for serial performance, we often use memory in ways that limit the available parallelism. Simple fixes for serial performance often complicate the original algorithm and hide the parallelism that is present.
  • Avoiding the bugs. Certainly, there is a class of bugs such as data races, deadlocks, and other synchronization problems that affect parallel programs, and which serial programs don’t have. And in some senses they are worse, because timing-sensitive bugs are often hard to reproduce -- especially in a debugger.
  • Tuning performance. Serial programmers have to worry about granularity, throughput, cache size, memory bandwidth, and memory locality. But for parallel programs, the programmer also has to consider the parallel overheads and unique problems, like false sharing of cache lines.
  • Ensuring future proofing. Serial programmers don’t worry whether the code they are writing will run well on next year’s processors -- it’s the job of the processor companies to maintain upward compatibility. But parallel programmers need to think about how their code will run on a wide range of machines, including machines with two, four, or even more processors. Software that is tuned for today’s quad-core processors may still be running unchanged on future 16-, 32- or even 64-core machines.
  • Using modern programming methods. Object-oriented programming makes it much less obvious where the program is spending its time.
  • Other reasons that parallel programming is considered hard include the complexity of the effort, insufficient help for developers unfamiliar with the techniques, and a lack of tools for dealing with parallel code. When adding parallelism to existing code, it can also be difficult to make all the changes needed to add parallelism all at once, and to ensure that there is enough testing to eliminate timing-sensitive bugs.

Use Serial Modeling to Evolve Serial Code to Parallel
The key to success in introducing parallelism is to rely on a well-proven programming method called serial modeling. Using serial modeling tools and technique, programmers can achieve parallelization with enhanced performance and without synchronization issues. The essence of the method involves consistently checking and resolving problems, and beginning early in the process to slowly evolve the code from pure serial, to serial but capable of being run in parallel, to truly parallel.

The first step is to measure where the application spends time -- effort spent in hot areas will be effective, while effort spent elsewhere is wasted. The next step is to use a serial modeling tool to evaluate opportunities for potential parallelization and determining what would happen if this code ran in parallel. This kind of tool observes the execution of the program, and uses the serial behavior to predict the performance and bugs that might occur if the program actually executed in parallel.

Checking for problems early in the evolution process, while a program is still serial, ensures that you don’t waste time on parallelization efforts that are doomed because of poor performance. You can then model parallelizations that resolve the performance issues or, if no alternatives are practical, focus your efforts on more profitable locations.

The tool can also model the correctness of the theoretical parallel program, and detect race conditions and other synchronization errors while still running the serial program. Although the program still runs serially, it is easy to debug and test, and it computes the same results. The programmer can change the program to resolve the potential races, and after each change, the program remains a serial program (with annotations) and can be tested and debugged using normal processes.

When the program has fully evolved, the result is a correct serial program with annotations describing a parallelization with known good performance and no synchronization issues. The final step in the process is to convert those annotations to parallel code. After conversion, the parallel program can undergo final tuning and debugging with the other tools. The beast has been tamed.

Photo: @iStockphoto.com/angelhell

Which Comes First: Parallel Languages or Patterns?

On the shuttle to the UPCRC (Universal Parallel Computation Research Center) Annual Summit meeting on the Microsoft campus in Redmond, Wash., I was listening in on a discussion about parallel programming patterns. Being a parallel programmer, I was interested in what people (and these were some of the experts in the field) had to say about parallel programming patterns, how they are evolving and how they will impact future parallel coders.

The discussion turned to whether patterns would affect programming languages directly or remain something that would be constructed from statements of the language. I think I’m in the former camp. Here’s why.

For those of us that were programming when Elvis was still alive, think back to writing with assembly language. For the most part, there were instructions for Load, Store, Add, Compare, Jump, plus some variations on these and other miscellaneous instructions. To implement a counting/indexing loop, you would use something like the following:

Initialize counter
LOOP: test end condition,
goto EXIT if done
Loop Body
increment counter
goto LOOP
EXIT: next statement

This is a programming pattern. Surprised? With the proper conditional testing and jumping (goto) instructions within the programming language, this pattern can be implemented in any imperative language. Since this pattern proved to be so useful and pervasive in the computations being written, programming language designers added syntax to “automate” the steps above. For example, the for-loop in C:

for (i = 0; i < N; ++i) {
Loop Body

Once we had threads and the supporting libraries to create and manage threads, parallel coding in shared memory was feasible, but at a pretty crude level since the programmer had to be sure the code handled everything explicitly. For example, dividing the loop iterations among threads can be done with each thread executing code that looks something like this:

start = (N/num_threads) * (myid)
end = (N/num_threads) * (myid + 1)
if (myid == LAST) end = N
for (i = start; i < end; ++i) {
Loop Body

Parallel programming patterns will be abstractions that can be “crudely” implemented in current languages and parallel libraries, like the pseudocode above. New languages (or language extensions) will make programming parallel patterns easier and less error prone. From the example above, OpenMP has the syntax to do this, but it only takes a single line added to the serial code:

#pragma omp for
for (i = 0; i < N; ++i) {
Loop Body

From the evidence above, I think future parallel programming languages or language extensions supporting parallelism will be influenced by the parallel programming patterns we define and use today. And nothing will remain static. During his UPCRC presentation, Design Patterns’ Ralph Johnson remarked that some of the original patterns saw early use, but this use has slacked off. Two reasons he noted for this was that some of the patterns couldn’t easily be implemented in Java and modern OO languages had better ways to accomplish the same tasks -- most likely these new languages found inspiration from the patterns and their usage.

For an answer to the question posed in the title, it boils down (no pun intended) to the old chicken-and-egg paradox. There were algorithms (patterns) to do computations before there were computers; prior to that, those algorithms were modifications of previous algorithms influenced by the tools available. Looking forward, though, we’re still in the relative stages of infancy for programming, let alone parallel programming. Clearly, the next generation of parallel programming languages or libraries or extensions bolted onto serial languages will be influenced by the patterns we use now for specifying parallel computations.