Why Is Parallel Programming So Hard
Most people first learn how to program serially. Later they learn about multitasking and threading. But with the advent of multicore processors, most programmers are still pretty intimidated by the prospect of true parallel programming.
To help serial programmers make the transition to parallel, we talked to distinguished IBM engineer Paul E. McKenney. (McKenney maintains RCU in the Linux kernel and has written the detailed guidebook Is Parallel Programming Hard, And, If So, What Can You Do About It? ) Here’s what he had to say.
Q: What makes parallel programming harder than serial programming? How much of this is simply a new mindset one has to adopt?
McKenney: Strange though it may seem, although parallel programming is indeed harder than sequential programming, it is not that much harder. Perhaps the people complaining about parallel programming have forgotten about parallelism in everyday life: Drivers deal naturally with many other cars; sports team members deal with other players, referees and sometimes spectators; and schoolteachers deal with large numbers of (sometimes unruly) children. It is not parallel programming that is hard, but rather programming itself.
Nevertheless, parallelism can pose difficult problems for longtime sequential programmers, just as Git can be for longtime users of revision control systems. These problems include design and coding habits that are inappropriate for parallel programming, but also sequential APIs that are problematic for parallel programs.
This turns out to be a problem both in theory and practice. For example, consider a collection API whose addition and deletion primitives return the exact number of items in the collection. This has simple and efficient sequential implementations but is problematic in parallel. In contrast, addition and deletion primitives that do not return the exact number of items in the set have simple and efficient parallel implementations.
As a result, much of the difficulty in moving from sequential to parallel programming is in fact adopting a new mindset. After all, if parallel programming really is mind-crushingly difficult, why are there so many successful parallel open-source projects?
Q: What does a serial programmer have to rethink when approaching parallel programming? Enterprise programmers know how to multitask and thread, but most don’t have a clue how to tap into the power of multicore/parallel programming.
McKenney: The two biggest learning opportunities are a) partitioning problems[SK7] to allow efficient parallel solutions, and b) using the right tool for the job. Sometimes people dismiss partitioned problems as “embarrassingly parallel,” but these problems are exactly the ones for which parallelism is the most effective. Parallelism is first and foremost a performance optimization, and therefore has its area of applicability. For a given problem, parallelism might well be the right tool for the job, but other performance optimizations might be better. Use the right tool for the job!
Q: What can best help serial programmers retool themselves into parallel programmers?
McKenney: All tools and languages, parallel or not, are domain-specific. So look at the tools and languages used by parallel programmers in application domains that interest you the most. There are a lot of parallel open-source projects out there, and so there is no shortage of existing practice to learn from. If you are not interested in a specific application domain, focus on tools used by a vibrant parallel open-source project. For example, the project I participate in has been called out as having made great progress in parallelism by some prominent academic researchers.
But design is more important than specific tools or languages. If your design fully partitions the problem, then parallelization will be easy in pretty much any tool or language. In contrast, no tool or language will help a flawed design. Finally, if your problem is inherently unpartitionable, perhaps you should be looking at other performance optimization.
Back in the 1970s and 1980s, the Great Programming Crisis was at least as severe as the current Great Multicore Programming Crisis. The primary solutions were not high-minded tools or languages, but rather the lowly spreadsheet, word processor, presentation manager, and relational database. These four lowly tools transformed the computer from an esoteric curiosity into something that almost no one can live without. But the high-minded tools and languages of the time are long forgotten, and rightly so. The same thing will happen with parallelism: The average person doesn’t care about fancy parallel languages and synchronization techniques; they just want their smartphone’s battery to last longer.
Of course, I have my favorite languages, tools and techniques, mostly from the perspective of someone hacking kernels or the lower levels of parallel system utilities. I have had great fun with parallel programming over the past two decades, and I hope your readers have at least as much fun in the decades to come!