Security Issues for Multicore Processors

If hackers love one thing, it’s a big pool of potential targets, which is why Android and Windows platforms are attacked far more often than BlackBerry and Mac OS X. So, it’s no surprise that as the installed base of multicore processors has grown, they’ve become a potential target.

That vulnerability is slowly expanding to mobile devices. Within three years, nearly 75 percent of smartphones and tablets will have a multicore processor, predicts In-Stat a research firm. That’s another reason why CIOs, IT managers and enterprise developers need to develop strategies for mitigating multicore-enabled attacks.

Cambridge University researcher Robert Watson has been studying multicore security attacks such as system call wrappers for several years. He recently spoke with Intelligence in Software about multicore vulnerabilities and what the IT industry is doing to close the processor back door.

Q: As the installed base of multicore processors grows in PC and mobile devices, are they becoming a more attractive target for hackers?

Robert Watson: One of the most important transitions in computer security over the last two decades has been the professionalization, on a large scale, of hacking. Online fraud and mass-market hacking see the same pressures that more stock-and-trade online businesses do: How to reach the largest audience, reduce costs and to reuse, and wherever possible, automate solutions. This means going for the low-hanging fruit and targeting the commodity platforms. Windows and Android are a case in point, but we certainly shouldn't assume that Apple's iOS and RIM's Blackberry aren't targets. They are major market players, as well.

Multicore attacks come into play in local privilege escalation. They may not be how an attacker gets their first byte of code on the phone -- that might be conventional buffer overflows in network protocols and file formats, or perhaps simply asking the user to buy malware in an online application store.

Multicore attacks instead kick in when users try to escape from sandboxing on devices, typically targeted at operating system kernel concurrency vulnerabilities. In many ways, it's quite exciting that vendors like Apple, Nokia and Google have adopted a "sandboxed by default" model on the phone. They took advantage of a change in platform to require application developers to change models. This has made the mobile device market a dramatically better place than the cesspool of desktop computing devices. However, from the attacker perspective, it's an obstacle to be overcome, and multicore attacks are a very good way to do that.

Q: What are the primary vulnerabilities in multicore designs? What are the major types of attacks?

R.W.: When teaching undergraduates about local and distributed systems programming, the term "concurrency" comes up a lot. Concurrency refers to the appearance, and in some cases the reality, of multiple things going on at once. The application developer has to deal with the possibility that two messages arrive concurrently, that a file is changed by two programs concurrently, etc.

Reasoning about possible interleavings of events turns out to be remarkably difficult to do. It's one of the things that makes CPU and OS design so difficult. When programmers reason about concurrency wrong, then applications can behave unpredictably. They can crash, data can be corrupted and in the security context, this can lead to incorrect implementation of sandboxing.

In our system call wrapper work, we showed that by exploiting concurrency bugs, attackers could bypass a variety of security techniques, from sandboxing to intrusion detection. Others have since shown that these attacks work against almost all mass-market antivirus packages, allowing viruses to go undetected. Similar techniques have been used to exploit OS bugs in systems such as Linux, in which incorrect reasoning about concurrency allows an application running with user privileges to gain system privileges.

It is precisely these sorts of attacks that we are worried about: the ability to escape from sandboxing and attack the broader mobile platform, stealing or modifying data, gaining unauthorized access to networks, etc.

Q: What can enterprise developers, CIOs and IT managers do to mitigate those threats? And is there anything that vendors such as chipset manufacturers could or should do to help make multicore processors more secure?

R.W.: Concurrency is inherent in the design of complex systems software, and multicore has brought this to the forefront in the design of end-user devices. It isn't just a security problem, though. Incorrect reasoning about concurrency leads to failures of a variety of systems. Our research and development communities need to continue to focus on how to make concurrency more accessible.

Enterprise developers need to be specifically trained in reasoning about concurrency, a topic omitted from the educations of many senior developers because they were trained before the widespread adoption of concurrent programming styles, and often taught badly even for more junior developers. Perhaps the most important thing to do here is to avoid concurrency wherever possible. It is tempting to adopt concurrent programming styles because that is the way things are going. Developers should resist!

There are places where concurrency can't be avoided, especially in the design of high-performance OS kernels, and there, concurrency and security must be considered hand-in-hand. In fact, concerns about concurrency and security, such as those raised in our system call wrapper work, have directly influenced the design of OS sandboxing used in most commercially available OSs.

For CIOs and IT managers,  concurrency attacks, like hackers, are just another weapon in the arsenal that they need to be aware of. Concurrency isn't going away. We use multicore machines everywhere, and the whole point of networking is to facilitate concurrency.

What they can do is put pressures on their vendors to consider the implications of concurrency maturely and on their local developers to do the same. Where companies produce software-based products, commercial scanning tools such as Coverity's Prevent are increasingly aware of concurrency, and these should be deployed as appropriate. For software developers, we shouldn't forget training:

First, to avoid risky software constructs, and second, to know how to use them correctly when they must be used.

We should take some comfort in knowing that hardware and software researchers are deeply concerned with the problems of concurrency, and that this is an active area of research. But there are no quick fixes since the limitations here are as much to do with our ability to comprehend concurrency as with the nature of the technologies themselves.

Read more about eliminating concurrency and threading errors from our sponsor.

Who’s in Charge of Multicore?

Like just about every other technology, multicore processors have an industry organization to help create best practices and guidelines. For developers, following The Multicore Association (MCA) can be a convenient way to keep up with what processor manufacturers, OS vendors, universities and other ecosystem members are planning a year or two out. MCA president Markus Levy recently spoke with Intelligence in Software about the organization’s current initiatives.

Q: What’s MCA’s role in the industry? For example, do you work with other trade groups and standards bodies?

Markus Levy: Multicore is a huge topic with many varieties of processors, issues and benefits. It boils down to the target market and, specifically, the target application to determine whether to use a homogeneous symmetrical multiprocessing processor or a highly integrated system-on-a-chip with many heterogeneous processing elements. The Multicore Association is and will be biting off a chunk of this to enable portability and ease of use.

With this in mind, we primarily aim to develop application program interfaces (APIs) to allow processor and operating system vendors and programmers to develop multicore-related products using open specifications. It’s important to note that The Multicore Association doesn’t currently work with other trade groups or standards bodies. We never intend to compete and/or develop redundant specifications.

Q: Developers work with whatever hardware vendors provide at any given time. In this case, that’s multicore processors. How can keeping up with MCA help developers understand what kind of hardware and software might be available to them in a year, three years or five years down the road? For example, what are some current association initiatives that they should keep an eye on?

M.L.: Good question. Notice that the MCA membership comprises a mixture of processor vendors, OS vendors and system developers. The main benefit for processor vendors is to support current and future generations of multicore processors. In other words, it makes it easier for their customers to write their application code once and only require minor changes as they move to another generation processor.

The OS vendors are also utilizing the MCA standards to enhance their offerings, and customers are requesting it. System developers are actually using our open standards to create their own proprietary implementations that are optimized for their needs.

We currently have a Tools Infrastructure Working Group (TIWG) that is defining a common data format and creating standards-based mechanisms to share data across diverse and non-interoperable development tools, specifically related to the interfaces between profilers and analysis/visualization tools. Actually, in this regard, the TIWG is also collaborating with the CE Linux Forum on a reference implementation for a de-facto trace data format standard that TIWG will define.

Our most popular specification to date is the Multicore Communications API (MCAPI), and the working group is currently defining and developing refinements and enhancements for version 2.0. MCAPI is now implemented by most OS vendors, quite a few university projects, as well as versions developed by system companies.

We’ve recently formed a Multicore Task Management API (MTAPI) working group that is focused on dynamic scheduling and mapping tasks to processor cores to help optimize throughput on multicore systems. MTAPI will provide an API that allows parallel embedded software to be designed in a straightforward way, abstracting the hardware details and letting the software developer focus on the parallel solution of the problem. This is already turning out to be quite popular with fairly extensive member involvement. It’s an important piece of the puzzle that many developers will be able to utilize in the next one to two years.

Q: Multicore processors have a lot of obvious advantages, particularly performance. But are there any challenges? The need for more parallelism seems like one. What are some others? And how is The Multicore Association working to address those challenges?

M.L.: There are many challenges of using multicore processors. Parallelizing code is just one aspect. Again, it also depends on the type of multicore processor and how it is being used. While the MTAPI is focused on the parallel solution, the MCAPI is critical to enable core-to-core communications, and our Multicore Resource management API (MRAPI) specification is focused on dealing with on-chip resources that are shared by two or more cores, such as shared memory and I/O.

Q: The MCA website has a lot of resources, such as webinars and a discussion group. Would you recommend those as a good way for developers to keep up with MCA activities?

M.L.: The webinars provide good background information on the projects we have completed so far, so I recommend these as starting points. The discussion group is very inactive. Better ways to stay up on activities include:

  • Subscribing to the MCA newsletter. This comes out every four to eight weeks, depending on whether or not there is news to be reported.
  • We also have special cases where nonmembers can attend meetings as guests. They can contact me for assistance.
  • Attend the Multicore Expo once a year, where members go into depth on the specifications. Plus, other industry folks present on various multicore technologies.

Photo Credit:

Maxing out Multicore

From smartphones to data centers, multicore processors are becoming the norm. The extra processing power is good news for developers, but there’s a catch: The real-world performance gains are often held back by old ways of thinking about coding, a caveat summed up in the oft-overlooked Amdahl’s Law.

More than two years ago, University of Wisconsin Professor Mark Hill gave a presentation about how Amdahl’s Law affects multicore performance. He recently spoke with Intelligence in Software about why parallelism is key to unlocking multicore’s benefits and why the computing world could use a new Moore’s Law.

Q: In your presentation, you mentioned a conversation you had in a bar with IBM researcher Thomas Puzak. He said that everybody knows Amdahl’s

Law but quickly forgets it. Why?

Mark Hill: You learn the math of this, but often when it comes to real life, you forget how harsh Amdahl’s Law really is. If you’re 99 percent parallel and 1 percent serial, and you have 256 cores, how much faster do you think you can go? You’d think you get a speed-up of maybe 250 out of 256. (A speed-up for 250 means that one is computing at 250 times the rate of one core.)

But the answer is a speed-up of 72. That 1 percent has already cost you that much. That’s the kind of thing I mean. People’s intuition often is more optimistic than if they did a calculation with Amdahl’s Law.

Q: Hence your point about why there’s a growing need for dramatic increases in parallelism.

M.H.: Correct. It also ties in with the fact that I don’t think you’re going to take old software and get dramatic parallelism because it’s going to get that sequential component down.

Let’s say you get the sequential component down to 35 percent. Then your

speed-up is limited to three -- at most three times faster than a single core. Nice, but hardly what you want. So in my opinion, dramatic gains in parallelism are going to have to happen due to new software that’s written for this person.

Q: For developers, what are the challenges to writing parallel-centric software, for lack of a better term? Is it mainly a change in mindset?

M.H.: It’s a pretty huge hurdle. There have been people writing parallel software in niche domains such as supercomputers, but most developers don’t have experience with it. If you think of software as a numerical recipe -- you do this, you do that -- parallel computing is like a bunch of numerical recipes operating at the same time. That can be conceptually a lot more difficult. It can be easier if you have a large dataset and say, “Let’s do approximately the same thing on each element of this large dataset.” That’s not so mind-blowing.

Part of the problem is that the literature can be a little biased because you can more easily publish results that show things working fantastically than working poorly. If you read these papers, you might think things are working pretty well, but people select things they want to publish with as opposed to the problems need to be done.

Q: You’ve talked about the need for a new Moore’s Law, where parallelism doubles every two years.

M.H.: It used to be that you could design a piece of software, and if it ran like a pig, you could say, “Well, it’s not a problem because processors are going to get twice as fast in two years.” Going forward, that’s going to be true only if you get parallelism that keeps increasing. That’s not going to happen by luck. You’re going to have to plan for it.

Q: Maybe there’s an analogy with oil: When gas prices skyrocket, as they did in the 1970s and again today, automakers start looking for ways to wring every mile they can out of a gallon. In the computing world, enterprises want data centers that aren’t electricity hogs, and smartphones that can last an entire workday before they need a charge.

M.H.: Electricity is increasingly becoming the limiting factor in machines. In this new era, there’s going to be a lot more pressure for the code to be more efficient. People say, “That doesn’t matter anymore because computers are so fast.” Yeah, but if my code is twice as efficient as your code, that’s going to matter because if somebody’s battery lasts twice as long, it’s a very good thing.

To download the slides from Hill’s presentation, visit .

Photo Credit:

Smartphones and Tablets Go Multicore: Now What?

Within three years, nearly 75 percent of smartphones and tablets will have a multicore processor, predicts In-Stat, a research firm. This trend is no surprise, considering how mobile devices are increasingly handling multiple tasks simultaneously, such as serving as a videoconferencing endpoint while downloading email in the background and running an antimalware scan.

Today’s commercially available dual-core mobile devices include Apple’s iPad 2, HTC’s EVO 3D and Samsung’s Galaxy Tab, and some vendors have announced quad-core processors that will begin shipping next year. Operating systems are the other half of the multicore equation. Android and iOS, which are No. 1 and No. 2  in terms of U.S. market share, already support multicore processors.

“IOS and Android are, at their core, based on multithreaded operating systems,” says Geoff Kratz, chief scientist at FarWest Software, a consultancy that specializes in system design. “Android is basically Linux, and iOS is based on Mac OS, which in turn is based on BSD UNIX. That means that, out of the box, these systems are built to support multicore/multi-CPU systems and multithreaded apps.”

For enterprise developers, what all this means is that it’s time to get up-to-speed on programming for mobile multicore devices. That process starts with understanding multicore’s benefits -- and why they don’t always apply.

Divide and Conquer

Performance is arguably multicore’s biggest and most obvious benefit. But that benefit doesn’t apply across the board; not all apps use multithreading. So when developing an app, an important first step is to determine what can be done in parallel.

“This type of programming can be tricky at first, but once you get used to it and thinking about it, it becomes straightforward,” says Kratz. “For multithreaded programming, the important concepts to understand are queues (for inter-thread communications) and the various types of locks you can use to protect memory (like mutexes, spin locks and read-write locks).”

It’s also important to protect shared data.

“Most programmers new to multithreaded programming assume things like a simple assignment to a variable (e.g., setting a “long” variable to “1”) is atomic, but unfortunately it often isn’t,” says Kratz. “That means that one thread could be setting a variable to some value, and another thread catches it halfway through the write, getting back a nonsense value.”

It’s here that hardware fragmentation can compound the problem. For example, not all platforms handle atomic assignments. So an app might run flawlessly on an enterprise’s installed base of mobile devices, only to have problems arise when non-atomic hardware is added to the mix.

“Using mutexes (which allow exclusive access to a variable) or read/write locks (which allow multiple threads to read the data, but lock everyone out when a thread wants to write) are the two most common approaches, and mutexes are by far the most common,” says Kratz. “For Android programmers, this is easily done using the ‘synchronized’ construct in the Java language, protecting some of the code paths so only one thread at a time can use it.”

For iOS, options include using POSIX mutexes, the NSLock class or the @synchronized directive.

“Ideally, a programmer should minimize the amount of data shared between threads to the absolute bare minimum,” says Kratz. “It helps with performance and, more importantly, makes the app simpler and easier to understand -- and less liable to errors as a result.”

No Free Power-lunch

As the installed base of mobile multicore devices grows, it doesn’t mean developers can now ignore power consumption. Just the opposite: If multicore enables things that convince more enterprises to increase their usage of smartphones and tablets, then they’re also going to expect these devices to be able to run for an entire workday between charges. So use the extra horsepower efficiently, such as by managing queues.

“You want to use a queue construct that allows anything reading from the queue to wait efficiently for the next item on the queue,” says Kratz. “If there is nothing on the queue, then the threads should basically go idle and use no CPU. If the chosen queue forces you to poll and repeatedly go back and read the queue to see if there is data on it, then that results in CPU effort for no gain, and all you’ve done is use up power and generate heat. If a programmer has no choice but to poll, then I would recommend adding a small sleep between polling attempts where possible, to keep the CPU load down a bit.”

When it comes to power management, many best practices from the single-core world still apply to multicore. For example, a device’s radios -- not just cellular, but GPS and Wi-Fi too -- are among the biggest power-draws. So even though multicore means an app can sit in the background and use a radio -- such as a navigation app constantly updating nearby restaurants and ATMs -- consider whether that’s the most efficient use of battery resources.

“For some apps, like a turn-by-turn navigation app, it makes sense that it wants the highest-resolution location as frequently as possible,” says Kratz. “But for some apps, a more coarse location and far less frequent updates may be sufficient and will help preserve battery.

“There may be times where an app will adapt its use of the GPS, depending on if the device is plugged into power or not. In this case, if you are plugged into an external power source, then the app can dial up the resolution and update frequency. When the device is unplugged, the app could then dial it back, conserving power and still working in a useful fashion.”

Photo Credit:

Taming the Parallel Beast

Many programmers seem to think parallelism is hard. A quick Internet search will yield numerous blogs commenting on the difficulty of writing parallel programs (or parallelizing existing serial code). There do seem to be many challenges for novices. Here’s a representative list:

  • Finding the parallelism. This can be difficult because when we tune code for serial performance, we often use memory in ways that limit the available parallelism. Simple fixes for serial performance often complicate the original algorithm and hide the parallelism that is present.
  • Avoiding the bugs. Certainly, there is a class of bugs such as data races, deadlocks, and other synchronization problems that affect parallel programs, and which serial programs don’t have. And in some senses they are worse, because timing-sensitive bugs are often hard to reproduce -- especially in a debugger.
  • Tuning performance. Serial programmers have to worry about granularity, throughput, cache size, memory bandwidth, and memory locality. But for parallel programs, the programmer also has to consider the parallel overheads and unique problems, like false sharing of cache lines.
  • Ensuring future proofing. Serial programmers don’t worry whether the code they are writing will run well on next year’s processors -- it’s the job of the processor companies to maintain upward compatibility. But parallel programmers need to think about how their code will run on a wide range of machines, including machines with two, four, or even more processors. Software that is tuned for today’s quad-core processors may still be running unchanged on future 16-, 32- or even 64-core machines.
  • Using modern programming methods. Object-oriented programming makes it much less obvious where the program is spending its time.
  • Other reasons that parallel programming is considered hard include the complexity of the effort, insufficient help for developers unfamiliar with the techniques, and a lack of tools for dealing with parallel code. When adding parallelism to existing code, it can also be difficult to make all the changes needed to add parallelism all at once, and to ensure that there is enough testing to eliminate timing-sensitive bugs.

Use Serial Modeling to Evolve Serial Code to Parallel
The key to success in introducing parallelism is to rely on a well-proven programming method called serial modeling. Using serial modeling tools and technique, programmers can achieve parallelization with enhanced performance and without synchronization issues. The essence of the method involves consistently checking and resolving problems, and beginning early in the process to slowly evolve the code from pure serial, to serial but capable of being run in parallel, to truly parallel.

The first step is to measure where the application spends time -- effort spent in hot areas will be effective, while effort spent elsewhere is wasted. The next step is to use a serial modeling tool to evaluate opportunities for potential parallelization and determining what would happen if this code ran in parallel. This kind of tool observes the execution of the program, and uses the serial behavior to predict the performance and bugs that might occur if the program actually executed in parallel.

Checking for problems early in the evolution process, while a program is still serial, ensures that you don’t waste time on parallelization efforts that are doomed because of poor performance. You can then model parallelizations that resolve the performance issues or, if no alternatives are practical, focus your efforts on more profitable locations.

The tool can also model the correctness of the theoretical parallel program, and detect race conditions and other synchronization errors while still running the serial program. Although the program still runs serially, it is easy to debug and test, and it computes the same results. The programmer can change the program to resolve the potential races, and after each change, the program remains a serial program (with annotations) and can be tested and debugged using normal processes.

When the program has fully evolved, the result is a correct serial program with annotations describing a parallelization with known good performance and no synchronization issues. The final step in the process is to convert those annotations to parallel code. After conversion, the parallel program can undergo final tuning and debugging with the other tools. The beast has been tamed.