Vendors Scramble to Fix Meltdown and Spectre Vulnerabilities

Two recently uncovered security problems that affect nearly every CPU on the planet have forced companies to issue fixes that could seriously impact performance. While Intel has taken the brunt of bad press, chips supplied by AMD, IBM, and ARM vendors are also affected.

The security vulnerabilities, known as Meltdown and Spectre, became public last week, forcing CPU-makers, operating system developers, and cloud providers to rush a series of patches to market. These patches include processor firmware updates, OS software workarounds, and web browser fixes. The problems were uncovered in June 2017 by researchers at Google Project Zero (GPZ), which then alerted Intel and ARM about them. But last week the news had leaked into the public domain, raising the prospect of hackers using the information to exploit these vulnerabilities.

The problems stem from incomplete support for the separation of kernel (OS) and user (application) address spaces and the use of speculative execution in the processor. In more technical jargon, the GRZ team breaks down the various issues as follows:

Variant 1: bounds check bypass (Spectre)

Variant 2: branch target injection (Spectre)

Variant 3: rogue data cache load (Meltdown)

In the case of Meltdown, these vulnerabilities were determined to be limited to Intel x86 processors and a particular ARM core (Cortex A57) – at least thus far. It could be exploited by malicious applications that could read kernel memory, and thus get access to all sorts of privileged data from other applications or the OS itself. A Linux patch, known as KAISER, is now available as a workaround, but the performance hit can be as much as 30 percent on applications that do many system calls, i.e., that access the operating system a lot.

The Spectre vulnerability exploits a more generic weakness in the way modern processors employ speculative execution and allows malicious software to induce applications to execute incorrect program sequences and then collect confidential data stored in registers and memory. The problem is not only more widespread, affecting x86, ARM and Power processors, but will also be more difficult to fix.

In fact, some believe that most processors will have to be redesigned in this area to completely eliminate the Spectre vulnerability. Other than disabling speculative execution entirely (which would devastate application performance), there is currently no bulletproof workaround to this problem. The saving grace here is that exploiting Spectre is extremely difficult.

Here is a compilation of responses by major vendors:

Intel has already issued updates for most of its processors and expects to issue patches that cover more than 90 percent of its processors by the end of the week. Note than Intel is not claiming to have solved all the problems, only saying the company “will continue to work with its partners and others to address these issues.”

AMD maintains its processors are not susceptible to the Meltdown issue, and says software updates are already available to mitigate at least one of the Spectre issues (Variant 1). Reiterating Intel’s stance, the company says that “[t]otal protection from all possible attacks remains an elusive goal and this latest example shows how effective industry collaboration can be.”

ARM has documented a detailed list of all pertinent software updates that address these two issues across its core architectures. For the more wide-ranging Spectre vulnerability, the company points to various software mitigations involving OS patches, compiler changes, and firmware updates.

IBM says it is readying firmware patches for its POWER7+, POWER8 and POWER9 platforms, which will be available January 9. The Linux OS patches will “start to become available” on January 9 as well, while that process for AIX will begin on February 12. The company notes that the firmware patches should be deployed in concert with the OS patches

It’s not yet know how much these changes will degrade performance. Estimates range anywhere from less than 1 percent to more than 50 percent, depending on application behavior and processor type. It affects both client and servers, but it’s on the server side that users are more concerned about application slowdowns.

That’s apt to be especially true if you’re running an HPC shop. Red Hat estimates only a 2 to 5 percent performance hit for HPC applications, but that’s based on Linpack and SPECcpu206 – two extremely compute-intensive benchmarks that make little use of the OS. For real applications that use a lot of I/O or other system resources, that number is sure to be higher.

Ellexus has put out a white paper that says I/O-intensive HPC workloads could suffer as much as 30 percent performance penalty, based on the KAISER patch. The company suggests profiling your application to optimize this OS interaction would be key to mitigating this performance reduction (note: Ellexus sells I/O profiling tools).

However, it’s worth noting that for HPC facilities isolated from the outside world or even in bare metal cloud setups with really trustworthy firewalls, it might not be worth the performance tradeoff to implement any of these performance-degrading patches. That’s assuming these users can be sure that their own applications, system software stack, and runtime tools don’t contain rogue software.

More than likely, most HPC users will want to play it safe, as is the case with the University of Michigan’s Advance Research Computing Technology Services group. They decided to install the existing Meltdown and Spectre fixes across their systems, and will keep doing so as new patches become available. That’s probably going to be the response at most HPC facilities, save for the ones running the most limited set of applications under the most secure conditions.

The good news is that vendors and users all want the same thing, and given the visibility of the problem, one can expect the affected companies will rapidly deploy solutions, or at least workarounds, until more capable hardware comes along. The bad news is that hardware vulnerabilities that have existed for years took so long to find. Hopefully, such research will not uncover similar weaknesses in other areas of modern architectures.

Current rating: 4.6