White paper: Meltdown and Spectre make measuring I/O vital

Meltdown and Spectre are quickly becoming household names and not just in the HPC space. The severe design flaws in Intel microprocessors that could allow sensitive data to be stolen will have a huge impact on the semiconductor space and beyond.

Read details of the “mega-gaffe” in this article published on The Register. In short, though, security has been compromised in favour of performance and has now come back to bite us.

What happens next

The impact of these mistakes will have wide-reaching effects. Putting the actual flaws aside, the changes that are being imposed on the Linux kernel to more securely separate user and kernel space are causing additional overhead to context switches, which are having a measurable impact on the performance of shared file systems and I/O intensive applications. This is particularly noticeable in I/O heavy workloads where a performance penalty could reach 10-30%.

Can you afford to lose a third of your compute real estate? Systems that were previously just about coping with I/O heavy workloads could now be in real trouble. The knock-on effect of the Meltdown and Spectre revelations will be felt far more widely than core design flows.

You don’t have to put up with poor performance in order to improve security, however. The most obvious way to mitigate performance losses is to profile I/O and identify ways to optimise applications’ I/O performance.

We have put together a white paper giving details on the bugs, the impact on performance and what you can do now to mitigate this.

  Download the paper in full.

Profile application I/O to rescue lost performance

By using our tool suites, Breeze and Mistral, to analyse workflows it is possible to identify changes that will help to eliminate bad I/O and regain the performance lost to these security patches.

Step one is to identify the workflows that will benefit most from optimisation. In some cases, the candidates will be obvious – a workflow that clearly stresses the file system every time it is run, for example, or one that runs for significantly longer than a typical task. In others it may be necessary to perform an initial high-level analysis of each job.

We do this using Mistral, our I/O profiling tool which is lightweight enough to run at scale. In this case Mistral would be set up to record relatively detailed information on the type of I/O that workflows are performing over time. It would look for factors such as how many meta data operations are being performed, the number of small I/O and so on.

Once the candidate workflows have been identified they can be analysed in detail with Breeze. As a first step, the Breeze trace can be run through our Healthcheck tool that identifies common issues such as an application that has a high ratio of file opens to writes or a badly configured $PATH causing the file system to be trawled every time a workflow uses “grep”:

 

Once the common issues have been addressed the trace can be looked at in detail in Breeze to identify workflow specific behaviours, using the tool’s I/O Stats and Event views:

Get in touch if you’d like to learn more about how the I/O profiling tools from Ellexus can improve system performance by up to 30%.