Load balancing HPC shared storage

Solving the noisy neighbour problem

In a compute cluster with shared storage it is possible for a small number of jobs to overload the network or file system. This can affect the performance of all the jobs on the cluster and even bring it down completely. This is called the noisy neighbour problem.

Our products combat the noisy neighbour problem by detecting rogue jobs and throttling I/O at the application level.

Products to load balance shared storage

Mistral

Mistral runs in real time across an entire compute cluster, monitoring application I/O and I/O performance to identify rogue jobs and load balance shared storage.

Ellexus Healthcheck

Simple, immediate checks that analyse dozens of different harmful I/O patterns and give you the top ten worst offenders, as well as the impact they are having on the performance of your application and the scalability.

Customer case study

ARM

Olly Stephens, Engineering Systems Architect at ARM, said of the project to develop Mistral with Ellexus:

We wanted to develop a system that will allow the infrastructure to protect itself somewhat against I/O behaviour that is considered a risk. In particular, we wanted the ability for aggressive use of the storage infrastructure to be automatically detected early and remedial steps taken quickly. Previously this activity was done by the HPC support staff, who were able to monitor and detect issues, but this was a slow and difficult process, primarily due to the lack of available information.

The data and system control provided by Mistral will allow the infrastructure to prevent risky I/O patterns and give us a lot more information to learn from.

Subscribe to our newsletter

Enter your email address to receive notifications of new posts by email.

Contact

Ellexus Ltd
St John’s Innovation Centre,
Cambridge CB4 0WS, UK
Tel: +44 (0)1223 42 16 46 UK
Email: info@ellexus.com