Video: Discovering rogue jobs with Mistral

How often are users of your compute cluster brought to a stop by someone else’s catastrophe? Which jobs could tip your shared storage over the edge?

Previously, it has been very difficult to identify which jobs are causing havoc across your compute cluster. This is why we developed Mistral.

Mistral is Ellexus’s lightweight I/O profiling tool, designed to run continuously in the background on a compute cluster to allow you to easily identify and even throttle rogue jobs before they become a problem.

Watch our short video to see Mistral in action.

The tool is storage agnostic and works with popular schedulers such as PBS, Slurm, LSF and Univa Grid Engine. In this video, we’ve set up Mistral with an LSF scheduler on one of our clusters and used our plugin architecture to input the data into Elasticsearch. We’ve then used a Grafana frontend to visualise the I/O.

If you would like to learn more about how to identify a rogue job and improve I/O performance across your cluster, get in touch.