One of the most versatile aspects of our I/O profiling tool Mistral is that the data it produces can be viewed in the way most convenient for the user. Cheerfully, we have now added support for the Splunk dashboard in Mistral.
Splunk is widely used by organisations in high performance computing to view information about the health of a system. By feeding data from Mistral into Splunk, we are able to provide an even clearer picture.
Read on for information about how the data collected by Mistral can be displayed in various Splunk views.
This view shows an aggregate of all open, create and delete activity under a graph showing measurements for jobs that are doing high levels of reads and writes. Mistral thresholds can be set at zero to collect all I/O data from all jobs or it can be set high to monitor only the jobs that are doing lots of I/O.
In this image we are filtering by host, but you can filter by job, by user, by project or by workflow.
As well as a system overview, Mistral can provide information on individual jobs.
These graphs show meta data and seek data for job 8901. The job seems to have unusually bad I/O patterns because it does a lot of meta data operations and the seeks indicate a lot of random I/O.
Mistral can measure the I/O throughput of reads and writes of different sizes. In this dashboard we can see all writes, with small and large writes displayed separately. In the table below, we can see which jobs have written the most data so it’s easy to see which jobs are causing the spikes in I/O bandwidth.
Our new traffic light report shows how much time is spent doing bad I/O and is another way of identifying problem jobs. By default, the report lists jobs in order of the worst offenders.
Bad I/O includes failed I/O, opening files that are not used and small reads and writes. This is an efficient way to identify which applications would benefit from I/O optimisation or should be run on faster storage.
Mistral can also measure I/O performance in the form of total time spent doing I/O, mean latency of I/O operations and maximum latency. In this way you can measure how I/O bound applications are, as well as the varying performance of the filers.