Mistral is a lightweight application monitoring tool for HPC and scientific compute that runs at application level. It monitors I/O, CPU and memory, quickly locating rogue jobs, storage bottlenecks and keeping track of what is running on the clusters day-to-day. By benchmarking storage at the application level you can see the impact of each action on the jobs they run.
Mistral can generate a lot of time series data logging I/O, CPU and memory across many thousands of jobs and compute hosts at scale. Mistral supports pushing its data to a number of time-series databases and dashboarding solutions including ELK stack, InfluxDB and Splunk.
The best way to make it easy to manage the Mistral data is to plan ahead and only take the measurements you need. You can read in our earlier blog how to maximise the return on investment for Mistral data by optimising what you collect:
Many of our customers use Elastic Search with Grafana so we have pulled together some resources for maintaining an Elastic Search database as the amount of Mistral data grows. Sarah Hersh at Elastic put together this useful blog on rolling up old data in Elastic search. This is a great way of collecting detailed per-job data for live triage within the operations team and then to prune down the data collected for long term forecasting:
If you would like to see a Mistral demo or need help tuning and configuring Mistral against Elastic Search or another database then please get in touch.