The exascale challenge

Data is growing out of control. While the amount of information within our reach overflows, organisations involved in high performance computing (HPC) – or cyber physical computing, for that matter, or low-power– are not growing their data centres sensibly or affordably. As the race to reach exascale ploughs on, organisations are attempting to scale their architecture so rapidly that they can’t create workflows that are sustainable. What we need are intelligent solutions that manage this growth.

Most organisations in HPC are currently finding that their data clusters are necessarily growing much faster than anticipated. Take CERN, for example; the compute power surrounding the Large Hadron Collider is growing by 50 times are year and is expected to reach 400PB/year by 2023. On top of that, they have such a high churn of people coming and going, accessing their data hourly, daily, that if they don’t keep their storage architecture fluid they will struggle to cope.

The bioinformatics industry, those carrying out essential cancer research, have the same challenge; mapping genomes takes a phenomenal amount of space which is only going to grow. Science aside, the list of industries relying on the efficient handling of data is endless; finance, oil and gas, post-production, chip design, weather forecasting, aerospace to name a few.

More than just HPC

This isn’t just a problem for HPC; there are countless organisations that don’t identify as HPC that have rapidly growing amounts of data to handle – how will they cope? The internet of things is throwing more and more data into clusters; is this sustainable? At Ellexus, our I/O profiling tools have shown countless organisations how untrained users or mistakes in applications can fundamentally slow down performance of a data cluster. We are yet to come across a company that does not have I/O problems, created by inefficient storage architecture.

A cluster that cannot cope is a difficult beast to handle. Inexperienced users can submit rogue jobs that can break down the cluster completely, halting the work of everyone else and wasting millions in lost engineering and research time. Using the wrong type of storage or assuming the hardware is the problem and spending even more money on bare metal can be equally as wasteful.

There are those who believe that moving storage to the cloud will save IT staffing and infrastructure costs while streamlining their architecture; this is not the case. You will need the same level of infrastructure for a cloud-based system as in-house. You can reduce hardware costs, but only if you run your applications efficiently. You can reduce software costs by using open source software stacks, but only if they work for your requirements and only if you have the staff and expertise to support that.

Partnership approach

If we are to handle data cluster growth effectively the entire computing industry needs to come together to create new solutions. Handling growth will take cooperation between each layer of storage, from the hardware through storage software and file systems right to software that analyses and monitors data use.

If we don’t come up with sensible, scaleable solutions, we are never going to cure cancer or create the next Large Hadron Collider. We have a huge amount of power at our fingertips, which is growing daily, but if our fingers continue to get burnt by our own systems we will never move forward.