The AWS summit London was an impressive display of cloud technologies. The definite bias was towards the monitoring and cost management space in terms of what the third party vendor solutions currently offer. Knowing what you are running and what it is costing you is the biggest part of making the cloud a success. This is true in the general space just as much as in high performance computing (HPC).
The talk that grabbed our attention was “What would you do with a million cores?”. This was an overview of the latest HPC-specific technologies available on AWS, as well as a case study from Astrazeneca on their plans to process 400,000 sequences by the end of 2019.
The first challenge that drove Astrazeneca to the cloud was the limitation of having fixed on-premise resources that didn’t scale with their changing workload demands. They also had to put a lot of work into analytics and orchestration to ensure that they we able to tune their workloads and automatically scale their pipelines.
HPC on AWS is billed as a fundamental rethink about what is possible, so instead of worrying about capex, capacity and technology you can focus on innovation. I would argue that with the cloud you replace a lot of these worries with concerns over opex, scalability and data management, but with the bonus of being able to iterate and tune your architectures quickly to maximise the return on investment.
Arm has been a growing player in HPC for some time and with the new EC2 A1 nodes you can now try your workloads on the Arm processors easily and cheaply. The Arm machines don’t cope very well with large memory applications, which at first thought could rule out a lot of HPC applications. However, many HPC applications use a lot of memory because storage is slow and expensive and throughput is the main driver. In a world limited by cost and scalability, it may well be more efficient to run at larger scales, making use of the new xFS all-flash Lustre file systems, and trade off scale for memory consumption to optimise costs.
Although customers are not yet using the new XFS file system in great numbers, it is likely to be the biggest draw for HPC in the cloud. The ability to quickly stand up a Luster filesystem in the cloud that is populated with the data on demand from S3 object store or via the AWS VPN Direct Connect will make it much easier to migrate applications to the cloud. It will also ease the transition between the traditional on-premise software architecture and newer data-flow centric workflows that are better suited to a cloud environment. The sub-microsecond performance that AWS is boasting for the xFS file system is certainly going to be attractive.