In response to an excellent article by Glenn Lockwood from the Lawrence Berkeley National Lab, Ellexus CEO Rosemary contributed an article on POSIX I/O and exascale to The Next Platform.
Read an extract of the article below and find the full piece on The Next Platform.
First published on The Next Platform: When POSIX I/O meets exascale, do the old rules apply?
By Dr. Rosemary Francis
We’ve all grown up in a world of digital filing cabinets. POSIX I/O has enabled code portability and extraordinary advances in computation, but it is limited by its design and the way it mirrors the paper offices that it has replaced.
The POSIX API and its implementation assumes that we know roughly where our data is, that accessing it is reasonably quick and that all versions of the data are the same. As we move to exascale, we need to let go of this model and embrace a sea of data and a very different way of handling it.
In computer science, it was often said that there are only two ideas in the discipline: abstraction and caching. While this holds for a lot of sophisticated solutions, those ideas break down very quickly when moving to exascale.
In the Next Platform piece, “What’s So Bad about POSIX I/O?” the author gives his expert take on exactly why POSIX I/O is holding back performance today, such as how the semantics of the POSIX I/O standard in particular provide a real limitation. To add to those points, it is indeed high time we come up with a new design. POSIX I/O was designed at a time when storage was local. It was simple to implement systems where applications had a consistent view of what was on disk. In a world of distributed systems and exascale compute, we can no longer rely on the assumption that all programs are looking at the same view of the data.
As systems grow, the time taken to sync data increases and long distributed lock times will become the biggest bottlenecks. We need to move to a world where data tiering and access times form part of the system design from the start.
The design of the I/O needs to take as much engineering effort as the compute algorithms we are used to profiling. A highly optimised application can be dramatically slowed down by bad I/O design and silly mistakes, such as shared log files or small I/O operations. Often the application is fine, but the environment it is deployed into makes the wrong choices. We’ve seen start up scripts try to open every file in the home directory in order to set a licence option.