A set of academic articles recently published on post-exascale supercomputing paint a picture of an HPC landscape that will be fundamentally different from the one we now inhabit. But the writeups avoid one obvious conclusion.
The articles in question were published in Frontiers of Information Technology & Electronic Engineering, which devoted a special issue on the post-exascale topic in October. Specifically, the articles focus on “2020-2030 supercomputing systems that go beyond the existing exascale systems under construction.” The issue was organized by the Chinese Academy of Engineering and many of the resulting perspective papers – 11 in total – were penned by academics from the IT research community in China.
The most forward-looking paper of the bunch describes how we will reach zettascale supercomputing milestone using technologies developed during the upcoming exascale era. The 10 authors of the paper, all of which are associated with China’s National University of Defense Technology (NUDT), define a zettascale supercomputer as one that attains a peak zettaflops (1,000 exaflops or, if you like, 1,000,000 petaflops) for 64-bit floating point operations. That definition is problematic for reasons we’ll get to in a moment. But for the time being, we’ll stick with that characterization.
In addition to the flops requirement, the authors also specify some other expected metrics for a zettascale system, namely:
Power consumption of 100 MW, thus a power efficiency of 10 teraflops/watt
Peak performance per node of 10 petaflops
Communication bandwidth between nodes of 1.6 terabits/second
I/O bandwidth of 10 to 100 petabytes/second
Storage capacity of one zettabyte
Floor space of 1000 square meters
Based on the performance curve of the number one supercomputer on the TOP500 list, the authors predict the first zettascale system will come online in 2035. A decade ago, that date looked a lot closer, but the flattening of the supercomputer performance curve in recent years is pushing these milestones out by years.
Zettaflops aside, the more critical prediction is that after 2025, CMOS advancements, a la Moore’s Law, will reach its limit. That will mark an inflection point for the rise on new processor technologies that are not dependent on complementary metal-oxide-semiconductors. That includes such things as optical computing, quantum computing and biological computing. The authors believe that these technologies will first appear in the role of accelerators or coprocessors.
Prior to the emergence of these more exotic technology, there will be a concerted move toward heterogeneous computing architectures in general, where CMOS-based dedicated HPC accelerators are paired with general-purpose CPUs. As the authors point out, that trend is already well underway, in the form of GPUs, and, to a lesser extent, FPGAs. And although they don’t suggest that dedicated processors or coprocessors for machine learning and AI will necessarily be a part of this, they do make the case that processor designs in general will be forced to incorporate mixed precision arithmetic to target this application area for HPC work.
Another area that get the authors’ attention is 3D integrated circuits. With Moore’s Law slowing, and eventually ending, stacking two-dimensional dies on top of one another offers an alternative way to increase transistor density, which simultaneously increases bandwidth, reduces latency, and improves energy efficiency. This is already being done with DRAM components in the form of high bandwidth memory (HBM) and the hybrid memory cube (HMC).
Using the extra dimension also enables processors and memory to be placed in close proximity to one another, which can go a long way toward solving the memory wall problem. Non-volatile memory can be incorporated into these stacks as well, which adds a hierarchical memory/storage capability to these structures. In addition, the authors believe interconnects can be integrated into these 3D devices, which further improves their performance and efficiency.
Memristors gets a number of mentions as well, not as just a superior form of non-volatile memory, but one that enables storage and computing to take place on the same media. The authors predict that this technology will “enter a practical stage” in the post-exascale era, at which time it could potentially replace traditional DRAM.
Zettascale computing will also need to up its game in the interconnect department. For this, the authors point to advancements in silicon photonics and opto-electrical technology more generally. They believe opto-electrical devices based on photonic crystals and carbon nanotubes will emerge during the next decade and form the basis for more scalable and more balanced supercomputers. The authors foresee system interconnect speeds of 400 gigabits/second and chip throughput in the hundreds of terabits/second.
Where the zettascale concept falls apart is the expected use of these machines for applications other than 64-bit numerical computing. In fact, a common theme throughout the special issue is that supercomputers will splitting their time between traditional simulations and modeling, data analytics, and machine learning/AI.
Only the first of those three relies on 64-bit floating point operations, and even here there is increasing talk of using lesser-precision flops for these workloads. Data analytics and machine learning typically use lower precision floating point arithmetic, and in many cases can get by with fixed point or integer operations. However, they tend to be much more reliant on memory, storage, and interconnect performance than traditional HPC.
The fact that the 200-petaflops Summit supercomputer has already claimed its first “exascale” application using reduced precision math on its GPUs highlights the ambiguity of these peta/exa/zetta labels. For most analytics applications, these metrics are even less applicable. The emergence of quantum, optical, and biological computing would further muddy the waters.
The unstated assumption is that supercomputers will primarily be used to perform numerical simulations and modeling, with data analytics and machine learning as sidelines. Therefore, it could be argued that sticking with the focus on 64-bit flops makes some sense. But while the workload mix in 2018 may be heavily skewed toward simulations, by the time 2035 rolls around, that may no longer be the case.
Machine learning, in particular, could come to dominate most computing domains, including HPC (and even data analytics) over the next decade and a half. While today it’s mostly used as an auxiliary step in traditional scientific computing – for both pre-processing and post-processing simulations, in some cases, like drug discovery, it could conceivably replace simulations altogether. Regardless of how it’s employed, machine learning is reducing the computation that would normally fall to numerical modeling. And given the pace of advancement this field is undergoing, it’s certainly conceivable that by 2035, it will usurp a lot of 64-bit number crunching.
If that comes to pass, not only will the supercomputing hardware of 2035 look very different from that of today, but the workload profiles will be almost unrecognizable. In such a landscape, the zetta designation would make little sense. In fact, in an environment where applications rely on a such wide array of performance characteristics, the milestones themselves may fall by the wayside.