Today's data centers consume between 1% and 3% of all electricity worldwide. Over 80% (Reference 1) of this electricity is currently generated by burning fossil fuels, and electricity generation is the largest source of greenhouse gas emissions worldwide. However, data centers continue to expand as new services are constantly being offered to consumers and organizations. Advanced computing technologies that include CPU level generational enhancements, heterogeneous computing, and faster storage and networking enable more complex analytics and simulations to be brought into mainstream workloads.
Several new hardware architectures have appeared as AI moves from a research topic into corporate workflows to enhance business goals. AI training and inference are significantly sped up using these innovative designs compared to traditional CPU-only applications. With performance in the petaflop range for specific applications, such systems still require a host CPU and generate tremendous amounts of heat.
The search for a COVID-19 vaccine and related research into the global pandemic has brought to the forefront the role that HPC technologies can play in critical health care research. United States National Labs have been involved in several research projects, lending their HPC expertise to finding a cure and understanding mutations. One example is the Lawrence Livermore National Lab (LLNL) Ruby cluster, which is being used for various research challenges. Ruby is being used for unclassified research that includes neutron imaging radiograph and fusion research. Other work being done on Ruby is as varied as asteroid detection, moon formation, and high-fidelity fission processes. On the health care front, Ruby is being used to search for therapeutic drugs and designer antibodies to fight SARS-CoV-2. The system is comprised of Supermicro TwinPro® nodes in 26 racks for a total of 1,512 nodes using 3rd Gen Intel® Xeon® processors with built-in AI acceleration.
While large HPC systems can be built up from individual servers, the power consumption of a sizeable multi-rack installation can be significant. The environmental impact of such large data centers can be measured in megawatt-hours and then translated into the amount of CO2 released, based on the power plants used to generate the consumed electricity. Data center operators can make several choices to reduce the environmental impact of HPC (or enterprise) data centers.
Renewable purchase power – Many utility companies make available the choice to choose where the electricity that is generated comes from. With about 80% of the worldwide electricity generated through fossil fuels, this choice can have an enormous impact on the environmental effects from the operation of large data centers.
Efficient data center organization – By keeping hot air generated by the servers away from the colder air that is drawn over the CPUs and other components, more efficient cooling will result. Cold aisles and hot aisles need to contain their air. The most efficient way to keep the cold and hot air separate is to enclose either of these aisles with containment solutions. By implementing distinct hot and cold aisles, data centers can increase the temperate of their inlet air.
Use liquid cooling when appropriate – For systems that are running a high percentage of the time and are high wattage TDP CPUs, liquid cooling may be an option. Since liquid cooling is about 1,000X more efficient than air cooling (liquid molecules are closer together), this cooling technique will result in cost savings over time due to less power used.
Investigate server power consumption at workload profiles – Although most HPC servers today use the same underlying CPUs and GPUs, differences in the mechanical design affect power consumption. Using advanced mechanical and modeling techniques, optimal airflow over the components can be modeled, resulting in lower fan speeds and less overheating of the CPUs and GPUs.
Select the right-sized system – With various form factors to choose from containing different CPU, GPU, memory, and storage capabilities, choices must be made based on workload and time to result requirements. Systems that can share larger fans typically result in lower power usage, as larger fans will not have to work as hard as smaller fans to keep the servers cool.
Choose more efficient CPUs – With recent announcements from the major CPU suppliers, system vendors offer new servers with the latest CPUs. On a performance per watt measurement, these new systems are faster for HPC applications due to faster clock rates, more cores, and faster bus speeds, and they perform more work (think applications) per watt of electricity.
Don't forget associated hardware – Persistent memory can significantly affect performance when used in AppDirect mode. This allows developers to use the persistent memory as a cache for data, although not as fast as DRAM, but an order of magnitude faster than storage devices.
Supermicro offers a wide range of servers for many applications. In the HPC domain, Supermicro offers the latest 3rd Gen Intel Xeon Scalable processors and 3rd Gen AMD EPYC™ processors in different form factors. With the densest CPU and GPU models on the market, Supermicro can deliver a higher total GHz per square foot of data center space. In addition, the environmental impact of Supermicro servers is reduced due to innovative mechanical designs. Liquid cooling is also an option for data centers that require the densest rack installations, providing cooling for very high-performance configurations.