iIntroduction

AI training and inference servers use accelerators and processors with high thermal design power( TDP). Air-cooling these chips becomes less practical when considering heat sink dimensions, server airflow and energy efficiency, forcing a transition to liquid-cooling. Liquid cooling servers offer benefits including improved accelerator reliability & performance, increased energy efficiency, reduced water usage, and reduced sound level.

There are two main categories of liquid cooling for AI servers – direct-to-chip and immersion. There are slight differences in the heat rejection ecosystem that we will cover. Data center operators and IT Managers unfamiliar with deploying liquid- cooled servers will need to answer a few questions:

How do I get cold water in and hot water out?

• How do I get cold water in and hot water out?

• What is a CDU, and do I need one?

• What steps do I take to select an appropriate liquid cooling heat rejection architecture?

There are three elements( i. e., heat capture within the server, CDU type, and method of rejecting heat to the outdoors) in a liquid cooling ecosystem. A CDU is a system used to isolate the IT fluid loop from the rest of the cooling system and is necessary to provide five key functions

( i. e., temperature control, flow control, pressure control, fluid treatment, heat exchange and isolation). There are six common liquid cooling architectures each with advantages, disadvantages, and when to implement as shown in Table 1. �

DOWNLOAD WHITEPAPER

PRESENTED BY

Intelligent Data Centres Issue 85 | Page 21

NAVIGATING LIQUID COOLING ARCHITECTURES FOR DATA CENTERS WITH AI WORKLOADS:

ENERGY MANAGEMENT RESEARCH CENTER

iIntroduction