Intelligent Data Centres Issue 08 | Page 51

Alibaba unveils congestion control mechanism for ultra- high-speed data centres A libaba Group has announced a self-developed data traffic control mechanism named HPCC, standing for High Precision Congestion Control. The technology will provide data transmission with ultra-low latency, high bandwidth and high stability. This is an important milestone towards ultra-high- speed data centres that are vital to unlock the potential of AI and IoT. Researchers from Alibaba have proven through testbed experiments and large- scale simulations that HPCC reacts faster to available bandwidth and congestion compared with other alternatives, while maintaining close-to-zero queues. In the simulations for under 50% traffic load, tuning. The new HPCC is set to address this gap. The researchers found that the fundamental cause of such limitations in the existing solutions is the lack of fine- grained network load information in legacy networks. However, this status quo has recently altered with the availability of new In-band Network Telemetry (INT) features. HPCC shortens flow completion times by up to 95%, causing little congestion even under large-scale incasts. Faster hardware alone is not sufficient to lead to ultra-high-speed networking in www.intelligentdatacentres.com From years of experience operating large- scale and high-speed RDMA networks, Alibaba reports several inherent limitations in the existing congestion control solutions available including slow convergence, unavoidable packet queueing and complicated parameter Alibaba’s researchers have proposed HPCC as it has the potential to leverage INT to obtain link load information and controls traffic with high levels of precision. By addressing challenges such as delayed INT information during congestion or overreaction to INT information, HPCC can quickly utilise free bandwidth to avoid this issue and can maintain near-zero in- network queues for ultra-low latency. HPCC has the added advantage of needing only three parameters to configure and is also easy to deploy in hardware. ◊ Issue 08 51 The paper discusses how the performance of modern data centre networks is essential to the service quality of cloud and that faster networking speed can significantly improve the experience of cloud users. Driven by the need for faster networks, the silicon industry has successfully increased the link speed in data centres from 1Gbps to 100Gbps in the past decade, a growth rate that continues to outpace Moore’s Law. data centres. An additional consideration is that a higher link speed can also be harmful to the network stability, because congestion is more likely with faster senders who are able to simultaneously transfer data on the network. Congestion is also harmful to the network latency experienced by applications because of queuing delays and potential packet loss. In particular, new switching ASICs are able to obtain fine-grained network load information and use this to provide a type of congestion control mechanism that can deliver a steady and ultra-high- speed network.