Intelligent Data Centres Issue 18

EDITOR’S QUESTION KEVIN DEIERLING, VP OF MARKETING, NVIDIA dvanced AI applications are becoming A commonplace across cloud, enterprise and Edge, driving massive compute and data requirements and making data centre resiliency more critical than ever. Data centre resilience is achieved by adopting a cloud-native architecture, where applications are broken down into small, distributed microservices which are assembled – or composed – into scalable applications as needed and on-demand. Such cloud-native applications are far more resilient than apps developed as giant monolithic code behemoths, because the small, cooperating microservices dynamically come and go. These microservices are implemented within containers, so they are easy to launch or update and the application can quickly scale across hundreds and even thousands of nodes. Resilience to failure is a huge additional benefit of this cloud-native architecture, because the distributed application is designed to accommodate containers that come and go, whether intentionally or not. So failures of individual containers or entire servers are expected and accommodated by design and the microservices are quickly replaced by new containers running on different servers. Accelerated computing using GPUs and an intelligent network are critical elements needed to build this resilient, distributed cloud-native data centre. A good example is the NVIDIA’s accelerated computing GPUs for AI applications, that deliver faster and more efficient natural language processing, Big Data analytics, task automation and recommendation engines for both consumers and IT staff. GPU-powered AI can recognise anomalies or problematic trends in power consumption, storage usage, network traffic, hardware reliability, or response time to let data centre professionals prevent outages or resource shortages. It can also recognise and stop security threats or intrusions more quickly. The AI acceleration is complemented by the intelligent NVIDIA networking switches, sSmartNICs and Data Processing Units (DPUs) from the Mellanox acquisition. The SmartNICs offload SDN, virtualisation (for networking containers), data movement and encryption tasks from the CPUs. This allows applications to run more quickly while using fewer CPUs and servers, and also simplifies connecting new or moved containers with their microservices. The DPUs provide security isolation, a distributed software-defined, hardwareaccelerated data and control plane, and storage virtualisation to servers and containers, making it faster and easier to spin up or spin down microservices with all the needed security protections and just the right amount of shared storage. Additionally, intelligent, opennetworking switches provide multiple high-bandwidth paths between servers to avoid bottlenecks or outages due to congestions or broken links. The switches also provide programmable fabric automation and smart telemetry across the network, increasing resiliency and simplifying the management of composable microservices. This entire accelerated AI computing stack and cloud-native fabric are fully integrated within a Kubernetes container orchestration platform that is at the heart of achieving resilience and scale in nextgeneration data centres. ◊ RESILIENCE TO FAILURE IS A HUGE ADDITIONAL BENEFIT OF THIS CLOUD-NATIVE ARCHITECTURE, BECAUSE THE DISTRIBUTED APPLICATION IS DESIGNED TO ACCOMMODATE CONTAINERS THAT COME AND GO. 32 Issue 18 www.intelligentdatacentres.com

Intelligent Data Centres Issue 18 | Page 32