EDITOR’S QUESTION
KEVIN DEIERLING,
VP OF MARKETING,
NVIDIA
dvanced AI applications
are becoming
A
commonplace across
cloud, enterprise and
Edge, driving massive
compute and data
requirements and making data centre
resiliency more critical than ever.
Data centre resilience is achieved by
adopting a cloud-native architecture,
where applications are broken down
into small, distributed microservices
which are assembled – or composed
– into scalable applications as needed
and on-demand. Such cloud-native
applications are far more resilient than
apps developed as giant monolithic
code behemoths, because the small,
cooperating microservices dynamically
come and go. These microservices
are implemented within
containers, so they
are easy to launch
or update and
the application
can quickly
scale across
hundreds and
even thousands
of nodes. Resilience
to failure is a huge
additional benefit of this
cloud-native architecture,
because the distributed
application is designed to
accommodate containers
that come and go, whether
intentionally or not. So failures of
individual containers or entire servers
are expected and accommodated by
design and the microservices are quickly
replaced by new containers running on
different servers.
Accelerated computing using GPUs
and an intelligent network are critical
elements needed to build this resilient,
distributed cloud-native data centre. A
good example is the NVIDIA’s accelerated
computing GPUs for AI applications, that
deliver faster and more efficient natural
language processing, Big Data analytics,
task automation and recommendation
engines for both consumers and IT
staff. GPU-powered AI can recognise
anomalies or problematic trends in power
consumption, storage usage, network
traffic, hardware reliability, or response
time to let data centre professionals
prevent outages or resource shortages.
It can also recognise and stop security
threats or intrusions more quickly. The
AI acceleration is complemented by the
intelligent NVIDIA networking switches,
sSmartNICs and Data Processing Units
(DPUs) from the Mellanox acquisition. The
SmartNICs offload SDN, virtualisation (for
networking containers), data movement
and encryption tasks from the CPUs. This
allows applications to run more quickly
while using fewer CPUs and servers, and
also simplifies connecting new or moved
containers with their microservices.
The DPUs provide security isolation, a
distributed software-defined, hardwareaccelerated
data and control plane, and
storage virtualisation to servers and
containers, making it faster and easier
to spin up or spin down microservices
with all the needed security protections
and just the right amount of shared
storage. Additionally, intelligent, opennetworking
switches provide multiple
high-bandwidth paths between servers
to avoid bottlenecks or outages due
to congestions or broken links. The
switches also provide programmable
fabric automation and smart telemetry
across the network, increasing resiliency
and simplifying the management of
composable microservices. This entire
accelerated AI computing stack and
cloud-native fabric are fully integrated
within a Kubernetes container
orchestration platform that is at the heart
of achieving resilience and scale in nextgeneration
data centres. ◊
RESILIENCE
TO FAILURE
IS A HUGE
ADDITIONAL
BENEFIT OF THIS
CLOUD-NATIVE
ARCHITECTURE,
BECAUSE THE
DISTRIBUTED
APPLICATION
IS DESIGNED TO
ACCOMMODATE
CONTAINERS
THAT COME
AND GO.
32 Issue 18
www.intelligentdatacentres.com