FEATURE
warehouse, data lake, streaming analytics
and AI clusters.
A data warehouse requires massive
throughput. Data lakes deliver scale-
out architecture for storage. Streaming
analytics go beyond batched jobs in a
data lake, requiring storage to deliver
multi-dimensional performance regardless
of data size (small or large) or I/O type
(random or sequential).
James Petter, VP EMEA, Pure Storage
But acknowledging the importance of
data and putting data to work are two
separate things. To put the latter in
perspective, a recent study conducted
by Baidu showed its dataset needed to
increase by a factor of 10 million in order
to lower its language model’s error rate
from 4.5 to 3.4%. That’s 10,000,000x
more data for 1% of progress.
All this research points to one thing –
to innovate and survive in a business
environment that is increasingly data-
driven, organisations must design their IT
infrastructure with data in mind and have
complete, real-time access to that data.
Unfortunately, mainstream storage
solutions were designed for the world of
disk and have historically helped create
silos of data. There are four classes of silos
in the world of modern analytics – data
MODERN
INTELLIGENCE
REQUIRES A
DATA HUB – AN
ARCHITECTURE
DESIGNED NOT
ONLY TO STORE
DATA, BUT TO
UNIFY, SHARE AND
DELIVER DATA.
36
Issue 04
Finally, AI clusters, powered by tens of
thousands of GPU cores, require storage
to also be massively parallel, servicing
thousands of clients and billions of objects
without data bottlenecks.
As a consequence, too much data today
remains stuck in a complex sprawl of
silos. Each is useful for its original task,
but in a data-first world, silos are counter-
productive. Silos mean organisational data
can’t do work for the business, unless it is
being actively managed.
Modern intelligence requires a data hub –
an architecture designed not only to store
data, but to unify, share and deliver data.
Unifying and sharing data means that the
same data can be accessed by multiple
applications at the same time with full
data integrity. Delivering data means each
application has the full performance of
data access that it requires, at the speed of
today’s business.
Data hub is a data-centric architecture for
storage that powers data analytics and AI.
Its architecture is built on four
foundational elements:
• • High-throughput for both file
and object storage. Backup and data
warehouse appliances require massive
throughput for mainstream, file-based
workloads and cloud-native, object-
based applications.
• • True scale-out design. The
power of data lake is its native,
scale-out architecture, which allows
batch jobs to scale limitlessly as
software – not the user – manages
resiliency and performance.
• • Multi-dimensional performance.
Data is unpredictable and can arrive at
any speed – therefore, organisations
need a platform that can process any
data type with any access pattern.
• • Massively parallel. Within the
computing industry, there has been
a drastic shift from serial to parallel
technologies, built to mimic the human
brain and storage must keep pace.
A true data hub must have all four
qualities as all are essential to unifying
www.intelligentdatacentres.com