Intelligent Data Centres Issue 04 | Page 36

FEATURE warehouse, data lake, streaming analytics and AI clusters. A data warehouse requires massive throughput. Data lakes deliver scale- out architecture for storage. Streaming analytics go beyond batched jobs in a data lake, requiring storage to deliver multi-dimensional performance regardless of data size (small or large) or I/O type (random or sequential). James Petter, VP EMEA, Pure Storage But acknowledging the importance of data and putting data to work are two separate things. To put the latter in perspective, a recent study conducted by Baidu showed its dataset needed to increase by a factor of 10 million in order to lower its language model’s error rate from 4.5 to 3.4%. That’s 10,000,000x more data for 1% of progress. All this research points to one thing – to innovate and survive in a business environment that is increasingly data- driven, organisations must design their IT infrastructure with data in mind and have complete, real-time access to that data. Unfortunately, mainstream storage solutions were designed for the world of disk and have historically helped create silos of data. There are four classes of silos in the world of modern analytics – data MODERN INTELLIGENCE REQUIRES A DATA HUB – AN ARCHITECTURE DESIGNED NOT ONLY TO STORE DATA, BUT TO UNIFY, SHARE AND DELIVER DATA. 36 Issue 04 Finally, AI clusters, powered by tens of thousands of GPU cores, require storage to also be massively parallel, servicing thousands of clients and billions of objects without data bottlenecks. As a consequence, too much data today remains stuck in a complex sprawl of silos. Each is useful for its original task, but in a data-first world, silos are counter- productive. Silos mean organisational data can’t do work for the business, unless it is being actively managed. Modern intelligence requires a data hub – an architecture designed not only to store data, but to unify, share and deliver data. Unifying and sharing data means that the same data can be accessed by multiple applications at the same time with full data integrity. Delivering data means each application has the full performance of data access that it requires, at the speed of today’s business. Data hub is a data-centric architecture for storage that powers data analytics and AI. Its architecture is built on four foundational elements: • • High-throughput for both file and object storage. Backup and data warehouse appliances require massive throughput for mainstream, file-based workloads and cloud-native, object- based applications. • • True scale-out design. The power of data lake is its native, scale-out architecture, which allows batch jobs to scale limitlessly as software – not the user – manages resiliency and performance. • • Multi-dimensional performance. Data is unpredictable and can arrive at any speed – therefore, organisations need a platform that can process any data type with any access pattern. • • Massively parallel. Within the computing industry, there has been a drastic shift from serial to parallel technologies, built to mimic the human brain and storage must keep pace. A true data hub must have all four qualities as all are essential to unifying www.intelligentdatacentres.com