Intelligent Data Centres Issue 81

F E A T U R E interruption, and long-term goals stay on track. The same scenario now applies across AI development and integration projects, but on a much broader scale.

Durability also helps explain why so few AI projects reach production. Given that under a third of enterprises have integrated data silos well enough to support Gen AI, it’ s hardly surprising that only 48 % of AI projects ever make it into production and that 65 % of Chief Data Officers say this year’ s AI goals are unachievable, with almost all( 98 %) reporting major data-quality incidents. Failures of storage reliability, data quality and resilience are at the heart of this shortfall.

So where does this leave organisations that have staked so much on the success of their AI strategies?

At a foundational level, meeting the storage performance and reliability requirements to deliver AI tools that work as intended depends on using technologies that go beyond traditional performance levels. Hybrid systems that integrate SSD speed with HDD capacity, or all-flash systems for latency-sensitive workloads, both have a role to play. What matters is not the choice of medium alone, but the measures taken to ensure durability – continuous monitoring, automated integrity checks and regular recovery testing.

Success also depends on embedding resilience into the core of AI operations. At a technology level, Multi-Level Erasure Coding( MLEC) provides greater fault tolerance than traditional RAID by offering protection against multiple simultaneous failures. For those handling petabytescale datasets, combining MLEC with a hybrid architecture can provide an optimal balance. Where real-time access is critical, all-flash systems deliver the lowest latency, albeit with the associated higher cost.

Operational measures are equally important. Automated data integrity checks can detect and isolate corruption before it enters the training pipeline. Regularly scheduled recovery drills, designed to simulate realistic fault conditions, ensure that restoration processes can be executed within the tight timeframes AI production demands. By aligning these measures with data governance and compliance frameworks, organisations can minimise both technical risk and regulatory exposure.

Looking ahead, AI workloads will continue to scale, and so will the storage systems that support them. Ideally, architectures should be modular, enabling capacity or performance components to be added without wholesale replacement. Here, vendorneutral solutions help to avoid lock-in, ensuring that infrastructure can adapt to new technologies such as higher-density storage media or more advanced faulttolerance requirements.

Ken Claffey, CEO, VDURA

To minimise risk, scalability should always be planned with an eye on both data growth and workload evolution. This includes anticipating the arrival of more complex AI models and use cases, both of which may change performance priorities.

Without the right technologies in place, however, we’ ll undoubtedly see more headlines around the failure of AI investments to deliver.

Get it right and organisations can look forward to a win-win scenario where storage performance and reliability support our increasing reliance on AI. �

Intelligent Data Centres Issue 81 | Page 39