In an ever-evolving insurance landscape where risk gets more complicated and interconnected, the role risk analytics plays, and will continue to play, is critical in creating and maintaining a competitive edge.

As a result, (re)insurers and brokers need to enhance their analytical capabilities by incorporating the latest science and analytical tools into their risk management workflows. To develop this next generation of risk analytics, a firm’s ability to store and access large data volumes of proprietary data assets becomes critical.

Across many industries, organizations have started exploring data lake technologies to enhance or replace their on-premises data warehouse-based infrastructure and tools.

Data lakes help firms simplify analytics by bringing together large, distinct sources of data under a single storage architecture to extract new insights from structured (relational databases), semi-structured (e.g., claims forms), and unstructured (e.g., aerial photos) data. Data lakes decouple the storage and compute, providing scalability for users and/or administrators to increase or decrease the underlying resource allocation based on demand.

However, multiple studies and research from firms such as Gartner and McKinsey suggest that approximately 80% of data lake initiatives fail to deliver their promised value. There are numerous cases from different industries where organizations spent millions of dollars and multiple years implementing a data lake, but either abandoned the project or struggled to make it fully operational.

Similar cases exist in the catastrophe risk modeling domain. A large broker spent 30 months implementing a data lake into their risk modeling workflows. Twelve months after the project concluded, their catastrophe modelers had not been able to utilize the data lake effectively.

Reasons why data lake implementations fail

Let’s examine three of the most widely quoted and common reasons why data lake projects go wrong and fail to deliver value:

‘Data Swamp’ Syndrome: Data lake technology is very flexible in terms of the data formats and schemas it supports. While traditional data warehouses require data to be structured in relational databases, data lakes allow users to bring data in multiple formats and schemas.

This flexibility provides both benefits and data management challenges. What starts as an organized repository degrades into an unnavigable swamp of poorly documented, inconsistently formatted data. Users lose trust in the data and struggle to locate the appropriate data for the analyses they are conducting. Without a single source of truth, various teams use different data sources when answering similar or related questions, and end up reaching very different conclusions.

‘Technology-First’ Trap: Rather than addressing existing business challenges, many data lake projects initiated look for problems to solve, and are led by IT teams as technology initiatives. These projects focus heavily on technological choices and architecture before identifying and fully understanding business use cases and data requirements. With most initiatives being company-wide, they cause misalignments in business needs, resulting in less optimal solutions.

‘Skills and Change Management’ Misalignments: With many organizations treating data lake implementation as a technology initiative, they aspire to build a data lake that can serve multiple disciplines within the organization.

Data lakes implemented using this approach can result in serving just a small portion of the workforce. Most users struggle to transfer their existing skills and workflows to the data lake. Business leaders can’t interact with the technology to take advantage of insights derived from data lakes. Users must switch between technologies and tools to fully leverage the data lake.

Moody’s Risk Data Lake on the Moody’s Intelligent Risk Platform™ is an applied data lake purpose-built for risk management analytics and workflows, designed to address these challenges. Risk Data Lake isn’t just built on a superior technology stack but also on a superior data and analytics strategy, with governance and organizational alignment.

Rather than reinventing data lake technology, Risk Data Lake builds upon it. By leveraging the existing strengths of data lake architectures, Risk Data Lake enhances core capabilities while addressing common pitfalls that have caused many data lake projects to falter. Instead of creating an entirely new technology, Risk Data Lake integrates proven solutions and augments them to serve the unique needs of risk management workflows.

Moody's Risk Data Lake: Ensuring data lake success

Let’s examine how the capabilities integrated within Risk Data Lake address these challenges and substantially enhance the likelihood of a successful deployment across the entire risk management user base.

Risk Data Catalog

Moody’s Risk Data Lake comes with built-in integration with Moody’s Intelligent Risk Platform (IRP). All data stored in the unified data store within IRP, including exposure data imported by users, model results, accumulation results, etc., is made available in the Risk Data Lake.

To prevent the occurrence of ‘data swamp’ syndrome, all data is cataloged systematically to facilitate efficient search and retrieval, presented in the form of a Risk Data Catalog. The catalog also contains all the metadata from the IRP, linking exposure and model analyses so that users can incorporate appropriate exposure and/or reference data while analyzing model results.

Risk Data Catalog comes in two flavors. For technical users familiar with Moody’s EDM (Exposure Data Module) and RDM (Results Data Module) schemas, the Risk Data Catalog presents data from the IRP in schemas aligned with EDM and RDM.

All the exposure data from IRP is cataloged in tables such as portinfo, accgrp, policy, property, loccvg, etc. Similarly, model result data is cataloged in tables such as portep, policystd, locstats, and so on.