Insurance

The majority of data lake projects fail to deliver their promised value: Risk Data Lake is designed to change that

Kirtan Dave

Senior Director, Product Management

In an ever-evolving insurance landscape where risk gets more complicated and interconnected, the role risk analytics plays, and will continue to play, is critical in creating and maintaining a competitive edge. 

As a result, (re)insurers and brokers need to enhance their analytical capabilities by incorporating the latest science and analytical tools into their risk management workflows. To develop this next generation of risk analytics, a firm’s ability to store and access large data volumes of proprietary data assets becomes critical.

Across many industries, organizations have started exploring data lake technologies to enhance or replace their on-premises data warehouse-based infrastructure and tools.

Data lakes help firms simplify analytics by bringing together large, distinct sources of data under a single storage architecture to extract new insights from structured (relational databases), semi-structured (e.g., claims forms), and unstructured (e.g., aerial photos) data. Data lakes decouple the storage and compute, providing scalability for users and/or administrators to increase or decrease the underlying resource allocation based on demand.

However, multiple studies and research from firms such as Gartner and McKinsey suggest that approximately 80% of data lake initiatives fail to deliver their promised value. There are numerous cases from different industries where organizations spent millions of dollars and multiple years implementing a data lake, but either abandoned the project or struggled to make it fully operational.

Similar cases exist in the catastrophe risk modeling domain. A large broker spent 30 months implementing a data lake into their risk modeling workflows. Twelve months after the project concluded, their catastrophe modelers had not been able to utilize the data lake effectively.

 

Reasons why data lake implementations fail

Let’s examine three of the most widely quoted and common reasons why data lake projects go wrong and fail to deliver value:

‘Data Swamp’ Syndrome: Data lake technology is very flexible in terms of the data formats and schemas it supports. While traditional data warehouses require data to be structured in relational databases, data lakes allow users to bring data in multiple formats and schemas.

This flexibility provides both benefits and data management challenges. What starts as an organized repository degrades into an unnavigable swamp of poorly documented, inconsistently formatted data. Users lose trust in the data and struggle to locate the appropriate data for the analyses they are conducting. Without a single source of truth, various teams use different data sources when answering similar or related questions, and end up reaching very different conclusions.

‘Technology-First’ Trap: Rather than addressing existing business challenges, many data lake projects initiated look for problems to solve, and are led by IT teams as technology initiatives. These projects focus heavily on technological choices and architecture before identifying and fully understanding business use cases and data requirements. With most initiatives being company-wide, they cause misalignments in business needs, resulting in less optimal solutions.

‘Skills and Change Management’ Misalignments: With many organizations treating data lake implementation as a technology initiative, they aspire to build a data lake that can serve multiple disciplines within the organization.

Data lakes implemented using this approach can result in serving just a small portion of the workforce. Most users struggle to transfer their existing skills and workflows to the data lake. Business leaders can’t interact with the technology to take advantage of insights derived from data lakes. Users must switch between technologies and tools to fully leverage the data lake.

Moody’s Risk Data Lake on the Moody’s Intelligent Risk Platform™ is an applied data lake purpose-built for risk management analytics and workflows, designed to address these challenges. Risk Data Lake isn’t just built on a superior technology stack but also on a superior data and analytics strategy, with governance and organizational alignment.

Rather than reinventing data lake technology, Risk Data Lake builds upon it. By leveraging the existing strengths of data lake architectures, Risk Data Lake enhances core capabilities while addressing common pitfalls that have caused many data lake projects to falter. Instead of creating an entirely new technology, Risk Data Lake integrates proven solutions and augments them to serve the unique needs of risk management workflows.

 

Moody's Risk Data Lake: Ensuring data lake success

Let’s examine how the capabilities integrated within Risk Data Lake address these challenges and substantially enhance the likelihood of a successful deployment across the entire risk management user base.

 

Risk Data Catalog

Moody’s Risk Data Lake comes with built-in integration with Moody’s Intelligent Risk Platform (IRP). All data stored in the unified data store within IRP, including exposure data imported by users, model results, accumulation results, etc., is made available in the Risk Data Lake.

To prevent the occurrence of ‘data swamp’ syndrome, all data is cataloged systematically to facilitate efficient search and retrieval, presented in the form of a Risk Data Catalog. The catalog also contains all the metadata from the IRP, linking exposure and model analyses so that users can incorporate appropriate exposure and/or reference data while analyzing model results.

Risk Data Catalog comes in two flavors. For technical users familiar with Moody’s EDM (Exposure Data Module) and RDM (Results Data Module) schemas, the Risk Data Catalog presents data from the IRP in schemas aligned with EDM and RDM.

All the exposure data from IRP is cataloged in tables such as portinfo, accgrp, policy, property, loccvg, etc. Similarly, model result data is cataloged in tables such as portep, policystd, locstats, and so on.  

Moody's Risk Data Lake

Figure 1: Risk Data Catalog functionality within Moody's Risk Data Lake

While technical users will want to write their own queries—joining different data tables, including model results and corresponding exposure, to perform analytics, other users might prefer building dashboards using drag-and-drop workflows.

For users who do not want to worry about joining different data tables, making sure appropriate keys have been utilized, or whether joins are efficient and performant, Risk Data Catalog provides a collection of datasets ready to be visualized.

These datasets present the data as logical entities and perform different table joins behind the scenes. For example, a user might want to analyze location AAL (average annual loss) alongside exposure details. The ‘Location AAL’ dataset provides both the model results (AAL) and the corresponding exposure details, including TIV. Users will be able to select the dataset and quickly develop dashboards and visuals.

Moody's Risk Data Lake

Figure 2: Risk Data Lake datasets

 

Built-in reporting engine

Once the data has been organized and cataloged, users can access the scalable infrastructure provided by the Risk Data Lake to analyze their data. Depending on the use case and the level of technical skills, users can select different tools for data analysis.

In some cases, users may prefer to develop dynamic dashboards or pixel-perfect reports via drag-and-drop workflows, rather than writing code to access and join different data sources; for this, Risk Data Lake provides an in-built reporting engine.

A dashboard designer can access ready-to-be-visualized datasets through the Risk Data Catalog to develop dashboards with multiple different visuals. These dashboards can then be published and consumed by a broader team, who can access them through any of the applications deployed in the Intelligent Risk Platform. Users don’t need to switch applications to access the powerful analytics developed through Risk Data Lake.

Moody's Risk Data Lake

Figure 3: Risk Data Lake ready-to-use visuals

 

Programmable Notebooks

For technical users who prefer more advanced analytical tools to access and analyze data, they can use programmable access capabilities for the Risk Data Lake to develop advanced analytics. Users can query the Risk Data Catalog using SQL, a tool most of the industry is very familiar with, and also bring over code that they may have developed over time to be executed in the Risk Data Lake. Because the Risk Data Catalog presents the data in an EDM-RDM-like schema, existing SQL scripts will work after a few modifications.

Users who want to utilize programmable languages such as Python and R can use the Programmable Notebook functionality within Risk Data Lake to access the same underlying Risk Data Catalog and develop analytics using these advanced tools.

Risk Data Lake covers the entire spectrum of technical skill levels and use cases, from providing drag-and-drop reporting functionality to SQL and programmable access to underlying data. This is all in one application, using the same underlying data, and all cataloged with business context. This unique capability simplifies change management and increases adoption for a broader user base with varying degrees of technical skills.

 

Risk Library

One of the reasons organizations struggle with data lake adoption is that an empty data lake, presented as a blank slate, is extremely hard to operationalize. The Risk Data Lake, with its built-in integration with Intelligent Risk Platform, hydrates the lake with all the data from IRP and catalogs it for easy access.

In addition, the Risk Data Lake has a library of analytical assets that acts as a great jumping-off point for users. Assets include sample dashboards, SQL scripts, and Notebook code designed for specific risk management workflows and analyses.

For example, a common use case in risk management is comparing two model results side-by-side. Instead of the user building a dashboard from scratch, the Risk Data Lake provides a sample dashboard that the user can work with and modify as needed.

Various teams within Moody’s - Insurance Solutions, such as Model Support, Model Product Management, Analytical Services, etc., have been working with Risk Data Lake for the last year and have developed a library of analytical assets.

These Risk Data Lake assets will cover use cases such as model validation, year-over-year comparison, market share analysis, event response, and more. With these assets, organizations can operationalize their Risk Data Lake in a matter of days, rather than years, and start extracting analytical value.

Moody's Risk Data Lake

Figure 4: Risk Data Lake analytical asset library example

 

Conclusion: Applied data lakes

Within the risk management sector, numerous insurers, reinsurers, and brokers are increasingly investigating data lake technology to enhance their analytical performance. Typically, these efforts originate at the organizational level, with risk management teams aiming to leverage their benefits; however, the successful implementation and adoption of data lakes remains limited.

To address this, risk management teams require purpose-built data lakes that deliver organized and structured data aligned with their analytical processes, seamlessly integrate with workflow requirements, and support targeted use cases.

Moody’s Risk Data Lake offers the advantages of conventional data lakes while incorporating specialized features tailored for risk management, serving as a valuable complement to broader organizational data lake strategies.

Read Kirtan Dave's blog, 'Introducing Moody's Risk Data Lake: A new era for risk analytics' here.


LEARN MORE

Moody's insurance solutions

Our differentiated solutions bring together technology, data and analytics and insights, helping insurers, reinsurers, and brokers address their most complex challenges and make better decisions with confidence – therefore helping to close the insurance gap and drive performance.