Author: Iuliia Shustikova, Product Manager - Australia Bushfire HD Model, Moody's
Catastrophe model validation involves a well-understood set of analytical tasks: examining exceedance probability (EP) curves, comparing model output to historical loss experience, characterising event sets by region and physical attributes, and testing sensitivity to key assumptions.
The methods are well-established, and the main questions are known. What varies is how long it takes to get from question to answer.
For most teams, the bottleneck is not the analysis itself but the data handling around it. Model outputs need to be extracted, reformatted, loaded into local tools, processed, and charted.
When a run produces a new result, that cycle repeats. When a new question emerges mid-validation, the same steps run again. Across dozens of simulation runs, this overhead accumulates into weeks of elapsed time that have nothing to do with the quality of the analysis.
Moody's Risk Data Lake (RDL) and the new SQL query execution capabilities coming to market in June 2026 on Moody’s Intelligent Risk Platform™ (IRP) address this directly.
With the Moody’s RMS™ risk models running on the platform's cloud infrastructure, outputs land directly in structured, queryable tables, largely removing the data export step, local processing, and manual assembly. The analytical work stays the same. The overhead largely disappears.
This blog describes how the RDL helps with model validation, what the query framework covers, and what it means in practice for teams running model validation, using Moody’s RMS Australia Bushfire HD catastrophe model as an example.
What the Risk Data Lake provides
The Moody's Risk Data Lake provides some key ingredients for model validation. In this case, when you run the Australia Bushfire HD model on the IRP, three components are important to the exercise:
- Industry Exposure Database (IED): A standardized Australia-wide exposure set, ready to run without portfolio preparation. Useful for an initial model assessment and can be easily substituted by your own IED or exposure portfolio.
- Event Information Table: Physical characteristics for every event in the stochastic catalogue—ignition location, wind speed, burn area, drought index, climate state, etc., all delivered as part of the Australia Bushfire HD model package on IRP and directly joinable to loss outputs.
- Risk Data Lake SQL Query Engine: The new SQL Engine for the RDL allows querying data such as EP curves, event losses, regional breakdowns, and analysis metadata organized into tables by the RDL catalog.
The key structural feature is a single Exposure Data Module (EDM) identifier that links every table. Set it once in an analysis, and the RDL metadata tables automatically resolve all associated analyses, model configurations, and results.
The same query works across any run. Change the identifier—to a different EDM, another analysis, or a different model version—and the entire framework reruns against the new data without modification.
Why do validation teams run more simulations than they plan to?
A structured validation typically involves more runs than an initial plan suggests, such as:
- Testing a parameter change requires a baseline and a variant.
- Checking whether a finding holds at a different geographic resolution requires another cut.
- Confirming that a model component behaves as expected under an alternative exposure requires another run.
- Each finding tends to generate follow-up questions.
In a local workflow, without the Risk Data Lake, each new simulation means another export-process-chart cycle.
The data handling cost per run is roughly constant, and it adds up. Teams end up rationing their simulations, running fewer tests than they would ideally run, because the overhead of each additional run is real. Or the validation ends up with iterations taking months to isolate some specific metrics and understand how the model is performing on their books.
With the RDL, that overhead is close to zero. A new simulation is immediately queryable alongside every previous one. Comparing a model run with e.g., urban conflagration to a model run without urban conflagration is one query. Checking how a parameter change shifts the EP curve takes minutes.
Figure 2: Moody's Risk Data Lake - Catalog Explorer
The practical impact is that teams run more iterations as they see analytics being produced much faster than before. They follow more leads, and catch more insights—not because they are working harder, but because the cost of each additional question has dropped.
The nine-question framework
Obviously, model validation covers more ground than nine questions. But there is a consistent core set of questions that come up in almost every engagement, regardless of portfolio, region, or regulatory context. The query framework automates the most common ones:
# | Question | What does it tell you |
1 | Where is the risk concentrated? | Geographic annual average loss (AAL) distribution |
2 | How does urban conflagration (UC) scale with severity? | UC contribution by return period |
3 | Do the same regions drive risk at practical return periods? | Regional loss drivers at key return periods |
4 | What do the largest events look like physically? | Catastrophic event characteristics |
5 | When does risk peak? | Seasonality analysis |
6 | Does the model match observed reality? | Historical validation (OEP and AEP) |
7 | Where are the postcode-level concentrations? | Accumulation hotspots |
8 | How do bushfires and floods interact in the same book? | Multi-peril comparison |
9 | What does the loss distribution look like at a granular level? | Claim frequency/severity distribution |
Automating these recurring tasks frees up capacity for company-specific analysis that cannot be templated, such as understanding how the model interacts with a particular portfolio's geographic concentration, or how specific underwriting assumptions affect the modeled loss profile.
Additionally, this framework offers helpful guidance for validation teams building their capabilities and outlines a structure for how to approach validation and the questions they need to ask internally.
Where cloud-based analysis adds value
The most useful analyses are not single-table queries. They come from joining loss data with event characteristics, providing regional breakdowns, and running metadata iterations, with all analyses stored on the cloud.
A user can add any third-party data, i.e., claims, loss history, hazard maps, etc. A few examples:
1. Physical drivers of loss by state
Joining modeled event-level losses to the Event Information Table and grouping by state and physical attribute—wind speed band, drought index, El Niño–Southern Oscillation (ENSO) phase, etc.—shows which conditions drive risk in each region in Australia.
Victoria and New South Wales tend to show different profiles from Queensland. This kind of analysis is not visible in EP curves alone; it requires joining the loss table to the physical event catalog.
Figure 3: Moody's Risk Data Lake: AAL losses by U.S. state by sub-peril
2. Consistency of geographic risk across return periods
Joining regional average annual loss (AAL) rankings at the 1-in-20 and 1-in-250 year return-period levels shows whether the same areas appear at both. Stable rankings suggest the model's geographic signal is consistent. Significant shifts between attritional and extreme return periods may warrant further investigation and are a useful input to reinsurance structuring decisions.
3. Historical loss validation
Joining model Occurrence Exceedance Probability (OEP) output with normalized historical catastrophe data and plotting historical events against the modeled exceedance probability curve is one of the most fundamental validation checks. On the RDL, it is a single query.
4. Component contribution
Running the same exposure through multiple model configurations and comparing EP curves side by side—using a pivot across two analyses within the same identifier set—shows exactly how much each component contributes at every return period, and whether that contribution is proportional or concentrates in the tail.
Time comparison
The differences in time reflect the available computing infrastructure, not analytical complexity. As described above, in the local workflow, each set of results needs to be downloaded separately, unarchived, merged with any additional third-party data (e.g., company-specific loss history), analyzed, and then displayed.
Depending on local infrastructure, such tasks may take significant amounts of time and require additional review loops to verify the data. For larger datasets such as Period Loss Tables (PLTs), event info tables, etc., limits on local memory capacity can slow data processing.
Instead, with the RDL model toolkit, the only user action required is to specify the analysis IDs to look up, and the entire computation is done in the cloud.
The table below shows the approximate elapsed time for common validation tasks in a traditional local workflow versus the RDL:
Task | Local workflow | On the RDL |
Geographic AAL analysis | 2–4 hours | ~5 minutes |
Component contribution across return periods | Half day | ~10 minutes |
Historical validation (OEP curve) | 1–2 days | ~30 minutes |
Event characterization by weather pattern | Half day | ~15 minutes |
Comparing sensitivity runs | 2–4 hours | ~5 minutes |
Full analysis rerun for a new simulation | 1–2 days | ~10 minutes |
Same analysis applied to a set of multiple portfolios | ~weeks | ~1 hour |
Total | x 12 times efficiency improvement with RDL* | |
* The total time saved on automated tasks largely depends on the resources, experience, and infrastructure available for model validation teams; the example above represents one analyst taking approximately three months analysing multiple portfolios against Industry Exposure Databases using third-party data.
Why you should validate a model
There is a periodic discussion in the market about centralizing catastrophe model validation—having a third party run the model and return a verdict. The appeal is obvious, but the approach misses something important.
The value of validation is not only in the output. It is in what a team learns about how a model behaves on their specific portfolio:
- Which regions concentrate risk in their book?
- How do the model's assumptions interact with their exposure profile?
- Which events at which return periods are most relevant to their reinsurance structure?
Model validation is the point in time when various teams and functions come together: model evaluation or R&D, underwriting, modeling pricing, claims, group risk management, etc.
Model evaluation can align all stakeholders on a common way forward, using an internally built view of risk to convert strategic plans into action.
The faster model validation can be executed, the easier the alignment across an organization will be; spending less time on data crunching and more time on consuming the insights.
A standardised external verdict cannot capture company-specific context. It also cannot build the internal understanding that teams need to use model results confidently in pricing, reserving, and capital decisions.
Validation is most useful when it is done by the people who will act on the results—and the Risk Data Lake makes it practical for them to do so without weeks of data preparation and analysis iterations.
What's on the roadmap
The real value of automation in this context cannot be understated—but this is just the beginning. As our technological capabilities grow, the catastrophe modeling industry stands to benefit enormously, and the RDL is positioned right at the centre of that shift.
The SQL framework, the analysis identifiers, the structured tables—that is exactly the architecture that helps with repeatable model validation work. The queries you write today will still run tomorrow. What changes is how fast you can get from question to insight, and how many people in your organisation can ask the question in the first place.
Getting started
If you are already on the Moody's Intelligent Risk Platform, the steps are straightforward:
- Run the Australia Bushfire HD model on the IED or your own exposure, and record analysisIDs
- Open the Risk Data Lake Model Validation Starter Kit (Clients: SQL file scripts are available in the Moody's Support Center)
- Update scripts by substituting with your analysisIDs
- Run All: Results and charts are ready within minutes
Every result traces back to an SQL query, which makes the analysis auditable and reproducible for regulatory or internal documentation purposes.
For a step-by-step guide to run the sample model validation queries in the Moody's Risk Data Lake SQL editor, clients can visit our Support Center.