Regulatory News

FCA report indicates effective synthetic data adoption hinges on robust MRM frameworks

The financial services industry is rapidly adopting Synthetic Data. This artificial data, designed to mimic the statistical properties of real data without compromising individual privacy, offers a powerful path to accelerated innovation. Banks can use it to test complex machine learning (ML) and artificial intelligence (AI) models faster, improve security, and responsibly share insights. However, innovation must be balanced with robust risk management. The UK Financial Conduct Authority (FCA) recently published a report that sets out governance considerations when generating and using synthetic data for models in financial services. The FCA report confirms that the responsible adoption of synthetic data is a critical Model Risk Management (MRM) issue. For financial institutions, this means integrating synthetic-data-specific controls into existing MRM frameworks is essential for safe and sustainable growth.

Governing synthetic data as an MRM imperative

The FCA, through its Synthetic Data Expert Group (SDEG), has formulated synthetic data governance by leveraging established AI Ethics and MRM principles. This clarifies that the core responsibilities of MRM—validating models, managing their limitations, and ensuring clear accountability—extend fully to models that rely on synthetic data. To guide firms, the SDEG established nine key principles. The most critical for institutional MRM strategies, according to the report, include:

  • Accountability: Clear roles and responsibilities must be assigned for the entire synthetic data lifecycle, from generation to model deployment.

  • Safety and Suitability: The resulting model must be proven robust and reliable, and synthetic data should only be used where its quality demonstrably meets the required risk threshold.

  • Fairness: Rigorous testing is mandatory to ensure synthetic data does not introduce, amplify, or fail to mitigate existing historical bias within the real-world dataset.

  • Transparency: Firms must maintain comprehensive documentation to allow auditors and risk teams to fully understand the generation methodology and its limitations.

  • Continuous Monitoring: Like any critical data input, the quality and integrity of synthetic data, and its impact on the resulting model, must be continually assessed.

Synthetic data generation risk

The FCA report emphasizes that the data generation phase is where foundational model risks are introduced. The SDEG clarifies that while synthetic data helps with privacy, institutions must actively manage the resulting governance and quality risks.

  • Establishing Auditability and Documentation: The report stresses that effective MRM demands an unbroken chain of evidence for all models built using synthetic data. If a model makes a critical decision, the institution must be able to demonstrate exactly how the artificial data was created from the real source data. The SDEG advises that building governance foundations requires firms to document the methodology, assumptions, and transformations applied during the generation process. It also requires firms to define clear organizational roles and responsibilities for oversight and formal sign-off on the generated data.

  • Managing Bias and Fidelity: The report explicitly highlights that synthetic data can either mitigate existing biases or, if poorly generated, entrench or amplify new ones. The SDEG emphasizes that managing bias is a core part of SD governance. This means firms must embed fairness validation into the data generation step to ensure the synthetic data accurately reflects the underlying risk characteristics without skewing outcomes based on sensitive attributes.

The validation mandate

Perhaps the most critical guidance in the FCA report concerns model validation. The SDEG is explicit: statistical similarity between real and synthetic data is not enough to confirm a model's fitness. Firms must prove their models will perform when they face real-world data. The central validation technique proposed by the FCA report is the Train-Synthetic-Test-Real (TSTR) methodology. The SDEG mandates that the model should be Trained on the Synthetic data, but its final validation must be conducted using an independent holdout set of Real data. The primary concern with this is understanding whether synthetic data leads to misleading performance signals or behavioral drift when deployed in live environments. The report warns that if a model performs excellently on synthetic test data but fails against real-world inputs, it signals a significant flaw. This suggests the model has likely overfitted to artifacts of the generated data, rather than learning the true, generalizable patterns required for reliable financial services applications.

A strategic path forward

The FCA report provides a clear strategic framework: the move toward synthetic data is underway, but its benefits can only be fully realized through robust, transparent governance. This confidence is built not just on technology, but on consistency, interdisciplinary collaboration, and the mandatory integration of advanced validation procedures like TSTR. By proactively updating their model risk management frameworks, financial institutions can successfully navigate these new strategic imperatives, ensuring that the pursuit of data innovation is always conducted with the highest commitment to safety, compliance, and reliability.

Related link: FCA report on synthetic data (pdf)


LEARN MORE

Innovating with purpose

Moody’s is incorporating cutting-edge technologies, such as artificial intelligence, to help banks meet their existing challenges more effectively.