CreditView | Blog

Moody’s AI: How we’re turning promise into performance

How does Moody’s do AI?

Artificial intelligence has become a headline technology in nearly every industry, and finance is no exception. As one of the first movers in the financial services AI space, we’ve quickly learned to identify what’s viable from what’s just hype, and we’ve invested our time and resources into honing our AI platforms into practical tools that can help professionals make better, faster, and more informed decisions.

How we do AI depends, first and foremost, on the problem we’re trying to solve. Our approach blends best-in-class models with deep finance experience, rigorous evaluation, and a healthy dose of pragmatism. At Moody’s, we’re putting in the work to help make sure we’re building AI systems that align with how our customers operate on a day-to-day basis.

Model-agnostic by design

Our AI team is often asked about which large language model we use. The truth is we don’t commit to just one. Moody’s is deliberately model-agnostic.

That means we use the strongest models available from across the industry (OpenAI, Anthropic, Google, and more), and we match the models to the task they do best. Some are better at retrieval, others at summarization, others at speed. By staying agnostic, we can plug in the right engine for the right job.

This approach keeps us nimble. As models improve – and they’re improving constantly – we can adopt the strongest performers without being tied to a single provider. For our customers, that means the outputs are always coming from state-of-the-art technology, tuned for the task at hand.

For our team, developing Moody’s-grade AI is much less like buying a single car and more like running a garage with the best fleet in the business. Whatever the terrain, we’ve got the right vehicle.

Doing what’s hard: retrieval and context engineering

One of the biggest challenges for AI in finance is around data. Financial information lives in sprawling documents, dense filings, and reports that can stretch to hundreds of pages. Buried in those are the critical details, what we’ve come to view as the needle in the haystack.

This is where retrieval and context engineering come in. Retrieval is about finding the right slice of information from a massive document or dataset. Context engineering is about feeding that slice to the model in a way that increases the likelihood of more accurate and relevant answers.

Here’s a fairly simple example. Suppose you’re asking to compare a company’s credit factors across a given number of years. We know those are found in a specific section of a Moody’s credit opinion. Rather than running a vague search across multiple documents, our systems can jump directly to the relevant sections. That’s Moody’s-grade context engineering in action.  We’re using our years of deep domain knowledge e to guide the AI toward the right part of the haystack, instead of letting it wander blindly.

This combination of smart retrieval and domain-specific tailoring is part of what makes Moody’s AI different. We looked at some of the best AI models on the market, matched them to the tasks they do best, and then layered our own strategies around retrieval and context engineering on top to develop AI systems that are deliberately orchestrated for financial services.

Measuring what matters

In AI, accuracy isn’t as simple as “right or wrong.” Answers can be technically correct but irrelevant, or on-topic but misleading. In finance especially, nuance matters, so we evaluate our AI along several dimensions:

  • Relevance: Did the system attempt to answer the question at all?
  • Pertinence: Is the answer meaningful in context? In finance, we’ve found that sometimes the most useful output is a short analytical paragraph with context rather than a single data point.
  • Groundedness: Was the answer inferred but directly supported by the source material?
  • Super-groundedness: Was the answer literally stated in the source, with no inference involved?

Imagine asking an AI system what the capital of Japan is. If it replies that Paris is the capital of Japan, that’s relevant in form, because it tried to answer the question directly, but it’s obviously wrong. If instead it says, “Japan is an island nation in East Asia with a population of about 125 million people,” that’s a different kind of failure. The statement is true, even useful background, but it doesn’t answer the question. In other words, it’s pertinent but not relevant.

Now let’s say the question is “What was Company X’s revenue in 2008?”, and the model says, accurately, that it was USD 50m in 2008, then the answer is both relevant (it answered the question directly) and pertinent (the answer is meaningful).  The answer should also be super-grounded, as that number would have presumably been pulled directly from the source text.  However, if the question is “Was Company X profitable over the late 2000s?” the AI may need to combine multiple figures and infer profitability. That’s grounded, because it can be supported by sources, but not super grounded, because the answer is not a direct pull.

By benchmarking across these categories, we can see where performance is strong and where it needs work. And importantly, we can measure whether changes, like adding new data sources, make the system better or worse.

Iteration: practice makes progress

AI at Moody’s is not a one-and-done-project.  Our engineering teams constantly run structured sprints to enhance retrieval methods, sharpen prompts, and fine-tune evaluation. Again, each update is measured against our benchmarks so we can demonstrate when accuracy is advancing and ensure it never slips.

Take the challenge of adding new data, for example. Expanding context is powerful, but it can also encourage models to “blabber”, where they produce overly long answers, lose precision, or drift off-topic.   Our evaluation framework is designed toto catch these instances, enabling us to reassess our retrieval and context engineering strategies, which   help us enrich the system without losing precision.

Another example is PDF parsing.. It might sound mundane, but anyone in AI who has worked with complex documents knows how difficult it can be to extract clean, reliable data from existing material. At Moody’s, we’ve built robust pipelines that help ensure we can predictably pull the right information from the data source — no garbled text, no missing context.

This disciplined, iterative approach helps keep Moody’s AI aligned with the evolving business goals of our customers. We are constantly strengthening our platforms with new data and methods, so we can maximize our ability to deliver clarity and precision consistently.

Why it matters

At the end of the day, financial professionals don’t care which LLM provider we use or how we configure retrieval settings. They care that when they ask a question, the answer is accurate, useful, and fast.

That’s why Moody’s approach to AI matters. We combine the best of today’s models with domain experience, meticulous context engineering, and rigorous evaluation. We constantly iterate to make sure performance improves, not degrades. And we do it all with one goal in mind: giving customers the insights they need to make better decisions.

Moody’s-grade AI is not about hype. It’s about auditability, accuracy, and accountability – and about helping enterprises turn possibility into real performance.


Learn more about Moody's Agentic solutions

Moody’s Agentic Solutions leverage advanced AI to add automation  and increased optimization to high-value processes like credit assessment, portfolio monitoring, KYC screening and sales intelligence, powered by Moody’s comprehensive foundation of financial data and content. 

 

Learn More
buildings from the top view
blog
Beyond prompts: Why enterprise AI demands context engineering
Enterprises have begun to discover what the GenAI hype can obscure: large language models are convincing but inconsistent unless fed the right data. That’s why the true differentiator in enterprise grade GenAI isn’t style, but substance — specifically, context engineering.
buildings from the top view
blog
AI is here to stay—but enterprises can’t afford to get it wrong
GenAI is the fastest adopted technology in history, but speed of uptake has not necessarily translated into durable enterprise value. Despite unprecedented investment, many organizations struggle to convert experimentation into lasting operational impact.


Book & explore

Get in touch or book a demo to explore how we can help