BIS Bulletin on Cognitive Limits of Large Language Models

The use cases of generative AI in the banking sector are evolving fast, with many institutions adopting the technology to enhance customer service and operational efficiency. However, this adoption has been accompanied by increasing regulatory scrutiny, as regulators seek to address the complexities and risks associated with generative AI.

These models’ potential for biased decision-making is one of the key concerns regulators have and thus is also increasingly an area of research. A recent Bulletin from the Bank of International Settlements (BIS) discusses the cognitive limits of large language models (LLMs) such as Generative Pre-trained Transformer (GPT). The authors, Fernando Perez-Cruz and Hyun Song Shin, explore whether these models truly understand the content they generate or merely replicate the text encountered during training.

The authors tested GPT-4 with a logic puzzle known as Cheryl’s birthday puzzle. The authors highlight the impressive capabilities of LLMs, including generating computer code, images, and solving complex mathematical problems. The LLM solved the puzzle flawlessly when presented with the original wording but failed when small incidental details, such as the names of the people, the number of dates, or the order of the clues, were changed. This suggests that while LLMs can perform tasks flawlessly when presented with familiar wording, they struggle with tasks that require rigorous reasoning and understanding when the wording is changed. The authors argue that this failure indicates a lack of true understanding of the underlying logic and a reliance on familiar wording. They also highlight the model's lack of self-awareness of its own ignorance.

The authors note that central banks’ activities are well-suited for the application of machine learning and artificial intelligence, reflecting the ample availability of structured and unstructured data, coupled with the need for sophisticated analyses to support policy. Historically, central banks had been early adopters of machine learning methods in statistics, macroeconomic analysis, and regulation/supervision. The authors also highlight that these findings do not detract from the tangible and rapid progress being made in these areas, as well as in scientific applications of artificial intelligence that have seen rapid progress. However, the authors conclude that while LLMs have made considerable progress in applications such as data management, macro analysis, and regulation/supervision, caution should be exercised when deploying them in contexts that demand rigorous reasoning in economic analysis. While the economic impact of AI could be significant, the current generation of LLMs falls short of achieving artificial general intelligence and cannot substitute for the rigorous reasoning abilities necessary for some core analytical activities. However, the ability of LLMs to engage in rigorous reasoning will determine which tasks and business processes will be impacted by their widespread deployment.

Related link:

BIS Bulletin

Learn more

Innovating with purpose

Moody’s is incorporating cutting-edge technologies, such as artificial intelligence, to help banks meet their existing challenges more effectively.

Learn more

BIS bulletin examines cognitive limits of large language models

Learn more

Innovating with purpose

Who We Serve

Solutions

Capabilities

Contact Us

Moody's Integrity Hotline

Company Information

Regulatory