Many players in the quantum computing industry proclaim leadership in some aspect of the technology, whether that applies to hardware, software, or algorithms. Publicly, these claims usually go unchallenged and are repeated by the media. Sometimes a new claim can overlap an existing one, however, resulting in a public dispute. As an outside party to such a dispute, how can one know which approach is better?
The first part of the answer is standardized benchmarks.
What are standardized benchmarks?
Standardized benchmarks are sets of tests that measure performance. An IEEE article titled "Demystifying Quantum Benchmarks" describes them as being:
- Randomized (unbiased)
- Clearly specified (unambiguous)
- Holistic (overall performance)
- Inclusive (broadly applicable)
Importantly, the results should be statistically significant. IEEE is working on P7131 Standard for Quantum Computing Performance Metrics & Performance Benchmarking , which is intended to compare quantum computers, both hardware and software, not only to other quantum computers but also to classical computers.
Who do you trust?
When a dispute goes public, benchmarks could give us a way to impartially determine which party is correct. Either one party has better numbers, or the numbers are roughly equal, or there is a split decision using multiple benchmarks. However, we have to ask where the numbers came from. A hardware benchmark such as Quantum Volume (QV), for example, is usually self-reported by the party that provides the quantum computer in question. We can imagine a dispute continuing over benchmarks because each party would assert their own claims while challenging the other side’s self-asserted claims.
Therefore, the second part of the answer is respected, independent benchmarking authorities.
A case study in Quantum Volume (QV)
“Quantum Volume in Practice: What Users Can Expect from NISQ Devices” by the team of Pelofske, Bärtschi, and Eidenbenz at CCS-3 Information Sciences, Los Alamos National Laboratory, independently measured the QV scores, as well as several other metrics, of two dozen quantum computers from almost half-a-dozen different providers. The authors reported their methodology, as well as the highest QV scores they were able to achieve.
Quantum volume (QV) has become the de-facto standard benchmark to quantify the capability of Noisy Intermediate-Scale Quantum (NISQ) devices. While QV values are often reported by NISQ providers for their systems, we perform our own series of QV calculations on 24 NISQ devices currently offered by IBM Q, IonQ, Rigetti, Oxford Quantum Circuits, and Quantinuum (formerly Honeywell). Our approach characterizes the performances that an advanced user of these NISQ devices can expect to achieve with a reasonable amount of optimization, but without white-box access to the device.
QV scores can be thought of as indicating the number of multi-qubit gates that can be successfully implemented per qubit up to a certain number of qubits. In other words, a low score indicates that a quantum computer can only handle small, shallow circuits, whereas a higher score indicates that a quantum computer can handle wider, deeper circuits. A QV of 8, for example, is 23 , indicating that 3 multi-qubit gates can be implemented on each of 3 qubits. A quantum computer with a QV of 16 can implement 4 multi-qubit gates on each of 4 qubits, and so on.
The need for application-oriented benchmarks
Users of quantum computers care about applications. In the Noisy Intermediate-Scale Quantum (NISQ) era, QV tells us what size algorithms we can run, but it doesn’t tell us which algorithms we can run. Volumetric Benchmarking (VB), which is discussed in the Quantum Economic Development Consortium (QED-C) paper “Quantum Algorithm Exploration using Application-Oriented Performance Benchmarks,” addresses that limitation by visualizing the hardware we need to run the algorithms we’re interested in. VB is still hardware-centric, however, forecasting algorithm availability as hardware continues to evolve.
The first limitation of VB is that it doesn’t take speed or cost into consideration. With certain hardware available, you can run your algorithm with the projected accuracy, but is it advantageous to do so over classical algorithms and other quantum or classical-quantum algorithms? That’s a big question, and novel benchmarks are needed to answer it. The second limitation is that it takes a QV-like do-it-yourself approach beyond what is in the paper, allowing QV-like disputable claims to be put forth.
Preliminary application-oriented benchmarks
The United States of America's Defense Advanced Research Projects Agency (DARPA) has begun addressing these limitations. As announced in "DARPA Researchers Highlight Application Areas for Quantum Computing," DARPA's Quantum Benchmarking program has recently shared preprints for 20 applications, selected from a pool of 200 candidate applications that covered chemistry, materials science, and non-linear differential equation problems. And while the preprints again include resource estimates for quantum computers, they also include estimates of computational advantages over supercomputers and high-performance computing (HPC) clusters.
The limitation of this initiative is that it is comparing quantum computing to classical computing. It is not currently comparing quantum computing to quantum computing, should one party claim to have a better performing algorithm than another party. It is application-centric, concluding whether each application has the potential to benefit from quantum computing, or is unlikely to. In fact, the program goes further with US2QC, which is separately determining whether the quantum computers that satisfy the resource estimates are actually feasible. Therefore, quantum computing has to be both advantageous and feasible to be beneficial but the program isn't currently concerned with proving that one quantum algorithm is more advantageous than another or that it will become feasible sooner than another.
Pushing the bar forward
Objective benchmarks have the potential to accelerate research and development. If everyone claims leadership, then no one is pushing anyone else. But if there could be an objective leader, others would have an objective goal. They would have additional motivation to push forward and become recognized as the objective leader in at least one aspect of quantum technology.
This was evident several years ago when then-IBM Q, now IBM Quantum, began announcing QV scores for their superconducting quantum computers. Honeywell Quantum Solutions, now Quantinuum, suddenly jumped out into the lead with their ion trap quantum computers, and then IBM Q caught up. Honeywell jumped ahead again, and then IBM Q caught up again. Anyone using Quantum Twitter at the time could witness this back-and-forth competition. It was both informative and civil.
Conclusion
Although the authors of the QV paper are not members of an official benchmarking authority, we can see the value of independent scoring. We can further see the value of DARPA's objectivity, even though those particular benchmarks use classical computing resources. As long as a respected, independent party impartially determines the scores (QV), plots (VB), or other standardized benchmarks from IEEE, QED-C, and other organizations, disputes over technological “leadership,” commercial utility, “quantum advantage,” and other claims could be resolved akin to being in a court of law: both sides can argue their cases, but we ultimately have a judge’s ruling to help us decide for ourselves who to believe.
Learn more
Balance sheet management
Moody’s solutions allow you to identify, measure, and manage risks across your balance sheet to create a holistic view of assets, liabilities, and opportunities.