Should All Mathematical Proofs Be Checked By A Computer?

Proofs, the central tenet of mathematics, occasionally have errors in them. Could computers stop this from happening, asks mathematician Emily Riehl.

Computer proof assistants can verify that mathematical proofs are correct

One miserable morning in 2017, in the third year of my tenure-track job as a mathematics professor, I woke up to a worrying email. It was from a colleague and he questioned the proof of a key theorem in a highly cited paper I had co-authored. “I had always kind of assumed that this was probably not true in general, though I have no proof either way. Did I miss something?” he asked. The proof, he noted, appeared to rest on a tacit assumption that was not warranted.

Much to my alarm and embarrassment, I realised immediately that my colleague was correct. After an anxious week working to get to the bottom of my mistake, it turned out I was very lucky. The theorem was true; it just needed a new proof, which my co-authors and I supplied in a follow-up paper. But if the theorem had been false, the whole edifice of consequences “proven” using it would have come crashing down.

The essence of mathematics is the concept of proof: a combination of assumed axioms and logical inferences that demonstrate the truth of a mathematical statement. Other mathematicians can then attempt to follow the argument for themselves to identify any holes or convince themselves that the statement is indeed true. Patched up in this way, theorems originally proven by the ancient Greeks about the infinitude of primes or the geometry of planar triangles remain true today – and anyone can see the arguments for why this must be.

Proofs have meant that mathematics has largely avoided the replication crises pervading other sciences, where the results of landmark studies have not held up when the experiments were conducted again. But as my experience shows, mistakes in the literature still occur. Ideally, a false claim, like the one I made, would be caught by the peer review process, where a submitted paper is sent to an expert to “referee”. In practice, however, the peer review process in mathematics is less than perfect – not just because experts can make mistakes themselves, but also because they often do not check every step in a proof.

This is not laziness: theorems at the frontiers of mathematics can be dauntingly technical, so much so that it can take years or even decades to confirm the validity of a proof. The mathematician Vladimir Voevodsky, who received a Fields medal, the discipline’s highest honour, noted that “a technical argument by a trusted author, which is hard to check and looks similar to arguments known to be correct, is hardly ever checked in detail”. After several experiences in which mistakes in his proofs took over a decade to be resolved – a long time for something to sit in logical limbo – Voevodsky’s subsequent crisis of confidence led him to take the unusual step of abandoning his “curiosity-driven research” to develop a computer program that could verify the correctness of his work.

This kind of computer program is known as a proof assistant, though it might be better called a “proof checker”. It can verify that a string of text proves the stated theorem. The proof assistant knows the methods of logical reasoning and is equipped with a library of proofs of standard results. It will accept a proof only after satisfying each step in the reasoning process, with no shortcuts of the sort that human experts often use.

For instance, a computer can verify that there are infinitely many prime numbers by validating the following proof, which is an adaptation of Greek mathematician Euclid’s argument. The human mathematician first tells the computer exactly what is being claimed – in this case that for any natural number N there is always some prime number p that is larger. The human then tells the computer the formula, defining p to be the minimum prime factor of the number formed by multiplying all the natural numbers up to N together and adding 1, represented as N! + 1.

For the computer proof assistant to make sense of this, it needs a library that contains definitions of the basic arithmetic operations. It also needs proofs of theorems, like the fundamental theorem of arithmetic, which tells us that every natural number can be factored uniquely into a product of primes. The proof assistant then demands a proof that this prime number p is greater than N. This is argued by contradiction – a technique where following an assumption to its conclusion leads to something that cannot possibly be true, demonstrating that the original assumption was false. In this case, if p is less than or equal to N, it should be a factor of both N! + 1 and N!. Some simple mathematics says this means that p must also be a factor of 1, which is absurd.

Computer proof assistants can be used to verify proofs that are so long that human referees are unable to check every step. In 1998, for example, Samuel Ferguson and Thomas Hales announced a proof of Johannes Kepler’s 1611 conjecture that the most efficient way to pack spheres into three-dimensional space is the familiar “cannonball” packing. When their result was accepted for publication in 2005 it came with a caveat: the journal’s reviewers attested to “a strong degree of conviction of the essential correctness of this proof approach” – they declined to certify that every step was correct.

Ferguson and Hales’s proof was based on a strategy proposed by László Fejes Tóth in 1953, which reduced the Kepler conjecture to an optimisation problem in a finite number of variables. Ferguson and Hales figured out how to subdivide this optimisation problem into a few thousand cases that could be solved by linear programming, which explains why human referees felt unable to vouch for the correctness of each calculation. In frustration, Hales launched a formalisation project, where a team of mathematicians and computer scientists meticulously verified every logical and computational step in the argument. The resulting 22-author paper was published in 2017 to as much fanfare as the original proof announcement.

Computer proof assistants can also be used to verify results in subfields that are so technical that only specialists understand the meaning of the central concepts. Fields medallist Peter Scholze spent a year working out the proof of a theorem that he wasn’t quite sure he believed and doubted anyone else would have the stamina to check. To be sure that his reasoning was correct before building further mathematics on a shaky foundation, Scholze posed a formalisation challenge in a SaiBlog post entitled the “liquid tensor experiment” in December 2020. The mathematics involved was so cutting edge that it took 60,000 lines of code to formalise the last five lines of the proof – and all the background results that those arguments relied upon – but nevertheless this project was completed and the proof confirmed this past July by a team led by Johan Commelin.

Could computers just write the proofs themselves, without involving any human mathematicians? At present, large language models like ChatGPT can fluently generate mathematical prose and even output it in LaTeX, a typesetting program for mathematical writing. However, the logic of these “proofs” tends to be nonsense. Researchers at Google and elsewhere are looking to pair large language models with automatically generated formalised proofs to guarantee the correctness of the mathematical arguments, though initial efforts are hampered by sparse training sets – libraries of formalised proofs are much smaller than the collective mathematical output. But while machine capabilities are relatively limited today, auto-formalised maths is surely on its way.

In thinking about how the human mathematics community might wish to collaborate with computers in the future, we should return to the question of what a proof is for. It’s never been solely about separating true statements from false ones, but about understanding why the mathematical world is the way it is. While computers will undoubtedly help humans check their work and learn to think more clearly – it’s a much more exacting task to explain mathematics to a computer than it is to explain it to a kindergartener – understanding what to make of it all will always remain a fundamentally human endeavour.

For more such insights, log into www.international-maths-challenge.com.

*Credit for article given to Emily Riehl*