How Automated Theorem Provers Discovered New Mathematics

7 min read

In 1996, the automated theorem prover EQP settled the Robbins algebra conjecture after sixty years of failed human attempts, demonstrating that some mathematical truths may be accessible only through computational search.

Proof compression occurs when machines find dramatically shorter proofs of known theorems, exposing hidden redundancy in human mathematical organization and revealing structural connections that longer proofs obscured.

Interactive proof assistants like Lean and Isabelle have become creative partners for mathematicians, enabling verification of results too complex for human checking alone while surfacing gaps in published proofs.

The proofs machines produce are mathematically valid but often resist human comprehension, challenging traditional values that prioritize explanatory power alongside correctness.

These developments suggest that the future of mathematics lies in human-machine collaboration, where human insight provides direction and computational capability provides verification and tireless exploration.

In 1996, a computer program named EQP settled a mathematical question that had defeated human mathematicians for sixty years. The Robbins algebra conjecture—a problem in abstract algebra—fell not to brilliant insight or creative intuition, but to systematic computational search. The proof the machine produced was correct, verifiable, and utterly incomprehensible to the humans who had posed the question.

This wasn't an isolated event. Automated theorem provers have become genuine contributors to mathematical knowledge, finding proofs that humans missed, discovering counterexamples that overturned cherished beliefs, and sometimes producing demonstrations so elegant they revealed hidden structure in familiar domains. The relationship between mathematician and machine has evolved from tool-user to something closer to collaboration.

What does it mean when a computer discovers mathematics that surprises its creators? These systems don't operate by mimicking human reasoning—they explore proof spaces through strategies that would be tedious or impossible for biological minds. The results challenge our intuitions about mathematical creativity, the nature of proof, and whether understanding is necessary for discovery. The theorems are real. The question is what they tell us about the enterprise of mathematics itself.

Robbins Algebra Resolution: Sixty Years of Human Failure, Eight Days of Computation

Herbert Robbins proposed his conjecture in 1933: that a certain set of axioms defined the same class of structures as Boolean algebras. The claim seemed plausible, even natural, but proof eluded every mathematician who attempted it. Alfred Tarski, one of the twentieth century's greatest logicians, worked on the problem and failed. His students failed. Their students failed. The conjecture accumulated the peculiar status of problems that seem like they should be tractable but resist every approach.

William McCune's EQP prover solved it in October 1996. The program used a technique called paramodulation—a form of equational reasoning that systematically applies equations as rewrite rules, searching for chains of transformations that connect premises to conclusions. EQP ran for eight days on a workstation, exploring millions of potential proof steps before finding a valid derivation.

The proof itself spans several pages of dense equational manipulation. Human mathematicians who examined it could verify each step, but the overall strategy defies intuitive comprehension. There's no 'aha moment,' no clever lemma that illuminates why the theorem holds. The machine found a path through logical space that works, not a path that teaches.

This created genuine philosophical discomfort. Mathematics has traditionally valued proofs that explain, not merely proofs that convince. A proof should reveal why a theorem is true, exposing the structural relationships that make it necessary. EQP's proof does no such thing. It's a sequence of valid inferences that happens to terminate at the goal—brute force elevated to the status of mathematical knowledge.

Yet the theorem is now established. The conjecture became a theorem not through human understanding but through computational persistence. McCune's achievement demonstrated that some mathematical truths might be accessible only through mechanical search, residing in regions of proof space that human cognition cannot navigate. The Robbins algebra problem wasn't waiting for a brilliant idea; it was waiting for a sufficiently powerful search.

Takeaway
Some mathematical truths may be provable only through computational search, existing in proof spaces that human cognition cannot effectively navigate.

Proof Compression: When Machines Find the Hidden Shortcuts

Not all machine contributions involve solving open problems. Sometimes automated provers rediscover known theorems—and in doing so, reveal that the standard human proofs were dramatically inefficient. Proof compression occurs when a machine finds a proof significantly shorter than any previously known, exposing redundancy that human mathematicians had accepted as necessary.

The four-color theorem provides a famous example in reverse: the original computer-assisted proof was criticized for its length and opacity. But subsequent work produced shorter proofs, and automated systems have continued to find improvements. More striking are cases where classical theorems—established through proofs that filled textbook chapters—turn out to have demonstrations a fraction of the length.

Larry Wos and his colleagues documented numerous instances where automated reasoning systems found proofs of just a few steps for results that had previously required elaborate constructions. Some of these shortened proofs revealed structural connections invisible in the longer versions. The theorem prover hadn't just found a faster route; it had exposed a deeper relationship between concepts that the longer proof obscured.

This phenomenon suggests that human mathematical practice may systematically miss certain kinds of insights. We prove theorems by building on previous results, constructing towers of lemmas and intermediate claims. This scaffolding is pedagogically useful—it shows how ideas connect—but it can hide the fact that some theorems follow almost immediately from first principles, if you can find the right inference path.

The implications extend beyond efficiency. If machine provers routinely find dramatically shorter proofs, it suggests that mathematical knowledge as organized by humans may contain substantial hidden redundancy. Concepts we treat as fundamental might be derivable from simpler bases. Distinctions we consider important might collapse under the right analysis. Machines, unburdened by human intuitions about what should be difficult, explore proof space without our preconceptions.

Takeaway
Dramatically shorter machine proofs suggest that human mathematical organization may contain substantial hidden redundancy, with 'fundamental' concepts derivable from simpler bases.

Interactive Discovery: Proof Assistants as Creative Partners

The most productive human-machine collaborations don't involve fully automated provers but interactive proof assistants—systems like Coq, Lean, and Isabelle that verify human-guided reasoning while providing computational search capabilities. In this paradigm, the mathematician supplies strategy and insight while the machine handles verification and tactical exploration.

Kevin Buzzard's Lean formalization work exemplifies this partnership. His team has formalized substantial portions of modern mathematics, and the process regularly surfaces gaps in published proofs—unstated assumptions, steps that seemed obvious but required substantial argument, occasional errors that peer review missed. The machine doesn't discover new theorems in these cases, but it enforces a standard of rigor that improves human work.

More dramatically, proof assistants have enabled results that would be impractical without computational support. Thomas Hales' proof of the Kepler conjecture—about optimal sphere packing—relied on extensive computation that was initially controversial. The subsequent formal verification in Isabelle and HOL Light transformed skepticism into acceptance. The machine didn't find the proof, but it made the proof trustworthy in a way that human verification alone could not.

The collaborative model is expanding. Mathematicians increasingly use proof assistants not just to verify completed work but to explore possibilities during the creative process. The machine can quickly check whether proposed lemmas hold, search for counterexamples to conjectures, and suggest tactics when human intuition stalls. The boundary between mathematician and tool becomes fluid.

This partnership points toward a future where mathematical creativity is augmented rather than replaced. The human provides direction, recognizing which questions matter and what kind of answer would be satisfying. The machine provides verification, search, and tireless exploration of cases. Neither alone could achieve what the combination accomplishes. The mathematics that emerges is genuinely collaborative—human insight shaped and extended by computational capability.

Takeaway
The most productive paradigm isn't automation or human effort alone, but genuine collaboration where humans provide direction and machines provide verification and tireless exploration.

Automated theorem provers have moved from curiosities to contributors. They've settled open conjectures, exposed hidden structure in established mathematics, and enabled results that human verification alone couldn't secure. The proofs they produce are real mathematics, whatever we think about their comprehensibility.

The deeper question isn't whether machines can do mathematics—they demonstrably can. It's what their success tells us about the nature of mathematical truth. If some theorems are accessible only through computational search, if human proofs routinely contain unnecessary complexity, if collaboration outperforms either human or machine working alone, then our traditional picture of mathematics as pure human reasoning requires revision.

The theorems don't care who proved them. They were true before EQP ran, and they'll remain true regardless of how we feel about mechanical proof. What machines have discovered about mathematical structure is genuine knowledge. The question of what it means for us remains open.