Proofs in Python

PyCon Singapore 2026 Edu Summit

Melvin Zhang

Computational Thinkerers

Computational Thinkerers

Spot the bug

Java’s binary search picks the middle index:

int low  = ...                // int is signed 32-bit
int high = ...
int mid  = (low + high) / 2;  // shipped in java.util.Arrays for ~9 years

Looks fine.1

What is the bug?2

Why proofs?

Java’s binary search had a test suite.

  • A test suite checks the cases you thought of
  • A proof checks all 2⁶⁴ cases at once

But first the proof makes you write the spec.

A Python bug from 2002 to 2015

Timsort in list.sort() was silently broken for 13 years.

  • Shipped in Python, Java, and Android
  • No test ever triggered it
  • de Gouw et al. found it while proving it correct1

Writing the proof forced the missing case into the open.

Back to the binary search bug

lo + hi overflows to a negative int.

from z3 import BitVec, ZeroExt, LShR, Implies, And, prove

lo, hi = BitVec("lo", 32), BitVec("hi", 32)
valid = And(lo >= 0, hi >= 0)                        # real array indices
true_mid = LShR(ZeroExt(1, lo) + ZeroExt(1, hi), 1)  # midpoint, no overflow

buggy = (lo + hi) / 2
prove(Implies(valid, true_mid == ZeroExt(1, buggy)))

fixed = (lo & hi) + LShR(lo ^ hi, 1)
prove(Implies(valid, true_mid == ZeroExt(1, fixed)))

Output

counterexample
[hi = 1874714396, lo = 2069860050]
proved

Replace a loop with an expression

\[ \sum_{k=0}^{n-1} (a + kd) = \frac{n(2a + (n - 1)d)}{2} \]

from sympy import symbols, summation, simplify

a, d = symbols("a d")
n, k = symbols("n k", positive=True, integer=True)

lhs = summation(a + k * d, (k, 0, n - 1))
rhs = n * (2 * a + (n - 1) * d) / 2

assert lhs.equals(rhs)

SymPy proves the general AP formula.

Proofs as functions

pyzar is a declarative proof checker inspired by Mizar/Isar

\[ A \land B \;\implies\; B \land A \]

Assume A∧B, take it apart, put it back together.

from proof import proof
from tactics import CONJ

@proof
def CONJ_COMM(p):
    p.goal("!a:bool b:bool. a /\\ b ==> b /\\ a")
    p.fix("a b")                             # consider some bool a and b
    h = p.assume("a /\\ b")                  # h is our assumption a /\ b
    ha, hb = p.split(h)                      # from a /\ b, get a and b
    p.thus("b /\\ a").by_thm(CONJ(hb, ha))   # recombine, swapped

Proofs in Python

Python can host different styles of proof:

  • bit-precise arithmetic with z3
  • symbolic algebra with SymPy
  • declarative step-by-step deduction with pyzar

Additional content

Verifying a full adder

A full adder has three Boolean inputs:

  • a
  • b
  • cin

It should return the same result as adding the three bits.

Full adder circuit

from z3 import Bools, Xor, And, Or, If, prove

a, b, cin = Bools("a b cin")

sum_bit = Xor(Xor(a, b), cin)
carry = Or(And(a, b), And(cin, Xor(a, b)))

Full adder specification

total = If(a, 1, 0) + If(b, 1, 0) + If(cin, 1, 0)

prove(If(sum_bit, 1, 0) == total % 2)
prove(If(carry, 1, 0) == total / 2)

Output

proved
proved

The two proof calls mean:

  • sum_bit is the low bit of a + b + cin.
  • carry is the high bit of a + b + cin.

A natural deduction proof

\[ A \land B \;\implies\; B \land A \]

mathesis builds a natural-deduction proof backwards.

from mathesis.grammars import BasicGrammar
from mathesis.deduction.natural_deduction import NDTree, rules as R

g = BasicGrammar()
nd = NDTree([g.parse("A∧B")], g.parse("B∧A"))   # premise ⊢ goal
nd.apply(nd[2], R.Conjunction.Intro())          # to prove B∧A, prove B and A  [∧I]
nd.apply(nd[4], R.Conjunction.Elim("right"))    # B follows from A∧B           [∧E]
nd.apply(nd[6], R.Conjunction.Elim("left"))     # A follows from A∧B           [∧E]
assert nd.is_valid()                            # every branch is justified

Proving the halting problem

No program can decide halting:

\[ \neg\,\exists H.\ \mathsf{halts\_decider}(H) \]

where halts_decider H is a program where

  • (H d) halts if d run forever
  • (H d) runs forever if d halts

The diagonal does the work

Construct a program d that evaluates to (H d)

\[ \begin{array}{rcll} d & \twoheadrightarrow & \mathsf{App}_t\,H\,d & \text{(diagonal)}\\ \mathsf{halts}\,d & = & \mathsf{halts}(\mathsf{App}_t\,H\,d) & \text{(reduction)}\\ \mathsf{halts}\,d & = & \neg\,\mathsf{halts}(\mathsf{App}_t\,H\,d) & \text{(decider)}\\ \hline \mathsf{halts}(\mathsf{App}_t\,H\,d) & = & \neg\,\mathsf{halts}(\mathsf{App}_t\,H\,d) & \text{($P=\neg P$)} \end{array} \]

Sketch of the pyzar proof

@proof
def HALTING_UNDECIDABLE(p):
    p.goal("~ (?H. halts_decider H)")
    with p.suppose("?H. halts_decider H") as h_ex:
        h_diag = p.have("?d. halts d = halts (App_t H d)").by(..)
        d = p.choose("d", from=h_diag)
        # d reduces to App_t H d, therefore
        # halts d = halts (App_t H d)
        ...
        H, _ = p.choose("H", from=h_ex)
        # H decides halting (flipped), therefore
        # halts d = ~halts (App_t H d)
        ...
        # contradiction