From f4b5bb033dc4430bbd31dcae8a55f988360bcec5 Mon Sep 17 00:00:00 2001
From: Sebastian Pipping <sebastian@pipping.org>
Date: Wed, 17 Sep 2025 23:14:02 +0200
Subject: [PATCH] lib: Document and regression-proof absence of integer
 overflow from expat_realloc

Matthew Fernandez (@Smattr) and I teamed up on whether function expat_realloc
could be vulnerable to integer overflow in line:

  mallocedPtr = parser->m_mem.realloc_fcn(mallocedPtr, sizeof(size_t) + size);
                                                                      ^
We ended up with a mathematical proof that, fortunately, the current code
already is safe from overflow.

The proof uses technique "proof by contradiction". Let's assume, there *was* a
risk of integer overflow. For a risk of overflow, these four conditions would
all need to be met, together:

(1) `SIZE_MAX < sizeof(size_t) + size`
    or we would not hit an overflow on `size_t`.

(2) `size > prevSize`
    or `expat_malloc` would have already not allocated earlier
    as `expat_realloc` relies on `expat_malloc` for the initial allocation.

(3) `rootParser->m_alloc_tracker.bytesAllocated >= sizeof(size_t) + prevSize`
    or the previous allocation would be gone already or have bypassed accounting.
    The code is not thread-safe in general, race conditions are off the table.

(4) `rootParser->m_alloc_tracker.bytesAllocated + (size - prevSize) <= SIZE_MAX`
    or `expat_heap_increase_tolerable` would have returned `false` and
    the overflow line would not be reached.

We encoded this for the Z3 Theorem Prover (https://github.com/Z3Prover/z3)
and ended up with this document:

  $ cat proof_v2.smt2
  ; Copyright (c) 2025 Matthew Fernandez <matthew.fernandez@gmail.com>
  ; Copyright (c) 2025 Sebastian Pipping <sebastian@pipping.org>
  ; Licensed under the MIT license

  ; (1), (2), (3), (4) form a contradiction

  ; define `SIZE_MAX`
  (declare-fun SIZE_MAX () (_ BitVec 64))
  (assert (= SIZE_MAX #xffffffffffffffff))

  ; define `sizeof(size_t)`
  (declare-fun sizeof_size_t () (_ BitVec 64))
  (assert (= sizeof_size_t #x0000000000000008))

  ; claim we have inputs `size`, `prevSize`, and `bytesAllocated`
  (declare-fun size () (_ BitVec 64))
  (declare-fun prevSize () (_ BitVec 64))
  (declare-fun bytesAllocated () (_ BitVec 64))

  ; assume `SIZE_MAX - sizeof(size_t) < size` (1)
  (assert (bvult (bvsub SIZE_MAX sizeof_size_t) size))

  ; assume `bytesAllocated >= sizeof(size_t) + prevSize` (3)
  (assert (bvuge bytesAllocated (bvadd sizeof_size_t prevSize)))

  ; assume `bytesAllocated - prevSize <= SIZE_MAX - size` (4)
  (assert (bvule (bvsub bytesAllocated prevSize) (bvsub SIZE_MAX size)))

  ; assume `SIZE_MAX - sizeof(size_t) >= prevSize` (anti-overflow for 3)
  (assert (bvuge (bvsub SIZE_MAX sizeof_size_t) prevSize))

  ; prove we have a contradiction
  (check-sat)

Note that we operate on fixed-size bit vectors here, and hence had
to transform the assertions to not allow integer overflow by themselves.

Z3 confirms the contradiction, and thus the absence of integer overflow:

  $ z3 -smt2 -model proof_v2.smt2
  unsat

Co-authored-by: Matthew Fernandez <matthew.fernandez@gmail.com>

CVE: CVE-2025-59375
Upstream-Status: Backport [https://github.com/libexpat/libexpat/commit/f4b5bb033dc4430bbd31dcae8a55f988360bcec5]
Signed-off-by: Peter Marko <peter.marko@siemens.com>
---
 lib/xmlparse.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/lib/xmlparse.c b/lib/xmlparse.c
index de159493..24fd7b97 100644
--- a/lib/xmlparse.c
+++ b/lib/xmlparse.c
@@ -969,6 +969,10 @@ expat_realloc(XML_Parser parser, void *ptr, size_t size, int sourceLine) {
     }
   }
 
+  // NOTE: Integer overflow detection has already been done for us
+  //       by expat_heap_increase_tolerable(..) above
+  assert(SIZE_MAX - sizeof(size_t) >= size);
+
   // Actually allocate
   mallocedPtr = parser->m_mem.realloc_fcn(mallocedPtr, sizeof(size_t) + size);