When Numerical Code Silently Fails: An Illustrative Example

Last Updated: January 05, 2026

DISCUSSED: finding exception-handling failures with my tool EXCVATE, exception spoofing, taint tracking, SMT solvers, opaque compiler transformations, the idiosyncracies of the x86 MAXSS instruction

In this post, I’ll explore how my tool EXCVATE finds latent exception-handling failures in numerical code via an illustrative example. Along the way, we will see:

An interesting case of an exception-handling failure stemming from the idiosyncratic behavior of the x86 maxss instruction.
A high-level overview of the techniques I implemented in EXCVATE to find such failures.
How the opaque nature of different compiler transformations can have surprising and unintended consequences for exception handling in floating-point programs.

The following is adapted from the tutorial I bundled along with the source code available here. The full paper is available here

Setting the scene

As investment in HPC and AI grows, so too does the importance of ensuring the correctness of the numerical code underlying these technologies. High-profile failures of numerical software deployed in mission-critical scenarios highlight the disastrous costs of incorrectness. One understudied source of incorrectness is the inconsistent handling of the exceptions defined in the IEEE 754 Standard. This is especially relevant today given the landscape of modern computing: GPUs are ubiquitous and, spurred by the growing computational demands of AI, they ship with ever-evolving generations of under-documented hardware accelerators (e.g., tensor cores, matrix cores) that operate on exception-prone/low-precision data types and that prioritize performance over correctness. When exception handling bugs remain an undetected failure mode in deployed code, they can appear nondeterministically in production. This can lead to disastrous consequences. For example, during a 2020 showcase of self-driving car technology, a mishandled NaN value in one race car’s system led to a loss of steering, causing the car to smash into a wall.

Today, no existing tool is designed to detect failures in exception handling. To address this gap is to convert latent, nondeterministic runtime failures into immediate, actionable diagnostics. In doing so, debugging costs are dramatically reduced while shifting the detection of potentially catastrophic errors from deployment to development. This is what I designed EXCVATE to do.

A simple piece of numerical code hides a latent exception-handling failure

Consider the squared hinge loss, given by:

\[L(y,t) = max(0, 1-yt)^2\]

We’ll test the following two Fortran implementations:

module squared_hinge_losses
    implicit none

    contains

        real function squared_hinge_loss1(y, t)
            real :: y, t
            squared_hinge_loss1 = (max(0.0, 1-y*t))**2
        end function

        real function squared_hinge_loss2(y, t)
            real :: y, t
            squared_hinge_loss2 = (max(1-y*t, 0.0))**2
        end function

end module

Both of these functions calculate the same quantity with the only difference being in the order of the operands for the max intrinsic function; their results should be identical. However, as we will soon discover with EXCVATE, one of these implementations has a latent exception-handling failure when compiled with gfortran.

Using EXCVATE to find latent failures in exception handling

EXCVATE is made up of three components, depicted in the above workflow diagram: The Execution Selector, the Exception Spoofer, and the Input Generator. Each of these is implemented as a plugin for the PIN binary instrumentation framework. While walking through an analysis of the squared hinge loss example I introduced above, I will discuss the inputs to the overall EXCVATE workflow, each of the components, and how to interpret the results.

Step 0: EXCVATE’s inputs

EXCVATE requires two inputs:

One or more binaries that exercise the functions we are testing. In the case of BLAS libraries, these are the developer-written regression tests bundled with the source code. Here, the tutorial comes with a simple test binary that runs the squared hinge loss functions on a handful of simple inputs.
A set of function prototypes. These are simple text files describing the input and output parameters of the functions to be tested. For these squared hinge loss functions, they look like this:
```
y       r   32  in      1
t       r   32  in      1
result  r   32  return  1
```

This indicates to EXCVATE that there are two 32-bit real inputs we are naming y and t and one 32-bit real return value we are naming result.

Step 1: The Execution Selector collects a set of representative inputs for the functions under test

We invoke the Execution Selector on the command line like so, providing the path to the prototype files and the path to the creatively-named test binary, test:

execution_selector -f prototypes/gfortran/ -- ./test

We will see the stdout from the normal execution of the test binary followed by these summary statistics from EXCVATE:

    saved 1 __squared_hinge_losses_MOD_squared_hinge_loss2 executions for replay
    saved 1 __squared_hinge_losses_MOD_squared_hinge_loss1 executions for replay
    saved 2/8 executions for replay

Note that we saved 1 execution for each of the functions under test. This is because there is no control flow in either of these functions so a single execution using one of the test inputs suffices for each in order to cover all executable paths.

Step 2: The Exception Spoofer overwrites the output of x86 instructions with `NaN` to see if they are handled incorrectly

Design and Motivation

The strategy here is informed by two key points:

It can be tricky to devise function inputs that trigger NaN-generating exceptions in specific instructions.
Most NaN-generating exceptions are handled correctly, meaning that the generated NaN is propagated to the function’s output, thereby signaling that something went wrong.

Point (2) means that, when your goal is finding exception-handling failures, taking on the work involved in Point (1) to trigger these exceptions is often wasted effort. Instead, the Exception Spoofer does the following:

For each selected function execution:
	For each instruction execution that could trigger a NaN-generating exception
		Replay the function execution
		Spoof the exception by overwriting the instruction's output with a NaN
		Raise a warning if handled incorrectly

Seeing it in action

We invoke the Exception Spoofer on the command line like so:

exception_spoofer -f prototypes/gfortran/ -- ./test

And we see the following stdout:

** replaying function executions with nan overwrites
    __squared_hinge_losses_MOD_squared_hinge_loss2
               Spoofed Exceptions: 3
      Exception-Handling Failures: 0
                     Failure Rate: 0.0000%
                       Miss Count: 0
    __squared_hinge_losses_MOD_squared_hinge_loss1
               Spoofed Exceptions: 3
      Exception-Handling Failures: 2
                     Failure Rate: 66.6667%
                       Miss Count: 0

The Spoofed Exception count for each is 3 because there are three possible instructions executed that could cause a NaN-generating exception: one subtraction and two multiplications. It looks like the Exception Spoofer raised two warnings for potential exception-handling failures in squared_hinge_loss1.

Step 3: The Input Generator reifies spoofed exceptions with with a combination of taint tracking and SMT solving

Design and Motivation

Because the Exception Spoofer has issued warnings for two potential exception-handling failures in squared_hinge_loss1, let us run the third component of EXCVATE to find which, if any of them, constitute true bugs. For a true buggy case, EXCVATE must create an input that satisfies the following:

The input must trigger the spoofed NaN-generating exception
The input must preserve the pathological control flow that resulted in failed exception handling.

Point (1) follows from the fact that the exception spoofer may have spoofed an exception in an instruction where, based on previous operations constraining the values of its operands, a NaN-generating exception is actually impossible. The Input Generator must rule out such cases.

Point (2) follows from the fact that it is possible that inputs satisfying point (1) may have side effects on the control flow that end up causing proper handling. The Input Generator must rule out such cases.

The Input Generator accomplishes this by doing the following:

For each warning:
	Apply taint to each FP input
	Replay the execution with the specific spoof that caused the warning
	For each instruction executed with tainted operands
		Add a semantically-equivalent constraint to an SMT query
	Give query to an SMT solver
	If satisfiable:
		Run program with generated input

Seeing it in action

input_generator -f prototypes/gfortran/ -- ./test

We see the following stdout:

** attempting to generate inputs that reify exception-handling failures
    __squared_hinge_losses_MOD_squared_hinge_loss1
        4 SAT instances
        4 event traces generated

Step 4: Inspecting the Results

Let us look at the contents of one of the event traces (omitting a few columns for display purposes):

(in) y: -1.17549e-38 
(in) t: nan 
(out) result: 0

====================================

Disassembly                        Event   Taint Count    
movss xmm0, dword ptr [rax]        G---    1              
mulss xmm2, xmm0                   -P-r    2              
movss xmm0, dword ptr [rip+0x3cf]  --K-    1              
subss xmm0, xmm2                   -P-r    2              
maxss xmm0, xmm1                   --Kr    1                          

At the top, we see the inputs generated by the SMT solver and the output returned by the function; a NaN input is resulting in a zero output! To shed some light on what happened, we can inspect the event trace at the bottom which lists all executed instructions where exceptional values were found:

First, the movss instruction loaded the input NaN from memory into the xmm0 register. The G--- event code indicates that this was a “Generate” event as this was the first instruction to see the NaN.
The mulss instruction performed the multiplication y*t and stored the result into xmm2. The -P-r event code indicates that there was both a “read” event (there was an exceptional value in one of the read operands) and a “Propagate” event. The taint count indicates that after that instruction, EXCVATE was tracking two exceptional values: one in xmm2 and the other in xmm0.
The movss instruction moved the constant 1 into xmm0 to prepare for the subtraction. The --K- event code inticates that there was a “Kill” event which overwrote the NaN in xmm0 that was written there by the initial movss instruction. The decremented taint count reflects this.
The subss instruction performed the subtraction 1-y*t and stored the result into xmm0. The instruction propagated the NaN. The event code and taint count reflect this.
The maxss instruction performed the max operation. The --Kr event code indicates that there was both a “read” event and a “Kill” event which overwrote the NaN in xmm0 that was written there by the previous subss instruction. The decremented taint count reflects this.

This concludes the executed instructions that had exceptional values. While the non-zero taint count in the last row of the event trace highlights the fact there is still one NaN in the xmm2 register, the x86 calling convention dictates that floating-point function return values are passed in xmm0; thus, we can deduce that the 0 return value was written by the maxss instruction and no further instructions read from the xmm2 register.

Conclusions

From the documentation of the maxss instruction:

If only one value is a NaN (SNaN or QNaN) for this instruction, the second source operand, either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN from either source operand be returned, the action of MAXSS can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN, and OR.

While this default behavior is questionably correct (it does not adhere to the definition of maximum provided in the IEEE-754 2019 standard which mandates propagation in the case of any quiet NaN operands), this could be forgiven if either gfortran translated the Fortran max intrinsic into the IEEE-compliant sequence of instructions alluded to by the documentation (it does not), or at the very least if gfortran generated a maxss instruction whose operand order was consistent with what was written in the source code…

However, if the latter were true, we would expect squared_hinge_loss2 to have the exception-handling failure found by EXCVATE and not squared_hinge_loss1. To see this, let’s consider squared_hinge_loss1 for which the max operation in the source code has a constant as the first operand. Since the second operand is the only potential source for a NaN, we could assume from the maxss documentation that writing the source in this way ensures propagation. But we saw that the assembly generated by gfortran actually places the result of the subtraction (subss xmm0, xmm2) as the first operand to the max instruction (maxss xmm0, xmm1). The compiler reversed the order of the operands! Hence, squared_hinge_loss1 has an unexpected exception-handling failure.

Furthermore, since EXCVATE did not find a failure in squared_hinge_loss2, we can deduce that the operand reversal in the translation to assembly happened there as well, thus leading to an order of maxss operands that actually ensured propagation. Interesting!

Even more interesting: exception handling behavior can change from one compiler to another. If you substitute ifx for gfortran both functions will have the exception handling failure. It seems that ifx is determined to set the operand order so that, as written, there will always be an exception-handling failure in the code…

When Numerical Code Silently Fails: An Illustrative Example

Setting the scene

A simple piece of numerical code hides a latent exception-handling failure

Using EXCVATE to find latent failures in exception handling

Step 0: EXCVATE’s inputs

Step 1: The Execution Selector collects a set of representative inputs for the functions under test

Step 2: The Exception Spoofer overwrites the output of x86 instructions with NaN to see if they are handled incorrectly

Design and Motivation

Seeing it in action

Step 3: The Input Generator reifies spoofed exceptions with with a combination of taint tracking and SMT solving

Design and Motivation

Seeing it in action

Step 4: Inspecting the Results

Conclusions

Step 2: The Exception Spoofer overwrites the output of x86 instructions with `NaN` to see if they are handled incorrectly