(Draft) How to Compare Floating Point Numbers

(Draft) How to Compare Floating Point Numbers

Let’s say you’ve just run a long computation, and want to verify the output in a test. If it’s a floating point number, what’s the best way to do it?

TL;DR: How to Quickly Test for (Approximate) Equality

# Set this value as appropriate, depending on how precision you want the results.
RELATIVE_TOLERANCE = 0.0000000001;
# Assumes double precision. Adjust accordingly, e.g., use 1e-37 if floats are used.
# May need to adjust this higher if your computation shrinks then grows its result.
ABSOLUTE_TOLERANCE = 1e-307 * (1 / RELATIVE_TOLERANCE);
def approximately_equal(n1, n2, absolute_tolerance=ABSOLUTE_TOLERANCE,
                        relative_tolerance=RELATIVE_TOLERANCE):
  diff = abs(n1 - n2)
  if diff < absolute_tolerance:
    return True
  larger = max(abs(n1), abs(n2))
  return diff / larger < relative_tolerance

To understand why this version is preferred, it’s best to go through earlier attempts and discuss why they do not work.

The Naive Implementation

def equal(n1, n2):
  return n1 == n2

The problem is that floating point arithmetic often has round-off error due to inexact representations. For example, .1 - (1 - .9) == 2.7755575615628914e-17 on my computer (it can vary based on both architecture and compiler). If you try to do exact comparisons, you will often get unexpected results. Thus, it is generally not recommended, since it will often not work.

Absolute Tolerance

def approximately_equal(n1, n2, absolute_tolerance):
  return abs(n1 - n2) < absolute_tolerance

The problem is there is no value of absolute tolerance that will work in all situations. E.g., since double precision floating points are accurate up to ~17 digits, after the ~17th digit, anything is possible. If the numbers are huge, like 1e100, then the absolute tolerance should not be less than to be approximately 1e83. However, if you are comparing 0.5 to 0.1, such a large tolerance will tell you nothing about equality.

Relative Tolerance

def approximately_equal(n1, n2, relative_tolerance):
  return abs(abs(n1 - n2) / n1)) < relative_tolerance

This algorithm is an improvement if your computation’s round-off error is bounded. However, it has a few issues. First, if n1 is zero, it will involve a division by zero. Second, it’s possible that swapping the order of the two arguments will result in a different answer, which violates the symmetric property of equality testing. An improvement would be:

def approximately_equal(n1, n2, relative_tolerance):
  larger = max(abs(n1), abs(n2))
  if larger == 0:
    return True
  return abs(n1 - n2) / larger < relative_tolerance

This version is improved, and works for most situations where the computation’s round-off error is bounded. However, it does not work if the two numbers are both very small. They may be represented inaccurately by a floating point number, if they are at the bounds of what a floating point can represent, e.g., 1.49999999999999e-308 may be stored as ~1e-308 and 1.50000000000001e-308 may be stored as ~2e-308. Relatively, one is twice the size of the other, but this is due to rounding inaccuracy since additional digits cannot be stored.

And We Arrive at the TL;DR Version

The TL;DR version accounts for loss of precision with numbers that are so small, they cannot be accurately represented as floating point numbers. It is computed by taking the smallest possible value that can be represented, and multiplying it by the inverse of your tolerance. If the difference between the numbers is less than this value, we consider them to be virtually identical. This helps ensure that information loss from very small numbers does not cause us to incorrectly conclude that two approximately equal numbers are not equal. This threshold may need to be adjusted upward if your computation reduces the numbers considerably before increasing them; e.g., if it decreases a number to the threshold of the floating point representation, then multiplies it by a huge number, it will have lost much information and its accuracy will be greatly reduced.

See Also

The Random ASCII Blog has a nice write-up about the pitfalls of floating point computation, along with information about proficiencies and deficiencies of various algorithms.