Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the documentation comments around CSE's uses (i.e., exploitations) of NaNs #505

Open
spahrenk opened this issue Sep 9, 2024 · 1 comment

Comments

@spahrenk
Copy link
Contributor

spahrenk commented Sep 9, 2024

I (@nealkruis) am going to start documenting things here as a draft while I wrap my head around all the different 32-bit representations.


In CSE, most record data members (especially those set through the user input language) are stored as 32-bit values.
CSE exploits the IEEE 754 definition of NaN to encode payload information about record members to indicate:

  • If the value is set by the user or not
  • If the value is supposed to be autosized
  • If the value is defined by the user as an expression (and which expression it corresponds to)
  • If the value is a choice input (and which choice value it contains)

Here's a good primer on floating point bit patterns. The 32 bits are divided into:

  • a sign (bit 1),
  • the exponent (bits 2-9), and
  • the mantissa/fraction (bits 10-32).

This exploit relies on relatively consistent implementations across compilers. However, per cppreference.com:

In IEEE 754, the most common binary representation of floating-point numbers, any value with all bits of the exponent set and at least one bit of the fraction set represents a NaN. It is implementation-defined which values of the fraction represent quiet or signaling NaNs, and whether the sign bit is meaningful.

The only real risk in this approach is when a floating point operation yields a signaling or quiet NaN value and CSE attempts to process its payload into a meaning that is not intended. In order to prevent this, we need to attempt to limit payload interpretations to bit patterns that are not commonly used as signaling or quiet NaNs in common compiler implementations.

Nomenclature:

  • 0 bit must be zero
  • 1 bit must be one
  • X bit may be either zero or one
  • Z all bits must contain at least one zero
  • N all bits must contain at least one one
  • B all bits must contain at least one zero and at least one one

Here are the rules (as far as I can tell):

  • 0 00000000 00000000000000000000000: 0
  • 1 00000000 00000000000000000000000: -0
  • 0 11111111 00000000000000000000000: inf
  • 1 11111111 00000000000000000000000: -inf
  • X ZZZZZZZZ XXXXXXXXXXXXXXXXXXXXXXX: Normal floating point number
  • 0 11111111 10000000000000000000000: std::numeric_limits<float>::quiet_NaN()
  • 0 11111111 01000000000000000000000: std::numeric_limits<float>::signaling_NaN()

This leaves the following bit sets for CSE's "NANDLES":

  • X 11111111 1NNNNNNNNNNNNNNNNNNNNNN: Quiet NaNs
  • X 11111111 0XNNNNNNNNNNNNNNNNNNNNN: Signaling NaNs

NANDLES (current):

  • 1 11111111 00000000000000000000000: Unset (Note: this is also -inf)
  • 1 11111111 00000001111111111111111: Autosizing
  • 1 11111111 0000000BBBBBBBBBBBBBBBB: Expressions (bottom 16 bits = expression index)
  • 0 11111111 XXXXXXXXXXXXXXXXXXXXXXX: Choices (top 7 bits = choice index; Note: Overlap with inf, std quiet NaN, and Signaling NaN)
@nealkruis
Copy link
Contributor

Unset should be changed to be different from -inf.

Choice needs to be modified to be either:

  • 0 11111111 1NNNNNNNNNNNNNNNNNNNNNN
  • 0 11111111 0XNNNNNNNNNNNNNNNNNNNNN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants