Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse Error on Out of Range Binary Integers #1337

Conversation

olabusayoT
Copy link
Contributor

  • add checks for signed/unsigned in BinaryIntegerBaseParser and BinaryIntegerBaseUnparser
  • add tests for parse and unparse
  • update failing tests

DAFFODIL-2297

@@ -93,6 +93,27 @@ abstract class BinaryIntegerBaseUnparser(e: ElementRuntimeData, signed: Boolean)
dos.putLong(asLong(value), nBits, finfo)
}
}

override def unparse(state: UState): Unit = {
val nBits = getBitLength(state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to call getBitLength twice, once here and once when super is called. We should probably avoid that.

Also, it looks like we don't have any checks on the max size for numbers like we do in parse. Maybe we should just do all the min/max length checks in the main unparse function like we do with parse? Or maybe in BinaryIntegerBaseUnparser.putNumber?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the issue is putNumber doesn't have access to state, which we need for Unparse error and BinaryNumberBaseUnparser is used by non Binary Integer classes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue is putNumber doesn't have access to state

When we call putNumber we actually do pass in the state, it is just upcast to a FormatInfo so putNumber doesn't realize it's actually a UState. But we can change the FormatInfo parameter to UState and it should all just work. I don't think there's a reason putNumber must take a FormatInfo instead of a UState.

BinaryNumberBaseUnparser is used by non Binary Integer classes

Could we put all the width logic in BinaryIntegerBaseUnparser.putNumber(). Then it will only apply to integer types, which I think is exactly what we want?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there is no access to state, I think the technique is to throw a thin-exception to someplace that catches it that has the state needed to report the error in context properly.

@@ -93,6 +93,27 @@ abstract class BinaryIntegerBaseUnparser(e: ElementRuntimeData, signed: Boolean)
dos.putLong(asLong(value), nBits, finfo)
}
}

override def unparse(state: UState): Unit = {
val nBits = getBitLength(state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue is putNumber doesn't have access to state

When we call putNumber we actually do pass in the state, it is just upcast to a FormatInfo so putNumber doesn't realize it's actually a UState. But we can change the FormatInfo parameter to UState and it should all just work. I don't think there's a reason putNumber must take a FormatInfo instead of a UState.

BinaryNumberBaseUnparser is used by non Binary Integer classes

Could we put all the width logic in BinaryIntegerBaseUnparser.putNumber(). Then it will only apply to integer types, which I think is exactly what we want?

primType match {
case primNumeric: NodeInfo.PrimType.PrimNumeric =>
if (primNumeric.width.isDefined) {
val nBits = result.get
val width = primNumeric.width.get
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on renaming width to maxWidth and adding a new minWidth member in NodeInfo. Avoids some of the special logic around figuring out if stuff is signed or not. And then checking maxWidth and minWidth is very similar logic, just something like

if (primNumeric.minWdith.isDefined) {
   val minWidth = primNumeric.minWidth.get
   if(nBits < minWidth) {
     SDE(...)
   }
}

And then the minWidth can be reused in the parser/unparser logic.

Maybe we also add a signed member so we don't have to have the isSigned/isUnsigned stuff? And then parser/unparsers, which should already have access to the primType can just reference the signed member? We pass around isSigned in a bunch of places, wondering if maybe that can all be simplified.

val width = primNumeric.width.get
if (primNumeric.minWidth.isDefined) {
val isSigned = primNumeric.isSigned
val signedStr = if (isSigned) "signed" else "unsigned"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isSigned and signedStr aren't used unless we fall into the nBits < minWidth block. Suggest we move this val into that block.

@@ -893,7 +892,7 @@ trait ElementBaseGrammarMixin
private lazy val packedSignCodes =
PackedSignCodes(binaryPackedSignCodes, binaryNumberCheckPolicy)

private def binaryIntegerValue(isSigned: Boolean) = {
private def binaryIntegerValue = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this just want to be a lazy val now that it isn't parameterized?

@@ -35,6 +35,10 @@ trait BinaryIntegerKnownLengthCodeGenerator extends BinaryValueCodeGenerator {
case n if n <= 64 => 64
case _ => e.SDE("Binary integer lengths longer than 64 bits are not supported.")
}
val signed = e.primType match {
case n: PrimType.PrimNumeric => n.isSigned
case _ => false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this function is only use for numerics, this probably wants to be an assert.impossible with code coverage disabed, or just do e.primType.asInstanceOf[PrimNumeric].isSigned

val state = finfo.asInstanceOf[UState]
if (primNumeric.minWidth.isDefined) {
val isSigned = primNumeric.isSigned
val signedStr = if (isSigned) "signed" else "unsigned"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, suggest moving vals into the block where they are used. Avoids unnecessary work (though in this case the work is pretty minimal).

minWidth,
nBits
)
return false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this return false correct? If we are only warning then should we still unparse something? For example, if nBits is 1 and we are signed but allowSignedIntegerLength1Bit, then we should output 1 bit. Seems like we want to warn but then drop through and still put a number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so I have a test that exercises it, and we go from an infoset containing -1 to unparsing 1, since it is only 1 bit. Is that ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's probably correct? If we think about the min/max values of a 2's complement value, the min value is given as -(2n-1) and the max value is (2n-1)-1, where n is the number of bits. That gives us a range of -1 to 0. So -1 unparsing to a single 1 bit seems reasonable. Hopefully our parse logic is consistent with that when this tunabled is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like we parse data that is -1 to an infoset containing 0, which I think we should be unparsing to 0 instead of 1 here since the max value is 0?

Copy link
Member

@stevedlawrence stevedlawrence Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(previous comment removed, it was wrong)

So it looks like, for parse
0x0 -> 0
0x1 -> 1

And for unparse:
0 -> 0x0
1 -> 0x1
-1 -> 0x1

Other infoset values do not throw an error, so I guess we just aren't doing proper range checking that the value fits in the number of bits?

I'm not sure 2's complement really makes sense with just a single bit, so I guess always parsing/unparse to 0 or 1 is just as logical as something else, and at least it's symmetric.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truncation of a number because it doesn't fit in the available space is supposed to be an unparse error.

case n: NodeInfo.PrimType.PrimNumeric => n.isSigned
case _ => false
}
}
protected def getBitLength(s: ParseOrUnparseState): Int

def parse(start: PState): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just below we have this line:

if (nBits == 0) return // zero length is used for outputValueCalc often.

The purpose of this bug was to make it so zero length numbers is an error. I don't understand why outputValueCalc would have zero length, but based on the bug it sounds like that shouldn't be allowed any more. Should that change to an unparse error?

Copy link
Contributor Author

@olabusayoT olabusayoT Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above this we have PackedDecimal that also returns if nBits is 0, should we also make that a PE? Also PackedDecimal doesn't pass along decimalSigned to the base, is it logical to do something like the below? If so, they I think we need to update the PackedDecimal classes to pass along decimalSigned

<xs:element name="num" type="xs:decimal"  dfdl:representation="binary" dfdl:binaryNumberRep="ibm4690Packed" dfdl:decimalSigned="yes" dfdl:binaryDecimalVirtualPoint="0"/>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you need to know if the decimal representation is expected to be signed or not. The error case of dfdl:decimalSigned="no" but the packed representation has a negative sign needs to be detected somehow.
One could choose different parse primitives based on dfdl:decimalSigned yes/no or pass a boolean isSigned to a common primitive.

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, nice improvements. I think my only comments is do reduce a bunch of copy/pasting, and I think we need different behavior/error messages for packed numbers (which we might already have?)

"Minimum length for %s binary integer is %d bit(s), number of bits %d out of range. " +
"An unsigned integer with length 1 bit could be used instead."
if (isSigned && state.tunable.allowSignedIntegerLength1Bit && nBits == 1) {
state.SDW(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any concerns about creating SDW's at parse time? Diagnostic overhead should be small, but imagine an array of a bunch of 1-bit length elements--that could create a lot of diagnostics. Also consider that this same warning was already output during compilation unless nbits was calculated at runtime. Maybe the compile time warning is sufficient and we should just be silent at runtime? Or maybe we need a flag so we only warn here only if we're in the BinaryIntegerKnownLengthUnparser?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of silent success at runtime if the flag is set to allow 1-bit signed (for compatibility), and parse-error if the flag is not set.

Comment on lines 313 to 332
val signedStr = if (isSigned) "a signed" else "an unsigned"
val outOfRangeFmtStr =
"Minimum length for %s binary decimal is %d bit(s), number of bits %d out of range. " +
"An unsigned decimal with length 1 bit could be used instead."
if (isSigned && state.tunable.allowSignedIntegerLength1Bit && nBits == 1) {
state.SDW(
WarnID.SignedBinaryIntegerLength1Bit,
outOfRangeFmtStr,
signedStr,
minWidth,
nBits
)
} else {
UE(
state,
outOfRangeFmtStr,
signedStr,
minWidth,
nBits
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this same chunk of code with very minor variations appears in multiple places. Can we add trait to avoid all this duplication?

PE(
start,
"Number of bits %d out of range for a binary decimal. " +
"An unsigned decimal with length 1 bit could be used instead.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this error message is correct. I believe packed numbers have different rules about minimum lengths. It's not the same as 2's complement.

Also, this message references "binary decimal", which I think implies 2's complement. Any error messages here should probably reference packed decimals, or leave it up to individual packed parsers to detect bitLength errors and give specific errors.

I think what was intended with the packed implementation is we just call toBigDecimal/toBigInteger below and the different packed implementations do their parsing/validation and throw an exception if it's not a valid length/value, and that exception is converted into a PE.

So I think all this 1-bit signed stuff done elsewhere doesn't even apply to packed decimals. Packed decimals should already be doing bit length checking specific to the packed format. If they handle the zero case correctly, we could probably even remove this check/PE. If not, we probably still need it, but should have an error specific to zero length packed decimals not allowed.

It might also be worth adding a test for some zero length packed things so we get coverage of this edge case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"binary decimal" means type xs:decimal dfdl:binaryNumberRep="binary" i.e., twos complement.

A fraction part is specified using dfdl:binaryDecimalVirtualPoint="3" for example to get 999.999 (3 digits after the decimal point)

So "binary decimal" behaves like a twos-complement signed integer vis a vis this 1-bit compatibility mode.

The packed representations are entirely different, and 1-bit stuff does not apply. They must be aligned 4-bit and a multiple of 4-bits length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like the indvidual packed parsers/unparser handle the nBits == 0 case, Before we were just returning when nBits was 0, should we revert the code, or just update the error to remove the suggestion of what could be used instead, and update the messaging to state nBits == 0 is not allowed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we may be depending on this zero-length behavior. See the comment in the code about zero length is used by outputValueCalc. But this doesn't really make sense to me. Some things, like strings and hexBinary, can be zero length, but many data types zero length makes no sense. There is defaulting, e.g., when we are trying to parse an integer and we get zero-length back we may provide a default value for the integer. But in the absence of defaulting, most simple types require at least 1 bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about that comment. Isn't OVC only used on unparsing? Also, none of our OVC tests failed when we changed the Return to a PE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, OVC is only for unparsing. If no test fail when making this a PE, then I would also delete the comment about OVC, which makes no sense if this is parser code anyway.

nBits
)
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above, I'm not sure this error makes sense for packed numbers and can possibly be removed?

@olabusayoT olabusayoT force-pushed the daf-2297-parseErrorOnOutOfRangeBinaryIntegers branch from 0c3e934 to f9ff30f Compare November 13, 2024 21:01
@olabusayoT olabusayoT requested a review from mbeckerle November 26, 2024 16:35
Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be possible to remove the check for nbits == 0 for packed decimal representations (including IBM4690Packed) because a check for 4-bit multiple has to be elsewhere and that presumably will also rule out the zero case. At least one of these in the code does not have test coverage.

I suggest removing the code that does check for nbits == 0 for the packed numbers and re-test.

I also think removing all the passing of dfdl:decimalSigned as a boolean to the packed/ibm4690 primitives means you cannot detect a negative representation being parsed into an unsigned number.

@@ -137,6 +137,14 @@
</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element name="allowSignedIntegerLength1Bit" type="xs:boolean" default="true" minOccurs="0">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this change and tunable requires a release note in the next release suggesting that people update schemas to not depend on 1-bit signed binary integers.

protected def getBitLength(s: ParseOrUnparseState): Int

def parse(start: PState): Unit = {
val nBits = getBitLength(start)
if (nBits == 0) return // zero length is used for outputValueCalc often.
if (nBits == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere else we must be checking for packed length not a multiple of 4 bits, and having room for the packed sign nibble, etc. So I suspect we don't need to check for zero bits as a special case here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find where we check the packed length is a multiple of 4 bits. Unless you're referring to what we do in the function below?

def bcdFromBigIntegerLength(absBigIntAsString: String, minLengthInBits: Int): (Int, Int) = {
    val numDigits = absBigIntAsString.length
    // Need to have an even number of digits to fill out a complete byte
    val requiredBitLen = if (numDigits % 2 == 0) (numDigits * 4) else ((numDigits + 1) * 4)
    val bitLen = scala.math.max(minLengthInBits, requiredBitLen)
    val numBytes = (bitLen / 8)
    val leadingZeros = bitLen / 4 - numDigits
    (numBytes, leadingZeros)
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Op, nvm. I found it in org.apache.daffodil.core.grammar.ElementBaseGrammarMixin

 if ((binaryNumberKnownLengthInBits != -1) && (binaryNumberKnownLengthInBits % 4) != 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic doesn't preclude nBits == 0, and it only checks for when nBits is known. I think we should leave in the 0 bits check as it is now, because it covers when the nBits is known and when it is calculated. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right, we can't rely entirely compile time validation since lengths could be runtime calculate.

In fact, in addition to checking it's not zero length, we also need to check it's a multiple for 4 at runtime. Right now during runtime, we just ask for a byte array that is nbits long and the packed logic assumes the array is completely full, even though it could have partial fragment bytes.

So I think we do need to keep this 0 check, but also need to add a % 4 == 0 check.

if (nBits == 0) return // zero length is used for outputValueCalc often.
if (nBits == 0) {
PE(
start,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does need test coverage. It may be possible to remove this check entirely. It may not be reachable in that something else is checking for the zero case before this is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually reachable. I've added the test. I don't think we ought to remove this check as we don't check for nBits == 0 anywhere else

case n: NodeInfo.PrimType.PrimNumeric => n.isSigned
case _ => false
}
}
protected def getBitLength(s: ParseOrUnparseState): Int

def parse(start: PState): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you need to know if the decimal representation is expected to be signed or not. The error case of dfdl:decimalSigned="no" but the packed representation has a negative sign needs to be detected somehow.
One could choose different parse primitives based on dfdl:decimalSigned yes/no or pass a boolean isSigned to a common primitive.

packedSignCodes: PackedSignCodes,
val lengthEv: Evaluatable[JLong],
val lengthUnits: LengthUnits
) extends PackedBinaryIntegerBaseParser(e, signed)
) extends PackedBinaryIntegerBaseParser(e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to pass the signed boolean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does using the signedness of the primitive type work? I think the only time xs:decimalSign should matter is with the primitive type is xs:decimal, which uses the PackedDecimal* parsers, not these PackedInteger* parsers. So if feels like we do need a signed boolean for PackedDecimal* parsers, but I think that's just an existing bug that could maybe be added in a separate PR, since this one is more about ranges of things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this signed boolean was replaced by the primitives signedness...as we were just passing in true/false depending on whether the primitive was signed or unsigned...the PAckedDecimalParsers never passed in decimalSigned, so that wasn't remove...as @stevedlawrence said I think that's a separate bug

Copy link
Contributor Author

@olabusayoT olabusayoT Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created DAFFODIL-2963 to track separate bug

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, #1391 modifies packed integers to use PrimType.fromNumber, which validates signedness according to the primitive, and the signed check in this parser goes away.

Copy link
Contributor

@jadams-tresys jadams-tresys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I think all comments have been addressed and I confirmed that with the comibnation of these changes and the ones for PR-1391 that we can remove all the isSigned parameters being passed around and rely on the PrimitiveNumeric.fromNumber to do appropriate sign/type checking.

Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

jadams-tresys pushed a commit to jadams-tresys/incubator-daffodil that referenced this pull request Dec 18, 2024
@olabusayoT olabusayoT force-pushed the daf-2297-parseErrorOnOutOfRangeBinaryIntegers branch from ce2d163 to 57ba139 Compare January 8, 2025 18:56
- currently we don't check the minimum lengths for signed/unsigned binary ints and this causes issues when you try to set a binary int to 0 bits (RSDE), so per the DFDL workgroup we add in these checks.
- add width check to unparse code and associated test
- fix bug where we weren't returning after UnparseError
- add SDEs for checks for non-expression fixed lengths
- currently we pass in isSigned boolean from ElementBaseGrammarMixin, but we can use the new PrimType.PrimNumeric.isSigned member instead for simplicity. We also added the minWidth member to simplify checking the minimum width of PrimNumeric types
- add tunable allowSignedIntegerLength1Bit and update tests to use
- add tests for parse and unparse
- update failing tests
- make it a PE if nBits == 0 for PackedBinaryIntegerBaseParser
- add checkMinWidth and checkMaxWidth into BinaryNumberCheckWidth trait
- add tests for 0 bit length and multiple of 4 check
- add runtime multiple of 4 length check

Deprecation/Compatibility
Although still supported via the tunable allowSignedIntegerLength1Bit, it is recommended that schemas be updated to not depend on 1-bit signed binary integers. allowSignedIntegerLength1Bit=true may be deprecated and removed in the future. Binary integers with 0 bit length are no longer supported.

DAFFODIL-2297
@olabusayoT olabusayoT force-pushed the daf-2297-parseErrorOnOutOfRangeBinaryIntegers branch from 57ba139 to 2d53a72 Compare January 8, 2025 19:18
@olabusayoT olabusayoT merged commit e4cc525 into apache:main Jan 8, 2025
12 checks passed
@olabusayoT olabusayoT deleted the daf-2297-parseErrorOnOutOfRangeBinaryIntegers branch January 8, 2025 19:39
@olabusayoT
Copy link
Contributor Author

Note this PR causes regressions in dfdl-nitf, dfdl-vmf and vmf-pmmc-dfdl

jadams-tresys pushed a commit to jadams-tresys/incubator-daffodil that referenced this pull request Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants