-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation of parsed data with assertions #435
Comments
You have suggested to check such relations using syntax in YAML. Having this syntax in YAML has both drawbacks and profits. The drawback is that we have to support it alongside with usual expressions. The profit is that it is yaml, any tools working with yaml would be able to parse it. So shouod we drop our current expr syntax and start using entirely yaml-based? |
What exactly is "our current expr syntax", and what do you mean by "dropping" it? We had no syntax proposal for validation until now, and I suspect what you imply is covered in - id: foo
type: u4
valid:
expr: _ % 2 == 0 |
If the field is supposed to contain an enum value, can I check that by specifying a key of |
@webbnh Probably for enums we can just do the following:
- id: foo
type: u4
enum: bar
valid:
enum-key: true |
What you have proposed can be expressed as a logical expression, assumming that we have some facilities. throw:
if: _ != expected
But we can go from another side. We can eliminate textual expressions and force the developer into using YAML AST. for example size:
and:
- '!=':
- a
- 0
- '>':
- b
- '_io.size' I know that it is monstrouous space-wasting and unusable, but I'm not sure ;) |
My point here is actually very simple: if we'll just do it as one expression — which we actually still can do, i.e. stuff like that is legal: valid:
expr: _ == 42 # the same as `eq: 42`
valid:
expr: _ >= 42 # the same as `min: 42` It's easy to do, but we lose declarativity this way. If it's only an expression, one can't just look at this validation spec and render it as a range input (for example, if a min/max boundaries are known), or as a combo box with a set of choices (if a set of choices is known). Basically, if we only have a validation expression, the only thing we can do (without reversing it and applying some symbolic solver) is checking every particular value for validity, and that's it. Declarative configuration offers more, for example, you can clearly get a list of valid values, or boundaries, or stuff like that. Proposed mechanism is also extensible. For example, if we'll want to add size limits of strings in future, we can do something like valid:
max-size: 42 and it could be rendered as string with limited input box, not asking a user to enter arbitrary string and then rejecting it for some mysterious reason. |
I haven't thought about using that info for metadata for GUI purposes. It's cool idea. In this case
IMHO is the best way to do it. If we have problems with performance we can try to cache solver results.
I guess we don't lose declarativity, since we still declare "this value must satisfy this logical expression" and all the needed info is there, but we lose some verbosity. What we win, is that we don't make an assumption about programmer using the proposed verbose syntax instead of exprs. So even if a programmer used exprs, if we have a system recovering these info from exprs, we would expect it working correctly, even retroactively, when a new extractor is added and it becomes working even on old specs without any modification |
Please take into account bit sized values like:
|
I went forward and do some proof-of-concept implementation in current compiler. Namely:
|
The current KSC (0.9-SNAPSHOT) generates incorrect code when Test code: meta:
id: valid_if_test
seq:
- id: optional
if: false
type: u1
valid: 42 produces (in JavaScript, but it happens in all languages): ValidIfTest.prototype._read = function() {
if (false) {
this.optional = this._io.readU1();
}
if (!(this.optional == 42)) {
throw new KaitaiStream.ValidationNotEqualError(42, this.optional, this._io, "/seq/0");
}
} |
Probably moving it inside |
Sure. |
There are two types of RIFF structs: with asserts (chunk and parent_chunk_data) and without asserts (chunk_generic and parent_chunk_data_generic). Those without asserts won't be needed when kaitai-io/kaitai_struct#435 (comment) is resolved.
Recent compiler updates fixed |
I forgot that the compiler doesn't yet perform type checking on `valid` expression (see kaitai-io/kaitai_struct#435 (comment)), so we need to pay attention ourselves.
Considering that we can assign
currently the compiler throws an error as it expects "string"? I assume that means KSY expressions so I tried Heck even the painful approach doesn't work:
KSC seems to evaluate |
I accidentally discovered that this was a thing. When will the relevant keywords be added to the docs and the YAML schema (the kaitai-struct-vscode plugin uses it). |
Which version of the VSCode extension do you have? I have v0.8.1 and it does not have the valid-expressions included. They are underlined red in my version. It seems to be outdated sinde it doesn't know the A year ago, I tried to contribute documentation about the |
Same. |
Adapted the discussion thread [1] to document the expected `valid` key behavior. [1]: kaitai-io/kaitai_struct#435
I almost forgot to mention that there is now a new subkey of seq:
- id: foo
type: u4
enum: animal
valid:
in-enum: true
enums:
animal:
4: dog
12: chicken
seq:
- id: foo
type: u4
enum: animal
valid:
any-of:
- animal::dog
- animal::chicken
enums:
animal:
4: dog
12: chicken ... but obviously more convenient to use and easier to maintain. Of course, this It's intended only for fields with the In case you were wondering,
|
Extracting easier part of #81, with a concrete proposal on implementation.
Context: we want to have some sort of way to state that certain attributes have some sort of "valid" values. Stating that has two implications:
The simplest existing implementation is
contents
: it is essentially a byte array, which validates that its contents are exactly as specified in ksy file. On parsing, we'll throw an error if byte array contents won't match the expected value. On generation, we can just write the expected value, and not bother the end-user with setting the attribute manually.Syntax proposal
Add
valid
key to the attribute spec. Inside it, there could be:eq: expected
states that attribute value must be equal to given value (i.e. the same as above)min: expected
states minimum valid value for this attribute (compared witha >= expected
)max: expected
states maximum valid value for this attribute (compared witha <= expected
)any-of: list-of-expected
states that valid value must be equal to one in the given list of expressionsexpr: expression
states that given expression need to be evaluated (substituting_
with an actual attribute value) to be true to treat attribute value as valid.All
expected
andexpression
values are KS expression strings, which can be constants or can depend on other attributes.list-of-expected
must be a YAML array of KS expression strings.expected
inferred type must be comparable with type of the attribute.expression
inferred type must be boolean.The text was updated successfully, but these errors were encountered: