YAML validation #463

codebrain · 2019-05-20T11:12:33Z

Can we look at implementing some kind of schema validation on the YAML files? Something along the lines of: https://github.com/paazmaya/yaml-validator?

Alternatively, perhaps a move towards a schema type that has native validation? JSON Schema?

ruflin · 2019-05-20T11:15:39Z

Can you elaborate a bit on where the users would use / deploy this validator?

codebrain · 2019-08-13T00:51:49Z

The thrust of my argument comes from the code generation perspective, whereby the specification will be used as the source of truth for generating C# types and helpers. In these instances, the downstream impact of an incorrect or incomplete specification often means breaking binary compatibility in the generated artifact. I believe this is also true for other statically typed languages, not just C#.

Anything we can do to validate that the specification is complete and correct minimises some of this downstream impact.

Perhaps this validation can be tied to the CI system, such that an update to the YAML specification results in a CI validation check to assert that the YAML is;

Well formed - parsable
Complete - all required fields and their values are present
Correct - all field types match expected types and the values are consistent with known boundaries

ruflin · 2019-08-15T06:31:07Z

++. I remember just recently seeing a PR from @jsoriano in Beats which does some YAML validation. Perhaps we can reuse parts from there?

jsoriano · 2019-08-15T08:43:33Z

In elastic/beats#13188 I am adding a check to Beats build to validate that the fields definitions have correct types and formats. Beats case is a bit special because we compose the big fields definition YAML file by concatenating many small YAML-like files, and we were not checking this validity yet, so we could end-up with obscure errors in tests, or with some unexpected behaviours.

For the check I am using the LoadFieldsYaml() function from github.com/elastic/beats/libbeat/mapping, what by itself is enough to check if the final YAML is well formed. Under the hood this function uses go-ucfg (github.com/elastic/go-ucfg), that allows to define custom validation for each parsed object, this way I am checking that format and type are correct. Other checks could be added to better check the completeness and correctness of all fields.

What I think that go-ucfg doesn't cover yet is the detection of unsupported attributes in parsed objects.

webmat · 2019-08-16T17:51:33Z

I agree with this need. An embryo of some of these ideas is already in place, but is far from perfect.

Here's a few of the things we have already:

Setting defaults and basic checks at the fieldset (or schema) level here
Setting defaults and basic checks at the field level here
I use the test_ecs_spec.py test file to sanity check that the end result of loading all ECS schemas is as expected. Are base fields at the root, are "reusable" fields actually nested where expected?

The checks in place right now are not enough for my liking, I agree we need more.

Side point. When you talk about bounds checking, I assume you're talking about the ignore_above param on keyword fields? This param is a kind of protection for Elasticsearch. A user can basically have as much text as they want in these fields. All this param means is that if the text is longer than ignore_above, this doc's value for the field won't be added to the search index. But the field value will still be present in full in the document's source, however. So I'm not sure I would necessarily enforce bounds checking based on that.

Related to this, there is work that needs to happen on adjusting ignore_above in many places, though. You can see past discussions in #105 and #270.

ChrisHines · 2021-09-20T20:31:25Z

Is a schema validator still under consideration? If so, would a CUE schema (cuelang.org) be a viable alternative?

ebeahan · 2021-11-22T20:51:11Z

@ChrisHines, yes, this is something valuable to incorporate at some point. I've done some JSON schema work as part of other ECS-related projects that could serve as a starting point: https://github.com/ebeahan/ecs-vscode/blob/main/schema/schema.json

ebeahan added the ready Issues we'd like to address in the future. label Nov 30, 2021

ebeahan mentioned this issue Dec 13, 2021

JSON schema generator #1687

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YAML validation #463

YAML validation #463

codebrain commented May 20, 2019

ruflin commented May 20, 2019

codebrain commented Aug 13, 2019

ruflin commented Aug 15, 2019

jsoriano commented Aug 15, 2019

webmat commented Aug 16, 2019

ChrisHines commented Sep 20, 2021

ebeahan commented Nov 22, 2021

YAML validation #463

YAML validation #463

Comments

codebrain commented May 20, 2019

ruflin commented May 20, 2019

codebrain commented Aug 13, 2019

ruflin commented Aug 15, 2019

jsoriano commented Aug 15, 2019

webmat commented Aug 16, 2019

ChrisHines commented Sep 20, 2021

ebeahan commented Nov 22, 2021