Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML validation #463

Open
codebrain opened this issue May 20, 2019 · 7 comments
Open

YAML validation #463

codebrain opened this issue May 20, 2019 · 7 comments
Labels
ready Issues we'd like to address in the future.

Comments

@codebrain
Copy link
Contributor

Can we look at implementing some kind of schema validation on the YAML files? Something along the lines of: https://github.com/paazmaya/yaml-validator?

Alternatively, perhaps a move towards a schema type that has native validation? JSON Schema?

@ruflin
Copy link
Contributor

ruflin commented May 20, 2019

Can you elaborate a bit on where the users would use / deploy this validator?

@codebrain
Copy link
Contributor Author

The thrust of my argument comes from the code generation perspective, whereby the specification will be used as the source of truth for generating C# types and helpers. In these instances, the downstream impact of an incorrect or incomplete specification often means breaking binary compatibility in the generated artifact. I believe this is also true for other statically typed languages, not just C#.

Anything we can do to validate that the specification is complete and correct minimises some of this downstream impact.

Perhaps this validation can be tied to the CI system, such that an update to the YAML specification results in a CI validation check to assert that the YAML is;

  1. Well formed - parsable
  2. Complete - all required fields and their values are present
  3. Correct - all field types match expected types and the values are consistent with known boundaries

@ruflin
Copy link
Contributor

ruflin commented Aug 15, 2019

++. I remember just recently seeing a PR from @jsoriano in Beats which does some YAML validation. Perhaps we can reuse parts from there?

@jsoriano
Copy link
Member

In elastic/beats#13188 I am adding a check to Beats build to validate that the fields definitions have correct types and formats. Beats case is a bit special because we compose the big fields definition YAML file by concatenating many small YAML-like files, and we were not checking this validity yet, so we could end-up with obscure errors in tests, or with some unexpected behaviours.

For the check I am using the LoadFieldsYaml() function from github.com/elastic/beats/libbeat/mapping, what by itself is enough to check if the final YAML is well formed. Under the hood this function uses go-ucfg (github.com/elastic/go-ucfg), that allows to define custom validation for each parsed object, this way I am checking that format and type are correct. Other checks could be added to better check the completeness and correctness of all fields.

What I think that go-ucfg doesn't cover yet is the detection of unsupported attributes in parsed objects.

@webmat
Copy link
Contributor

webmat commented Aug 16, 2019

I agree with this need. An embryo of some of these ideas is already in place, but is far from perfect.

Here's a few of the things we have already:

  • Setting defaults and basic checks at the fieldset (or schema) level here
  • Setting defaults and basic checks at the field level here
  • I use the test_ecs_spec.py test file to sanity check that the end result of loading all ECS schemas is as expected. Are base fields at the root, are "reusable" fields actually nested where expected?

The checks in place right now are not enough for my liking, I agree we need more.

Side point. When you talk about bounds checking, I assume you're talking about the ignore_above param on keyword fields? This param is a kind of protection for Elasticsearch. A user can basically have as much text as they want in these fields. All this param means is that if the text is longer than ignore_above, this doc's value for the field won't be added to the search index. But the field value will still be present in full in the document's source, however. So I'm not sure I would necessarily enforce bounds checking based on that.

Related to this, there is work that needs to happen on adjusting ignore_above in many places, though. You can see past discussions in #105 and #270.

@ChrisHines
Copy link

Is a schema validator still under consideration? If so, would a CUE schema (cuelang.org) be a viable alternative?

@ebeahan
Copy link
Member

ebeahan commented Nov 22, 2021

@ChrisHines, yes, this is something valuable to incorporate at some point. I've done some JSON schema work as part of other ECS-related projects that could serve as a starting point: https://github.com/ebeahan/ecs-vscode/blob/main/schema/schema.json

@ebeahan ebeahan added the ready Issues we'd like to address in the future. label Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready Issues we'd like to address in the future.
Projects
None yet
Development

No branches or pull requests

6 participants