Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot validate CustomResourceDefinitions #47

Open
mattnworb opened this issue Jan 26, 2018 · 38 comments
Open

cannot validate CustomResourceDefinitions #47

mattnworb opened this issue Jan 26, 2018 · 38 comments
Labels

Comments

@mattnworb
Copy link

mattnworb commented Jan 26, 2018

Apologies if this should be created against https://github.com/garethr/kubernetes-json-schema instead.

Attempting to validate a apiextensions.k8s.io/v1beta1 CustomResourceDefinition resource fails as the schema file in $VERSION-standalone is empty:

1 error occurred:

* Problem loading schema from the network at https://raw.githubusercontent.com/garethr/kubernetes-json-schema/master/v1.8.5-standalone/customresourcedefinition.json: EOF
[mattbrown@mattmbp kubernetes-json-schema]$ wc -c v1.*-standalone/customresourcedefinition.json
       0 v1.8.0-standalone/customresourcedefinition.json
       0 v1.8.1-standalone/customresourcedefinition.json
       0 v1.8.2-standalone/customresourcedefinition.json
       0 v1.8.3-standalone/customresourcedefinition.json
       0 v1.8.4-standalone/customresourcedefinition.json
       0 v1.8.5-standalone/customresourcedefinition.json
       0 v1.8.6-standalone/customresourcedefinition.json
       0 v1.9.0-standalone/customresourcedefinition.json
       0 total

Is this intentional? It seems impossible in the current form to lint any CustomResourceDefinitions. The kubernetes-json-schema repo does have non-0 byte versions of the schema in the non-standalone directories (i.e. in /v1.8.0/) but kubeval is hardcoded to load the -standalone flavor of each schema.

@garethr
Copy link
Collaborator

garethr commented Feb 24, 2018

Ah, yes. At present kubeval doesn't handle CRDs. I have some ideas for how to fix that but haven't quite had time, and those 0 byte files are a bug in the script used to extract the schemas. I'll at least try and provide better error handling for this. Thanks for opening the issue.

@ghost
Copy link

ghost commented Jul 12, 2018

We're hitting this same issue, unfortunately. I even tried doing kubectl apply --recursive --dry-run -f . in the directory that contains our Yaml files in an effort to get our k8s cluster itself to validate our Yaml (some of which rely on CRDs.) Just a piece of advice to anyone that may try the same approach: this may seem to work for a while. But only because CRDs that are created then deleted get cached for the purposes of subsequent calls to kubectl apply --dry-run. So, if you've ever actually run your CRD-creation Yaml against your cluster, then doing a kubectl apply --dry-run -f <your-file-that-uses-a-CRD> will appear to work. Even though doing a kubectl apply -f <your-file-that-uses-a-CRD> would fail.

@jaredallard
Copy link

In the short term, while this isn't implemented currently, could we exit with a specific exit code if the only errors are schema misses? This would enable people to "ignore" those errors in CI situations

@karlskewes
Copy link

karlskewes commented Mar 20, 2019

Had a quick look, wonder if an approach like:

  1. Add additional Cobra flag --skip-crd-schema-miss or similar
  2. Add another condition here: https://github.com/garethr/kubeval/blob/master/kubeval/kubeval.go#L159
  3. Add suitable test-crd.yaml
  4. Add tests.

If above a suitable approach (or per other suggestions/required changes) I'm happy to submit PR (or someone else can).

@caarlos0
Copy link

caarlos0 commented Apr 1, 2019

for now my workaround has been find . -name '*.yaml' | grep -v crd | xargs kubeval

@grzesuav
Copy link

@garethr can you post some ideas how to add CRD validation ? Maybe it would be possible that someone else could work on this ?

@jaredallard
Copy link

jaredallard commented Apr 12, 2019

For the short term, I have a fork that sorta, kinda implements a bad fix to this: https://github.com/jaredallard/kubeval

Some might find it useful in the short term. I don't think I will have the bandwith to do this properly anytime soon though.

@stefansedich
Copy link

+1 also looking at add kubeval to our CI and CRDs are the currently blocking step.

@trnl
Copy link

trnl commented May 5, 2019

Same for us. We decided to introduce additional CRD in manifests and this one is blocking.

@emirozer
Copy link
Contributor

emirozer commented May 9, 2019

i have introduced this pr: #127
which implements what @kskewes suggested in his comment

@Hiruma31
Copy link

Same thing here while piping helm template straight to kubeval.
A note though:
https://kubernetesjsonschema.dev/v1.11.0-standalone/customresourcedefinition-apiextensions-v1beta1.json leads to a 404, which according to the source could be normal
Yet if we remove -standalone, we get the JSON.
I'm not sure to understand what the standalone part is meant for...

@dwight-biddle
Copy link

This would be a really helpful feature for us as well. Although hiding CRD errors would be a good start, we would actually even like to be able to add to a list of crds for commonly used things like SealedSecrets. We also plan to move all our deployments to HelmReleases, which means that kubeval would no longer be able to validate any of our workloads.

@garethr - Would love to understand at least when PR #127 will be reviewed/merged.

@bgagnon
Copy link

bgagnon commented May 30, 2019

Agree with @dwightbiddle-ef. While the ability to skip validation of CustomResourceDefinitions is desirable, the ultimate goal should be to support their validation according to the schema.

Modern CRDs have the OpenAPI schema embedded in their manifest, so in theory it's a matter of collecting these schemas:

  • from the API server (this would require K8S API access, something generally undesirable for kubeval)
  • through HTTP calls to fetch published CRD manifests (could be solved by reading them from upstream github repos or some other curated repository like https://kubernetesjsonschema.dev/)

To give a concrete example, the Prometheus Operator has a prometheuses.monitoring.coreos.com/Prometheus CRD for which the canonical definition lives at https://github.com/coreos/prometheus-operator/blob/master/jsonnet/prometheus-operator/prometheus-crd.libsonnet

If I can somehow pass this CRD manifest (in Jsonnet, JSON or YAML form) to kubeval at runtime, it should be able to understand the embedded openAPIV3Schema data it contains and validate any input manifests against it.

@dwight-biddle
Copy link

dwight-biddle commented May 30, 2019

We have actually implemented a way to skip CRDs for now by using a simple naming convention and filtering out files with the name "secret" or "helm" in the filename, since that is the bulk of our CRD usage. For those that are curious, we're using the following command:

find . -name '*.yaml' ! -regex '.*[Ss]ecret.*' ! -regex '.*[Hh]elm.*' -print0 | xargs -0 -n1 kubeval

Just want to reiterate that the more valuable feature to spend time investigating/implementing is the ability to reference the open api schemas for CRDs at runtime as @bgagnon is saying above.

@bgagnon
Copy link

bgagnon commented May 31, 2019

I have a working prototype for a potential solution to this approach.
Here's the general idea:

  • use git submodules to collect upstream definitions
  • use js-yaml to convert from YAML to JSON (including multi-document YAML)
  • keep only the CustomResourceDefinitions objects from these projects
  • using Jsonnet, extract the openAPIV3 schemas from those CRDs and apply the same transformations that the instrumenta/kubernetes-json-schema Python tool would do
  • using jsonnet -m, output files with the naming convention that kubeval expects
  • store the combined schemas (Kubernetes + third parties) in a single directory
  • serve that directory over HTTP
  • invoke kubeval with --schema-location set to that base URL

So far I have collected schemas for:

  • Kubernetes
  • Prometheus Operator
  • Heptio Contour

At this point I am wondering if I should simply move to another JSON Schema command line validator such as ajv-cli which might be easier to work with. But for now, I'll keep toying around with kubeval.

Some lessons learned:

  • Jsonnet is the best way to do JSON data transformations
  • upstream projects don't publish their CRDs in any standard way; some have Jsonnet, single-document YAML, multi-document YAML, Helm templates, JSON,etc.
  • it's best to find and use the "source of truth" for the CRDs in the upstream repos in their original form, as opposed to derived versions that may get out of sync over time
  • technically, CRDs are generated by Go code inside the operators/controllers, so we have to trust that the Git repositories contains that output -- some operators might only produce their CRD during registration (ie. at runtime)
  • the relationship between OpenAPI and JSON schema is poorly documented
  • the "strict mode" is unnecessarily complicated to implement

Any thoughts on that general idea?

@bgagnon
Copy link

bgagnon commented May 31, 2019

The schema generation bug that @garethr mentioned is likely due to the circular reference contained in the Kubernetes OpenAPI/Swagger schemas.

It is not possible to "resolve" (de-reference and inline) all $ref references because of that:

$ swagger-cli bundle -r swagger.json
Circular $ref pointer found at swagger.json#/definitions/io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.JSONSchemaProps/properties

To fix this, the "standalone" mode would need to make an exception for this circular reference and allow a top-level object to be referenced by $ref.

@garethr
Copy link
Collaborator

garethr commented Jul 13, 2019

0.11.0 now has the --skip-missing-schemas flag which at least allows for ignoring these resources.

@nvtkaszpir
Copy link

nvtkaszpir commented Jul 15, 2019

0.11.0 now has the --skip-missing-schemas flag which at least allows for ignoring these resources.

Notice, it's --ignore-missing-schemas

@grzesuav
Copy link

@bgagnon could you elaborate how to extract OpenAPI from json ?

Or preferably describe your actions on Prometheus Operator CRD as an example.

@ian-howell
Copy link
Contributor

@garethr The fix in 0.11.0 adding the --ignore-missing-schemas is insufficient. This issue demonstrates the unwanted behavior.

This PR addresses the issue.

2opremio pushed a commit to 2opremio/helm-operator that referenced this issue Aug 6, 2019
kubectl doesn't support it yet, see
instrumenta/kubeval#47
@marshallford
Copy link

Not having support for easily importing CRDs is a bummer. The ability to validate CRs related to foundational projects like Istio, Cert-Manager, and the Prometheus Operator would be great. I am considering implementing roughly the same flow that @bgagnon described but it sounds like a lot to maintain.

Just a thought: As a first step, how feasible would it be to support the OpenAPI spec directly to avoid the conversion step?

@ams0
Copy link

ams0 commented Jun 12, 2020

I was looking into this for validating my manifests and resorted to create my own custom schemas here (using js-yaml/jsonnet). You can find HelmReleases, ClusterIssuers and system-update-controller schemas there.

@leosunmo
Copy link

@ams0 Does that actually validate anything other than that it's valid YAML? I'm running in to #247 and which it does fail on a completely broken YAML file, it doesn't detect clear violations of the JSON schemas.

@joshuaspence
Copy link

@ams0 I was having issues with your JSON schemas, similar to what @leosunmo had experienced.

> cat helmrelease.yaml 
---
apiVersion: 'helm.fluxcd.io/v1'
kind: 'HelmRelease'
metadata:
  name: 'test'
  namespace: 'test'
spec: {}

> kubeval --additional-schema-locations https://raw.githubusercontent.com/ams0/kubernetes-json-schema/master helmrelease.yaml 
PASS - helmrelease.yaml contains a valid HelmRelease (test.test)

I forked your repository to https://github.com/joshuaspence/kubernetes-json-schema and added some additional processing of the JSON schemas and it seems to work as I would expect now.

> kubeval --additional-schema-locations https://raw.githubusercontent.com/joshuaspence/kubernetes-json-schema/master helmrelease.yaml  
WARN - helmrelease.yaml contains an invalid HelmRelease (test.test) - chart: chart is required

@yannh
Copy link

yannh commented Oct 18, 2020

Welp the documentation for kubeval at https://www.kubeval.com/ has not been updated to include --additional-schema-locations :( I wrote kubeconform ( https://github.com/yannh/kubeconform ) partly to solve this.

I've included a python script derived from our @garethr openapi2jsonschema > https://github.com/yannh/kubeconform/blob/master/cmd/openapi2jsonschema/main.py to generate the jsonschemas - I think it does a couple of things your rq magic ( https://github.com/joshuaspence/kubernetes-json-schema/blob/master/build.sh ) does not, such as support for the "strict" mode. Maybe that could generate better schemas for https://github.com/joshuaspence/kubernetes-json-schema/ ?
I share the need of a repository with a large pool of JSON schemas for CRDs, so thanks a lot @joshuaspence !

In Kubeconform I also added support for "configurable" json schemas paths, since I don't think the kubernetes version needs to be part of the path for JSON schema registries for custom resources...

This ticket can probably be closed though.

@joshuaspence
Copy link

Thanks for the pointers @yannh. Your script didn't work on all of the CRDs I have but it made me find a bug in my own script, will try to get around to fixing it.

@ilyesAj
Copy link

ilyesAj commented Feb 3, 2021

any updates for this subject ?

@RichiCoder1
Copy link

One of the features of doc.crds.dev is to output the underlying CRD yaml for crds from a given repo.
e.x.: https://doc.crds.dev/raw/github.com/crossplane/[email protected]

I wonder if wouldn't be possible to take those, download and massage them via kubevel, and then output them to a cache dir that can be using by kubevela

@jstrachan
Copy link

FWIW we've added some to this additional repository:

kubeval --additional-schema-locations https://jenkins-x.github.io/jenkins-x-schemas

@maxenglander
Copy link

maxenglander commented Jun 29, 2021

--additional-schema-locations is helpful, but not ideal. For example, I might want to validate against K8S schema v1.17, and Istio schema v1.8. Suppose I have a private URL serving Istio schemas at <private-istio-json-schema-baseurl>/v1.8

If I try to supply --kubernetes-version v1.17 --additional-schema-location <private-istio-json-schema-baseurl>, kubeval will try to download the Istio schemas from <private-istio-json-schema-baseurl>/v1.17, which isn't what I want.

I can currently work around this by setting --kubernetes-version master or by storing my Istio schemas in a v1.17 path.

I think it might be nice to be able to specify something like --versioned-crd-schema-locations and have kubeval download schemas from the supplied URL without appending the --kubernetes-version, or perhaps --exact-crd-schema-locations without appending either --kubernetes-version or --strict to the path.

@schmurfy
Copy link

any news on this ?
I am currently using --ignore-missing-schemas since CRD are present in all our manifests but I would prefer to validate everything.

@tarioch
Copy link

tarioch commented Sep 28, 2021

I switched to kubeconform and now my workflow is like this

I have a repo with all my schemas https://github.com/tarioch/k8s-schemas

Whenever I add any new CRDs to my cluster I update the schemas on a machine that has access to the cluster, see https://github.com/tarioch/k8s-schemas/blob/master/update.sh for special cases (e.g. in my case jaeger-operator) I get the CRDs not from the cluster but from another place

This then get's checked in.

Whenever I want to validate (e.g. on CI or in a pre-commit hook), I can just point it to that repository and validate

@schmurfy
Copy link

@tarioch Thanks, which command do you use to point kubeconform to your schema repository ?
I tried:

kubeconform --schema-location default --schema-location https://github.com/tarioch/k8s-schemas/ manifest.yaml
kubeconform --schema-location default --schema-location https://github.com/tarioch/k8s-schemas/tree/master/schemas manifest.yaml
kubeconform -verbose --schema-location default --schema-location "https://github.com/tarioch/k8s-schemas/tree/master/schemas/{{.ResourceKind}}{{.KindSuffix}}.json" manifest.yaml

None of them worked and since I have no real way to debug what is the path really queried I am kind of blind here

@tarioch
Copy link

tarioch commented Sep 29, 2021

kubeconform -kubernetes-version 1.21.0 -strict -schema-location default -schema-location 'https://raw.githubusercontent.com/tarioch/k8s-schemas/master/schemas/{{ .ResourceKind }}_{{ .ResourceAPIVersion }}.json'

@rdelpret
Copy link

rdelpret commented Feb 8, 2022

Just because you can doesn't mean you should?

mkdir -p master-standalone-strict && curl --silent  https://raw.githubusercontent.com/argoproj/argo-cd/master/manifests/crds/application-crd.yaml | \ 
yq -o=json eval > ./master-standalone-strict/application-argoproj-v1alpha1.json && cat backstage.yaml | \ 
kubeval --strict --additional-schema-locations file://. ; rm -rf master-standalone-strict
PASS - stdin contains a valid Application (argocd.backstage-staging)

@AtzeDeVries
Copy link

@rdelpret i would really like to know it is documented that https://www.kubeval.com/#crds it does not validate CRD's but it seems it can. What's the catch here?

@rdelpret
Copy link

@AtzeDeVries you can pass it a json schema, so I converted a remote yaml schema to json, trick the tool into thinking that the schema is in the file path format the tool expects then profit. Basically showing they could implement this easily.

@eyarz
Copy link

eyarz commented Aug 21, 2022

The main problem with CRs is that they are spread across GitHub, unlike the K8s native objects schema.
Therefore, we created the CRDs-catalog project to gather them in one place. This repository already contains over 100 popular CRs in JSON Schema format. It can be used (out-of-the-box) with kubeval, kubeconform, and datree.

We have also created a helpful utility - a CRD Extractor to pull CRDs from the cluster and convert them locally to JSON Schema as an output.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests