-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Flux aborts synchronization on manifest syntax errors #2861
Comments
Thanks for reporting the problem! I understand your frustration, but the fact that this was caused by a tab, and the amount of manifests you are dealing with is just incidental. Flux takes a no-harm approach. Finding an error in a manifest and blindly applying the rest can be disastrous in a production environment (e.g. ignoring a syntax error in in an updated config map which is needed for the correct functioning of a critical workload also updated accordingly). Now, users should be made aware of reconciliation being aborted as soon as possible to avoid situations like the one described in this issue. Flux already offers an (unfortunately as of yet undocumented) event API (used by e.g. Weave Cloud and fluxcloud) and some Prometheus metrics. Please let us know if those two methods are not enough for you. If the you are already using the events API through your bots (which you didn't clarify in the description) that should be fixed. We understand that Flux's observability could be improved, and we are keeping track of that at #2812. We will happily accept your contributions improving the situation. This is open source after all! |
Related, and probably solvable by the same means as #2535 |
Flux v2 resolves this with a model that allows sharding of your resources across https://toolkit.fluxcd.io/guides/notifications/ Flux v1 is in maintenance mode now, and is not adding any new features unless they are critical. As Flux contrib efforts have been focused on Flux v2, the Flux project has moved to a new repo, fluxcd/flux2 In the interest of reducing the number of open issues not directly related to supporting Flux v1 in maintenance mode, and respecting you may have moved on already, I will go ahead and close out this issue for now. Thanks for using Flux! |
We noticed today that
fluxd
stopped applying our commits for over an hour. Flabbergasted by the lack of errors posted to our #bots slack channel, we found these logs being repeatedly output:Indeed, there was a problem with that single file. However, this is a repo with 34 folders and totalling 267
yaml
files and it would be nice iffluxd
would just ignore the failing file and kept going. We had other commits that were not applied to the k8s and people had no clue, other than from theflux-sync
tag not changing, that something was going on.Additional context
The text was updated successfully, but these errors were encountered: