You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think the Fluent Bit community should work towards having a higher bar for releases, to ensure stability, and improve user confidence.
The most common use case for Fluent Bit users is collecting k8s log files. It would be really cool if we had automated testing prior to releases that did the following:
deploy the release candidate to a k8s node and collect logs
use kubernetes filter to decorate with metadata
some of the logs should be multiline
testing custom parsers would be ideal as well
as time goes on, we can add other common use cases
send the logs via some open source, non-vendor output plugin, like forward or http. The destination receiving the logs should validate that all logs emitted by the k8s applications were sent and that they have k8s metadata and are in the right format.
This way, we test each release candidate against real-world use cases before releasing it.
We could have two types of tests:
Performance tests: Send logs at some decently high rate for a short period of time, check that they all end up at the destination. We should set some minimum performance bar for each release. As time goes on, this could be expanded into automated benchmarking for releases- we see what the max throughput of each release is in some common use case. And then we have a min bar it must meet, and then the final result (which should be above the min bar) will be published in the release notes for the release.
Stability tests: Run Fluent Bit in the k8s cluster for some non-trivial period of time. The test fails if it crashes or restarts. For patch/bug releases, we can set some small time frame, so that these tests can be run over-night. For minor version releases with new features, we would set a higher bar, like that FB must run without restarts for 3 - 5 days.
The text was updated successfully, but these errors were encountered:
Agreed, I'm looking at general improvements under this change: #3753
Staging build automation
Testing of staging build <-- insert the suggestions here
Promotion of staging to release
It includes some level of testing for releases, although the tests above are more specifically resilience and performance tests. I would agree these should feed in - essentially there is some minimum level of validation for staging builds and then we trigger these longer running tests on those staging builds before approving the release.
The actual test cases can also be used as a form of validation for user infrastructure, i.e. run them in-situ to help identify any issues there.
I agree with keeping verification vendor-agnostic although it would also be useful to include some level of verification for common output plugins. We'll probably need a monotonic count or similar output message to verify all messages were received (coping with out-of-order retries too).
If we have the basic framework in place it will make it easy to evolve by user submission of PRs for new test cases, targets, etc. benefiting everyone then.
I think the Fluent Bit community should work towards having a higher bar for releases, to ensure stability, and improve user confidence.
The most common use case for Fluent Bit users is collecting k8s log files. It would be really cool if we had automated testing prior to releases that did the following:
This way, we test each release candidate against real-world use cases before releasing it.
We could have two types of tests:
The text was updated successfully, but these errors were encountered: