-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enforce requirment for periodic config for index tables to be 24h when using boltdb shipper #2166
enforce requirment for periodic config for index tables to be 24h when using boltdb shipper #2166
Conversation
77aa519
to
a80ad48
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2166 +/- ##
==========================================
- Coverage 62.91% 62.77% -0.14%
==========================================
Files 162 162
Lines 13952 13959 +7
==========================================
- Hits 8778 8763 -15
- Misses 4491 4512 +21
- Partials 683 684 +1
|
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
a80ad48
to
4f37230
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…n using boltdb shipper
bf669f2
to
0c07aba
Compare
* First step towards Cortex using services model. Only Server module converted. Signed-off-by: Peter Štibraný <[email protected]> * Converted runtimeconfig.Manager to service. Signed-off-by: Peter Štibraný <[email protected]> * Added /services page. Signed-off-by: Peter Štibraný <[email protected]> * Converted memberlistKV to service. Signed-off-by: Peter Štibraný <[email protected]> * Fixed runtimeconfig tests. Signed-off-by: Peter Štibraný <[email protected]> * Converted Ring to service. Signed-off-by: Peter Štibraný <[email protected]> * Log waiting for other module to initialize. Signed-off-by: Peter Štibraný <[email protected]> * Converted overrides to service. Signed-off-by: Peter Štibraný <[email protected]> * Converted client pool to a service. Signed-off-by: Peter Štibraný <[email protected]> * Convert ring lifecycler into a service. Signed-off-by: Peter Štibraný <[email protected]> * Converted HA Tracker to a service. Signed-off-by: Peter Štibraný <[email protected]> * Converted Distributor to a service. Signed-off-by: Peter Štibraný <[email protected]> * Handle nil from wrappedService call. Signed-off-by: Peter Štibraný <[email protected]> * Explain why Server uses service and not a wrappedService. Signed-off-by: Peter Štibraný <[email protected]> * Make service from Store. Signed-off-by: Peter Štibraný <[email protected]> * Convert ingester to a service. Signed-off-by: Peter Štibraný <[email protected]> * Convert querier initialization into a service. This isn't full conversion, but more work would be required to fully use services for querier. Left TODO comment instead. Signed-off-by: Peter Štibraný <[email protected]> * Listen for module failures, and log them. Also log when all services start successfully. Signed-off-by: Peter Štibraný <[email protected]> * Convert blockQueryable and UserStore into a services. UserStore now does initial sync in Starting state. blockQueryable doesn't return querier until it has finished starting. Signed-off-by: Peter Štibraný <[email protected]> * Wait a little before shutdown. Signed-off-by: Peter Štibraný <[email protected]> * Converted queryFrontend to a service. Signed-off-by: Peter Štibraný <[email protected]> * Logging Signed-off-by: Peter Štibraný <[email protected]> * Convert TableManager to service Signed-off-by: Peter Štibraný <[email protected]> * Convert Ruler to service Signed-off-by: Peter Štibraný <[email protected]> * Convert Configs to service Signed-off-by: Peter Štibraný <[email protected]> * Convert AlertManager to service. Signed-off-by: Peter Štibraný <[email protected]> * Renamed init methods back. Signed-off-by: Peter Štibraný <[email protected]> * Fixed tests. Signed-off-by: Peter Štibraný <[email protected]> * Converted Compactor to a service. Signed-off-by: Peter Štibraný <[email protected]> * Polishing, comments. Signed-off-by: Peter Štibraný <[email protected]> * Comments. Signed-off-by: Peter Štibraný <[email protected]> * Lint comments. Signed-off-by: Peter Štibraný <[email protected]> * Stop server only after all other modules have finished. Signed-off-by: Peter Štibraný <[email protected]> * Don't send more jobs to lifecycler loop, if it's not running anymore. Signed-off-by: Peter Štibraný <[email protected]> * Don't stop Server until other modules have stopped. Signed-off-by: Peter Štibraný <[email protected]> * Removed Compactor from All target. It was meant to be for testing only. Signed-off-by: Peter Štibraný <[email protected]> * Comment. Signed-off-by: Peter Štibraný <[email protected]> * More comments around startup logic. Signed-off-by: Peter Štibraný <[email protected]> * moduleServiceWrapper doesn't need full Cortex, only serviceMap Signed-off-by: Peter Štibraný <[email protected]> * Messages Signed-off-by: Peter Štibraný <[email protected]> * Fix outdated comment. Signed-off-by: Peter Štibraný <[email protected]> * Start lifecycler in starting functions. Signed-off-by: Peter Štibraný <[email protected]> * Fixed comment. Return lifecycler's failure case, if any, as error. Signed-off-by: Peter Štibraný <[email protected]> * Fix lifecycler usage. Pass independent context to lifecycler, so that it doesn't stop too early. Signed-off-by: Peter Štibraný <[email protected]> * Fix test. Signed-off-by: Peter Štibraný <[email protected]> * Removed obsolete waiting code. Only log error if it is real error. Signed-off-by: Peter Štibraný <[email protected]> * Renamed servManager to subservices. Signed-off-by: Peter Štibraný <[email protected]> * Addressing review feedback. Signed-off-by: Peter Štibraný <[email protected]> * Fix test. Signed-off-by: Peter Štibraný <[email protected]> * Fix compilation errors after rebase. Signed-off-by: Peter Štibraný <[email protected]> * Extracted code that creates server service into separate file. Signed-off-by: Peter Štibraný <[email protected]> * Added some helper methods. Signed-off-by: Peter Štibraný <[email protected]> * Use helper methods to simplify service creation. Signed-off-by: Peter Štibraný <[email protected]> * Comment. Signed-off-by: Peter Štibraný <[email protected]> * Helper functions for manager. Signed-off-by: Peter Štibraný <[email protected]> * Use helper functions. Signed-off-by: Peter Štibraný <[email protected]> * Fixes, use helper functions. Signed-off-by: Peter Štibraný <[email protected]> * Fixes, use helper functions. Signed-off-by: Peter Štibraný <[email protected]> * comment Signed-off-by: Peter Štibraný <[email protected]> * Helper function Signed-off-by: Peter Štibraný <[email protected]> * Use helper functions to reduce amount of code. Signed-off-by: Peter Štibraný <[email protected]> * Added tests for helper functions. Signed-off-by: Peter Štibraný <[email protected]> * Fixed imports Signed-off-by: Peter Štibraný <[email protected]> * Comment Signed-off-by: Peter Štibraný <[email protected]> * Simplify code. Signed-off-by: Peter Štibraný <[email protected]> * Stop and wait until stopped. Signed-off-by: Peter Štibraný <[email protected]> * Imports Signed-off-by: Peter Štibraný <[email protected]> * Manage compaction and shipper via subservices manager. Signed-off-by: Peter Štibraný <[email protected]> * Improve error message. Signed-off-by: Peter Štibraný <[email protected]> * Comment. Signed-off-by: Peter Štibraný <[email protected]> * Comment. Signed-off-by: Peter Štibraný <[email protected]> * Added unit test for Cortex initialization and module dependencies. Signed-off-by: Peter Štibraný <[email protected]> * Comments, return errors. Signed-off-by: Peter Štibraný <[email protected]> * Unified /ready handlers into one, that reflects state of all services. It also uses ingester's check (if configured) to limit rate of starting ingesters. Signed-off-by: Peter Štibraný <[email protected]> * Added //nolint:errcheck to `defer services.StopAndAwaitTerminated(...)` calls. Signed-off-by: Peter Štibraný <[email protected]> * Fix http handler logic. Also renamed it. Signed-off-by: Peter Štibraný <[email protected]> * Address review feedback. Signed-off-by: Peter Štibraný <[email protected]> * One more test... since Shutdown already stops Run, no need to call Stop(). We can use Stop in the test instead. Signed-off-by: Peter Štibraný <[email protected]> * CHANGELOG.md Signed-off-by: Peter Štibraný <[email protected]> * Fixed integration test, old versions of Cortex didn't have /ready probe. Signed-off-by: Peter Štibraný <[email protected]> * Make lint happy. Signed-off-by: Peter Štibraný <[email protected]> * Mention /ready for all services. Signed-off-by: Peter Štibraný <[email protected]> * Wrap "not running" error into promql.ErrStorage. That makes API handler to return HTTP code 500 (server error) instead of 422 (client error). Signed-off-by: Peter Štibraný <[email protected]> * Expose http port via method on HTTPService. Signed-off-by: Peter Štibraný <[email protected]> * Print number of services in each state. Signed-off-by: Peter Štibraný <[email protected]> * Fix comment and remove obsolete line. Signed-off-by: Peter Štibraný <[email protected]> * Fix compile errors after rebase. Signed-off-by: Peter Štibraný <[email protected]> * Rebased and converted data purger to a service. Signed-off-by: Peter Štibraný <[email protected]> * Pass context to the bucket client. Signed-off-by: Peter Štibraný <[email protected]>
What this PR does / why we need it:
Boltdb shipper keeps uploading modified boltdb files by ingesters, which are by default created weekly. Queriers also downloads them during reads and keeps downloading the updated files. This means for a big cluster, files could grow a lot and ingesters have to upload bigger files even for minor updates and on the other side Queriers have to download them.
This PR enforces periodic config for index tables to be 24h when using boltdb shipper, which would help create smaller files. We anyways wanted to move towards that in future to help to add more components which can modify the index other than ingesters. The plan is to let only ingesters modify last 24 hours of the index and let other components to modify the rest of the index.
I think we should do it now while boltdb shipper is still in an experimental stage to avoid any problems in future and doing the provisioning in code for variable index file sizes.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Since this would not work with existing deployments which already used non-24h periodic duration, I am going to build a helper tool to rebuild boltdb files to make them have 24 hours worth of index.
Checklist