Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enforce requirment for periodic config for index tables to be 24h when using boltdb shipper #2166

Conversation

sandeepsukhani
Copy link
Contributor

What this PR does / why we need it:
Boltdb shipper keeps uploading modified boltdb files by ingesters, which are by default created weekly. Queriers also downloads them during reads and keeps downloading the updated files. This means for a big cluster, files could grow a lot and ingesters have to upload bigger files even for minor updates and on the other side Queriers have to download them.

This PR enforces periodic config for index tables to be 24h when using boltdb shipper, which would help create smaller files. We anyways wanted to move towards that in future to help to add more components which can modify the index other than ingesters. The plan is to let only ingesters modify last 24 hours of the index and let other components to modify the rest of the index.

I think we should do it now while boltdb shipper is still in an experimental stage to avoid any problems in future and doing the provisioning in code for variable index file sizes.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
Since this would not work with existing deployments which already used non-24h periodic duration, I am going to build a helper tool to rebuild boltdb files to make them have 24 hours worth of index.

Checklist

  • Documentation added

@sandeepsukhani sandeepsukhani force-pushed the boltdb-shipper-24h-periodic-config branch from 77aa519 to a80ad48 Compare June 3, 2020 09:17
@codecov-commenter
Copy link

codecov-commenter commented Jun 3, 2020

Codecov Report

Attention: Patch coverage is 82.14286% with 5 lines in your changes missing coverage. Please review.

Project coverage is 62.77%. Comparing base (2a596a7) to head (21ea033).

Files with missing lines Patch % Lines
pkg/loki/modules.go 0.00% 5 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2166      +/-   ##
==========================================
- Coverage   62.91%   62.77%   -0.14%     
==========================================
  Files         162      162              
  Lines       13952    13959       +7     
==========================================
- Hits         8778     8763      -15     
- Misses       4491     4512      +21     
- Partials      683      684       +1     
Files with missing lines Coverage Δ
pkg/loki/loki.go 0.00% <ø> (ø)
pkg/storage/store.go 66.48% <100.00%> (+4.45%) ⬆️
pkg/loki/modules.go 5.04% <0.00%> (-5.73%) ⬇️

... and 3 files with indirect coverage changes

@cyriltovena cyriltovena requested a review from slim-bean June 9, 2020 18:59
@stale
Copy link

stale bot commented Jul 9, 2020

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Jul 9, 2020
@stale stale bot closed this Jul 17, 2020
@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Jul 17, 2020
@sandeepsukhani sandeepsukhani added the keepalive An issue or PR that will be kept alive and never marked as stale. label Jul 17, 2020
@sandeepsukhani sandeepsukhani force-pushed the boltdb-shipper-24h-periodic-config branch from a80ad48 to 4f37230 Compare July 27, 2020 08:20
@pull-request-size pull-request-size bot added size/L and removed size/M labels Jul 27, 2020
Copy link
Collaborator

@slim-bean slim-bean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sandeepsukhani sandeepsukhani force-pushed the boltdb-shipper-24h-periodic-config branch from bf669f2 to 0c07aba Compare July 30, 2020 07:44
@sandeepsukhani sandeepsukhani merged commit d5c840b into grafana:master Jul 30, 2020
cyriltovena pushed a commit to cyriltovena/loki that referenced this pull request Jun 11, 2021
* First step towards Cortex using services model. Only Server module converted.

Signed-off-by: Peter Štibraný <[email protected]>

* Converted runtimeconfig.Manager to service.

Signed-off-by: Peter Štibraný <[email protected]>

* Added /services page.

Signed-off-by: Peter Štibraný <[email protected]>

* Converted memberlistKV to service.

Signed-off-by: Peter Štibraný <[email protected]>

* Fixed runtimeconfig tests.

Signed-off-by: Peter Štibraný <[email protected]>

* Converted Ring to service.

Signed-off-by: Peter Štibraný <[email protected]>

* Log waiting for other module to initialize.

Signed-off-by: Peter Štibraný <[email protected]>

* Converted overrides to service.

Signed-off-by: Peter Štibraný <[email protected]>

* Converted client pool to a service.

Signed-off-by: Peter Štibraný <[email protected]>

* Convert ring lifecycler into a service.

Signed-off-by: Peter Štibraný <[email protected]>

* Converted HA Tracker to a service.

Signed-off-by: Peter Štibraný <[email protected]>

* Converted Distributor to a service.

Signed-off-by: Peter Štibraný <[email protected]>

* Handle nil from wrappedService call.

Signed-off-by: Peter Štibraný <[email protected]>

* Explain why Server uses service and not a wrappedService.

Signed-off-by: Peter Štibraný <[email protected]>

* Make service from Store.

Signed-off-by: Peter Štibraný <[email protected]>

* Convert ingester to a service.

Signed-off-by: Peter Štibraný <[email protected]>

* Convert querier initialization into a service. This isn't full
conversion, but more work would be required to fully use services
for querier. Left TODO comment instead.

Signed-off-by: Peter Štibraný <[email protected]>

* Listen for module failures, and log them.

Also log when all services start successfully.

Signed-off-by: Peter Štibraný <[email protected]>

* Convert blockQueryable and UserStore into a services.

UserStore now does initial sync in Starting state.
blockQueryable doesn't return querier until it has finished starting.

Signed-off-by: Peter Štibraný <[email protected]>

* Wait a little before shutdown.

Signed-off-by: Peter Štibraný <[email protected]>

* Converted queryFrontend to a service.

Signed-off-by: Peter Štibraný <[email protected]>

* Logging

Signed-off-by: Peter Štibraný <[email protected]>

* Convert TableManager to service

Signed-off-by: Peter Štibraný <[email protected]>

* Convert Ruler to service

Signed-off-by: Peter Štibraný <[email protected]>

* Convert Configs to service

Signed-off-by: Peter Štibraný <[email protected]>

* Convert AlertManager to service.

Signed-off-by: Peter Štibraný <[email protected]>

* Renamed init methods back.

Signed-off-by: Peter Štibraný <[email protected]>

* Fixed tests.

Signed-off-by: Peter Štibraný <[email protected]>

* Converted Compactor to a service.

Signed-off-by: Peter Štibraný <[email protected]>

* Polishing, comments.

Signed-off-by: Peter Štibraný <[email protected]>

* Comments.

Signed-off-by: Peter Štibraný <[email protected]>

* Lint comments.

Signed-off-by: Peter Štibraný <[email protected]>

* Stop server only after all other modules have finished.

Signed-off-by: Peter Štibraný <[email protected]>

* Don't send more jobs to lifecycler loop, if it's not running anymore.

Signed-off-by: Peter Štibraný <[email protected]>

* Don't stop Server until other modules have stopped.

Signed-off-by: Peter Štibraný <[email protected]>

* Removed Compactor from All target. It was meant to be for testing only.

Signed-off-by: Peter Štibraný <[email protected]>

* Comment.

Signed-off-by: Peter Štibraný <[email protected]>

* More comments around startup logic.

Signed-off-by: Peter Štibraný <[email protected]>

* moduleServiceWrapper doesn't need full Cortex, only serviceMap

Signed-off-by: Peter Štibraný <[email protected]>

* Messages

Signed-off-by: Peter Štibraný <[email protected]>

* Fix outdated comment.

Signed-off-by: Peter Štibraný <[email protected]>

* Start lifecycler in starting functions.

Signed-off-by: Peter Štibraný <[email protected]>

* Fixed comment. Return lifecycler's failure case, if any, as error.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix lifecycler usage.

Pass independent context to lifecycler, so that it doesn't stop too early.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix test.

Signed-off-by: Peter Štibraný <[email protected]>

* Removed obsolete waiting code. Only log error if it is real error.

Signed-off-by: Peter Štibraný <[email protected]>

* Renamed servManager to subservices.

Signed-off-by: Peter Štibraný <[email protected]>

* Addressing review feedback.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix test.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix compilation errors after rebase.

Signed-off-by: Peter Štibraný <[email protected]>

* Extracted code that creates server service into separate file.

Signed-off-by: Peter Štibraný <[email protected]>

* Added some helper methods.

Signed-off-by: Peter Štibraný <[email protected]>

* Use helper methods to simplify service creation.

Signed-off-by: Peter Štibraný <[email protected]>

* Comment.

Signed-off-by: Peter Štibraný <[email protected]>

* Helper functions for manager.

Signed-off-by: Peter Štibraný <[email protected]>

* Use helper functions.

Signed-off-by: Peter Štibraný <[email protected]>

* Fixes, use helper functions.

Signed-off-by: Peter Štibraný <[email protected]>

* Fixes, use helper functions.

Signed-off-by: Peter Štibraný <[email protected]>

* comment

Signed-off-by: Peter Štibraný <[email protected]>

* Helper function

Signed-off-by: Peter Štibraný <[email protected]>

* Use helper functions to reduce amount of code.

Signed-off-by: Peter Štibraný <[email protected]>

* Added tests for helper functions.

Signed-off-by: Peter Štibraný <[email protected]>

* Fixed imports

Signed-off-by: Peter Štibraný <[email protected]>

* Comment

Signed-off-by: Peter Štibraný <[email protected]>

* Simplify code.

Signed-off-by: Peter Štibraný <[email protected]>

* Stop and wait until stopped.

Signed-off-by: Peter Štibraný <[email protected]>

* Imports

Signed-off-by: Peter Štibraný <[email protected]>

* Manage compaction and shipper via subservices manager.

Signed-off-by: Peter Štibraný <[email protected]>

* Improve error message.

Signed-off-by: Peter Štibraný <[email protected]>

* Comment.

Signed-off-by: Peter Štibraný <[email protected]>

* Comment.

Signed-off-by: Peter Štibraný <[email protected]>

* Added unit test for Cortex initialization and module dependencies.

Signed-off-by: Peter Štibraný <[email protected]>

* Comments, return errors.

Signed-off-by: Peter Štibraný <[email protected]>

* Unified /ready handlers into one, that reflects state of all services.

It also uses ingester's check (if configured) to limit rate of starting
ingesters.

Signed-off-by: Peter Štibraný <[email protected]>

* Added //nolint:errcheck to `defer services.StopAndAwaitTerminated(...)` calls.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix http handler logic. Also renamed it.

Signed-off-by: Peter Štibraný <[email protected]>

* Address review feedback.

Signed-off-by: Peter Štibraný <[email protected]>

* One more test... since Shutdown already stops Run, no need to call Stop().

We can use Stop in the test instead.

Signed-off-by: Peter Štibraný <[email protected]>

* CHANGELOG.md

Signed-off-by: Peter Štibraný <[email protected]>

* Fixed integration test, old versions of Cortex didn't have /ready probe.

Signed-off-by: Peter Štibraný <[email protected]>

* Make lint happy.

Signed-off-by: Peter Štibraný <[email protected]>

* Mention /ready for all services.

Signed-off-by: Peter Štibraný <[email protected]>

* Wrap "not running" error into promql.ErrStorage.

That makes API handler to return HTTP code 500 (server error)
instead of 422 (client error).

Signed-off-by: Peter Štibraný <[email protected]>

* Expose http port via method on HTTPService.

Signed-off-by: Peter Štibraný <[email protected]>

* Print number of services in each state.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix comment and remove obsolete line.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix compile errors after rebase.

Signed-off-by: Peter Štibraný <[email protected]>

* Rebased and converted data purger to a service.

Signed-off-by: Peter Štibraný <[email protected]>

* Pass context to the bucket client.

Signed-off-by: Peter Štibraný <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive An issue or PR that will be kept alive and never marked as stale. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants