Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added shipper capability to sync compacted blocks once. #731

Merged
merged 1 commit into from
Feb 8, 2019

Conversation

bwplotka
Copy link
Member

@bwplotka bwplotka commented Jan 14, 2019

Added shipper capability to sync compacted blocks once.

Changes:

  • Use shipper metrics properly.
  • Do not add compacted blocks to uploaded arr in thanos shipper meta if they are not uploaded.
  • Added flag for syncing compacted blocks once.
  • Enable working on different work dir.
  • Enable using snapshot API to avoid race conditions.
  • Ignore empty blocks.

Signed-off-by: Bartek Plotka [email protected]

Copy link
Member

@jojohappy jojohappy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! LGTM! just some minor nits for me!

cmd/thanos/bucket.go Outdated Show resolved Hide resolved
cmd/thanos/bucket.go Outdated Show resolved Hide resolved
pkg/promclient/promclient.go Outdated Show resolved Hide resolved
pkg/shipper/shipper.go Outdated Show resolved Hide resolved
@bwplotka bwplotka changed the title Added shipper capability to sync compacted blocks once. Hooked in a sync command. [sync part4] Added shipper capability to sync compacted blocks once. Hooked in a sync command. Jan 21, 2019
@bwplotka bwplotka force-pushed the promclient-snapshot branch from 0e0b864 to bbd0b52 Compare January 23, 2019 18:22
pkg/shipper/shipper.go Outdated Show resolved Hide resolved
Copy link
Member

@FUSAKLA FUSAKLA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, ok from me just few minor suggestions or nits.

if *dataDir != "" {
level.Warn(logger).Log("msg", "is Prometheus operating on specified data dir? If yes, turn it's local compaction or leave dataDir empty to safely snapshot data to avoid races.")
}
level.Warn(logger).Log("msg", "have you turned off compactor? Currently it is unsafe to run compactor in the same time as this command! (y/N)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe adding flag --force-yes would be good to allow automating?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on other hand, you can do the same with just yes command on unix, so let's skip this for now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, you are right we can skip it.

cmd/thanos/bucket.go Outdated Show resolved Hide resolved
cmd/thanos/bucket.go Outdated Show resolved Hide resolved
pkg/shipper/shipper.go Outdated Show resolved Hide resolved
@bwplotka bwplotka force-pushed the promclient-snapshot branch from bbd0b52 to a8391db Compare January 30, 2019 13:32
@bwplotka bwplotka force-pushed the sync-command branch 2 times, most recently from 8c9de1f to d12b918 Compare January 30, 2019 13:48
@bwplotka bwplotka force-pushed the promclient-snapshot branch from a8391db to 8abd8d1 Compare January 30, 2019 13:51
@bwplotka bwplotka force-pushed the promclient-snapshot branch 4 times, most recently from 737cf23 to e15f983 Compare February 1, 2019 20:48
@bwplotka bwplotka changed the base branch from promclient-snapshot to master February 1, 2019 21:21
@bwplotka bwplotka force-pushed the sync-command branch 7 times, most recently from ad0c1fa to bbdbe00 Compare February 5, 2019 17:21
@bwplotka bwplotka changed the title [sync part4] Added shipper capability to sync compacted blocks once. Hooked in a sync command. Added shipper capability to sync compacted blocks once. Feb 5, 2019
@bwplotka
Copy link
Member Author

bwplotka commented Feb 5, 2019

Moved from command to proper migration handling in sidecar. Tested that during FOSDEM demo. (:

@bwplotka bwplotka force-pushed the sync-command branch 3 times, most recently from 43fa7f3 to 17c496b Compare February 6, 2019 00:22
@bwplotka bwplotka mentioned this pull request Feb 6, 2019
3 tasks
Copy link
Member

@FUSAKLA FUSAKLA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :) just few questions

@@ -51,13 +57,18 @@ func newMetrics(r prometheus.Registerer) *metrics {
Name: "thanos_shipper_upload_failures_total",
Help: "Total number of failed object uploads",
})
m.uploadedCompacted = prometheus.NewGauge(prometheus.GaugeOpts{
Name: "thanos_shipper_upload_compacted_done",
Help: "If 1 it means shipper uploaded all compacted blocks from the filesystem.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity what is the use-case for this metric?
It strikes me that shipping of compacted blocks is one time issue during migration of pure Prometheus setup.
In that case you should see it in the logs because you know it happens in this time and after that you will disable the compacted syncing I suppose?

Maybe I'm just missing the use-case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at a minimum, this metric should only be registered if uploadCompacted is true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, but wanted just more flexible way to tell if that one operation is done or not....will register it only on that flag good point 👍

return errors.New("method Start was not invoked.")
}
return runutil.Retry(time.Second, ctx.Done(), func() error {
r, err := http.Get(fmt.Sprintf("http://%s/metrics", p.addr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to check the /-/ready endpoint? IIRC /metrics is ready sooner than the instance is ready to answer queries for example.

return errors.Wrap(err, "failed to kill Prometheus. Kill it manually")
}

time.Sleep(time.Second / 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use p.wait() ? https://golang.org/pkg/os/#Process.Wait
in case of lazy stop running two instances on the same data could cause issues.

Copy link
Member

@jojohappy jojohappy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a small question.

@@ -295,6 +457,7 @@ func hardlinkBlock(src, dst string) error {
type Meta struct {
Version int `json:"version"`
Uploaded []ulid.ULID `json:"uploaded"`
Ignored []ulid.ULID `json:"ignored"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this filed used?

Copy link
Member Author

@bwplotka bwplotka Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, good point!

@@ -27,6 +27,7 @@ const (
CompactorRepairSource SourceType = "compactor.repair"
RulerSource SourceType = "ruler"
BucketRepairSource SourceType = "bucket.repair"
BucketSyncSource SourceType = "bucket.sync"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

@@ -51,6 +51,8 @@ func registerSidecar(m map[string]setupFunc, app *kingpin.Application, name stri

objStoreConfig := regCommonObjStoreFlags(cmd, "", false)

uploadCompacted := cmd.Flag("shipper.upload-compacted", "[Experimental] If true sidecar will try to upload compacted blocks as well. Useful for migration purposes. Works only if compaction is disabled on Prometheus.").Hidden().Bool()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably be explicit and Default("false") here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

@bwplotka
Copy link
Member Author

bwplotka commented Feb 7, 2019

Addressed all comments @SuperQ @FUSAKLA

Copy link
Contributor

@domgreen domgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - minor naming nits :)

// NewMigrating creates a new shipper that detects new TSDB blocks in dir and uploads them
// to remote if necessary, including compacted blocks which are already in filesystem.
// It attaches the Thanos metadata section in each meta JSON file.
func NewMigrating(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - would update to NewWithCompacted its a shipper that also ships compacted blocks.

// If updload
//
// It is not concurrency-safe, however it is compactor-safe (running concurrently with compactor is ok)
func (s *Shipper) Sync(ctx context.Context) (uploaded int, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - uploaded is used a few times in this func ... might be clearer to be something like uploadedCount

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm, I would stay with upload int - you can see integer so you can assume this?

)
// Sync non compacted blocks first.
if err := s.iterBlockMetas(func(m *metadata.Meta) error {
// Do not sync a block if we already uploaded or ignored it. If it is no longer found in the bucket,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - remove it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which it? (:

Copy link
Member Author

@bwplotka bwplotka Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess later - mgm not sure

Changes:
* Use shipper metrics properly.
* Do not add compacted blocks to `uploaded` arr in thanos shipper meta if they are not uploaded.
* Added flag for syncing compacted blocks once.
* Enable working on different work dir.
* Enable using snapshot API to avoid race conditions.
* Ignore empty blocks.

Signed-off-by: Bartek Plotka <[email protected]>
@bwplotka bwplotka merged commit 9eb0944 into master Feb 8, 2019
@bwplotka bwplotka deleted the sync-command branch February 8, 2019 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants