-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: remember shards that fail Open(), avoid repeated attempts #23437
Conversation
If a shard cannot be opened, store its ID and last error. Prevent future attempts to open during this invocation of influxDB. This information is not persisted. closes #23426
8b8c2f4
to
f6da4af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments.
Also, I edited #23428 to also include setting max-series
related errors at Warn
as we discussed - can we do that here?
coordinator/points_writer.go
Outdated
@@ -407,7 +407,7 @@ func (w *PointsWriter) writeToShard(writeCtx tsdb.WriteContext, shard *meta.Shar | |||
// store has not actually created this shard, tell it to create it and | |||
// retry the write | |||
if err = w.TSDBStore.CreateShard(database, retentionPolicy, shard.ID, true); err != nil { | |||
w.Logger.Info("Write failed", zap.Uint64("shard", shard.ID), zap.Error(err)) | |||
w.Logger.Error("Write failed creating shard", zap.Uint64("shard", shard.ID), zap.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one is maybe safe as Warn
, the write will fail back to the caller.
if err != nil { | ||
log.Info("Failed to open shard", logger.Shard(shardID), zap.Error(err)) | ||
log.Error("Failed to open shard", logger.Shard(shardID), zap.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one definitely should be an error, since we are failing to open a shard with existing data.
require.EqualError(t, err2, "opening shard previously failed with: "+errStr) | ||
|
||
// This should succeed with the force (and because opening an open shard automatically succeeds) | ||
require.NoError(t, s.OpenShard(sh.Shard, true), "forced re-opening previously failing shard") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need force
, since it is only used in this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was discussion of adding a forced retry option on shard opening, so I added the force
parameter to ensure we had a tested mechanism for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this parameter, should it be allowRetry
instead of force
? Or, maybe a const rather than a bool flag so the calling signature looks like s.OpenShard(sh.Shard, allowRetry)
?
I would like to implement the change of reporting MaxSeriesDataBaseExceeded as a separate issue, because the changes are wide-ranging (it gets passed around as a string field in a PartialWriteError and that has to be converted in a |
Added to https://github.com/influxdata/plutonium/issues/3870 (general logging audit) |
…xdata#23437) If a shard cannot be opened, store its ID and last error. Prevent future attempts to open during this invocation of influxDB. This information is not persisted. closes influxdata#23428 closes influxdata#23426
If a shard cannot be opened store its ID and last error.
Prevent future attempts to open during this invocation of
influxDB. This information is not persisted, however.
partial close of #23428
closes #23426