Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remember shards that fail Open(), avoid repeated attempts #23437

Merged
merged 3 commits into from
Jun 13, 2022

Conversation

davidby-influx
Copy link
Contributor

@davidby-influx davidby-influx commented Jun 11, 2022

If a shard cannot be opened store its ID and last error.
Prevent future attempts to open during this invocation of
influxDB. This information is not persisted, however.

partial close of #23428

closes #23426

If a shard cannot be opened, store its ID and last error.
Prevent future attempts to open during this invocation of
influxDB. This information is not persisted.

closes #23426
@davidby-influx davidby-influx marked this pull request as ready for review June 11, 2022 16:28
@davidby-influx davidby-influx requested a review from lesam June 11, 2022 16:41
Copy link
Contributor

@lesam lesam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

Also, I edited #23428 to also include setting max-series related errors at Warn as we discussed - can we do that here?

@@ -407,7 +407,7 @@ func (w *PointsWriter) writeToShard(writeCtx tsdb.WriteContext, shard *meta.Shar
// store has not actually created this shard, tell it to create it and
// retry the write
if err = w.TSDBStore.CreateShard(database, retentionPolicy, shard.ID, true); err != nil {
w.Logger.Info("Write failed", zap.Uint64("shard", shard.ID), zap.Error(err))
w.Logger.Error("Write failed creating shard", zap.Uint64("shard", shard.ID), zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is maybe safe as Warn, the write will fail back to the caller.

if err != nil {
log.Info("Failed to open shard", logger.Shard(shardID), zap.Error(err))
log.Error("Failed to open shard", logger.Shard(shardID), zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one definitely should be an error, since we are failing to open a shard with existing data.

require.EqualError(t, err2, "opening shard previously failed with: "+errStr)

// This should succeed with the force (and because opening an open shard automatically succeeds)
require.NoError(t, s.OpenShard(sh.Shard, true), "forced re-opening previously failing shard")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need force, since it is only used in this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was discussion of adding a forced retry option on shard opening, so I added the force parameter to ensure we had a tested mechanism for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep this parameter, should it be allowRetry instead of force? Or, maybe a const rather than a bool flag so the calling signature looks like s.OpenShard(sh.Shard, allowRetry)?

@davidby-influx
Copy link
Contributor Author

I would like to implement the change of reporting MaxSeriesDataBaseExceeded as a separate issue, because the changes are wide-ranging (it gets passed around as a string field in a PartialWriteError and that has to be converted in a error to allow distinguishing it from other partial writes.

@davidby-influx davidby-influx requested a review from lesam June 13, 2022 16:37
@lesam
Copy link
Contributor

lesam commented Jun 13, 2022

I would like to implement the change of reporting MaxSeriesDataBaseExceeded as a separate issue, because the changes are wide-ranging (it gets passed around as a string field in a PartialWriteError and that has to be converted in a error to allow distinguishing it from other partial writes.

Added to https://github.com/influxdata/plutonium/issues/3870 (general logging audit)

@davidby-influx davidby-influx merged commit 54ac7e5 into master-1.x Jun 13, 2022
@davidby-influx davidby-influx deleted the DSB_shard_open branch June 13, 2022 17:33
davidby-influx added a commit that referenced this pull request Jun 13, 2022
If a shard cannot be opened, store its ID and last error.
Prevent future attempts to open during this invocation of
influxDB. This information is not persisted.

closes #23428
closes #23426

(cherry picked from commit 54ac7e5)

closes #23433
closes #23435
davidby-influx added a commit that referenced this pull request Jun 13, 2022
… (#23444)

If a shard cannot be opened, store its ID and last error.
Prevent future attempts to open during this invocation of
influxDB. This information is not persisted.

closes #23428
closes #23426

(cherry picked from commit 54ac7e5)

closes #23433
closes #23435
davidby-influx added a commit that referenced this pull request Jun 14, 2022
If a shard cannot be opened, store its ID and last error.
Prevent future attempts to open during this invocation of
influxDB. This information is not persisted.

closes #23428
closes #23426

(cherry picked from commit 54ac7e5)

closes #23434
closes #23436
davidby-influx added a commit that referenced this pull request Jun 14, 2022
… (#23455)

If a shard cannot be opened, store its ID and last error.
Prevent future attempts to open during this invocation of
influxDB. This information is not persisted.

closes #23428
closes #23426

(cherry picked from commit 54ac7e5)

closes #23434
closes #23436
chengshiwen pushed a commit to chengshiwen/influxdb that referenced this pull request Aug 27, 2024
…xdata#23437)

If a shard cannot be opened, store its ID and last error.
Prevent future attempts to open during this invocation of
influxDB. This information is not persisted.

closes influxdata#23428
closes influxdata#23426
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants