Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3: clone putUserMetadata map to prevent concurrent modification #78

Merged
merged 2 commits into from
Sep 21, 2023

Conversation

andyasp
Copy link
Contributor

@andyasp andyasp commented Sep 20, 2023

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

When pulling in the latest objstore in a Mimir PR (grafana/mimir#6061) the tests ran into a data race:

Logs
21:56:02 alertmanager-1: ==================
21:56:02 alertmanager-1: WARNING: DATA RACE
21:56:02 alertmanager-1: Read at 0x00c0007b32f0 by goroutine 524:
21:56:02 alertmanager-1: runtime.mapiterinit()
21:56:02 alertmanager-1: /usr/local/go/src/runtime/map.go:816 +0x0
21:56:02 alertmanager-1: github.com/minio/minio-go/v7.PutObjectOptions.validate()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/minio/minio-go/v7/api-put-object.go:227 +0x66
21:56:02 alertmanager-1: github.com/minio/minio-go/v7.(*Client).PutObject()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/minio/minio-go/v7/api-put-object.go:277 +0x184
21:56:02 alertmanager-1: github.com/thanos-io/objstore/providers/s3.(*Bucket).Upload()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/thanos-io/objstore/providers/s3/s3.go:495 +0x655
21:56:02 alertmanager-1: github.com/thanos-io/objstore.(*metricBucket).Upload()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/thanos-io/objstore/objstore.go:606 +0x328
21:56:02 alertmanager-1: github.com/thanos-io/objstore/tracing/opentracing.(*TracingBucket).Upload.TracingBucket.Upload.func1()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/thanos-io/objstore/tracing/opentracing/opentracing.go:102 +0x18a
21:56:02 alertmanager-1: github.com/thanos-io/objstore/tracing/opentracing.doWithSpan()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/thanos-io/objstore/tracing/opentracing/opentracing.go:215 +0x142
21:56:02 alertmanager-1: github.com/thanos-io/objstore/tracing/opentracing.TracingBucket.Upload()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/thanos-io/objstore/tracing/opentracing/opentracing.go:100 +0x146
21:56:02 alertmanager-1: github.com/thanos-io/objstore/tracing/opentracing.(*TracingBucket).Upload()
21:56:02 alertmanager-1: <autogenerated>:1 +0x29
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/storage/bucket.(*PrefixedBucketClient).Upload()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/storage/bucket/prefixed_bucket_client.go:40 +0xdd
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/storage/bucket.(*PrefixedBucketClient).Upload()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/storage/bucket/prefixed_bucket_client.go:40 +0xdd
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/storage/bucket.(*SSEBucketClient).Upload()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/storage/bucket/sse_bucket_client.go:64 +0x139
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager/alertstore/bucketclient.(*BucketAlertStore).SetFullState()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/alertstore/bucketclient/bucket_client.go:166 +0x161
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*statePersister).persist()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/state_persister.go:130 +0x4cf
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*statePersister).iteration()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/state_persister.go:99 +0x5b
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*statePersister).iteration-fm()
21:56:02 alertmanager-1: <autogenerated>:1 +0x47
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.newStatePersister.NewTimerService.func1()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/services.go:33 +0x195
21:56:02 alertmanager-1: github.com/grafana/dskit/services.(*BasicService).main()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/basic_service.go:190 +0x34c
21:56:02 alertmanager-1: github.com/grafana/dskit/services.(*BasicService).StartAsync.func1.2()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/basic_service.go:119 +0x33
21:56:02 alertmanager-1: Previous write at 0x00c0007b32f0 by goroutine 454:
21:56:02 alertmanager-1: runtime.mapdelete_faststr()
21:56:02 alertmanager-1: /usr/local/go/src/runtime/map_faststr.go:301 +0x0
21:56:02 alertmanager-1: github.com/minio/minio-go/v7.(*Client).putObjectMultipartStreamNoLength()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/minio/minio-go/v7/api-put-object.go:359 +0x2b8
21:56:02 alertmanager-1: github.com/minio/minio-go/v7.(*Client).putObjectCommon()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/minio/minio-go/v7/api-put-object.go:315 +0x6f7
21:56:02 alertmanager-1: github.com/minio/minio-go/v7.(*Client).PutObject()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/minio/minio-go/v7/api-put-object.go:282 +0x2e4
21:56:02 alertmanager-1: github.com/thanos-io/objstore/providers/s3.(*Bucket).Upload()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/thanos-io/objstore/providers/s3/s3.go:495 +0x655
21:56:02 alertmanager-1: github.com/thanos-io/objstore.(*metricBucket).Upload()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/thanos-io/objstore/objstore.go:606 +0x328
21:56:02 alertmanager-1: github.com/thanos-io/objstore/tracing/opentracing.(*TracingBucket).Upload.TracingBucket.Upload.func1()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/thanos-io/objstore/tracing/opentracing/opentracing.go:102 +0x18a
21:56:02 alertmanager-1: github.com/thanos-io/objstore/tracing/opentracing.doWithSpan()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/thanos-io/objstore/tracing/opentracing/opentracing.go:215 +0x142
21:56:02 alertmanager-1: github.com/thanos-io/objstore/tracing/opentracing.TracingBucket.Upload()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/thanos-io/objstore/tracing/opentracing/opentracing.go:100 +0x146
21:56:02 alertmanager-1: github.com/thanos-io/objstore/tracing/opentracing.(*TracingBucket).Upload()
21:56:02 alertmanager-1: <autogenerated>:1 +0x29
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/storage/bucket.(*PrefixedBucketClient).Upload()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/storage/bucket/prefixed_bucket_client.go:40 +0xdd
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/storage/bucket.(*PrefixedBucketClient).Upload()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/storage/bucket/prefixed_bucket_client.go:40 +0xdd
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/storage/bucket.(*SSEBucketClient).Upload()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/storage/bucket/sse_bucket_client.go:64 +0x139
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager/alertstore/bucketclient.(*BucketAlertStore).SetFullState()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/alertstore/bucketclient/bucket_client.go:166 +0x161
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*statePersister).persist()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/state_persister.go:130 +0x4cf
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*statePersister).iteration()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/state_persister.go:99 +0x5b
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*statePersister).iteration-fm()
21:56:02 alertmanager-1: <autogenerated>:1 +0x47
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.newStatePersister.NewTimerService.func1()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/services.go:33 +0x195
21:56:02 alertmanager-1: github.com/grafana/dskit/services.(*BasicService).main()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/basic_service.go:190 +0x34c
21:56:02 alertmanager-1: github.com/grafana/dskit/services.(*BasicService).StartAsync.func1.2()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/basic_service.go:119 +0x33
21:56:02 alertmanager-1: Goroutine 524 (running) created at:
21:56:02 alertmanager-1: github.com/grafana/dskit/services.(*BasicService).StartAsync.func1()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/basic_service.go:119 +0x1dc
21:56:02 alertmanager-1: github.com/grafana/dskit/services.(*BasicService).switchState()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/basic_service.go:139 +0x10b
21:56:02 alertmanager-1: github.com/grafana/dskit/services.(*BasicService).StartAsync()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/basic_service.go:116 +0x86
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.New()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/alertmanager.go:236 +0x162a
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*MultitenantAlertmanager).newAlertmanager()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/multitenant.go:774 +0x896
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*MultitenantAlertmanager).setConfig()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/multitenant.go:743 +0x19ad
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*MultitenantAlertmanager).syncConfigs()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/multitenant.go:625 +0x452
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*MultitenantAlertmanager).loadAndSyncConfigs()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/multitenant.go:536 +0x315
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*MultitenantAlertmanager).starting()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/multitenant.go:461 +0x904
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*MultitenantAlertmanager).starting-fm()
21:56:02 alertmanager-1: <autogenerated>:1 +0x47
21:56:02 alertmanager-1: github.com/grafana/dskit/services.(*BasicService).main()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/basic_service.go:157 +0xe6
21:56:02 alertmanager-1: github.com/grafana/dskit/services.(*BasicService).StartAsync.func1.2()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/basic_service.go:119 +0x33
21:56:02 alertmanager-1: Goroutine 454 (running) created at:
21:56:02 alertmanager-1: github.com/grafana/dskit/services.(*BasicService).StartAsync.func1()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/basic_service.go:119 +0x1dc
21:56:02 alertmanager-1: github.com/grafana/dskit/services.(*BasicService).switchState()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/basic_service.go:139 +0x10b
21:56:02 alertmanager-1: github.com/grafana/dskit/services.(*BasicService).StartAsync()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/basic_service.go:116 +0x86
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.New()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/alertmanager.go:236 +0x162a
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*MultitenantAlertmanager).newAlertmanager()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/multitenant.go:774 +0x896
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*MultitenantAlertmanager).setConfig()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/multitenant.go:743 +0x19ad
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*MultitenantAlertmanager).syncConfigs()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/multitenant.go:625 +0x452
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*MultitenantAlertmanager).loadAndSyncConfigs()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/multitenant.go:536 +0x315
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*MultitenantAlertmanager).starting()
21:56:02 alertmanager-1: /__w/mimir/mimir/pkg/alertmanager/multitenant.go:461 +0x904
21:56:02 alertmanager-1: github.com/grafana/mimir/pkg/alertmanager.(*MultitenantAlertmanager).starting-fm()
21:56:02 alertmanager-1: <autogenerated>:1 +0x47
21:56:02 alertmanager-1: github.com/grafana/dskit/services.(*BasicService).main()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/basic_service.go:157 +0xe6
21:56:02 alertmanager-1: github.com/grafana/dskit/services.(*BasicService).StartAsync.func1.2()
21:56:02 alertmanager-1: /__w/mimir/mimir/vendor/github.com/grafana/dskit/services/basic_service.go:119 +0x33
21:56:02 alertmanager-1: ==================

From those logs I determined the race was from concurrent modification to the putUserMetadata map stored in Bucket:

// Bucket implements the store.Bucket interface against s3-compatible APIs.
type Bucket struct {
	logger          log.Logger
	name            string
	client          *minio.Client
	defaultSSE      encrypt.ServerSide
	putUserMetadata map[string]string
	storageClass    string
	partSize        uint64
	listObjectsV1   bool
}

The map is passed directly into the PutObjectOptions from the Bucket struct in s3's Upload:

	if _, err := b.client.PutObject(
		ctx,
		b.name,
		name,
		r,
		size,
		minio.PutObjectOptions{
			PartSize:             partSize,
			ServerSideEncryption: sse,
			UserMetadata:         b.putUserMetadata,
			StorageClass:         b.storageClass,
			// 4 is what minio-go have as the default. To be certain we do micro benchmark before any changes we
			// ensure we pin this number to four.
			// TODO(bwplotka): Consider adjusting this number to GOMAXPROCS or to expose this in config if it becomes bottleneck.
			NumThreads: 4,
		},
	); err != nil {
		return errors.Wrap(err, "upload s3 object")
	}

The problem fixed in #77 caused the upload to be done by minio's putObjectMultipartStreamNoLength, which surpisingly modifies the map passed in by writing/deleting "X-Amz-Checksum-Algorithm".

Verification

Reran the affected test locally with this modification vendored.

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yeya24
Copy link
Contributor

yeya24 commented Sep 21, 2023

Need to fix changelog so that I can merge it. Thanks

@andyasp andyasp force-pushed the clone-putUserMetadata branch from d417137 to a557e75 Compare September 21, 2023 12:41
@andyasp
Copy link
Contributor Author

andyasp commented Sep 21, 2023

Fixed the conflict, thanks!

@fpetkovski
Copy link
Contributor

Thanks @andyasp !

@fpetkovski fpetkovski merged commit 63a603e into thanos-io:main Sep 21, 2023
@andyasp andyasp deleted the clone-putUserMetadata branch September 21, 2023 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants