-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add storageID to tier_fs #8549
base: master
Are you sure you want to change the base?
Add storageID to tier_fs #8549
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice stuff, thanks for this.
I think that some places are are still missing storageID
,
Plus the ns
to namespace
renaming.
if err != nil { | ||
return nil, err | ||
} | ||
return NewValueIterator(it), nil | ||
} | ||
|
||
func (c *committedManager) WriteRange(ctx context.Context, ns graveler.StorageNamespace, it graveler.ValueIterator) (*graveler.RangeInfo, error) { | ||
writer, err := c.RangeManager.GetWriter(ctx, Namespace(ns), nil) | ||
func (c *committedManager) WriteRange(ctx context.Context, storageID graveler.StorageID, ns graveler.StorageNamespace, it graveler.ValueIterator) (*graveler.RangeInfo, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please rename ns
to namespace
?
So at least in the context of this manager, the term StorageID
will always be attached to Namespace
.
(Here, and in every function that have ns
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear what you're saying, but I don't think we should try to go into graveler and try to make all the names conform to some coding convention- that's not the goal of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the goal, but a side-effect. Renaming it to namespace
will help us maintain the code in its new form (that's including storageID
).
It's a simple search-and-replace in this file that will improve its readability and align with our general code style of this repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is like tabs/spaces. You end up with a commit war over code-style nonsense that's a huge waste of everyone's time.
ns is an acceptable variable name for Go: https://www.reddit.com/r/golang/comments/3aporh/why_so_many_gophers_use_single_letter_variables/
I prefer a little more descriptive names, but I'm not going to go through our whole codebase and start correcting other people's naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So at least rename storageID
to sid
to keep the same convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can both be right!
- We can decide that "ns" is a "familiar variable name". Reddit is not authoritative; the Google Go Style Guide is, and this is the section on abbreviated naming. Single letters are encouraged in receivers, and abbreviations are cool for a few other cases. I am not convinced that "ns" fits these, but I have no strong feelings.
- We should definitely not do this renaming in this PR, it is already quite large. It is perfectly acceptable to decide and then do this in a separate PR, either before or after this one. But not as part of this one.
@@ -228,6 +228,7 @@ type mergeContext struct { | |||
srcIt Iterator | |||
baseIt Iterator | |||
strategy graveler.MergeStrategy | |||
storageID graveler.StorageID | |||
ns graveler.StorageNamespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this also be renamed from ns
to namespace
, to align terminology?
@@ -14,6 +14,9 @@ type ID string | |||
// Namespace is namespace for ID ranges | |||
type Namespace string | |||
|
|||
// StorageID is id for object storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please place StorageID
above Namespace
(to be consistent).
pkg/graveler/graveler.go
Outdated
|
||
// Exists returns true if a MetaRange matching ID exists in namespace ns. | ||
Exists(ctx context.Context, ns StorageNamespace, id MetaRangeID) (bool, error) | ||
Exists(ctx context.Context, storageID StorageID, ns StorageNamespace, id MetaRangeID) (bool, error) | ||
|
||
// WriteMetaRangeByIterator flushes the iterator to a new MetaRange and returns the created ID. | ||
WriteMetaRangeByIterator(ctx context.Context, ns StorageNamespace, it ValueIterator, metadata Metadata) (*MetaRangeID, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't WriteMetaRangeByIterator
also get StorageID
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
|
||
// Commit is the act of taking an existing metaRange (snapshot) and applying a set of changes to it. | ||
// A change is either an entity to write/overwrite, or a tombstone to mark a deletion | ||
// it returns a new MetaRangeID that is expected to be immediately addressable | ||
Commit(ctx context.Context, ns StorageNamespace, baseMetaRangeID MetaRangeID, changes ValueIterator, allowEmpty bool, opts ...SetOptionsFunc) (MetaRangeID, DiffSummary, error) | ||
Commit(ctx context.Context, storageID StorageID, ns StorageNamespace, baseMetaRangeID MetaRangeID, changes ValueIterator, allowEmpty bool, opts ...SetOptionsFunc) (MetaRangeID, DiffSummary, error) | ||
|
||
// GetMetaRange returns information where metarangeID is stored. | ||
GetMetaRange(ctx context.Context, ns StorageNamespace, metaRangeID MetaRangeID) (MetaRangeAddress, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't GetMetaRange
and GetRange
also get StorageID
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please check with @arielshaqed or someone with more context if these params can be removed from these functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right! Please remove it, but in a separate PR. Otherwise it is very hard to navigate among so many changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both GetMetaRange
and GetRange
end up in the GetRemoteURI()
implementation, in tier_fs.go
.
So if I get it right, FS.GetRemoteURI
(in pyramid.go
) gets namespace
as a parameter, but it isn't used.
The way I see it, we have two options:
Either validate that namespace
is indeed not needed in the FS
interface and remove it,
Or assume that namespace
was added to this interface so that all of its methods will get namespace
as a parameter, and in this case storageID
should also be added, to align.
Co-authored-by: itaigilo <[email protected]>
0a12d54
to
3836080
Compare
OK I added to the other places. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Barging in again as my username was uttered.
First of all thanks for handling a change that manages to infiltrate everything! Even if we reduce explicitness somewhat as offered in some comments, it still obviously changes many components.
As a consequence, though this PR is really large and makes changes all over the code. This is obviously not bad! But I have some suggestions for how we might manage the scope here.
More smaller changes
I would really like it broken into smaller PRs or into more logical commits. For instance, there's a chain of removing the unneeded StorageNamespace argument from GetRemoteURI. This change seems very good - but it does not belong on this PR. Having to review it together with StorageIDs is much harder. (This request is not blocking, but I believe that it will speed time-to-merge of the sum of all PRs.)
Tests
On the dev side, I am unable to find anywhere that tests a non-empty storage ID. This might be worrying? (This request is blocking: we should test non-empty storage IDs, or state on the description of this PR why the existing tests are sufficient, or keep the issue open until we do have a PR that tests it.)
Scope
In some cases it might clarify the code to keep StorageID on some object, rather than drill it through the objects. For instance, AFAIU a MetaRangeManager and its committed.rangeManager refer to objects on a single StorageID; if we keep that StorageID on the object we might be able to avoid having to pass it through so many layers. Everything in the same commit should be on the same storage adapter, amirite? So the CommittedManager might just keep maps StorageID -> stuff, and then never pass all those StorageID arguments.
As a reminder, at some point in the design process we talked about keeping StorageID on the context of each request, and then retrieving it only in the adapter. We decided on the more explicit approach, but perhaps we can dial back on this explicitness? (@itaigilo and/or @N-o-Z can say).
(The only part of this request which is blocking is to understand whether StorageIDs need to be quite this explicit. Explaining why they should be explicit will unblock this.)
@@ -889,7 +889,7 @@ func createPrepareUncommittedTestScenario(t *testing.T, repositoryID string, num | |||
sort.Slice(diffs[i], func(ii, jj int) bool { | |||
return bytes.Compare(diffs[i][ii].Key, diffs[i][jj].Key) < 0 | |||
}) | |||
test.CommittedManager.EXPECT().Diff(gomock.Any(), repository.StorageNamespace, gomock.Any(), branches[i].CompactedBaseMetaRangeID).MinTimes(1).Return(gUtils.NewDiffIter(diffs[i]), nil) | |||
test.CommittedManager.EXPECT().Diff(gomock.Any(), gomock.Any(), repository.StorageNamespace, gomock.Any(), branches[i].CompactedBaseMetaRangeID).MinTimes(1).Return(gUtils.NewDiffIter(diffs[i]), nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we expect the correct storageID here and in other places?
|
||
rangeManager := mock.NewMockRangeManager(ctrl) | ||
|
||
writer.EXPECT().Abort().AnyTimes() | ||
metaRangeId := graveler.MetaRangeID("import") | ||
writer.EXPECT().Close(gomock.Any()).Return(&metaRangeId, nil).AnyTimes() | ||
committedManager := committed.NewCommittedManager(metaRangeManager, rangeManager, params) | ||
_, err := committedManager.Import(ctx, "ns", destMetaRangeID, sourceMetaRangeID, tst.prefixes) | ||
_, err := committedManager.Import(ctx, "", "ns", destMetaRangeID, sourceMetaRangeID, tst.prefixes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may indicate a TODO: Import will need to support multiple storages, may as well mark that we'll test it here (or elsewhere; depends on how you plan to do that).
if err != nil { | ||
return nil, err | ||
} | ||
return NewValueIterator(it), nil | ||
} | ||
|
||
func (c *committedManager) WriteRange(ctx context.Context, ns graveler.StorageNamespace, it graveler.ValueIterator) (*graveler.RangeInfo, error) { | ||
writer, err := c.RangeManager.GetWriter(ctx, Namespace(ns), nil) | ||
func (c *committedManager) WriteRange(ctx context.Context, storageID graveler.StorageID, ns graveler.StorageNamespace, it graveler.ValueIterator) (*graveler.RangeInfo, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can both be right!
- We can decide that "ns" is a "familiar variable name". Reddit is not authoritative; the Google Go Style Guide is, and this is the section on abbreviated naming. Single letters are encouraged in receivers, and abbreviations are cool for a few other cases. I am not convinced that "ns" fits these, but I have no strong feelings.
- We should definitely not do this renaming in this PR, it is already quite large. It is perfectly acceptable to decide and then do this in a separate PR, either before or after this one. But not as part of this one.
@@ -43,19 +43,19 @@ func NewMetaRangeManager(params Params, metaManager, rangeManager RangeManager) | |||
}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here and in range_manager.go seem odd. AFAICT, a metarange from a storage backend will manage ranges from that same backend. So why not hold storage ID on the metarange and range managers? Everything on them is either externally owned (like TierFS) or reference-counted (like the pebble read cache, which we manage as "cache Unrefer"). So committedManager might hold a map from StorageID to their metarange and range managers. This would allow us to limit the spread of StorageIDs throughout the code.
|
||
// Commit is the act of taking an existing metaRange (snapshot) and applying a set of changes to it. | ||
// A change is either an entity to write/overwrite, or a tombstone to mark a deletion | ||
// it returns a new MetaRangeID that is expected to be immediately addressable | ||
Commit(ctx context.Context, ns StorageNamespace, baseMetaRangeID MetaRangeID, changes ValueIterator, allowEmpty bool, opts ...SetOptionsFunc) (MetaRangeID, DiffSummary, error) | ||
Commit(ctx context.Context, storageID StorageID, ns StorageNamespace, baseMetaRangeID MetaRangeID, changes ValueIterator, allowEmpty bool, opts ...SetOptionsFunc) (MetaRangeID, DiffSummary, error) | ||
|
||
// GetMetaRange returns information where metarangeID is stored. | ||
GetMetaRange(ctx context.Context, ns StorageNamespace, metaRangeID MetaRangeID) (MetaRangeAddress, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right! Please remove it, but in a separate PR. Otherwise it is very hard to navigate among so many changes.
Well, just saying that my last batch of comments were added before seeing Ariel's review 😄 And regarding this important comment:
I'm still good with the explicit approach - mainly since being implicit makes it much harder to trace bugs, and easier to forget when adding new flows (that |
Closes #8546