Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce copying in GetSnapshotsOperation#snapshots() #105765

Conversation

DaveCTurner
Copy link
Contributor

No need to create a set (the values are distinct anyway), make a
separate synchronized list, populate them both, and finally copy them
both again into another concatenated list. We can just make the final
list up front.

No need to create a set (the values are distinct anyway), make a
separate synchronized list, populate them both, and finally copy them
both again into another concatenated list. We can just make the final
list up front.
@DaveCTurner DaveCTurner added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.14.0 labels Feb 23, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Feb 23, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor Author

Reducing the unnecessary copying here is clearly good in its own right, but this change is also motivated by wanting to take more control over the SnapshotInfo retrieval process rather than just handing a list of snapshot IDs off to repository.getSnapshotInfo(). In particular if we've already found size matching snapshots then (depending on the chosen sort key) we might be able to tell in advance that certain other snapshots won't make it to the final list, avoiding the need to retrieve their SnapshotInfo blobs at all. That's the main reason for introducing a RefCountingListener here - it's not necessary as written yet but it will be important when we're tracking the individual retrievals ourselves.

@DaveCTurner
Copy link
Contributor Author

#105769 is what I mean by taking more control of the SnapshotInfo retrieval process. That PR branch includes the changes in this one so there's quite some extra noise.

@@ -400,7 +400,7 @@ private void snapshots(String repositoryName, Collection<SnapshotId> snapshotIds
if (cancellableTask.notifyIfCancelled(listener)) {
return;
}
final Set<SnapshotInfo> snapshotSet = new HashSet<>();
final List<SnapshotInfo> snapshots = new ArrayList<>(snapshotIds.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to create a set (the values are distinct anyway)

snapshotIds is defined as collection in the method signature

private void snapshots(String repositoryName, Collection<SnapshotId> snapshotIds, ActionListener<SnapshotsInRepo> listener) {

that is only called from

snapshots(repo, toResolve.stream().map(Snapshot::getSnapshotId).toList(), listener);

where toResolve

is defined as a HashSet. I think it is worth to update the method signature to accept Set<SnapshotId> snapshotIds to better reflect that there are no duplicates expected.

Copy link
Contributor Author

@DaveCTurner DaveCTurner Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what you mean, but (a) nothing particularly bad happens if there are duplicates here, and (b) we'd have to change that .toList() into a .collect(Collectors.toSet()) which is quite a bit more expensive. This whole thing will go away eventually anyway so I'd rather leave it as-is for now.

@DaveCTurner DaveCTurner removed the request for review from DiannaHohensee March 4, 2024 07:37
@DaveCTurner DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Mar 4, 2024
@elasticsearchmachine elasticsearchmachine merged commit 42786aa into elastic:main Mar 4, 2024
14 checks passed
@DaveCTurner DaveCTurner deleted the 2024/02/23/GetSnapshotsOperation-snapshots-copying branch March 4, 2024 08:37
idegtiarenko pushed a commit to idegtiarenko/elasticsearch that referenced this pull request Mar 4, 2024
No need to create a set (the values are distinct anyway), make a
separate synchronized list, populate them both, and finally copy them
both again into another concatenated list. We can just make the final
list up front.
@DaveCTurner DaveCTurner restored the 2024/02/23/GetSnapshotsOperation-snapshots-copying branch June 17, 2024 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants