Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summarizer: Dedup blobs with same content #398

Closed
vladsud opened this issue Oct 23, 2019 · 2 comments
Closed

Summarizer: Dedup blobs with same content #398

vladsud opened this issue Oct 23, 2019 · 2 comments
Assignees
Labels
area: runtime Runtime related issues perf
Milestone

Comments

@vladsud
Copy link
Contributor

vladsud commented Oct 23, 2019

Reducing number of blobs we sent to storage improves performance of the system.
Ideally runtime should be capable of doing it across the board, in efficient way, but here are couple concrete scenarios that cover 99% of the current need:

1 : we produce same attributes blob for every map, or every directory, or every shared string. It would be great for all of them to point to same baseId across components.

channelContext.ts

export function snapshotChannel(channel: IChannel, baseId: string | null) {
    const snapshot = channel.snapshot();

    // Add in the object attributes to the returned tree
    const objectAttributes = channel.attributes;
    snapshot.entries.push(new BlobTreeEntry(".attributes", JSON.stringify(objectAttributes)));

    // If baseId exists then the previous snapshot is still valid
    snapshot.id = baseId;

    return snapshot;
}

2 : It would be great if some dedup was happening for blobs within DDS, even if DDS changed and needs to regenerate blobs. For example, imagine that SharedString implements more stable separation of header & body blobs. While it itself does not track (today) what changed (as long as something changed in shared string), if it was splitting content in stable way and nothing changed in body , it would be great for runtime to notice it and reuse prior blob. It's not ideal (as we spend time generating content for nothing), but it would be way better than what we have today, as we would not transfer bits over wire.

@arinwt - FYI

@vladsud
Copy link
Contributor Author

vladsud commented Dec 10, 2019

Tracks re-enabling blob dedupping work by Jatin

@vladsud vladsud added this to the Build2020 milestone Jan 31, 2020
@curtisman curtisman added area: runtime Runtime related issues and removed Ignite labels Jan 31, 2020
@curtisman curtisman modified the milestones: Build 2020, February 2020 Feb 5, 2020
@vladsud
Copy link
Contributor Author

vladsud commented Mar 2, 2020

Checked into 0.14

@vladsud vladsud closed this as completed Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: runtime Runtime related issues perf
Projects
None yet
Development

No branches or pull requests

4 participants