Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataSegment memory optimizations #5094

Merged
merged 18 commits into from
Dec 12, 2017
Merged

Conversation

leventov
Copy link
Member

  • Deduplicated DataSegment contents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively;
  • Used ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average;
  • Pruned DataSegment.loadSpec on Brokers

…trics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers
@leventov
Copy link
Member Author

Is it expected that we have two almost identical classes called "DataSegmentTest" in different modules?

@himanshug
Copy link
Contributor

himanshug commented Nov 16, 2017

Is it expected that we have two almost identical classes called "DataSegmentTest" in different modules?

If they are testing different things and can't be merged into single class, then its fine. Or else, please merge them.

Pruned DataSegment.loadSpec on Brokers

Can we just drop them from announcements made from historicals/realtime-processes instead ? I think @nishantmonu51 already made a change at some point to optionally drop metric/dimension names.

@leventov
Copy link
Member Author

@himanshug could you point to a PR?

@himanshug
Copy link
Contributor

#2784

@leventov
Copy link
Member Author

@himanshug thanks. I think it's different because it doesn't affect the presence of the data on coordinator endpoints. And it's a more decoupled approach.

@himanshug
Copy link
Contributor

There are endpoints at coordinator ( https://github.com/druid-io/druid/blob/master/server/src/main/java/io/druid/server/http/MetadataResource.java ) to get source of truth details of a Segment information from DB.
So, I'm not sure if there is any benefit in keeping that information even in Coordinator memory.

In fact I would change even the default behavior to not publish dimension, metrics, loadSpecs information in announcements so that all users get that.

does anyone else have an opinion ? @gianm @drcrallen @nishantmonu51

@himanshug
Copy link
Contributor

it was decided in dev-sync today to go with current state of this PR wherein historicals announce the loadSpec and they are dropped by broker only.

@leventov
Copy link
Member Author

@himanshug @gianm could you please take a look?

@gianm
Copy link
Contributor

gianm commented Nov 27, 2017

@leventov i'll take a look now

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, couple minor comments.

return interner.intern(input);
}
};
private static final Interner<String> stringInterner = Interners.newWeakInterner();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well CAPITALIZE these if you are renaming them anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

private static final Interner<List<String>> dimensionsInterner = Interners.newWeakInterner();
private static final Interner<List<String>> metricsInterner = Interners.newWeakInterner();
private static final Function<String, String> internFun = Interners.asFunction(stringInterner);
private static final Map<String, Object> PRUNED_LOAD_SPEC = ImmutableMap.of(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any APIs on the Broker that return DataSegment objects? If so, this needs a mention in release notes at least.

private static final Interner<String> stringInterner = Interners.newWeakInterner();
private static final Interner<List<String>> dimensionsInterner = Interners.newWeakInterner();
private static final Interner<List<String>> metricsInterner = Interners.newWeakInterner();
private static final Map<String, Object> PRUNED_LOAD_SPEC = ImmutableMap.of(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any APIs on the broker that return DataSegment objects? If so, this needs at least a mention on the release notes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also suggest adding a note to the docs for any relevant methods, something like "This method does not return loadSpecs for segments; if you need those then you can use the Coordinator API XXX."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, as far as I can tell. It's exposed only in MetadataResource and DatasourcesResource, which are present only on Coordinator node.

@leventov
Copy link
Member Author

leventov commented Dec 8, 2017

@gianm do you have more comments on this PR?

@gianm
Copy link
Contributor

gianm commented Dec 12, 2017

@leventov LGTM; sorry, am a few days behind on github email at the moment.

@gianm gianm merged commit 64848c7 into apache:master Dec 12, 2017
@leventov leventov deleted the fat-data-segments branch December 18, 2017 14:31
@jon-wei jon-wei added this to the 0.12.0 milestone Jan 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants