-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: Clarify segmentMetadata cardinality, minmax, and size behavior. #11549
Docs: Clarify segmentMetadata cardinality, minmax, and size behavior. #11549
Conversation
* `cardinality` in the result will return the size of the bitmap index or dictionary encoding for string dimensions, or null for other dimension types. | ||
If `merge` was set, the result will be the max of this value across segments. Only relevant for dimension columns. | ||
* `cardinality` in the result will return the number of unique values present in a string column. It is null for other column types. | ||
If `merge` is set, the result will be the max of this value across segments. Only relevant for string columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not clear to us newbies. Does "max" mean the largest number of any segment, or the aggregated total across segments? Both are useful: if I have 1M rows, and see a cardinality of 1K, that could mean either A) 1K total, or B) 1K per segment. If there are 100K rows per segment, 1K per segment says one thing. If there are 1K rows per segment, then a cardinality of 1K per segment says something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed some new content that hopefully clears this up. Please let me know if it makes sense to you.
|
||
### size | ||
|
||
* `size` in the result will contain the estimated total segment byte size as if the data were stored in text format | ||
* `size` in the result will contain the estimated total segment byte size as if the data were stored in text format. This is _not_ the actual storage size of the column in Druid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointer to where I might find the actual storage size? I want to know the amount of space the column takes so I know if it is worth the cost to store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed some new content that hopefully clears this up. Please let me know if it makes sense to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request small style changes. Otherwise LGTM.
style update
@gianm , @paul-rogers if we need more clarification, let's pick that up in a new pr. Thanks. |
size
isn't actually the storage size. (See also: rework segment metadata query "size" analysis #7124)