Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Clarify segmentMetadata cardinality, minmax, and size behavior. #11549

Merged
merged 3 commits into from
Aug 26, 2021

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Aug 4, 2021

  1. Dimensions exist that aren't strings, so say "string" explicitly when that's what we mean.
  2. Emphasize that size isn't actually the storage size. (See also: rework segment metadata query "size" analysis #7124)

* `cardinality` in the result will return the size of the bitmap index or dictionary encoding for string dimensions, or null for other dimension types.
If `merge` was set, the result will be the max of this value across segments. Only relevant for dimension columns.
* `cardinality` in the result will return the number of unique values present in a string column. It is null for other column types.
If `merge` is set, the result will be the max of this value across segments. Only relevant for string columns.
Copy link
Contributor

@paul-rogers paul-rogers Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not clear to us newbies. Does "max" mean the largest number of any segment, or the aggregated total across segments? Both are useful: if I have 1M rows, and see a cardinality of 1K, that could mean either A) 1K total, or B) 1K per segment. If there are 100K rows per segment, 1K per segment says one thing. If there are 1K rows per segment, then a cardinality of 1K per segment says something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed some new content that hopefully clears this up. Please let me know if it makes sense to you.


### size

* `size` in the result will contain the estimated total segment byte size as if the data were stored in text format
* `size` in the result will contain the estimated total segment byte size as if the data were stored in text format. This is _not_ the actual storage size of the column in Druid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointer to where I might find the actual storage size? I want to know the amount of space the column takes so I know if it is worth the cost to store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed some new content that hopefully clears this up. Please let me know if it makes sense to you.

Copy link
Contributor

@techdocsmith techdocsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request small style changes. Otherwise LGTM.

docs/querying/segmentmetadataquery.md Outdated Show resolved Hide resolved
@techdocsmith techdocsmith merged commit ec6c6e2 into apache:master Aug 26, 2021
@techdocsmith
Copy link
Contributor

@gianm , @paul-rogers if we need more clarification, let's pick that up in a new pr. Thanks.

@clintropolis clintropolis added this to the 0.22.0 milestone Sep 3, 2021
@gianm gianm deleted the docs-clarify-segmentmetadata branch September 23, 2022 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants