Skip to content

Commit

Permalink
Docs: Clarify segmentMetadata cardinality, minmax, and size behavior. (
Browse files Browse the repository at this point in the history
…#11549)

* Docs: Clarify segmentMetadata cardinality, minmax, and size behavior.

* Further clarifications.

* Update docs/querying/segmentmetadataquery.md

style update

Co-authored-by: Charles Smith <[email protected]>
  • Loading branch information
gianm and techdocsmith authored Aug 26, 2021
1 parent 9032a0b commit ec6c6e2
Showing 1 changed file with 15 additions and 4 deletions.
19 changes: 15 additions & 4 deletions docs/querying/segmentmetadataquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,16 +144,27 @@ Types of column analyses are described below:

### cardinality

* `cardinality` in the result will return the size of the bitmap index or dictionary encoding for string dimensions, or null for other dimension types.
If `merge` was set, the result will be the max of this value across segments. Only relevant for dimension columns.
* `cardinality` is the number of unique values present in string columns. It is null for other column types.

Druid examines the size of string column dictionaries to compute the cardinality value. There is one dictionary per column per
segment. If `merge` is off (false), this reports the cardinality of each column of each segment individually. If
`merge` is on (true), this reports the highest cardinality encountered for a particular column across all relevant
segments.

### minmax

* Estimated min/max values for each column. Only relevant for dimension columns.
* Estimated min/max values for each column. Only reported for string columns.

### size

* `size` in the result will contain the estimated total segment byte size as if the data were stored in text format
* `size` is the estimated total byte size as if the data were stored in text format. This is _not_ the actual storage
size of the column in Druid. If you want the actual storage size in bytes of a segment, look elsewhere. Some pointers:

- To get the storage size in bytes of an entire segment, check the `size` field in the
[`sys.segments` table](sql.md#segments-table). This is the size of the memory-mappable content.
- To get the storage size in bytes of a particular column in a particular segment, unpack the segment and look at the
`meta.smoosh` file inside the archive. The difference between the third and fourth columns is the size in bytes.
Currently, there is no API for retrieving this information.

### interval

Expand Down

0 comments on commit ec6c6e2

Please sign in to comment.