Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add long/float ColumnSelectorStrategy implementations #3838

Merged
merged 23 commits into from
Feb 9, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 24 additions & 2 deletions docs/content/querying/dimensionspecs.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ Returns dimension values as is and optionally renames the dimension.
}
```

When specifying a DimensionSpec on a numeric column, the user should include the type of the column in the `outputType` field. This is necessary as it is possible for a column with given name to have different value types in different segments; results will be converted to the type specified by `outputType` before merging.
When specifying a DimensionSpec on a numeric column, the user should include the type of the column in the `outputType` field. If left unspecified, the `outputType` defaults to STRING.

If left unspecified, the `outputType` defaults to STRING.
Please refer to the [Output Types](#output-types) section for more details.

### Extraction DimensionSpec

Expand All @@ -36,10 +36,15 @@ Returns dimension values transformed using the given [extraction function](#extr
"type" : "extraction",
"dimension" : <dimension>,
"outputName" : <output_name>,
"outputType": <"STRING"|"LONG"|"FLOAT">,
"extractionFn" : <extraction_function>
}
```

`outputType` may also be specified in an ExtractionDimensionSpec to apply type conversion to results before merging. If left unspecified, the `outputType` defaults to STRING.

Please refer to the [Output Types](#output-types) section for more details.

### Filtered DimensionSpecs

These are only useful for multi-value dimensions. If you have a row in druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you send a groupBy/topN query grouping by that dimension with [query filter](filters.html) for value "v1". In the response you will get 3 rows containing "v1", "v2" and "v3". This behavior might be unintuitive for some use cases.
Expand All @@ -61,6 +66,8 @@ Following filtered dimension spec retains only the values matching regex. Note t
{ "type" : "regexFiltered", "delegate" : <dimensionSpec>, "pattern": <java regex pattern> }
```

Note that filtered DimensionSpecs are not supported when applied to a numeric column (when the base column in a segment contains long or float values, not related to what `outputType` specifies).

For more details and examples, see [multi-value dimensions](multi-value-dimensions.html).

### Lookup DimensionSpecs
Expand Down Expand Up @@ -105,6 +112,21 @@ The second kind where it is not possible to pass at query time due to their size
}
```

## Output Types

The dimension specs provide an option to specify the output type of a column's values. This is necessary as it is possible for a column with given name to have different value types in different segments; results will be converted to the type specified by `outputType` before merging.

Note that not all use cases for DimensionSpec currently support `outputType`, the table below shows which use cases support this option:

|Query Type|Supported?|
|--------|---------|
|GroupBy (v1)|no|
|GroupBy (v2)|yes|
|TopN|yes|
|Search|no|
|Select|no|
|Cardinality Aggregator|no|

## Extraction Functions

Extraction functions define the transformation applied to each dimension value.
Expand Down
2 changes: 0 additions & 2 deletions docs/content/querying/sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,8 +160,6 @@ Druid does not support all SQL features. Most of these are due to missing featur
language. Some unsupported SQL features include:

- Grouping on functions of multiple columns, like concatenation: `SELECT COUNT(*) FROM data_source GROUP BY dim1 || ' ' || dim2`
- Grouping on long and float columns.
- Filtering on float columns.
- Filtering on non-boolean interactions between columns, like two columns equaling each other: `SELECT COUNT(*) FROM data_source WHERE dim1 = dim2`.
- A number of miscellaneous functions, like `TRIM`.
- Joins, other than semi-joins as described above.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
import io.druid.segment.DimensionSelector;
import io.druid.segment.StorageAdapter;
import io.druid.segment.VirtualColumns;
import io.druid.segment.column.ValueType;
import io.druid.segment.data.IndexedInts;
import io.druid.segment.filter.Filters;
import org.joda.time.DateTime;
Expand Down Expand Up @@ -326,13 +327,18 @@ public RowIterator(GroupByQuery query, final Cursor cursor, ByteBuffer metricsBu

for (int i = 0; i < dimensionSpecs.size(); ++i) {
final DimensionSpec dimSpec = dimensionSpecs.get(i);
if ( dimSpec.getOutputType() != ValueType.STRING) {
throw new UnsupportedOperationException(
"GroupBy v1 only supports dimensions with an outputType of STRING."
);
}

final DimensionSelector selector = cursor.makeDimensionSelector(dimSpec);
if (selector != null) {
if (selector.getValueCardinality() == DimensionSelector.CARDINALITY_UNKNOWN) {
throw new UnsupportedOperationException(
"GroupBy v1 does not support dimension selectors with unknown cardinality.");
}

dimensions.add(selector);
dimNames.add(dimSpec.getOutputName());
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,10 @@ public class DictionaryBuildingStringGroupByColumnSelectorStrategy extends Strin
public void processValueFromGroupingKey(GroupByColumnSelectorPlus selectorPlus, ByteBuffer key, Map<String, Object> resultMap)
{
final int id = key.getInt(selectorPlus.getKeyBufferPosition());
final String value = dictionary.get(id);

// GROUP_BY_MISSING_VALUE is used to indicate empty rows, which are omitted from the result map.
if (id != GROUP_BY_MISSING_VALUE) {
final String value = dictionary.get(id);
resultMap.put(
selectorPlus.getOutputName(),
value
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -117,8 +117,11 @@ public Function<Sequence<Row>, Sequence<Row>> build(
final StringComparator naturalComparator;
if (columnType == ValueType.STRING) {
naturalComparator = StringComparators.LEXICOGRAPHIC;
} else {
} else if (columnType == ValueType.LONG || columnType == ValueType.FLOAT) {
naturalComparator = StringComparators.NUMERIC;
} else {
sortingNeeded = true;
break;
}

if (columnSpec.getDirection() != OrderByColumnSpec.Direction.ASCENDING
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -133,9 +133,8 @@ public static <ColumnSelectorStrategyClass extends ColumnSelectorStrategy> Colum
);
ColumnSelectorStrategyClass strategy = makeStrategy(
strategyFactory,
dimName,
dimSpec,
cursor.getColumnCapabilities(dimSpec.getDimension()),
dimSpec.getExtractionFn() != null,
selector
);
final ColumnSelectorPlus<ColumnSelectorStrategyClass> selectorPlus = new ColumnSelectorPlus<>(
Expand All @@ -156,19 +155,13 @@ public static ColumnValueSelector getColumnValueSelectorFromDimensionSpec(
{
String dimName = dimSpec.getDimension();
ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(dimName);
capabilities = getEffectiveCapabilities(dimName, capabilities, dimSpec.getExtractionFn() != null);
capabilities = getEffectiveCapabilities(dimSpec, capabilities);
switch (capabilities.getType()) {
case STRING:
return columnSelectorFactory.makeDimensionSelector(dimSpec);
case LONG:
if (dimSpec instanceof BaseFilteredDimensionSpec) {
throw new UnsupportedOperationException("Filtered dimension specs are not supported on numeric columns.");
}
return columnSelectorFactory.makeLongColumnSelector(dimSpec.getDimension());
case FLOAT:
if (dimSpec instanceof BaseFilteredDimensionSpec) {
throw new UnsupportedOperationException("Filtered dimension specs are not supported on numeric columns.");
}
return columnSelectorFactory.makeFloatColumnSelector(dimSpec.getDimension());
default:
return null;
Expand All @@ -179,9 +172,8 @@ public static ColumnValueSelector getColumnValueSelectorFromDimensionSpec(
// adjusts the capabilities for columns that cannot be handled as-is to manageable defaults
// (e.g., treating missing columns as empty String columns)
private static ColumnCapabilities getEffectiveCapabilities(
String dimName,
ColumnCapabilities capabilities,
boolean hasExFn
DimensionSpec dimSpec,
ColumnCapabilities capabilities
)
{
if (capabilities == null) {
Expand All @@ -195,22 +187,29 @@ private static ColumnCapabilities getEffectiveCapabilities(

// Currently, all extractionFns output Strings, so the column will return String values via a
// DimensionSelector if an extractionFn is present.
if (hasExFn) {
if (dimSpec.getExtractionFn() != null) {
capabilities = DEFAULT_STRING_CAPABILITIES;
}

// Filtered dimension specs are not supported on numerics, the numeric column
// will be treated as a null String column in that case
if (capabilities.getType() == ValueType.LONG || capabilities.getType() == ValueType.FLOAT) {
if (dimSpec instanceof BaseFilteredDimensionSpec) {
capabilities = DEFAULT_STRING_CAPABILITIES;
}
}

return capabilities;
}

private static <ColumnSelectorStrategyClass extends ColumnSelectorStrategy> ColumnSelectorStrategyClass makeStrategy(
ColumnSelectorStrategyFactory<ColumnSelectorStrategyClass> strategyFactory,
String dimName,
DimensionSpec dimSpec,
ColumnCapabilities capabilities,
boolean hasExFn,
ColumnValueSelector selector
)
{
capabilities = getEffectiveCapabilities(dimName, capabilities, hasExFn);
capabilities = getEffectiveCapabilities(dimSpec, capabilities);
return strategyFactory.makeColumnSelectorStrategy(capabilities, selector);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
import io.druid.java.util.common.guava.Sequence;
import io.druid.java.util.common.guava.Sequences;
import io.druid.query.QueryInterruptedException;
import io.druid.query.dimension.BaseFilteredDimensionSpec;
import io.druid.query.dimension.DimensionSpec;
import io.druid.query.extraction.ExtractionFn;
import io.druid.query.filter.BooleanFilter;
Expand Down Expand Up @@ -455,10 +456,17 @@ private DimensionSelector makeDimensionSelectorUndecorated(
}

if (columnDesc.getCapabilities().getType() == ValueType.LONG) {
// Filtered dimension specs are not supported on numerics
if (dimensionSpec instanceof BaseFilteredDimensionSpec) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, these instanceof BaseFilteredDimensionSpec checks everywhere are kind of gnarly. There's way too many of them for comfort. Is there something better we can do? Maybe making them work is a better option if it's not too hard… why don't they, again? They should be able to decorate a wrapped dimension selector just fine, right?

Is it just that we don't know if a DimensionSpec's "decorate" method is actually going to get called, because getColumnValueSelectorFromDimensionSpec might skip it by going and making something like a LongColumnSelector instead? If so, we could address that by adding a "mustDecorate" method to DimensionSpec, and if it mustDecorate, then we force creation of a DimensionSelector (rather than whatever type the underlying column is).

I would also accept alternate solutions that don't involve instanceof checks all over the place.

return NullDimensionSelector.instance();
}
return new LongWrappingDimensionSelector(makeLongColumnSelector(dimension), extractionFn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to create a bad selector if extractionFn is null. Consider modifying LongWrappingDimensionSelector and FloatWrappingDimensionSelector to do something sensible even if extractionFn is null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, why didn't tests catch this? Is there a missing test case, or is there some reason why this method won't ever get called with a null extractionFn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only case these would get created is if a query has a dimension spec on a base numeric column but with an extractionFn applied, so exFn was never null

}

if (columnDesc.getCapabilities().getType() == ValueType.FLOAT) {
if (dimensionSpec instanceof BaseFilteredDimensionSpec) {
return NullDimensionSelector.instance();
}
return new FloatWrappingDimensionSelector(makeFloatColumnSelector(dimension), extractionFn);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
import io.druid.java.util.common.guava.Sequence;
import io.druid.java.util.common.guava.Sequences;
import io.druid.query.QueryInterruptedException;
import io.druid.query.dimension.BaseFilteredDimensionSpec;
import io.druid.query.dimension.DefaultDimensionSpec;
import io.druid.query.dimension.DimensionSpec;
import io.druid.query.extraction.ExtractionFn;
Expand Down Expand Up @@ -359,9 +360,16 @@ public DimensionSelector makeDimensionSelector(
ColumnCapabilities capabilities = getColumnCapabilities(dimension);
if (capabilities != null) {
if (capabilities.getType() == ValueType.LONG) {
if (dimensionSpec instanceof BaseFilteredDimensionSpec) {
// Filtered dimension specs are not supported on numerics
return dimensionSpec.decorate(NullDimensionSelector.instance());
}
return new LongWrappingDimensionSelector(makeLongColumnSelector(dimension), extractionFn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar issues to QueryableIndexStorageAdapter with null extractionFns.

}
if (capabilities.getType() == ValueType.FLOAT) {
if (dimensionSpec instanceof BaseFilteredDimensionSpec) {
return dimensionSpec.decorate(NullDimensionSelector.instance());
}
return new FloatWrappingDimensionSelector(makeFloatColumnSelector(dimension), extractionFn);
}
}
Expand Down
Loading