-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from using docvalue_fields to extracting values from _source #44062
Conversation
where applicable. Doing this means parsing the _source and handling the numbers parsing just like Elasticsearch is doing it when it's indexing a document. This also introduces a minor limitation: aliases type of fields that are NOT part of a tree of sub-fields will not be able to be retrieved anymore. field_caps API doesn't shed any light into a field being an alias or not and at _source parsing time there is no way to know if a root field is an alias or not. Fields of the type "a.b.c.alias" can be extracted from docvalue_fields, only if the field they point to can be extracted from docvalue_fields. Also, not all fields in a hierarchy of fields can be evaluated to being an alias.
Pinging @elastic/es-search |
@elasticmachine run elasticsearch-ci/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work overall!
Left some comments but nothing major.
Also, is there a need to have the isAggregatable
as an attribute of EsField
class? I was thinking it can just delegate to dataType.hasDocValues(), or there is some distinction between them?
x-pack/plugin/sql/qa/src/main/java/org/elasticsearch/xpack/sql/qa/FieldExtractorTestCase.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/qa/src/main/java/org/elasticsearch/xpack/sql/qa/FieldExtractorTestCase.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/qa/src/main/java/org/elasticsearch/xpack/sql/qa/FieldExtractorTestCase.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/qa/src/main/java/org/elasticsearch/xpack/sql/qa/FieldExtractorTestCase.java
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/xpack/sql/execution/search/extractor/FieldHitExtractor.java
Outdated
Show resolved
Hide resolved
@@ -520,9 +522,26 @@ public void resolveAsSeparateMappings(String indexWildcard, String javaRegex, bo | |||
} | |||
EsField field = indexFields.flattedMapping.get(fieldName); | |||
if (field == null || (invalidField != null && (field instanceof InvalidMappedField) == false)) { | |||
int dot = fieldName.lastIndexOf('.'); | |||
/* | |||
* Looking up the "tree" at the parent fields here to see if the field is an alias. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed offline that this isn't a very solid way of checking whether a field is an alias. The fact that we don't return parent information about field aliases is kind of an 'accident' of the field caps logic, and I don't think it was an intentional API decision.
I filed #44298 to propose adding explicit field alias information to the field caps API, which would help make this more robust. It might also give enough information to be able to resolve + extract the alias field values from _source
.
@matriv thank you for the review. I've addressed those and added more tests and covered some more scenarios in code. Would you mind having a second look, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work with the extensive testing and code comments.
Left a few more comments.
`date`, `scaled_float`, `geo_point`, `geo_shape`. | ||
Most of {es-sql}'s columns are retrieved from the document's `_source` and there is no attempt to get the columns content from | ||
`docvalue_fields` not even in the case <<mapping-source-field,`_source`>> field is disabled in the mapping explicitly. | ||
If a column, for which there is no source stored, is asked for in a query, {es-sql} will not return it. Field types NOT returned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe: Field types that don't follow this restriction are: .... since they are NOT returned from _source but from doc values.
@@ -84,6 +88,7 @@ public FieldHitExtractor(String name, DataType dataType, ZoneId zoneId, boolean | |||
|
|||
FieldHitExtractor(StreamInput in) throws IOException { | |||
fieldName = in.readString(); | |||
fullFieldName = in.readOptionalString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need some version check here to ensure backwards compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have this in place in other place in code? I mean, at the moment we enforce strict version matching for jdbc and, I think, for cli as well....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is not for server-client mismatch but for version mismatch between nodes during a rolling upgrade.
return l; | ||
Number result = null; | ||
try { | ||
result = NumberType.valueOf(dataType.esType.toUpperCase(Locale.ROOT)).parse(values, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we extract this to a method in DataType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NumberType.valueOf(dataType.esType.toUpperCase(Locale.ROOT))
so it's a class method of a DataType and one can do:
dataType.numberType()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, please, it's minor and future oriented in case we want to reuse it somewhere else.
} | ||
} | ||
while (rootField.parent() != null) { | ||
fullFieldName = rootField.parent().field().getName() + "." + fullFieldName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but I think it makes sense to make fullFieldName
a StringBuilder
because of the potential iterations in the while loop.
// Only if the field is not an alias (in which case it will be taken out from docvalue_fields if it's isAggregatable()), | ||
// go up the tree of parents until a non-object (and non-nested) type of field is found and use that specific parent | ||
// as the field to extract data from, from _source. We do it like this because sub-fields are not in the _source, only | ||
// the root field to which those sub-fields belong to, are. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to add also an example with a subfield so it's more clear for someone that reads it in the future.
@astefan Should we also add a sentence in limitations page to say that if there are many docvalue fields (like date) a use can still experience the issue when doing a |
.../plugin/sql/src/main/java/org/elasticsearch/xpack/sql/querydsl/container/QueryContainer.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Once again: great work!
@elasticmachine run elasticsearch-ci/2 |
…lastic#44062) * Switch from using docvalue_fields to extracting values from _source where applicable. Doing this means parsing the _source and handling the numbers parsing just like Elasticsearch is doing it when it's indexing a document. * This also introduces a minor limitation: aliases type of fields that are NOT part of a tree of sub-fields will not be able to be retrieved anymore. field_caps API doesn't shed any light into a field being an alias or not and at _source parsing time there is no way to know if a root field is an alias or not. Fields of the type "a.b.c.alias" can be extracted from docvalue_fields, only if the field they point to can be extracted from docvalue_fields. Also, not all fields in a hierarchy of fields can be evaluated to being an alias. (cherry picked from commit 8bf8a05)
…44062) (#44804) * Switch from using docvalue_fields to extracting values from _source where applicable. Doing this means parsing the _source and handling the numbers parsing just like Elasticsearch is doing it when it's indexing a document. * This also introduces a minor limitation: aliases type of fields that are NOT part of a tree of sub-fields will not be able to be retrieved anymore. field_caps API doesn't shed any light into a field being an alias or not and at _source parsing time there is no way to know if a root field is an alias or not. Fields of the type "a.b.c.alias" can be extracted from docvalue_fields, only if the field they point to can be extracted from docvalue_fields. Also, not all fields in a hierarchy of fields can be evaluated to being an alias. (cherry picked from commit 8bf8a05)
Doing this means parsing the
_source
and handling any fields that need to be parsed and have its outcome potentially changed by the parsers (for example, the numbers whencoerce=true
), just like Elasticsearch is doing it when it's indexing a document.Fields that will not be taken from
_source
(but fromdocvalue_fields
) are:keyword
,date
,scaled_float
,geo_point
,geo_shape
.This change also introduces a limitation:
alias
es type of fields that are NOT part of a tree of sub-fields will not be able to be retrieved anymore.field_caps
API doesn't shed any light into a field being analias
or not and at_source
parsing time there is no way to know if aroot field is an alias or not. Fields of the typea.b.c.alias
can be extracted fromdocvalue_fields
, only if the field they point to can beextracted from
docvalue_fields
. Also, not all fields in a hierarchy of fields can be evaluated to being an alias.Fixes #41852.