-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML-Dataframe] Data frame transforms config HLRC objects #39691
Conversation
Pinging @elastic/ml-core |
|
||
// TODO should this function be removed? | ||
public String getCron() { | ||
return "*"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cannot be set and not appear in the xContent representation. Should it be dropped?
} | ||
|
||
|
||
public DataFrameTransformConfig(final String id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note I've taken the headers out of this class
} | ||
|
||
while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) { | ||
ensureExpectedToken(XContentParser.Token.FIELD_NAME, token, parser::getTokenLocation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parsing is a little brittle as it does not handle unknown fields. We may to revisit this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it should silently ignore unknown fields. This is quite important as the whole point of the HLRC is that it's not locked to a specific server version and does as best it can with a mismatched server version.
return id; | ||
} | ||
|
||
// TODO should this function be removed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the time this PR is merged we need to keep these config classes synchronised when changes are made to the server-side classes.
The corresponding method in the server side class is not used anywhere, and trivial to add back if ever required. Therefore I would remove it both here and in the server-side class. We don't want to create a BWC headache for ourselves after release in 7.1.
String id = (String) args[0]; | ||
String source = (String) args[1]; | ||
String dest = (String) args[2]; | ||
// default handling: if the user does not specify a query, we default to match_all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave the default handling server-side only. It will improve cross-version compatibility if we ever change the default. So the client class can mindlessly store what it parses, preserving no entry as null
.
}); | ||
|
||
static { | ||
PARSER.declareString(optionalConstructorArg(), ID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think that ID
is optional. It is required and needs to be non-null. The reason it is an optional arg on the server side is because the user COULD provide it in the URL, but we will make that choice for the user when we transform the rest request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot 👓
import static org.elasticsearch.common.xcontent.ConstructingObjectParser.constructorArg; | ||
import static org.elasticsearch.common.xcontent.ConstructingObjectParser.optionalConstructorArg; | ||
|
||
public class DataFrameTransformConfig implements ToXContentObject { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, it would be nice for this class to provide a fluent builder API.
ensureExpectedToken(XContentParser.Token.START_OBJECT, token, parser::getTokenLocation); | ||
token = parser.nextToken(); | ||
ensureExpectedToken(XContentParser.Token.FIELD_NAME, token, parser::getTokenLocation); | ||
SingleGroupSource.Type groupType = SingleGroupSource.Type.valueOf(parser.currentName().toUpperCase(Locale.ROOT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this blows up horrifically if parser.currentName()
is not contained in the enum.
The expected json representation of the GroupConfig object is:
Where GROUP_TYPE is one of terms, histogram or date_histogram. I tried different approaches to lenient parsing and it quickly gets complicated and runs the risk of making most things validly parseable. I thought a good compromise and a relatively simple change was to leniently ignore fields in the root of the object if they are not a nested object type i.e. ignore strings, arrays etc.
In this case I pushed this change in 530f070 |
run elasticsearch-ci/2 |
We discussed the lenient parsing and decided that unknown groups should also be leniently parsed. Now the following will be parsed with a single
There is still a restriction that the object after the destination field must have a single field whose name is a valid group type.
In this case the parser will trip up on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
run elasticsearch-ci/bwc |
run elasticsearch-ci/1 |
Adds the data frame transforms configuration objects to the HLRC.