Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML-Dataframe] Data frame transforms config HLRC objects #39691

Merged
merged 5 commits into from
Mar 8, 2019

Conversation

davidkyle
Copy link
Member

@davidkyle davidkyle commented Mar 5, 2019

Adds the data frame transforms configuration objects to the HLRC.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@droberts195 droberts195 changed the title [ML-Dataframe] Dataframe config HLRC objects [ML-Dataframe] Data frame transforms config HLRC objects Mar 5, 2019

// TODO should this function be removed?
public String getCron() {
return "*";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot be set and not appear in the xContent representation. Should it be dropped?

}


public DataFrameTransformConfig(final String id,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note I've taken the headers out of this class

}

while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) {
ensureExpectedToken(XContentParser.Token.FIELD_NAME, token, parser::getTokenLocation);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parsing is a little brittle as it does not handle unknown fields. We may to revisit this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it should silently ignore unknown fields. This is quite important as the whole point of the HLRC is that it's not locked to a specific server version and does as best it can with a mismatched server version.

return id;
}

// TODO should this function be removed?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the time this PR is merged we need to keep these config classes synchronised when changes are made to the server-side classes.

The corresponding method in the server side class is not used anywhere, and trivial to add back if ever required. Therefore I would remove it both here and in the server-side class. We don't want to create a BWC headache for ourselves after release in 7.1.

String id = (String) args[0];
String source = (String) args[1];
String dest = (String) args[2];
// default handling: if the user does not specify a query, we default to match_all
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave the default handling server-side only. It will improve cross-version compatibility if we ever change the default. So the client class can mindlessly store what it parses, preserving no entry as null.

});

static {
PARSER.declareString(optionalConstructorArg(), ID);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that ID is optional. It is required and needs to be non-null. The reason it is an optional arg on the server side is because the user COULD provide it in the URL, but we will make that choice for the user when we transform the rest request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot 👓

import static org.elasticsearch.common.xcontent.ConstructingObjectParser.constructorArg;
import static org.elasticsearch.common.xcontent.ConstructingObjectParser.optionalConstructorArg;

public class DataFrameTransformConfig implements ToXContentObject {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, it would be nice for this class to provide a fluent builder API.

ensureExpectedToken(XContentParser.Token.START_OBJECT, token, parser::getTokenLocation);
token = parser.nextToken();
ensureExpectedToken(XContentParser.Token.FIELD_NAME, token, parser::getTokenLocation);
SingleGroupSource.Type groupType = SingleGroupSource.Type.valueOf(parser.currentName().toUpperCase(Locale.ROOT));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this blows up horrifically if parser.currentName() is not contained in the enum.

@davidkyle
Copy link
Member Author

davidkyle commented Mar 5, 2019

The expected json representation of the GroupConfig object is:

{
    destination-field1 : { GROUP_TYPE : { type specific fields } },
    destination-field2 : { GROUP_TYPE : { type specific fields } },
   ...
}

Where GROUP_TYPE is one of terms, histogram or date_histogram. I tried different approaches to lenient parsing and it quickly gets complicated and runs the risk of making most things validly parseable. I thought a good compromise and a relatively simple change was to leniently ignore fields in the root of the object if they are not a nested object type i.e. ignore strings, arrays etc.

{
    destination-field1 : { GROUP_TYPE : { type specific fields } },
    some-new-field : foo
   ...
}

In this case some-new-field is ignored.

I pushed this change in 530f070

@davidkyle
Copy link
Member Author

run elasticsearch-ci/2

@davidkyle
Copy link
Member Author

We discussed the lenient parsing and decided that unknown groups should also be leniently parsed. Now the following will be parsed with a single terms group and unknown-group-type will be ignored.

{
    destination-field-foo : { terms : { type specific fields } },
    destination-field-bar : { unknown-group-type : { type specific fields } },   
}

There is still a restriction that the object after the destination field must have a single field whose name is a valid group type.

{
    destination-field-foo : { 
        some-new-field : bar
        terms : { type specific fields } 
    }
}

In this case the parser will trip up on some-new-field and the following terms object will not be parsed.

Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidkyle
Copy link
Member Author

run elasticsearch-ci/bwc

@davidkyle
Copy link
Member Author

run elasticsearch-ci/1

@davidkyle davidkyle merged commit 2f07402 into elastic:master Mar 8, 2019
@davidkyle davidkyle deleted the df-hlrc-objs branch March 8, 2019 09:04
davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Mar 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants