[ML-Dataframe] Data frame transforms config HLRC objects #39691

davidkyle · 2019-03-05T11:26:44Z

Adds the data frame transforms configuration objects to the HLRC.

elasticmachine · 2019-03-05T11:26:45Z

Pinging @elastic/ml-core

davidkyle · 2019-03-05T11:28:03Z

...el/src/main/java/org/elasticsearch/client/dataframe/transforms/DataFrameTransformConfig.java

+
+    // TODO should this function be removed?
+    public String getCron() {
+        return "*";


This cannot be set and not appear in the xContent representation. Should it be dropped?

davidkyle · 2019-03-05T11:28:50Z

...el/src/main/java/org/elasticsearch/client/dataframe/transforms/DataFrameTransformConfig.java

+    }
+
+
+    public DataFrameTransformConfig(final String id,


Note I've taken the headers out of this class

davidkyle · 2019-03-05T11:32:17Z

...igh-level/src/main/java/org/elasticsearch/client/dataframe/transforms/pivot/GroupConfig.java

+        }
+
+        while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) {
+            ensureExpectedToken(XContentParser.Token.FIELD_NAME, token, parser::getTokenLocation);


This parsing is a little brittle as it does not handle unknown fields. We may to revisit this

Yes, I think it should silently ignore unknown fields. This is quite important as the whole point of the HLRC is that it's not locked to a specific server version and does as best it can with a mismatched server version.

droberts195 · 2019-03-05T11:35:05Z

...el/src/main/java/org/elasticsearch/client/dataframe/transforms/DataFrameTransformConfig.java

+        return id;
+    }
+
+    // TODO should this function be removed?


From the time this PR is merged we need to keep these config classes synchronised when changes are made to the server-side classes.

The corresponding method in the server side class is not used anywhere, and trivial to add back if ever required. Therefore I would remove it both here and in the server-side class. We don't want to create a BWC headache for ourselves after release in 7.1.

droberts195 · 2019-03-05T11:36:46Z

...el/src/main/java/org/elasticsearch/client/dataframe/transforms/DataFrameTransformConfig.java

+                    String id = (String) args[0];
+                    String source = (String) args[1];
+                    String dest = (String) args[2];
+                    // default handling: if the user does not specify a query, we default to match_all


I would leave the default handling server-side only. It will improve cross-version compatibility if we ever change the default. So the client class can mindlessly store what it parses, preserving no entry as null.

benwtrent · 2019-03-05T13:30:41Z

...el/src/main/java/org/elasticsearch/client/dataframe/transforms/DataFrameTransformConfig.java

+                });
+
+    static {
+        PARSER.declareString(optionalConstructorArg(), ID);


I do not think that ID is optional. It is required and needs to be non-null. The reason it is an optional arg on the server side is because the user COULD provide it in the URL, but we will make that choice for the user when we transform the rest request.

Good spot 👓

benwtrent · 2019-03-05T13:31:49Z

...el/src/main/java/org/elasticsearch/client/dataframe/transforms/DataFrameTransformConfig.java

+import static org.elasticsearch.common.xcontent.ConstructingObjectParser.constructorArg;
+import static org.elasticsearch.common.xcontent.ConstructingObjectParser.optionalConstructorArg;
+
+public class DataFrameTransformConfig implements ToXContentObject {


Personally, it would be nice for this class to provide a fluent builder API.

benwtrent · 2019-03-05T13:36:23Z

...igh-level/src/main/java/org/elasticsearch/client/dataframe/transforms/pivot/GroupConfig.java

+            ensureExpectedToken(XContentParser.Token.START_OBJECT, token, parser::getTokenLocation);
+            token = parser.nextToken();
+            ensureExpectedToken(XContentParser.Token.FIELD_NAME, token, parser::getTokenLocation);
+            SingleGroupSource.Type groupType = SingleGroupSource.Type.valueOf(parser.currentName().toUpperCase(Locale.ROOT));


FWIW, this blows up horrifically if parser.currentName() is not contained in the enum.

davidkyle · 2019-03-05T18:44:44Z

The expected json representation of the GroupConfig object is:

{
    destination-field1 : { GROUP_TYPE : { type specific fields } },
    destination-field2 : { GROUP_TYPE : { type specific fields } },
   ...
}

Where GROUP_TYPE is one of terms, histogram or date_histogram. I tried different approaches to lenient parsing and it quickly gets complicated and runs the risk of making most things validly parseable. I thought a good compromise and a relatively simple change was to leniently ignore fields in the root of the object if they are not a nested object type i.e. ignore strings, arrays etc.

{
    destination-field1 : { GROUP_TYPE : { type specific fields } },
    some-new-field : foo
   ...
}

In this case some-new-field is ignored.

I pushed this change in 530f070

davidkyle · 2019-03-06T08:51:31Z

run elasticsearch-ci/2

davidkyle · 2019-03-06T14:51:13Z

We discussed the lenient parsing and decided that unknown groups should also be leniently parsed. Now the following will be parsed with a single terms group and unknown-group-type will be ignored.

{
    destination-field-foo : { terms : { type specific fields } },
    destination-field-bar : { unknown-group-type : { type specific fields } },   
}

There is still a restriction that the object after the destination field must have a single field whose name is a valid group type.

{
    destination-field-foo : { 
        some-new-field : bar
        terms : { type specific fields } 
    }
}

In this case the parser will trip up on some-new-field and the following terms object will not be parsed.

hendrikmuhs

LGTM

davidkyle · 2019-03-07T09:54:05Z

run elasticsearch-ci/bwc

davidkyle · 2019-03-07T15:00:11Z

run elasticsearch-ci/1

davidkyle added >enhancement :ml Machine learning v8.0.0 v7.2.0 labels Mar 5, 2019

droberts195 changed the title ~~[ML-Dataframe] Dataframe config HLRC objects~~ [ML-Dataframe] Data frame transforms config HLRC objects Mar 5, 2019

davidkyle commented Mar 5, 2019

View reviewed changes

droberts195 reviewed Mar 5, 2019

View reviewed changes

benwtrent reviewed Mar 5, 2019

View reviewed changes

davidkyle force-pushed the df-hlrc-objs branch from 8157eaa to 14a012c Compare March 5, 2019 14:14

benwtrent approved these changes Mar 5, 2019

View reviewed changes

davidkyle force-pushed the df-hlrc-objs branch from 14a012c to 530f070 Compare March 5, 2019 18:40

hendrikmuhs approved these changes Mar 7, 2019

View reviewed changes

davidkyle added 4 commits March 7, 2019 10:44

some config objs

d1b22db

Remove unused getter + other feedback

4466a1b

Lenient parsing for GroupConfig

ce08a38

Allow unknown group types in lenient parser

31b3b9f

davidkyle force-pushed the df-hlrc-objs branch from 63cbeb8 to 31b3b9f Compare March 7, 2019 10:44

Fix test error where 2 aggs can have the same name

cfae00f

davidkyle merged commit 2f07402 into elastic:master Mar 8, 2019

davidkyle deleted the df-hlrc-objs branch March 8, 2019 09:04

davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Mar 8, 2019

[ML-Dataframe] Data frame config HLRC objects (elastic#39691)

dc34f28

This was referenced Mar 8, 2019

[ML-Dataframe] Data frame config HLRC objects #39825

Merged

[ML-Dataframe] Data frame config HLRC objects (#39691) #39906

Closed

codebrain mentioned this pull request Aug 5, 2019

[meta] 7.2 Release elastic/elasticsearch-net#3980

Closed

37 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML-Dataframe] Data frame transforms config HLRC objects #39691

[ML-Dataframe] Data frame transforms config HLRC objects #39691

davidkyle commented Mar 5, 2019 •

edited by droberts195

Loading

elasticmachine commented Mar 5, 2019

davidkyle Mar 5, 2019

davidkyle Mar 5, 2019

davidkyle Mar 5, 2019

droberts195 Mar 5, 2019

droberts195 Mar 5, 2019

droberts195 Mar 5, 2019

benwtrent Mar 5, 2019

davidkyle Mar 5, 2019

benwtrent Mar 5, 2019

benwtrent Mar 5, 2019

davidkyle commented Mar 5, 2019 •

edited

Loading

davidkyle commented Mar 6, 2019

davidkyle commented Mar 6, 2019

hendrikmuhs left a comment

davidkyle commented Mar 7, 2019

davidkyle commented Mar 7, 2019

[ML-Dataframe] Data frame transforms config HLRC objects #39691

[ML-Dataframe] Data frame transforms config HLRC objects #39691

Conversation

davidkyle commented Mar 5, 2019 • edited by droberts195 Loading

elasticmachine commented Mar 5, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidkyle commented Mar 5, 2019 • edited Loading

davidkyle commented Mar 6, 2019

davidkyle commented Mar 6, 2019

hendrikmuhs left a comment

Choose a reason for hiding this comment

davidkyle commented Mar 7, 2019

davidkyle commented Mar 7, 2019

davidkyle commented Mar 5, 2019 •

edited by droberts195

Loading

davidkyle commented Mar 5, 2019 •

edited

Loading