Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML-DataFrame] add a stats endpoint #35911

Merged
merged 6 commits into from
Nov 28, 2018

Conversation

hendrikmuhs
Copy link

Feature Branch PR

add a stats endpoint for data frame jobs:

GET _xpack/feature_index_builder/jobs/_all/_stats

(instead of _all you can specify an exact/explicit job id)
example output:

{
  "count" : 2,
  "jobs" : [
    {
      "id" : "pivot_reviews_4",
      "state" : {
        "job_state" : "stopped"
      },
      "stats" : {
        "pages_processed" : 0,
        "documents_processed" : 0,
        "documents_indexed" : 0,
        "trigger_count" : 0
      }
    },
    {
      "id" : "pivot_reviews_5",
      "state" : {
        "job_state" : "indexing",
        "current_position" : {
          "business" : "a_xdfGzrJ1mx3mBx6m14Ebhw",
          "reviewer" : "a_U2dPmHaUtlcjDhJjX7oN3w"
        }
      },
      "stats" : {
        "pages_processed" : 330,
        "documents_processed" : 330001,
        "documents_indexed" : 329000,
        "trigger_count" : 2
      }
    }
  ]
}

Notes:

  • the path (_xpack/feature...) will be renamed in a separate PR.
  • the output for state will be changed to make it more user-friendly in an upcoming PR

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

}

public DataFrameJobStateAndStats(String id, FeatureIndexBuilderJobState state, DataFrameIndexerJobStats stats) {
this.id = id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems all 3 fields should not be null. Add Objects.requireNotNull checks? (Btw, just realised the main reason I like those is the way they serve as documentation)

}

public Response() {
super(Collections.emptyList(), Collections.emptyList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also init jobsStateAndStats to empty list?

builder.startObject();
builder.field(COUNT.getPreferredName(), jobsStateAndStats.size());
// XContentBuilder does not support passing the params object for Iterables
builder.field(JOBS.getPreferredName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to simply do builder.field(JOBS.getPreferredName(), jobsStateAndStats) here. Unless we need to use Params it should be fine.


@Override
protected void taskOperation(Request request, FeatureIndexBuilderJobTask task, ActionListener<Response> listener) {
List<DataFrameJobStateAndStats> jobsStateAndStats = Collections.emptyList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be simpler if instead of declaring the return value we call the listener once for each branch directly like:

listener.onResponse(CollectionsSingletonList(jobStateAndStats)

etc. Just a suggestion, I know this is quite subjective :-)


if (pTasksMeta != null) {
// If the request was for _all jobs, we need to look through the
// list of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be able to squash this comment in 2 lines

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eclipse formatting drives me crazy! -> #35984

import java.util.List;
import java.util.Objects;

public class GetDataFrameJobsStatsAction extends Action<GetDataFrameJobsStatsAction.Response> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anomaly detection, the equivalent action is called GetJobStatsAction where job is singular. We might want to be consistent but I don't feel strongly about it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this with @droberts195 on a previous PR and we went all for plural, all the endpoint have changed to .../jobs/{id}/... GetDataFrameJob changed to GetDataFrameJobs, etc.

Anyway, it's a good point and I suggest to keep it as is for now but have a session about naming where we can go over all endpoints and then do a renaming PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ML actions are inconsistent in this respect: we have GetJobsStatsAction and GetDatafeedsStatsAction but GetJobStatsActionRequest and RestGetJobStatsAction. I think plural is correct.

@hendrikmuhs
Copy link
Author

hendrikmuhs commented Nov 28, 2018

addressed the review comments.

As #35471 has been merged yesterday, the stats API is going to break with the next merge, I will therefore do a master merge now and add fixes as part of this PR.

@hendrikmuhs
Copy link
Author

@droberts195 @dimitris-athanasiou Thanks for your comments, this PR should be ready now.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dimitris-athanasiou
Copy link
Contributor

dimitris-athanasiou commented Nov 28, 2018

Oh, just realised this is missing the REST endpoint specification which also enables writing YAML tests.

@hendrikmuhs
Copy link
Author

@dimitris-athanasiou Yes, you are right. This is true for all endpoints, tracked in the team repro issue number 7.

@hendrikmuhs hendrikmuhs merged commit 7a78e1f into elastic:feature/fib Nov 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :ml Machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants