Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Datafeed _preview should work with configs #70264

Closed
sophiec20 opened this issue Mar 10, 2021 · 3 comments · Fixed by #70836
Closed

[ML] Datafeed _preview should work with configs #70264

sophiec20 opened this issue Mar 10, 2021 · 3 comments · Fixed by #70836
Assignees
Labels
>enhancement :ml Machine learning Team:ML Meta label for the ML team

Comments

@sophiec20
Copy link
Contributor

sophiec20 commented Mar 10, 2021

The anomaly detection datafeed _preview currently requires that the datafeed has already been created. https://www.elastic.co/guide/en/elasticsearch/reference/7.11/ml-preview-datafeed.html

As an enhancement, it would be useful to preview source data before creating it. For example:

  • To check if relevant data exists before creating the job - avoiding unnecessary object creation, unnecessary memory usage and in order to explain to users why no anomalies are being found
  • To check that the expected data will be analysed - useful if trying to hand craft complex aggs or scripted fields or complex filters
  • To simplify UI code which already performs a preview step of its own - will also make sure results match

Being able to preview source data by supplying a potential config, also aligns with POST _ml/data_frame/analytics/_explain and POST _transform/_preview, both of which can be performed prior to creating the job/transform.

@sophiec20 sophiec20 added >enhancement :ml Machine learning Team:ML Meta label for the ML team needs:triage Requires assignment of a team area label labels Mar 10, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@davidkyle davidkyle removed the needs:triage Requires assignment of a team area label label Mar 11, 2021
@benwtrent
Copy link
Member

In the UI wizard, neither the job nor the datafeed are created. So, the API will have to accept both configurations. Consequently, I am wondering if this should not be under the datafeed url but instead is an anomaly_detection/_preview. The user will have to provide BOTH configurations and we do prefer to think of datafeeds as dependent on anomaly_detection jobs and not the other way around.

Something like:

POST|GET _ml/anomaly_detectors/_preview
{
  "job_config": {...},
  "datafeed_config":{...}
}

Then you can also do something like:

POST|GET _ml/anomaly_detectors/<job_id>/_preview
{
  "datafeed_config":{...}
}

@dimitris-athanasiou
Copy link
Contributor

I think it makes sense to keep it under datafeeds. The response of these two APIs should be the same. And it is really a preview of what the datafeed sends to the job. Previewing an anomaly detector could imply more things.

I think the request still makes sense to expect a job_config and a datafeed_config. It would also be nice if it took in a job_id instead of a job_config so that it works with a job that has already been created and a datafeed that has not been created yet. That's a nice-to-have btw, definitely not a must.

@benwtrent benwtrent self-assigned this Mar 24, 2021
benwtrent added a commit that referenced this issue Mar 26, 2021
Previously, a datafeed and job must already exist for the `_preview` API to work.

With this change, users can get an accurate preview of the data that will be sent to the anomaly detection job
without creating either of them. 

closes #70264
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Mar 26, 2021
…#70836)

Previously, a datafeed and job must already exist for the `_preview` API to work.

With this change, users can get an accurate preview of the data that will be sent to the anomaly detection job
without creating either of them.

closes elastic#70264
benwtrent added a commit that referenced this issue Mar 26, 2021
…#70927)

Previously, a datafeed and job must already exist for the `_preview` API to work.

With this change, users can get an accurate preview of the data that will be sent to the anomaly detection job
without creating either of them.

closes #70264
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml Machine learning Team:ML Meta label for the ML team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants