Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] domainSplit painless extension not available in runtime fields #71946

Open
davidkyle opened this issue Apr 20, 2021 · 7 comments
Open

[ML] domainSplit painless extension not available in runtime fields #71946

davidkyle opened this issue Apr 20, 2021 · 7 comments
Labels
:ml Machine learning Team:ML Meta label for the ML team

Comments

@davidkyle
Copy link
Member

The ml plugin defines a custom painless extension function domainSplit which splits a URL into the highest registered domain and subdomains. The function can be called via script_fields in a search but if called in runtime_mappings the script errors with:
Unknown call [domainSplit] with [[org.elasticsearch.painless.node.EDot@156a1756]] arguments

To Reproduce

PUT /domain-split-test
{
  "mappings":{
    "properties": {
      "url": {
        "type": "keyword" 
      }
    }
  }
}

POST /domain-split-test/_doc
{
  "url":"www.elastic.co"
}

# OK with script fields 
GET /domain-split-test/_search
{
  "script_fields":{
    "sub":{
      "script":"return domainSplit(doc['url'].value).get(0)"
    },
    "hrd":{
      "script":"return domainSplit(doc['url'].value).get(1);"
    }
  }
}

# Error using RT mappings
GET /domain-split-test/_search
{
  "runtime_mappings":{
    "sub":{
      "type": "keyword",
      "script":"emit(domainSplit(doc['url'].value).get(0))"
    },
    "hrd":{
      "type": "keyword",
      "script":"emit(domainSplit(doc['url'].value).get(1))"
    }
  }
}
@davidkyle davidkyle added the :ml Machine learning label Apr 20, 2021
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Apr 20, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@javanna
Copy link
Member

javanna commented Apr 20, 2021

I had a look at this, and it looks like the domainSplit function is registered by the MachineLearningPainlessExtension and enhances the existing script_field context. Runtime fields have different contexts, one per data type. It is possible to enhance the existing runtime fields contexts as well in the same way. I wonder if this is needed, and if so if the function should be built-in.

@javanna javanna added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Apr 20, 2021
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Apr 20, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@droberts195
Copy link
Contributor

#36359 has been open for years to get the domainSplit function moved to core Elasticsearch, and it's on the list of TODO items in the meta issue #48986.

@davidkyle
Copy link
Member Author

A Registered domain ingest processor has been added to the 7.13 release in #67611. The implementation is different to the one in the ml plugin as it uses org.apache.http.conn.util.PublicSuffixMatcher although both use the same public suffix list. If domainSplit'() becomes a built-in function it would make sense to use just one of the implementations throughout.

@javanna
Copy link
Member

javanna commented Apr 21, 2021

Thanks for the context: I have nothing against adding the function to the runtime fields contexts, but it looks like it may needed in other places too and this was raised before. I will leave it for the @elastic/es-core-infra team to discuss and decide.

@stu-elastic
Copy link
Contributor

Let's move this into the core whitelists, as there is nothing core here, I'm going to remove the scripting label. Please reach out if you need help with the work.

@stu-elastic stu-elastic removed the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label May 27, 2021
@elasticmachine elasticmachine removed the Team:Core/Infra Meta label for core/infra team label May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning Team:ML Meta label for the ML team
Projects
None yet
Development

No branches or pull requests

5 participants