Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] separate pivot and extract function interface #58744

Merged
merged 12 commits into from
Jul 14, 2020

Conversation

hendrikmuhs
Copy link

@hendrikmuhs hendrikmuhs commented Jun 30, 2020

separate pivot from the indexer and introduce an abstraction layer, pivot becomes a function. This opens
the possibility to add more functions to transform.

cleaned up version of #55287 after removing experimental code

piggy backed fixes:

  • when running geo tile group_by it could fail due to query clause limit (unreleased)
  • new style page size using settings was not validating limit of 10k (7.8)

Review hints

  • the most important abstraction is the Function interface
  • change collection - the essential part for continuous transform - has been abstracted as Function.ChangeCollector, all pivot related parts are now in CompositeBucketsChangeCollector

ToDo's

Must

  • abstract progress gatherer

Found problems not introduced by this PR

  • update should validate ingest pipeline
  • GeoTileFieldCollector can not run with pagesize > 1024
  • validateConfig/Query should use ValidationException

@hendrikmuhs hendrikmuhs marked this pull request as ready for review June 30, 2020 15:02
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to read through it a second time.

The separation of "function" from "data" gives us flexibility. But, it seems we might be sacrificing polymorphism. I am unsure the best way to reconcile this right now as it is designed.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I like the redesign it makes the code cleaner

static Map<String, FieldCollector> createFieldCollectors(Map<String, SingleGroupSource> groups, String synchronizationField) {
Map<String, FieldCollector> fieldCollectors = new HashMap<>();

for (Entry<String, SingleGroupSource> entry : groups.entrySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what we are gaining by not having SingleGroupSource have a createFieldCollector method. We would avoid this switch statement.

Are we just trying to keep from adding classes to core?

Copy link
Author

@hendrikmuhs hendrikmuhs Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it seems unavoidable, config and implementation should be separate, like PivotConfig and Pivot. so far it was simple enough to put in into ...GroupSource (which was totally ok). With the re-factoring it gets more complicated and there are already plans, that will add more code (maybe one day we end up with 1 class per GroupSource).

Yes keeping code away from core is a reason. If I would add it the implementation to ...GroupSource, I would not only need to move the code there, but also the collector interface.

I think this change is similar to what is done in other places. The boiler-plate (switch) isn't great, but I do not see a need for something registry-style (like aggregations), at least at the moment.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

👍 nice change

@hendrikmuhs hendrikmuhs force-pushed the transform-refactor-functions-2 branch from b4bb7cd to 12601c2 Compare July 13, 2020 11:29
@hendrikmuhs hendrikmuhs merged commit 3280bd2 into elastic:master Jul 14, 2020
@hendrikmuhs hendrikmuhs deleted the transform-refactor-functions-2 branch July 14, 2020 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants