Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ARRAY_NORMALIZE function #16159

Merged
merged 1 commit into from
Jun 3, 2021
Merged

Conversation

yuanzhanhku
Copy link
Contributor

fixes #16134

Normalizes array x by dividing each element by the p-norm of the array.
It is equivalent to
TRANSFORM(array, v -> v / REDUCE(array, 0, (a, v) -> a + POW(ABS(v), p), a -> POW(a, 1 / p)).
But the reduce part is only executed once to improve performance.
Returns null if the array is null or there are null array elements.

Test plan

  • Added unit tests in src/test/java/com/facebook/presto/operator/scalar/TestArrayNormalizeFunction.java.
  • Also started a presto server locally and tested the change manually.
== RELEASE NOTES ==

General Changes
* Added array_normalize function

@yuanzhanhku yuanzhanhku requested a review from highker May 26, 2021 17:30
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented May 26, 2021

CLA Signed

The committers are authorized under a signed CLA.

  • ✅ Zhan Yuan (f50c6ed85f489bcea4f3f0f3e7e529e9c00bb813)

@yuanzhanhku yuanzhanhku force-pushed the master branch 3 times, most recently from a087f38 to 254cb32 Compare May 27, 2021 23:52
Copy link
Contributor

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits only

@highker highker self-assigned this May 31, 2021
@yuanzhanhku yuanzhanhku force-pushed the master branch 4 times, most recently from a2b1a3e to f50c6ed Compare June 2, 2021 01:06
Copy link
Contributor

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more nit

presto-docs/src/main/sphinx/functions/array.rst Outdated Show resolved Hide resolved
@yuanzhanhku yuanzhanhku requested a review from highker June 2, 2021 16:31
@yuanzhanhku yuanzhanhku force-pushed the master branch 2 times, most recently from 3ef8fe2 to 0c51328 Compare June 2, 2021 19:13
Normalizes array ``x`` by dividing each element by the p-norm of the array.
It is equivalent to
  `TRANSFORM(array, v -> v / REDUCE(array, 0, (a, v) -> a + POW(ABS(v), p), a -> POW(a, 1 / p))`.
But the reduce part is only executed once to improve performance.
Returns null if the array is null or there are null array elements.
@highker highker merged commit 9095346 into prestodb:master Jun 3, 2021
double pNorm = 0;
for (int i = 0; i < elementCount; i++) {
if (block.isNull(i)) {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should work like an "Aggregation" so ignore nulls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Before I prepared this commit, I discussed with erikbrinkman who requested this function in issue #16134. There was no specific requirement for how to handle null element.

Technically, this is not an Aggregation function as it returns an array instead of a scalar value. Thus there is no absolute right and wrong answers for this custom function. We just need to make sure the function comment states the behavior correctly. If you think ignoring nulls is better, I am happy to change this behavior.

return blockBuilder.build();
}

private interface ValueAccessor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular thing of every operation adding their own classes for typed accessors is becoming too much code bloat. See if you can make it some kind of utility class at the top level. For example, I had to do something similar in MapUnionSum:

type.writeLong(blockBuilder, BigintOperators.add(type.getLong(block1, position1), type.getLong(block2, position2)));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree. I took a closer look at the Type interface. I think it would be possible to move the type specific accessors into the subclasses of Types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prepared #16209 to address this. Please see if the change makes sense.

@jainxrohit jainxrohit mentioned this pull request Jun 5, 2021
4 tasks
@ajaygeorge ajaygeorge mentioned this pull request Jun 9, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UDF Suggestions: ARRAY_NORMALIZE and ARRAY_AGG_SUM / ARRAY_AGG_AVERAGE
3 participants