Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adds support for 0x, X'...', x'...' type hex strings in udf:encode #6118

Merged
merged 4 commits into from
Sep 3, 2020

Conversation

cprasad1
Copy link
Contributor

@cprasad1 cprasad1 commented Aug 27, 2020

Description

Earlier, the encode udf would not recognize hex strings of types belong to {0x, X'...', x'...'} while converting to ascii, utf8 and base 64 and would give nulls. Now it recognizes them and converts them accordingly from hex to other formats. Behavior is similar to https://dev.mysql.com/doc/refman/8.0/en/hexadecimal-literals.html

Testing done

Unit + query-validation-tests + manual

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@cprasad1 cprasad1 requested a review from a team as a code owner August 27, 2020 20:24
@ghost
Copy link

ghost commented Aug 27, 2020

It looks like @cprasad1 hasn't signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement here.

Once you've signed reply with [clabot:check] to prove it.

Appreciation of efforts,

clabot

@cprasad1 cprasad1 marked this pull request as draft August 27, 2020 20:32
@cprasad1 cprasad1 marked this pull request as ready for review August 27, 2020 20:38
@cprasad1
Copy link
Contributor Author

[clabot:check]

@ghost
Copy link

ghost commented Aug 27, 2020

@confluentinc It looks like @cprasad1 just signed our Contributor License Agreement. 👍

Always at your service,

clabot

@vpapavas vpapavas self-assigned this Aug 27, 2020
@@ -1,3 +1,4 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an accidental change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this change from the PR please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@vpapavas vpapavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @cprasad1 ! Congratz on your first PR!
Before merging we need to make sure that both all checks are green. We are still waiting for jenkins but there seems to be something wrong with the PR title as well (though it's not obvious to me what).

@@ -41,6 +41,15 @@ public void shouldEncodeHexToAscii() {
assertThat(udf.encode("31202b2031203d2031", "hex", "ascii"), is("1 + 1 = 1"));
assertThat(udf.encode("ce95cebbcebbceacceb4ceb1", "hex", "ascii"), is("������������"));
assertThat(udf.encode("c39c6265726d656e736368", "hex", "ascii"), is("��bermensch"));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the string is 0x0x? Or 0X? Or X' without a closing quote? Are there any characters that are not allowed with these formats?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that you are worried about illegal hex literals (eg: 'p', 'x', '*' etc) in these formats. The function would throw an exception and we would get nulls in the end which was the default behavior before this was implemented. That is because this feature piggybacks on the pre-existing hex decoders that existed before these changes. I am adding a few more tests to demonstrate these corrupted formats.

Copy link
Contributor Author

@cprasad1 cprasad1 Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I noticed that the pre-existing encode udf couldn't deal with Hex of odd lengths (It would throw an exception). Now, it supports odd lengths by padding a "0" to the left in the 0x... format but would throw an exception with X'...' and x'...' formats which is consistent with https://dev.mysql.com/doc/refman/8.0/en/hexadecimal-literals.html. But I am not sure if we should support hex of odd lengths before without any of these special characters around them. It would just throw an exception like it used to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added more tests to demonstrate that behavior

@AlanConfluent
Copy link
Member

Not sure if this is part of a larger set of changes, but with this UDF, you already give the hint of what the string is encoded as, e.g. encode("4578616d706C6521", 'hex', 'ascii').

I thought that supporting a literal means that anywhere that you can use a normal ASCII string today, you could instead write 0x4578616d706C6521, without the need to explicitly call encode since the hint is in the "0x." I would assume that the internal representation is UTF8.

Copy link
Member

@vpapavas vpapavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cprasad1 cprasad1 closed this Sep 1, 2020
@cprasad1
Copy link
Contributor Author

cprasad1 commented Sep 1, 2020

Accidentally closed this. @agavra do we want to support the Hex literals via the encode() method or in general?

@cprasad1 cprasad1 reopened this Sep 1, 2020
@agavra
Copy link
Contributor

agavra commented Sep 1, 2020

I think this PR is a valuable improvement to the UDF, but the ticket (#5544) specifically refers to the ability to (as @AlanConfluent thought) write 0xdeadbeef anywhere and have it be recognized as a hex literal. You can see the corresponding MySQL documentation on HEX literals (linked in the above ticket): https://dev.mysql.com/doc/refman/8.0/en/hexadecimal-literals.html

@vpapavas
Copy link
Member

vpapavas commented Sep 3, 2020

My bad, when I asked @agavra about the ticket he pointed me to the encode UDF, so I thought that's what he meant. I know realize he meant to use the encode udf to support hex literals. We will still merge this PR since it is still valuable and a step in the right direction but NOT close the ticket. Do you agree? @AlanConfluent , @agavra ? If yes, can you give the green light please?

@agavra
Copy link
Contributor

agavra commented Sep 3, 2020

I haven't reviewed the PR, but I agree that the feature is valuable! Green light from my perspective (I trust your code review @vpapavas 😉 ).

@AlanConfluent
Copy link
Member

We will still merge this PR since it is still valuable and a step in the right direction but NOT close the ticket. Do you agree? @AlanConfluent , @agavra ? If yes, can you give the green light please?

Yes, that makes sense. The same logic can potentially just be applied in a different place and it's fine to have that as part of the UDF as well.

Copy link
Member

@AlanConfluent AlanConfluent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cprasad1
Copy link
Contributor Author

cprasad1 commented Sep 3, 2020

updated the PR description. Will merge as soon as checks are completed

@agavra agavra merged commit d492556 into confluentinc:master Sep 3, 2020
sarwarbhuiyan pushed a commit to sarwarbhuiyan/ksql that referenced this pull request Sep 4, 2020
@cprasad1 cprasad1 deleted the Support_HEX_Literals branch September 16, 2020 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants