-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8234][SQL] misc function: md5 #6779
Conversation
@rxin @liancheng can you trigger the unit test? |
Jenkins, ok to test. |
override def dataType: DataType = StringType | ||
|
||
override def expectedChildTypes: Seq[DataType] = | ||
if (child.dataType == BinaryType) Seq(BinaryType) else Seq(StringType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't make sense -- since it is using child's data type to check data type... you should remove ExpectsInputTypes and explicitly define the type check
Test build #34756 has finished for PR 6779 at commit
|
val s1 = 'a.string.at(0) | ||
val s2 = 'a.binary.at(0) | ||
checkEvaluation(Md5(s1), "902fbdd2b1df0c4f70b4a5d23525e932", create_row("ABC")) | ||
checkEvaluation(Md5(s2), "6ac1e56bc78f031059be7be854522c4c", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test would be simpler/clearer if you just pass literals to the expression. There are other tests that make sure that we can extract values from rows.
Additionally It would be good to test some edge cases: an empty string, null
, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for edge cases.
@qiansl127 you can write code like
checkEvaluation(Md5(Literal("ABC"), "902fbdd2b1df0c4f70b4a5d23525e932")
checkEvaluation(Md5(Literal.create(null, BinaryType), "...")
checkEvaluation(Md5(Literal.create(null, StringType), "...")
Test build #34765 has finished for PR 6779 at commit
|
////////////////////////////////////////////////////////////////////////////////////////////// | ||
|
||
/** | ||
* Calculates the MD5 digest and returns the value as a 32 character hex string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I set a bad example in strlen
, we'd better insert an empty line after the doc and right before the @ group
Test build #34907 has finished for PR 6779 at commit
|
Test build #34908 has finished for PR 6779 at commit
|
Test build #34922 has finished for PR 6779 at commit
|
Test build #34964 has finished for PR 6779 at commit
|
@rxin @chenghao-intel Any thing need to be improved? |
add a test using LGTM otherwise. |
|
||
override def expectedChildTypes: Seq[DataType] = Seq(BinaryType) | ||
|
||
override def checkInputDataTypes(): TypeCheckResult = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This override is not necessary now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean we can remove this function overriding, as we implemented the expectedChildTypes
already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, please see doc for ExpectsInputTypes
. We don't need to do type check for it.
Test build #35013 has finished for PR 6779 at commit
|
LGTM, cc @rxin |
} | ||
} | ||
|
||
override def toString: String = s"MD5($child)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think you need the toString here ... since case class already handles it.
Looks good - but please add a codegen version of this. |
override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = { | ||
defineCodeGen(ctx, ev, c => | ||
s"""org.apache.spark.unsafe.types.UTF8String.fromString | ||
|(org.apache.commons.codec.digest.DigestUtils.md5Hex($c))""".stripMargin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a new line here? maybe just
"abcd" +
"efgh"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably not a big problem, as we don't want to make the generated code more than 100 characters per line, right? Or @qiansl127 can update that in the follow up PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have called stripMargin
, so it will still result in one line.
Anyway I think the current code is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the generated code can be as long as we want.
in this case indenting this way is really weird. we should indent it this way
"org.apache.spark.unsafe.types.UTF8String.fromString" +
s"(org.apache.commons.codec.digest.DigestUtils.md5Hex($c))"
Test build #35210 has finished for PR 6779 at commit
|
@rxin I think this is ready to be merged. |
Can you fix the indent? LGTM other than that. |
Test build #35228 has finished for PR 6779 at commit
|
@rxin Done! |
Thanks. Merging. |
No description provided.