-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SQL] [SPARK-6620] Speed up toDF() and rdd() functions by constructing converters in ScalaReflection #5279
Conversation
i += 1 | ||
} | ||
|
||
mutableRow | ||
new GenericRowWithSchema(ar, schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite related to this PR, but I don't think it was necessary to use GenericMutableRow here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. The original version uses a mutable row mostly because of the updates in the while loop I guess.
Test build #29441 has started for PR 5279 at commit |
Test build #29441 has finished for PR 5279 at commit
|
Test PASSed. |
I know you are probably still working on this - any benchmark numbers? |
cc @davies since you guys are both changing this part of the code lately. |
|
||
dataType match { | ||
// Check UDT first since UDTs can override other types | ||
case udt: UserDefinedType[_] => (item) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please break the line after the first =>
.
Went though very quickly for the first time, left some styling comments. |
Test build #29499 has started for PR 5279 at commit |
Test build #29499 has finished for PR 5279 at commit
|
Test PASSed. |
Here are the benchmark numbers: http://pastie.org/private/6vg7kg2yfwop2ov5zu2eq |
Test build #29514 has started for PR 5279 at commit |
Test build #29514 has finished for PR 5279 at commit
|
Test PASSed. |
ping |
@vlyubin who are you pinging? is this still "WIP"? |
@rxin Sorry, I just removed the WIP tag. There isn't anything more to add, as it turned out that there are no places here where we could use SpecificMutableRow to speed things up. |
Jenkins, retest this please. |
override def executeTake(limit: Int): Array[Row] = | ||
rows.map(ScalaReflection.convertRowToScala(_, schema)).take(limit).toArray | ||
override def executeCollect(): Array[Row] = { | ||
val converters = schema.fields.map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this pattern cannot be
val converter = CatalystTypeConverters.createToScalaConverter(schema)
rows.map(converter).toArray
?
It looks like this pretty efficiently handles this situation in the same way that you've extracted it here (even with calling convertRowWithConverters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, I'll update these.
Test build #29896 has started for PR 5279 at commit |
Test build #29897 has started for PR 5279 at commit |
Test build #29896 has finished for PR 5279 at commit
|
Test PASSed. |
Test build #29897 has finished for PR 5279 at commit
|
Test PASSed. |
@vlyubin Would you mind to add |
val convertedMap: HashMap[Any, Any] = HashMap() | ||
while (iter.hasNext) { | ||
val entry = iter.next() | ||
convertedMap += Tuple2(keyConverter(entry.getKey), valueConverter(entry.getValue)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use
convertedMap(keyConverter(entry.getKey)) = valueConverter(entry.getValue)
to avoid creating a tuple.
LGTM, @marmbrus would you mind doing a final pass? |
Test build #29988 has started for PR 5279 at commit |
Test build #29988 has finished for PR 5279 at commit
|
Test PASSed. |
} | ||
} | ||
|
||
/** Converts Catalyst types used internally in rows to standard Scala types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong comment style.
Test build #30051 has started for PR 5279 at commit |
Test build #30051 has finished for PR 5279 at commit
|
Test PASSed. |
case 1 => | ||
val func = function.asInstanceOf[(Any) => Any] | ||
val child0 = children(0) | ||
lazy val converter0 = CatalystTypeConverters.createToScalaConverter(child0.dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use lazy val
here?
cc @marmbrus