Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4202][SQL] Simple DSL support for Scala UDF #3067

Closed
wants to merge 1 commit into from

Conversation

liancheng
Copy link
Contributor

This feature is based on an offline discussion with @mengxr, hopefully can be useful for the new MLlib pipeline API.

For the following test snippet

case class KeyValue(key: Int, value: String)
val testData = sc.parallelize(1 to 10).map(i => KeyValue(i, i.toString)).toSchemaRDD
def foo(a: Int, b: String) => a.toString + b

the newly introduced DSL enables the following syntax

import org.apache.spark.sql.catalyst.dsl._
testData.select(Star(None), foo.call('key, 'value) as 'result)

which is equivalent to

testData.registerTempTable("testData")
sqlContext.registerFunction("foo", foo)
sql("SELECT *, foo(key, value) AS result FROM testData")

@SparkQA
Copy link

SparkQA commented Nov 3, 2014

Test build #22797 has started for PR 3067 at commit f132818.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 3, 2014

Test build #22797 has finished for PR 3067 at commit f132818.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ScalaUdfBuilder[T: TypeTag](f: AnyRef)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22797/
Test PASSed.

@liancheng liancheng changed the title [SQL] Simple DSL support for Scala UDF [SPARK-4202][SQL] Simple DSL support for Scala UDF Nov 3, 2014
@asfgit asfgit closed this in c238fb4 Nov 3, 2014
asfgit pushed a commit that referenced this pull request Nov 3, 2014
This feature is based on an offline discussion with mengxr, hopefully can be useful for the new MLlib pipeline API.

For the following test snippet

```scala
case class KeyValue(key: Int, value: String)
val testData = sc.parallelize(1 to 10).map(i => KeyValue(i, i.toString)).toSchemaRDD
def foo(a: Int, b: String) => a.toString + b
```

the newly introduced DSL enables the following syntax

```scala
import org.apache.spark.sql.catalyst.dsl._
testData.select(Star(None), foo.call('key, 'value) as 'result)
```

which is equivalent to

```scala
testData.registerTempTable("testData")
sqlContext.registerFunction("foo", foo)
sql("SELECT *, foo(key, value) AS result FROM testData")
```

Author: Cheng Lian <[email protected]>

Closes #3067 from liancheng/udf-dsl and squashes the following commits:

f132818 [Cheng Lian] Adds DSL support for Scala UDF

(cherry picked from commit c238fb4)
Signed-off-by: Michael Armbrust <[email protected]>
@liancheng liancheng deleted the udf-dsl branch November 4, 2014 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants