Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Simplify the build with sbt 0.13.2 features #706

Closed

Conversation

jaceklaskowski
Copy link
Contributor

It's a WIP, but am pull request'ing the current changes hoping that someone from the dev team would have a look at the changes and guide me how to merge them to the project.

Ultimately I'd like to refactor the build to make it easier to understand.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@pwendell
Copy link
Contributor

pwendell commented May 9, 2014

Hey @jaceklaskowski - this is nice to have as a clean up. Just a heads up, we're likely to restructure the sbt build to read dependencies from maven since right now we are maintaining two builds. So we might wait until we see what that looks like before we consider other major changes to the build.

@jaceklaskowski
Copy link
Contributor Author

Hey @pwendell, thanks for prompt response! I'd love being engaged in the effort if possible. Where could we discuss how much I could do regarding sbt (I think I might be quite helpful here and there)? Is there a JIRA issue I could be following the progress of the task.

Please guide me so the project could benefit from some of spare time I could devote.

@jaceklaskowski jaceklaskowski changed the title Simplify the build with sbt 0.13.2 features [WIP] Simplify the build with sbt 0.13.2 features May 9, 2014
@markhamstra
Copy link
Contributor

import sbtassembly.Plugin._
import AssemblyKeys._
import scala.util.Properties
import util.Properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will keep that in mind when sending patches in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@jaceklaskowski
Copy link
Contributor Author

As to the changes I proposed in the PR, I think that however the future steps with sbt-pom-reader they're easily applicable to the build. They (are supposed to) simplify the build definition format to leverage sbt 0.13.2 macros and are expected not to interfere with the upcoming changes. They're slated for 1.1.0 after all.

I'm writing all this hoping to convince you guys, the committers, to accept the pr so I can propose others ;-) I'd like to play a bit with the optional project inclusions as I think there's too many env vars to work with the different versions of Hadoop.

@jaceklaskowski
Copy link
Contributor Author

Knock, knock. Could I ask to approve the changes or let me know what portion to change to get it in?

@@ -297,7 +273,7 @@ object SparkBuild extends Build {
val chillVersion = "0.3.6"
val codahaleMetricsVersion = "3.0.0"
val jblasVersion = "1.2.3"
val jets3tVersion = if ("^2\\.[3-9]+".r.findFirstIn(hadoopVersion).isDefined) "0.9.0" else "0.7.1"
val jets3tVersion = "^2\\.[3-9]+".r.findFirstIn(hadoopVersion).fold("0.7.1")(_ => "0.9.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tangential question. Is this more idiomatic scala? I've seen fold as an alternative to if-else or use of Option, but had thought it harder to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I myself learnt it not so long ago and noticed it spurred some discussions about it in the Scala community (with Martin Ordersky himself).

Functional Programming in Scala reads on page 69 (in the pdf version):

"It’s fine to use pattern matching, though you should be able to implement all the functions besides map and getOrElse without resorting to pattern matching."

So I followed the advice and applied Option.map(...).getOrElse(...) quite intensively, but...IntelliJ IDEA just before I committed the change had suggested to replace it with fold. I thought I needed to change my habits once more and sent the PR with what IDEA offered.

With all that said, it's not clear what's the most idiomatic approach, but pattern matching is in my opinion a step back from map/getOrElse and there's no need for it.

I'd appreciate being corrected and could even replace Option.fold with Option.map/Option.getOrElse if that would make the PR better for more committers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this a while ago, soon after Spark development moved to Scala 2.10. The overwhelming preference was that we use map/getOrElse instead of fold. When working on Spark, consider turning off that particular inspection and warning in the IntelliJ preferences.

@ScrapCodes
Copy link
Member

This is what Patrick was trying to say https://issues.apache.org/jira/browse/SPARK-1776. So while we have this on our mind, even if we merge this patch it has to eventually be replaced entirely. We are still experimenting and lets see if we all agree to this. OTOH We should retain the comments, in case we decide to merge this patch.

Jenkins, test this please.

@jaceklaskowski
Copy link
Contributor Author

Thanks! I've said it before (when @pwendell asked to hold off) and now I'll say it again as my changes don't seem to find home soon before the "We are still experimenting"'s over. When is the experimentation happening? Is there a branch for it? Is there a discussion on the mailing list(s) how it's going to be done? I'd appreciate more openness in this regard (to avoid Kafka's case where they moved to gradle for no apparent reasons other than that they didn't seem to have cared to learn sbt enough).

@aarondav
Copy link
Contributor

Here is a link to our mailing list discussion on the topic:
http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-tp2315.html

Note that also the biggest time sink right now is preparing the 1.0
release, many people are running around trying to finalize APIs and bug
fixes before 1.0 is shipped, and don't have so much time at the moment to
closely examine features for 1.1.

On Sun, May 11, 2014 at 3:24 PM, Jacek Laskowski
[email protected]:

Thanks! I've said it before (when @pwendell https://github.com/pwendellasked to hold off) and now I'll say it again as my changes don't seem to
find home soon before the "We are still experimenting"'s over. When is
the experimentation happening? Is there a branch for it? Is there a
discussion on the mailing list(s) how it's going to be done? I'd appreciate
more openness in this regard (to avoid Kafka's case where they moved to
gradle for no apparent reasons other than that they didn't seem to have
cared to learn sbt enough).


Reply to this email directly or view it on GitHubhttps://github.com//pull/706#issuecomment-42785356
.

@@ -55,81 +58,77 @@ object SparkBuild extends Build {
val SCALAC_JVM_VERSION = "jvm-1.6"
val JAVAC_JVM_VERSION = "1.6"

lazy val root = Project("root", file("."), settings = rootSettings) aggregate(allProjects: _*)
lazy val root = project in file(".") settings (rootSettings: _*) aggregate(allProjects: _*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm actually let me step back. Is this change purporting to make the style better, or is there a functionality change?

@pwendell
Copy link
Contributor

pwendell commented Aug 1, 2014

I think this we should close this issue now because we separately re-factored this build.

@asfgit asfgit closed this in 87738bf Aug 2, 2014
agirish pushed a commit to HPEEzmeral/apache-spark that referenced this pull request May 5, 2022
* [SPARK-29444] Add configuration to support JacksonGenrator to keep fields with null values

As mentioned in jira, sometimes we need to be able to support the retention of null columns when writing JSON.
For example, sparkmagic(used widely in jupyter with livy) will generate sql query results based on DataSet.toJSON and parse JSON to pandas DataFrame to display. If there is a null column, it is easy to have some column missing or even the query result is empty. The loss of the null column in the first row, may cause parsing exceptions or loss of entire column data.

Example in spark-shell.
scala> spark.sql("select null as a, 1 as b").toJSON.collect.foreach(println)
{"b":1}

scala> spark.sql("set spark.sql.jsonGenerator.struct.ignore.null=false")
res2: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.sql("select null as a, 1 as b").toJSON.collect.foreach(println)
{"a":null,"b":1}

Add new test to JacksonGeneratorSuite

Lead-authored-by: stczwd <[email protected]>
Co-authored-by: Jackey Lee <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

(cherry picked from commit with id 78b0cbe)

* [SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFields in json generating

# What changes were proposed in this pull request?
Add description for ignoreNullFields, which is commited in apache#26098 , in DataFrameWriter and readwriter.py.
Enable user to use ignoreNullFields in pyspark.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
run unit tests

Closes apache#26227 from stczwd/json-generator-doc.

Authored-by: stczwd <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

Co-authored-by: stczwd <[email protected]>
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Dec 8, 2023
…xy user in cluster mode (apache#706)

Backporting fix for SPARK-41958 to 3.3 branch from apache#39474
Below description from original PR.

--------------------------

### What changes were proposed in this pull request?

This PR proposes to disallow arbitrary custom classpath with proxy user in cluster mode by default.

### Why are the changes needed?

To avoid arbitrary classpath in spark cluster.

### Does this PR introduce _any_ user-facing change?

Yes. User should reenable this feature by `spark.submit.proxyUser.allowCustomClasspathInClusterMode`.

### How was this patch tested?

Manually tested.

Closes apache#39474 from Ngone51/dev.

Lead-authored-by: Peter Toth <peter.tothgmail.com>



(cherry picked from commit 909da96)

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes apache#41428 from degant/spark-41958-3.3.

Lead-authored-by: Degant Puri <[email protected]>

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Dec 11, 2023
…xy user in cluster mode (apache#706)

Backporting fix for SPARK-41958 to 3.3 branch from apache#39474
Below description from original PR.

--------------------------

### What changes were proposed in this pull request?

This PR proposes to disallow arbitrary custom classpath with proxy user in cluster mode by default.

### Why are the changes needed?

To avoid arbitrary classpath in spark cluster.

### Does this PR introduce _any_ user-facing change?

Yes. User should reenable this feature by `spark.submit.proxyUser.allowCustomClasspathInClusterMode`.

### How was this patch tested?

Manually tested.

Closes apache#39474 from Ngone51/dev.

Lead-authored-by: Peter Toth <peter.tothgmail.com>



(cherry picked from commit 909da96)

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes apache#41428 from degant/spark-41958-3.3.

Lead-authored-by: Degant Puri <[email protected]>

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024
* [SPARK-29444] Add configuration to support JacksonGenrator to keep fields with null values

As mentioned in jira, sometimes we need to be able to support the retention of null columns when writing JSON.
For example, sparkmagic(used widely in jupyter with livy) will generate sql query results based on DataSet.toJSON and parse JSON to pandas DataFrame to display. If there is a null column, it is easy to have some column missing or even the query result is empty. The loss of the null column in the first row, may cause parsing exceptions or loss of entire column data.

Example in spark-shell.
scala> spark.sql("select null as a, 1 as b").toJSON.collect.foreach(println)
{"b":1}

scala> spark.sql("set spark.sql.jsonGenerator.struct.ignore.null=false")
res2: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.sql("select null as a, 1 as b").toJSON.collect.foreach(println)
{"a":null,"b":1}

Add new test to JacksonGeneratorSuite

Lead-authored-by: stczwd <[email protected]>
Co-authored-by: Jackey Lee <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

(cherry picked from commit with id 78b0cbe)

* [SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFields in json generating

# What changes were proposed in this pull request?
Add description for ignoreNullFields, which is commited in apache#26098 , in DataFrameWriter and readwriter.py.
Enable user to use ignoreNullFields in pyspark.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
run unit tests

Closes apache#26227 from stczwd/json-generator-doc.

Authored-by: stczwd <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

Co-authored-by: stczwd <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants