Skip to content

Commit

Permalink
Merge pull request #311 from InterestingLab/garyelephant.fea.doctor
Browse files Browse the repository at this point in the history
Removed all duplicate dependencies for spark and hadoop in assembly jar
  • Loading branch information
RickyHuo authored May 12, 2019
2 parents dd2db18 + 1c69db3 commit 4926d85
Show file tree
Hide file tree
Showing 4 changed files with 50 additions and 20 deletions.
36 changes: 30 additions & 6 deletions build.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ Building Tool: SBT
```
sbt scalastyle
```
## Show Dependency Tree

```
sbt dependencyTree > tree.log
```

## build and package

Expand All @@ -23,24 +28,32 @@ sbt package
Build fat jar(with all dependencies) for Spark Application

```
# Linux, MACOS, Windows
# on Linux, MACOS
sbt "-DprovidedDeps=true" clean assembly
# on Windows
set JAVA_TOOL_OPTIONS='-Dfile.encoding=UTF8'
sbt package
```

Package Distribution

```
sbt -DprovidedDeps=true universal:packageBin
# on Linux, Mac
sbt "-DprovidedDeps=true" universal:packageBin
# you can find distribution here:
# on Windows
set JAVA_TOOL_OPTIONS='-Dfile.encoding=UTF8'
sbt "-DprovidedDeps=true" universal:packageBin
# When packaging finished, you can find distribution here:
target/universal/waterdrop-<version>.zip
```

If you want to check what files/directories will be included distribution package

```
sbt -DprovidedDeps=true stage
sbt "-DprovidedDeps=true" stage
ls ./target/universal/stage/
```

Expand All @@ -50,8 +63,19 @@ check sbt native packager [universal plugin](http://www.scala-sbt.org/sbt-native

## FAQs

1. Intellij Idea doesn't recognize antlr4 generated source ?
1. Intellij Idea doesn't recognize antlr4 generated source ?

File -> Project Structure -> Modules, in `Sources` Tab,
mark directory `target/scala-2.11/src_managed/main/antlr4/` as `Sources`(blue icon)

2. OutOfMemoryError occurs while compilation ?

```
# Linux, Mac
export JAVA_OPTS=-Xmx4G
sbt ...
# Windows
set JAVA_OPTS=-Xmx4G
sbt ...
```
2 changes: 1 addition & 1 deletion build.sbt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name := "Waterdrop"
version := "1.3.3"
version := "1.3.4"
organization := "io.github.interestinglab.waterdrop"

scalaVersion := "2.11.8"
Expand Down
6 changes: 6 additions & 0 deletions waterdrop-apis/build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ scalaVersion := "2.11.8"

val sparkVersion = "2.4.0"

// We should put all spark or hadoop dependencies here,
// if coresponding jar file exists in jars directory of online Spark distribution,
// such as spark-core-xxx.jar, spark-sql-xxx.jar
// or jars in Hadoop distribution, such as hadoop-common-xxx.jar, hadoop-hdfs-xxx.jar
lazy val providedDependencies = Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
Expand Down Expand Up @@ -33,4 +37,6 @@ unmanagedJars in Compile += file("lib/config-1.3.3-SNAPSHOT.jar")
libraryDependencies ++= Seq(
)

// TODO: exclude spark, hadoop by for all dependencies

dependencyOverrides += "com.google.guava" % "guava" % "15.0"
26 changes: 13 additions & 13 deletions waterdrop-core/build.sbt
Original file line number Diff line number Diff line change
@@ -1,16 +1,21 @@
name := "Waterdrop-core"
version := "1.3.3"
version := "1.3.4"
organization := "io.github.interestinglab.waterdrop"

scalaVersion := "2.11.8"


val sparkVersion = "2.4.0"

// We should put all spark or hadoop dependencies here,
// if coresponding jar file exists in jars directory of online Spark distribution,
// such as spark-core-xxx.jar, spark-sql-xxx.jar
// or jars in Hadoop distribution, such as hadoop-common-xxx.jar, hadoop-hdfs-xxx.jar
lazy val providedDependencies = Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-hive" % sparkVersion
)

// Change dependepcy scope to "provided" by : sbt -DprovidedDeps=true <task>
Expand All @@ -33,11 +38,14 @@ unmanagedJars in Compile += file("lib/config-1.3.3-SNAPSHOT.jar")

libraryDependencies ++= Seq(

// ------ Spark Dependencies ---------------------------------
// spark distribution doesn't provide this dependency.
"org.apache.spark" %% "spark-streaming-kafka-0-10" % sparkVersion
exclude("org.spark-project.spark", "unused")
exclude("net.jpountz.lz4", "unused"),
"org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion,
"org.apache.spark" %% "spark-hive" % sparkVersion ,
// --------------------------------------------------------

"org.mongodb.spark" %% "mongo-spark-connector" % "2.2.0",
"org.apache.kudu" %% "kudu-spark2" % "1.7.0",
"com.alibaba" % "QLExpress" % "3.2.0",
Expand All @@ -56,18 +64,10 @@ libraryDependencies ++= Seq(
exclude("com.google.guava","guava")
excludeAll(ExclusionRule(organization="com.fasterxml.jackson.core")),
"com.databricks" %% "spark-xml" % "0.5.0",
"org.apache.httpcomponents" % "httpasyncclient" % "4.1.3",
"com.databricks" %% "spark-xml" % "0.5.0"
"org.apache.httpcomponents" % "httpasyncclient" % "4.1.3"
).map(_.exclude("com.typesafe", "config"))



//excludeDependencies += "com.typesafe" % "config" % "1.2.0"
//excludeDependencies += "com.typesafe" % "config" % "1.2.1"

//excludeDependencies ++= Seq(
// ExclusionRule("com.typesafe", "config")
//)
// TODO: exclude spark, hadoop by for all dependencies

// For binary compatible conflicts, sbt provides dependency overrides.
// They are configured with the dependencyOverrides setting.
Expand Down

0 comments on commit 4926d85

Please sign in to comment.