[SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevant POM fix ups #20923

steveloughran · 2018-03-28T18:08:57Z

What changes were proposed in this pull request?

Adds a hadoop-3.1 profile build depending on the hadoop-3.1 artifacts.
In the hadoop-cloud module, adds an explicit hadoop-3.1 profile which switches from explicitly pulling in cloud connectors (hadoop-openstack, hadoop-aws, hadoop-azure) to depending on the hadoop-cloudstorage POM artifact, which pulls these in, has pre-excluded things like hadoop-common, and stays up to date with new connectors (hadoop-azuredatalake, hadoop-allyun). Goal: it becomes the Hadoop projects homework of keeping this clean, and the spark project doesn't need to handle new hadoop releases adding more dependencies.
the hadoop-cloud/hadoop-3.1 profile also declares support for jetty-ajax and jetty-util to ensure that these jars get into the distribution jar directory when needed by unshaded libraries.
Increases the curator and zookeeper versions to match those in hadoop-3, fixing spark core to build in sbt with the hadoop-3 dependencies.

How was this patch tested?

Everything this has been built and tested against both ASF Hadoop branch-3.1 and hadoop trunk.
spark-shell was used to create connectors to all the stores and verify that file IO could take place.

The spark hive-1.2.1 JAR has problems here, as it's version check logic fails for Hadoop versions > 2.

This can be avoided with either of

The hadoop JARs built to declare their version as Hadoop 2.11 mvn install -DskipTests -DskipShade -Ddeclared.hadoop.version=2.11 . This is safe for local test runs, not for deployment (HDFS is very strict about cross-version deployment).
A modified version of spark hive whose version check switch statement is happy with hadoop 3.

I've done both, with maven and SBT.

Three issues surfaced

A spark-core test failure —fixed in SPARK-23787.
SBT only: Zookeeper not being found in spark-core. Somehow curator 2.12.0 triggers some slightly different dependency resolution logic from previous versions, and Ivy was missing zookeeper.jar entirely. This patch adds the explicit declaration for all spark profiles, setting the ZK version = 3.4.9 for hadoop-3.1
Marking jetty-utils as provided in spark was stopping hadoop-azure from being able to instantiate the azure wasb:// client; it was using jetty-util-ajax, which could then not find a class in jetty-util.

…rage artifacts and binding Change-Id: Ia4526f184ced9eef5b67aee9e91eced0dd38d723

SparkQA · 2018-03-28T18:13:52Z

Test build #88670 has finished for PR 20923 at commit 29e7324.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class BindingParquetOutputCommitter(
class PathOutputCommitProtocol(

vanzin

I only looked at the build stuff so far...

vanzin · 2018-03-28T18:44:04Z

hadoop-cloud/pom.xml

+
+      <build>
+        <plugins>
+          <!-- Include a source dir depending on the Scala version -->


Not really based on the Scala version right?

my bad. Cut and paste error. Will make explicit what it's really doing.

vanzin · 2018-03-28T18:44:30Z

hadoop-cloud/pom.xml

+            </executions>
+          </plugin>
+        </plugins>
+


nit: remove

vanzin · 2018-03-28T18:48:09Z

hadoop-cloud/pom.xml

+
+        <!--
+        There's now a hadoop-cloud-storage which transitively pulls in the store JARs,
+        but it still needs some selective exclusion across versions, especially 3.0.x.


Can you expand a little on why the exclusions are needed? Some look a bit suspicious.

e.g. hadoop-common is already pulled transitively by other parts of Spark (including this very module, which does so via hadoop-client), so I'm not sure why the explicit exclusion is needed.

Excluding hadoop-client means there's no need to worry about any of the stuff explicitly excluded from hadoop-client in the spark root pom (asm/asm, jackson, etc).

Hadoop 3.0.1 declares hadoop-client as a compile time dependency of hadoop-cloud-storage

From 3.0.2+ it's been cut down to provided, and added azure-datalake as a dependency commit 3c03672e, so it's complete w.r.t ASF connectors.
There's also a fix for the aws shaded SDK to exclude netty HADOOP-15264, because of aws-sdk-java/issues/1488.

The individual hadoop cloud modules (hadoop-aws, hadoop-azure, ...) have also downgraded hadoop-client to being provided, so if you pull in any of those, you will only get the extra artifacts needed to connect to the relevant cloud endpoint, and are expected to pull in the same hadoop-client version elsewhere for things to work.

Here's the dependency list for spark-hadoop-cloud and 3.0.2-SNAPSHOT; 3.1 will be the same unless there's a last minute update to one of the external SDKs or jetty.

[INFO] +- org.apache.hadoop:hadoop-cloud-storage:jar:3.0.2-SNAPSHOT:compile [INFO] | +- org.apache.hadoop:hadoop-aliyun:jar:3.0.2-SNAPSHOT:compile [INFO] | | \- com.aliyun.oss:aliyun-sdk-oss:jar:2.8.3:compile [INFO] | | \- org.jdom:jdom:jar:1.1:compile [INFO] | +- org.apache.hadoop:hadoop-aws:jar:3.0.2-SNAPSHOT:compile [INFO] | | \- com.amazonaws:aws-java-sdk-bundle:jar:1.11.271:compile [INFO] | +- org.apache.hadoop:hadoop-azure:jar:3.0.2-SNAPSHOT:compile [INFO] | | +- com.microsoft.azure:azure-storage:jar:5.4.0:compile [INFO] | | | \- com.microsoft.azure:azure-keyvault-core:jar:0.8.0:compile [INFO] | | \- org.eclipse.jetty:jetty-util-ajax:jar:9.3.19.v20170502:compile [INFO] | +- org.apache.hadoop:hadoop-azure-datalake:jar:3.0.2-SNAPSHOT:compile [INFO] | | \- com.microsoft.azure:azure-data-lake-store-sdk:jar:2.2.5:compile [INFO] | \- org.apache.hadoop:hadoop-openstack:jar:3.0.2-SNAPSHOT:compile

Given that Hadoop 3.0.2+ is downgrading hadoop-client to provided, and that's the minimum version this patch will build against, then the exclusion is mostly superfluous: there to block regressions than actually keep it out.

Update, tried taking it out. Turns out that the hadoop-allyun module still declares hadoop-common as provided, so it gets. in. The exclusion is needed, and I've filed HADOOP-15354. I don't know if that'll get into Hadoop 3.1.0 tho'

vanzin · 2018-03-28T18:53:51Z

hadoop-cloud/pom.xml

+    <profile>
+      <id>hadoop-2.6</id>
+      <activation>
+        <activeByDefault>true</activeByDefault>


activeByDefault is a little misleading. It only enables the profile if you don't explicitly activate any other profiles.

So if you enable any other profile in the build, this won't be enabled automatically. And since the cloud module itself is already under a profile, I don't think you can ever trigger this.

Probably will need to be documented in the build docs, or maybe you can think of a different solution like enabling the cloud profile via a property instead.

Hmmm. There's another option which is to leave all those in the standard list, and you get a few extra dependencies which aren't needed for the 3.x line:

[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.6.7.1:compile * [INFO] | \- com.fasterxml.jackson.core:jackson-core:jar:2.6.7:compile * [INFO] +- com.fasterxml.jackson.core:jackson-annotations:jar:2.6.7:compile * [INFO] +- com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:jar:2.6.7:compile * [INFO] +- org.apache.httpcomponents:httpclient:jar:4.5.4:compile [INFO] | +- commons-logging:commons-logging:jar:1.2:compile [INFO] | \- commons-codec:commons-codec:jar:1.10:compile [INFO] +- org.apache.httpcomponents:httpcore:jar:4.4.8:compile [INFO] +- org.apache.hadoop:hadoop-aws:jar:3.0.2-SNAPSHOT:compile [INFO] | \- com.amazonaws:aws-java-sdk-bundle:jar:1.11.271:compile [INFO] +- org.apache.hadoop:hadoop-openstack:jar:3.0.2-SNAPSHOT:compile [INFO] +- joda-time:joda-time:jar:2.9.3:compile * [INFO] +- org.apache.hadoop:hadoop-cloud-storage:jar:3.0.2-SNAPSHOT:compile [INFO] | +- org.apache.hadoop:hadoop-aliyun:jar:3.0.2-SNAPSHOT:compile [INFO] | | \- com.aliyun.oss:aliyun-sdk-oss:jar:2.8.3:compile [INFO] | | \- org.jdom:jdom:jar:1.1:compile [INFO] | +- org.apache.hadoop:hadoop-azure:jar:3.0.2-SNAPSHOT:compile [INFO] | | +- com.microsoft.azure:azure-storage:jar:5.4.0:compile [INFO] | | | \- com.microsoft.azure:azure-keyvault-core:jar:0.8.0:compile [INFO] | | \- org.eclipse.jetty:jetty-util-ajax:jar:9.3.19.v20170502:compile [INFO] | \- org.apache.hadoop:hadoop-azure-datalake:jar:3.0.2-SNAPSHOT:compile [INFO] | \- com.microsoft.azure:azure-data-lake-store-sdk:jar:2.2.5:compile

the jackson-dataformat-cbor is the funny one; This is the sole declaration within spark. With the shaded aws JAR then it's not needed at all.
The rest all make their way to the spark assembly through other routes.

What do you think? Leave them as the default and not worry about it? It would remove the duplication in the 2.7 profile, and apart from the extraneousness on hadoop-3 builds, harmless.

I think that's ok as an initial step. It would be better if you could, in profiles, customize individual dependencies (e.g. in the hadoop-3 profile exclude some transitive deps), but I'm not sure whether maven would complain about something like that.

jackson-dataformat-cbor can become interesting if Spark decides to upgrade jackson, since the github for that project says it's been removed in 2.8.

* hadoop branch-2 dependencies always declared * minor nits in POM addressed * added log4j.properties for tests Change-Id: Ibb64b20a0be8624d1709e592b9fe85bdc4dd1af7

SparkQA · 2018-03-29T15:09:09Z

Test build #88713 has finished for PR 20923 at commit 016d690.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds no public classes.

jerryshao · 2018-03-30T08:14:41Z

pom.xml

+    <profile>
+      <id>hadoop-3</id>
+      <properties>
+        <hadoop.version>3.1.0-SNAPSHOT</hadoop.version>


Hey @steveloughran what is the possible release date for Hadoop 3.1.0?

RC0 is up for testing right now! @leftnoteasy is managing the release

jerryshao · 2018-03-30T08:16:06Z

I think we could separate cloud related stuffs to another PR, and fix only build related stuff in this PR.

jerryshao · 2018-03-30T08:17:49Z

Also I think we need to create a related spark-deps-hadoop-3.x under dev/deps and make dependency check work for Hadoop 3.

steveloughran · 2018-04-03T13:30:26Z

I think we could separate cloud related stuffs to another PR, and fix only build related stuff in this PR

OK

… the build with all the POM changes other than those adding the optional hadoop-3.02+ source tree to the spark-hadoop-cloud build Change-Id: Iccc2b66602db05db132ce5cf5c8546fe9a13a3fa

Change-Id: Ic13caf5fcf96d617085051579ede8380b2106119

steveloughran · 2018-04-03T14:24:11Z

@jerryshao the latest revision only has the POM changes, and that also excludes the build profile option to compile the hadoop-3 source trees

It also switches the hadoop 3.1 version to being 3.1.0, which matches that of the current RC, and will be downloaded from the ASF staging repo if the snapshots-and-staging profile is enabled

For example

mvn install -Phadoop-3,hadoop-cloud,yarn,snapshots-and-staging

Remember that if you do download the staged artifacts, then if new RCs are cut then you will need to purge the RCs from ~/.m2/repository

SparkQA · 2018-04-03T19:20:54Z

Test build #88845 has finished for PR 20923 at commit 58c04e9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

steveloughran · 2018-04-03T20:14:54Z

sbt isn't going to test this profile, obviously. Ran both the mvn and sbt package targets qith profiles hadoop-3,hadoop-cloud,yarn,Psnapshots-and-staging

jerryshao · 2018-04-04T00:48:47Z

Hi @steveloughran , I think you missed this comment. You need to create a deps file under dev/deps and change the related script.

Also I think we need to create a related spark-deps-hadoop-3.x under dev/deps and make dependency check work for Hadoop 3.

steveloughran · 2018-04-04T11:13:55Z

I saw that, but given there isn't much in the way of a 2.8 profile though it was more of a wish list than a requirement. How do I go about creating it?

This includes the profile in test-dependencies.sh, so this part of the build will work: hive doesn't need to be working to build that dependency graph. Change-Id: I1ecfd4b1a8bea26600765b1de59f2425c42f6b03

SparkQA · 2018-04-05T14:14:48Z

Test build #88941 has finished for PR 20923 at commit 4184526.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

steveloughran · 2018-04-05T17:50:31Z

Failure is the test dependencies failing as the checker is trying to pull in hadoop-3.1.0 & its still in ASF staging

Performing Maven install for hadoop-3
Using `mvn` from path: /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/mvn
[ERROR] Failed to execute goal on project spark-launcher_2.11: Could not resolve dependencies for project org.apache.spark:spark-launcher_2.11:jar:spark-440623: Could not find artifact org.apache.hadoop:hadoop-client:jar:3.1.0 in central (https://repo.maven.apache.org/maven2) ->

I'll turn off that check now, leaving the dependency list as is

….1 is still in staging Change-Id: Id2d5655088b2a8c2bdec43f7d17110a513be3f7c

SparkQA · 2018-04-05T21:15:05Z

Test build #88953 has finished for PR 20923 at commit 43e3f31.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-04-05T21:19:20Z

Test build #88955 has finished for PR 20923 at commit 7c93d98.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

steveloughran · 2018-04-06T09:30:14Z

Test failures are all in org.apache.spark.sql.sources.BucketedWriteWithoutHiveSupportSuite. I don't see how these pom changes could have affected that

jerryshao · 2018-04-09T01:14:38Z

Sorry @steveloughran for the late response. most of the deps in file is similar to "spark-deps-hadoop-2.7", so copy/rename it and run "test-dependencies.sh" will show you the diffs, based on the diffs to update the deps file will get a new hadoop3 deps file.

jerryshao · 2018-04-09T01:16:16Z

I think you should also update "test-dependencies.sh" to make the new deps file work.

jerryshao · 2018-04-09T01:16:35Z

Jenkins, retest this please.

SparkQA · 2018-04-09T05:29:42Z

Test build #89038 has finished for PR 20923 at commit 7c93d98.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

steveloughran · 2018-04-09T11:45:44Z

. I think you should also update "test-dependencies.sh" to make the new deps file work.

I did, but then things failed because the artifacts were only visible if you did a run with the -Psnapshots-and-staging profile, which jenkins doesn't do here -it's why test run 88953 failed.

The 3.1.0 artifacts are now up on the mvnrepo, so I can turn that on.

Because its out, I think it'd also be good to call the profile hadoop-3.1, to make clear that's the target release, and to avoid confusion if anyone ever creates a 3.2, 3.3, ... profile in future.

…hadoop 3.1 is still in staging" This reverts commit 7c93d98.

…t-dependencies.sh knows about it Change-Id: Ie4906e2f41e9992e803674dce283f03b4dbab67e

SparkQA · 2018-04-09T16:40:07Z

Test build #89058 has finished for PR 20923 at commit 52a8c28.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

steveloughran · 2018-04-10T10:50:06Z

Test failures are org.apache.spark.sql.sources.BucketedWriteWithoutHiveSupportSuite.; SPARK-23894

jerryshao · 2018-04-11T12:38:47Z

Ping @vanzin @gatorsmile , would like to hear your comments. Thanks!

steveloughran · 2018-04-11T13:01:27Z

I should add that the spark-shell doesn't bring up the Azure client, though it's happy with the rest, because of jetty-utils not making into dist/jars...I fear this is shading related.

Here's the JAR I'm using to do diagnostics of CP setup there; it tries to bootstrap access from searching for & loading the JARs into actually reading and writing the data. Significantly more informative when things don't work.

https://github.com/steveloughran/cloudstore/releases/tag/tag_2018_04_11_release

jetty-util and jetty-util-ajax are forced into the dist/jars directory by explicit identification in the relevant POMs as in the hadoop-dist-scope. Without this they weren't coming in as spark-assembly was seeing jetty-util marked as provided. It's not needed for the spark-* JARs, which all use the shaded reference, but it is needed indirectly via hadoop-azure. This change to the poms reinstates it. Maven has proven surprisingly "fussy" here; the implication being its "closest declaration wins" resolution policy doesn't just control versions, it has influence over scoping. Change-Id: I081023cae84236c925fad4e94168f1dac5a8026a

steveloughran · 2018-04-12T19:30:24Z

The jetty problem has been dealt with; because of the shading declaration of jetty-util as provided (it isn't needed in spark any more), it wasn't getting into dist/jars even for those dependencies which did needed it.

Fix: explicitly declaring it in the hadoop-cloud module and the hadoop cloud profile in the assembly module.

The jetty-ajax dependency, which is the direct dependency of it from hadoop-azure, is also declared so that the spark build can keep in control of jetty versions everywhere. That one was getting through fine, because it wasn't being tagged as provided.

SparkQA · 2018-04-12T23:47:22Z

Test build #89295 has finished for PR 20923 at commit f6b9dc8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

steveloughran · 2018-04-13T09:57:23Z

@jerryshao comments? I know without the patched hive or mutant hadoop build Spark doesn't work with Hadoop 3, but this sets everything up to build consistently, which is a prerequisite to fixing up the semi-official spark hive JAR.

vanzin

Just a few minor things.

vanzin · 2018-04-13T22:34:06Z

assembly/pom.xml

+        Redeclare this dependency to force it into the distribution.
+        -->
+        <dependency>
+          <groupId>org.eclipse.jetty</groupId>


This kinda sucks. Doesn't this also end up pulling up a bunch of other jetty stuff into the packaging?

I guess there's no way around it until Hadoop itself shades jetty in some way...

bq. Doesn't this also end up pulling up a bunch of other jetty stuff into the packaging?

It doesn't pull in anything else. There's already one of the jetty- JARs in the dist/jars directory BTW.

I guess there's no way around it until Hadoop itself shades jetty in some way...

Or when @aajisaka & colleagues implement the Java 9 support and everyone runs to it. This is one of those examples why, from a packaging and deployment perspective, Java 9 is the good one

Created HADOOP-15387 for the shading task, put my name to it as Bikas has already been expressing a desire for it

vanzin · 2018-04-13T22:35:30Z

hadoop-cloud/pom.xml

  <dependencies>
+    <!--used during compilation but not exported as transitive dependencies-->


Is this still needed after you removed the committer code?

vanzin · 2018-04-13T22:36:03Z

hadoop-cloud/pom.xml

@@ -38,7 +38,32 @@
    <sbt.project.name>hadoop-cloud</sbt.project.name>
  </properties>

+  <build>


Is this still needed after you removed the committer code?

it's in an adjacent PR, I've just pulled in all the POM dependency changes to keep everything related to the dependency digraph in this one so it can be audited in one go.

steveloughran · 2018-04-23T13:18:38Z

@vanzin : The followup to this is #21066; I could move the compile time changes there but if you are going to have POMs playing with dependencies, seems best to have it all in one place...the other one just setting up the compile and tests

@jerryshao what do you suggest? It was your proposal to split things into pom and source for ease of reviewal, after all?

dongjoon-hyun · 2018-04-23T17:09:11Z

pom.xml

@@ -2671,6 +2671,15 @@
      </properties>
    </profile>

+    <profile>
+      <id>hadoop-3.1</id>


+1 for skipping Hadoop 3.0 and starts to support Hadoop 3.1+ only.

vanzin · 2018-04-23T23:43:00Z

retest this please

vanzin · 2018-04-23T23:43:54Z

I asked for the tests to run once more just to be sure.

I'd have preferred to keep the code-related build changes in the other PR where the code is actually added. But since the PR is already up, that would just slow things down, so this is fine as long as tests pass.

jerryshao · 2018-04-24T01:30:37Z

I would guess the test here doesn't actually run on Hadoop 3 profile. So we actually doesn't test anything.

Also we still cannot use Hadoop3 even if we merge this because of Hive issue. Unless we use some tricks mentioned above. So I'm not sure if we should address Hive issue first.

dongjoon-hyun · 2018-04-24T02:02:20Z

+1 for @jerryshao 's comment. Some of Hive UTs will fail with Hadoop 3 profile.

SparkQA · 2018-04-24T03:49:15Z

Test build #89748 has finished for PR 20923 at commit f6b9dc8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

steveloughran · 2018-04-24T10:20:27Z

I can and do build Hadoop with this local version enabled, so it's easy enough to set things up locally, Indeed the ability to change Hadoop version, HADOOP-13852 came about precisely because I was the first person to try and do that hadoop-3+spark test, with a precursor profile for this

Get this in and things are set up for the hive work, as everything else is ready for it. We've decoupled the work, and for those people who do have a compatible hadoop/hive setup, then this provides a standard profile for them to use, instead of having to write their own, determine zookeeper and curator versions, etc, etc.

steveloughran · 2018-04-24T13:40:28Z

I've also added a comment to SPARK-18673 offering to fix the org.spark-project.hive JAR, but only once this patch is in. This bit is ready, that may time, and it will need this setup so that the actual tests can be run.

vanzin · 2018-04-24T16:56:47Z

Merging to master.

steveloughran · 2018-04-24T17:05:48Z

thank you! I guess that means I'm down for the hive JAR, doesn't it :)

Better make a list of patches which should go in, I think internally we have 1+ kerberos related (pwendell/hive#2) as well as the Hadoop version case statement.

gatorsmile · 2020-04-08T18:17:29Z

dev/deps/spark-deps-hadoop-3.1

+jersey-server-2.22.2.jar
+jets3t-0.9.4.jar
+jetty-webapp-9.3.20.v20170531.jar
+jetty-xml-9.3.20.v20170531.jar


Hadoop 3.x profile does not shade Jetty any more.

This is different from Hadoop 2.x profile. See #4285.

cc @wangyum @yhuai

https://github.com/apache/spark/pull/20923/files#r181524069

Sorry, but what do you mean? Apache Spark 2.4.5 Hadoop 2.7 binary has jetty jars while Apache Spark 3.0.0 Hadoop 3.2 binary does not.

$ tar tvf spark-2.4.5-bin-hadoop2.7.tgz | grep jetty -rw-r--r-- spark-rm/spark-rm 177131 2020-01-13 02:30 spark-2.4.5-bin-hadoop2.7/jars/jetty-util-6.1.26.jar -rw-r--r-- spark-rm/spark-rm 539912 2020-01-13 02:30 spark-2.4.5-bin-hadoop2.7/jars/jetty-6.1.26.jar $ tar tvf spark-3.0.0-bin-hadoop3.2.tgz | grep jetty $

Please see [SPARK-30051][BUILD] Clean up hadoop-3.2 dependency

f3abee3

The other required jetty jars are still shaded correctly. Please let me know if there is something missed.

$ jar tvf spark-core_2.12-3.0.0.jar | grep jetty | wc -l 1308

Great! Thank you for answering the question.

SPARK-23807 Add Hadoop 3 profile with relevant POM fix ups, cloud-sto…

29e7324

…rage artifacts and binding Change-Id: Ia4526f184ced9eef5b67aee9e91eced0dd38d723

vanzin reviewed Mar 28, 2018

View reviewed changes

SPARK-23807 review set 1:

016d690

* hadoop branch-2 dependencies always declared * minor nits in POM addressed * added log4j.properties for tests Change-Id: Ibb64b20a0be8624d1709e592b9fe85bdc4dd1af7

jerryshao reviewed Mar 30, 2018

View reviewed changes

steveloughran added 2 commits April 3, 2018 14:36

SPARK-23807 move new hadoop-cloud source out to new PR; this contains…

9423657

… the build with all the POM changes other than those adding the optional hadoop-3.02+ source tree to the spark-hadoop-cloud build Change-Id: Iccc2b66602db05db132ce5cf5c8546fe9a13a3fa

HADOOP-13207 and switch to the RC hadoop 3.1

58c04e9

Change-Id: Ic13caf5fcf96d617085051579ede8380b2106119

steveloughran changed the title ~~[SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with relevant POM fix ups, cloud-storage artifacts and binding~~ [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with relevant POM fix ups Apr 3, 2018

SPARK-23807 add the dependencies for the hadoop 3 profile.

4184526

This includes the profile in test-dependencies.sh, so this part of the build will work: hive doesn't need to be working to build that dependency graph. Change-Id: I1ecfd4b1a8bea26600765b1de59f2425c42f6b03

remove hadoop-3 as a profile to do a dependency check on, as hadoop 3…

7c93d98

….1 is still in staging Change-Id: Id2d5655088b2a8c2bdec43f7d17110a513be3f7c

steveloughran force-pushed the cloud/SPARK-23807-hadoop-31 branch from 43e3f31 to 7c93d98 Compare April 5, 2018 17:58

steveloughran added 2 commits April 9, 2018 13:31

Revert "remove hadoop-3 as a profile to do a dependency check on, as …

036d92a

…hadoop 3.1 is still in staging" This reverts commit 7c93d98.

SPARK-23807 Hadoop 3.1.0 is shipping: profile => "hadoop-3.1" and tes…

52a8c28

…t-dependencies.sh knows about it Change-Id: Ie4906e2f41e9992e803674dce283f03b4dbab67e

steveloughran changed the title ~~[SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with relevant POM fix ups~~ [SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevant POM fix ups Apr 12, 2018

vanzin reviewed Apr 13, 2018

View reviewed changes

dongjoon-hyun reviewed Apr 23, 2018

View reviewed changes

asfgit closed this in ce7ba2e Apr 24, 2018

gatorsmile reviewed Apr 8, 2020

View reviewed changes

steveloughran deleted the cloud/SPARK-23807-hadoop-31 branch March 9, 2023 15:08

		<dependencies>
		<!--used during compilation but not exported as transitive dependencies-->

[SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevant POM fix ups #20923

[SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevant POM fix ups #20923

Conversation

steveloughran commented Mar 28, 2018 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Mar 28, 2018

vanzin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vanzin Mar 29, 2018 • edited Loading

Choose a reason for hiding this comment

SparkQA commented Mar 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryshao commented Mar 30, 2018

jerryshao commented Mar 30, 2018

steveloughran commented Apr 3, 2018 • edited Loading

steveloughran commented Apr 3, 2018

SparkQA commented Apr 3, 2018

steveloughran commented Apr 3, 2018

jerryshao commented Apr 4, 2018

steveloughran commented Apr 4, 2018

SparkQA commented Apr 5, 2018

steveloughran commented Apr 5, 2018

SparkQA commented Apr 5, 2018

SparkQA commented Apr 5, 2018

steveloughran commented Apr 6, 2018

jerryshao commented Apr 9, 2018

jerryshao commented Apr 9, 2018

jerryshao commented Apr 9, 2018

SparkQA commented Apr 9, 2018

steveloughran commented Apr 9, 2018 • edited Loading

SparkQA commented Apr 9, 2018

steveloughran commented Apr 10, 2018

jerryshao commented Apr 11, 2018

steveloughran commented Apr 11, 2018

steveloughran commented Apr 12, 2018

SparkQA commented Apr 12, 2018

steveloughran commented Apr 13, 2018

vanzin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveloughran Apr 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveloughran commented Apr 23, 2018

Choose a reason for hiding this comment

vanzin commented Apr 23, 2018

vanzin commented Apr 23, 2018

jerryshao commented Apr 24, 2018

dongjoon-hyun commented Apr 24, 2018

SparkQA commented Apr 24, 2018

steveloughran commented Apr 24, 2018

steveloughran commented Apr 24, 2018 • edited Loading

vanzin commented Apr 24, 2018

steveloughran commented Apr 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveloughran commented Mar 28, 2018 •

edited

Loading

vanzin Mar 29, 2018 •

edited

Loading

steveloughran commented Apr 3, 2018 •

edited

Loading

steveloughran commented Apr 9, 2018 •

edited

Loading

steveloughran Apr 16, 2018 •

edited

Loading

steveloughran commented Apr 24, 2018 •

edited

Loading