Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24372][build] Add scripts to help with preparing releases. #21515

Closed
wants to merge 11 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Jun 8, 2018

The "do-release.sh" script asks questions about the RC being prepared,
trying to find out as much as possible automatically, and then executes
the existing scripts with proper arguments to prepare the release. This
script was used to prepare the 2.3.1 release candidates, so was tested
in that context.

The docker version runs that same script inside a docker image especially
crafted for building Spark releases. That image is based on the work
by Felix C. linked in the bug. At this point is has been only midly
tested.

I also added a template for the vote e-mail, with placeholders for
things that need to be replaced, although there is no automation around
that for the moment. It shouldn't be hard to hook up certain things like
version and tags to this, or to figure out certain things like the
repo URL from the output of the release scripts.

The "do-release.sh" script asks questions about the RC being prepared,
trying to find out as much as possible automatically, and then executes
the existing scripts with proper arguments to prepare the release. This
script was used to prepare the 2.3.1 release candidates, so was tested
in that context.

The docker version runs that same script inside a docker image especially
crafted for building Spark releases. That image is based on the work
by Felix C. linked in the bug. At this point is has been only midly
tested.

I also added a template for the vote e-mail, with placeholders for
things that need to be replaced, although there is no automation around
that for the moment. It shouldn't be hard to hook up certain things like
version and tags to this, or to figure out certain things like the
repo URL from the output of the release scripts.
@SparkQA
Copy link

SparkQA commented Jun 8, 2018

Test build #91590 has finished for PR 21515 at commit 44454b6.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 11, 2018

Test build #91673 has finished for PR 21515 at commit 1366f23.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -0,0 +1,64 @@
Please vote on releasing the following candidate as Apache Spark version {version}.

The vote is open until {deadline} and passes if a majority of at least 3 +1 PMC votes are cast.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: personally I find 3 PMC +1 votes more clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rules are a little bit more complicated than that. Updated.

@@ -106,3 +106,4 @@ spark-warehouse
structured-streaming/*
kafka-source-initial-offset-version-2.1.0.bin
kafka-source-initial-offset-future-version.bin
vote.tmpl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if rat doesn't check, isn't vote.tmpl packaged into the source release this way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying this file should not be packaged in the source release? Not sure I see why that would be the case. There's a lot of stuff in .rat-excludes that is still packaged.

WORKDIR /opt/spark-rm/output

ARG UID
RUN useradd -m -s /bin/bash -p spark-rm -u $UID spark-rm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that mean the do-release script is run as the user "spark-rm"?
I thought it's generally best practice to gpg sign as yourself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gpg signature is completely unrelated to the user running the process.

Also, Docker being the weird thing it is, the user name in the container doesn't really matter much. It's running with the same UID as the host user who built the image, which in this case is expected the be the person preparing the release, and this will only really matter for files that are written back to the host's file system.

@tgravescs
Copy link
Contributor

I guess we will need to update the release process website once these go in?

Its not clear to me how the docker stuff applies here, are we recommending to create the release inside docker or leaving it as optional? did you use it for 2.3.1 release?

@vanzin
Copy link
Contributor Author

vanzin commented Jun 13, 2018

We could update the docs once these get in.

I did not use docker for 2.3.1 but I plan to give it a shot for 2.1.3. The goal of the docker path is to always build Spark in a known environment, so things like R packages and everything else are the same. It could also help folks who use other OSes (although Sameer used MacOS for 2.3.0 and it seemed to work fine, aside from some warnings from gnu tar when decompressing the tarballs on Linux).

@SparkQA
Copy link

SparkQA commented Jun 13, 2018

Test build #91734 has finished for PR 21515 at commit 04f6371.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Jun 14, 2018

FYI I'm playing with building 2.1.3 using the docker-based approach, and there are a few things that I need to adjust before that works. I hope to update this PR later today.

- Allow mounting a local JDK (2.1 wants 1.7).
- Tweak profiles and builds to 2.1
- Restore support for Scala 2.10
- set up JAVA_HOME in the docker container

I also expanded "dry run" mode a bit to build one of the binary
packages, and also the documentation. The build documentation is
now also kept in the output directory along with the other
artifacts.
@SparkQA
Copy link

SparkQA commented Jun 15, 2018

Test build #91869 has finished for PR 21515 at commit 48eda71.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SELF=$(cd $(dirname $0) && pwd)
. "$SELF/release-util.sh"

while getopts "bn" opt; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to have a high level description in the script just saying this does a release which does things like tag, build, etc and pushes things to the asf spark repo.

# Check if the RC already exists, and if re-creating the RC, skip tag creation.
RELEASE_TAG="v${RELEASE_VERSION}-rc${RC_COUNT}"
SKIP_TAG=0
if check_for_tag "RELEASE_TAG"; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be $RELEASE_TAG

@tgravescs
Copy link
Contributor

not sure if this was something in my env but somehow it pushed next version as 2.2-3-snapshot.
https://github.com/apache/spark/commits/branch-2.2
I manually fixed it, and am building now but skipped the push of tag since it was already there. The build had failed due to JAVA_HOME not set.
If I reproduce I'll debug.

@vanzin
Copy link
Contributor Author

vanzin commented Jun 18, 2018

pushed next version as 2.2-3-snapshot

Hmm must be something I introduced in the last changes (2.3.2 seems correct). Sorry about that.

failed due to JAVA_HOME not set

I think that might happen if you don't use the -j option I added, but let me double check.

@vanzin
Copy link
Contributor Author

vanzin commented Jun 18, 2018

I was able to run without -j, no problems with JAVA_HOME...

@SparkQA
Copy link

SparkQA commented Jun 18, 2018

Test build #92036 has finished for PR 21515 at commit 7fe133a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Seems like Maven Central updated something over the weekend, and Java 7
was having trouble connecting; needed some extra SSL-related options.

I also made the build sequential, which makes it slower, but avoids some
issues with parallel maven processes that I was hitting when building
2.1.
@SparkQA
Copy link

SparkQA commented Jun 19, 2018

Test build #92044 has finished for PR 21515 at commit faab8ae.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2018

Test build #92045 has finished for PR 21515 at commit 82ae00e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Marcelo Vanzin added 2 commits June 18, 2018 18:44
- Override SBT_OPTS for Java 7
- Allow executins individual steps in docker
- Fix docker image so that changing what to install also forces
  an "apt update" to happen.
- Some bugs I introduced in last commit.
- Fix release-tag.sh with Java 1.7.
- Expand dry run to tag and publish steps.
@SparkQA
Copy link

SparkQA commented Jun 19, 2018

Test build #92051 has finished for PR 21515 at commit 5771354.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2018

Test build #92058 has finished for PR 21515 at commit a39933e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2018

Test build #92061 has finished for PR 21515 at commit b6d4c70.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

# Gather some user information.
export ASF_USERNAME=$(read_config "ASF user" "$LOGNAME")

GIT_NAME=$(git config user.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

script silently fails is user.name not set

@SparkQA
Copy link

SparkQA commented Jun 19, 2018

Test build #92097 has finished for PR 21515 at commit 745342d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

just a note I successfully used these scripts with docker with dry-run option with spark 2.2.2 rc1.

@tgravescs
Copy link
Contributor

at this point should we just merge these and make add anything else that comes up in other prs ?

@vanzin
Copy link
Contributor Author

vanzin commented Jun 22, 2018

SGTM.

@tgravescs
Copy link
Contributor

ok, +1, will merge shortly

@tgravescs
Copy link
Contributor

merged, did you have a jira to update the documents to match?

@tgravescs
Copy link
Contributor

Also thanks again for doing these scripts, the docker was much easier then me trying to get all dependencies on rhel!

@asfgit asfgit closed this in 4e7d867 Jun 22, 2018
@vanzin
Copy link
Contributor Author

vanzin commented Jun 22, 2018

I did not have a task filed for the doc updates.

@vanzin vanzin deleted the SPARK-24372 branch June 22, 2018 21:29
@cloud-fan
Copy link
Contributor

cloud-fan commented Sep 16, 2018

I hit an issue when using do-release-docker.sh to prepare 2.4.0 rc1

useradd: UID 0 is not unique
The command '/bin/sh -c useradd -m -s /bin/bash -p spark-rm -u $UID spark-rm' returned a non-zero code: 4

Any ideas about how to resolve it?

@felixcheung
Copy link
Member

UID already exists?

$GPG --export-secret-key --armor "$GPG_KEY" > "$GPG_KEY_FILE"

run_silent "Building spark-rm image with tag $IMGTAG..." "docker-build.log" \
docker build -t "spark-rm:$IMGTAG" --build-arg UID=$UID "$SELF/spark-rm"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need to do export UID=xxx before running this script?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. This is a system variable. So we can't run this script with root user...

@cloud-fan
Copy link
Contributor

Problem solved. This script assumes the current user can run docker command without sudo, while it may not be true. But we can't run this script with root user as we can't add a root user in the docker container.

Following the steps in https://docs.docker.com/install/linux/linux-postinstall/#manage-docker-as-a-non-root-user solves it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants