-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24372][build] Add scripts to help with preparing releases. #21515
Conversation
The "do-release.sh" script asks questions about the RC being prepared, trying to find out as much as possible automatically, and then executes the existing scripts with proper arguments to prepare the release. This script was used to prepare the 2.3.1 release candidates, so was tested in that context. The docker version runs that same script inside a docker image especially crafted for building Spark releases. That image is based on the work by Felix C. linked in the bug. At this point is has been only midly tested. I also added a template for the vote e-mail, with placeholders for things that need to be replaced, although there is no automation around that for the moment. It shouldn't be hard to hook up certain things like version and tags to this, or to figure out certain things like the repo URL from the output of the release scripts.
Test build #91590 has finished for PR 21515 at commit
|
Test build #91673 has finished for PR 21515 at commit
|
dev/create-release/vote.tmpl
Outdated
@@ -0,0 +1,64 @@ | |||
Please vote on releasing the following candidate as Apache Spark version {version}. | |||
|
|||
The vote is open until {deadline} and passes if a majority of at least 3 +1 PMC votes are cast. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: personally I find 3 PMC +1 votes
more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rules are a little bit more complicated than that. Updated.
@@ -106,3 +106,4 @@ spark-warehouse | |||
structured-streaming/* | |||
kafka-source-initial-offset-version-2.1.0.bin | |||
kafka-source-initial-offset-future-version.bin | |||
vote.tmpl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if rat doesn't check, isn't vote.tmpl packaged into the source release this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example, .git
is removed from release here https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh#L157
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying this file should not be packaged in the source release? Not sure I see why that would be the case. There's a lot of stuff in .rat-excludes
that is still packaged.
WORKDIR /opt/spark-rm/output | ||
|
||
ARG UID | ||
RUN useradd -m -s /bin/bash -p spark-rm -u $UID spark-rm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does that mean the do-release script is run as the user "spark-rm"?
I thought it's generally best practice to gpg sign as yourself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gpg signature is completely unrelated to the user running the process.
Also, Docker being the weird thing it is, the user name in the container doesn't really matter much. It's running with the same UID as the host user who built the image, which in this case is expected the be the person preparing the release, and this will only really matter for files that are written back to the host's file system.
I guess we will need to update the release process website once these go in? Its not clear to me how the docker stuff applies here, are we recommending to create the release inside docker or leaving it as optional? did you use it for 2.3.1 release? |
We could update the docs once these get in. I did not use docker for 2.3.1 but I plan to give it a shot for 2.1.3. The goal of the docker path is to always build Spark in a known environment, so things like R packages and everything else are the same. It could also help folks who use other OSes (although Sameer used MacOS for 2.3.0 and it seemed to work fine, aside from some warnings from gnu tar when decompressing the tarballs on Linux). |
Test build #91734 has finished for PR 21515 at commit
|
FYI I'm playing with building 2.1.3 using the docker-based approach, and there are a few things that I need to adjust before that works. I hope to update this PR later today. |
- Allow mounting a local JDK (2.1 wants 1.7). - Tweak profiles and builds to 2.1 - Restore support for Scala 2.10 - set up JAVA_HOME in the docker container I also expanded "dry run" mode a bit to build one of the binary packages, and also the documentation. The build documentation is now also kept in the output directory along with the other artifacts.
Test build #91869 has finished for PR 21515 at commit
|
SELF=$(cd $(dirname $0) && pwd) | ||
. "$SELF/release-util.sh" | ||
|
||
while getopts "bn" opt; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice to have a high level description in the script just saying this does a release which does things like tag, build, etc and pushes things to the asf spark repo.
dev/create-release/release-util.sh
Outdated
# Check if the RC already exists, and if re-creating the RC, skip tag creation. | ||
RELEASE_TAG="v${RELEASE_VERSION}-rc${RC_COUNT}" | ||
SKIP_TAG=0 | ||
if check_for_tag "RELEASE_TAG"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be $RELEASE_TAG
not sure if this was something in my env but somehow it pushed next version as 2.2-3-snapshot. |
Hmm must be something I introduced in the last changes (2.3.2 seems correct). Sorry about that.
I think that might happen if you don't use the |
I was able to run without |
Test build #92036 has finished for PR 21515 at commit
|
Seems like Maven Central updated something over the weekend, and Java 7 was having trouble connecting; needed some extra SSL-related options. I also made the build sequential, which makes it slower, but avoids some issues with parallel maven processes that I was hitting when building 2.1.
Test build #92044 has finished for PR 21515 at commit
|
Test build #92045 has finished for PR 21515 at commit
|
- Override SBT_OPTS for Java 7 - Allow executins individual steps in docker - Fix docker image so that changing what to install also forces an "apt update" to happen.
- Some bugs I introduced in last commit. - Fix release-tag.sh with Java 1.7. - Expand dry run to tag and publish steps.
Test build #92051 has finished for PR 21515 at commit
|
Test build #92058 has finished for PR 21515 at commit
|
Test build #92061 has finished for PR 21515 at commit
|
dev/create-release/release-util.sh
Outdated
# Gather some user information. | ||
export ASF_USERNAME=$(read_config "ASF user" "$LOGNAME") | ||
|
||
GIT_NAME=$(git config user.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
script silently fails is user.name not set
Test build #92097 has finished for PR 21515 at commit
|
just a note I successfully used these scripts with docker with dry-run option with spark 2.2.2 rc1. |
at this point should we just merge these and make add anything else that comes up in other prs ? |
SGTM. |
ok, +1, will merge shortly |
merged, did you have a jira to update the documents to match? |
Also thanks again for doing these scripts, the docker was much easier then me trying to get all dependencies on rhel! |
I did not have a task filed for the doc updates. |
I hit an issue when using
Any ideas about how to resolve it? |
UID already exists? |
$GPG --export-secret-key --armor "$GPG_KEY" > "$GPG_KEY_FILE" | ||
|
||
run_silent "Building spark-rm image with tag $IMGTAG..." "docker-build.log" \ | ||
docker build -t "spark-rm:$IMGTAG" --build-arg UID=$UID "$SELF/spark-rm" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we need to do export UID=xxx
before running this script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. This is a system variable. So we can't run this script with root user...
Problem solved. This script assumes the current user can run docker command without Following the steps in https://docs.docker.com/install/linux/linux-postinstall/#manage-docker-as-a-non-root-user solves it. |
The "do-release.sh" script asks questions about the RC being prepared,
trying to find out as much as possible automatically, and then executes
the existing scripts with proper arguments to prepare the release. This
script was used to prepare the 2.3.1 release candidates, so was tested
in that context.
The docker version runs that same script inside a docker image especially
crafted for building Spark releases. That image is based on the work
by Felix C. linked in the bug. At this point is has been only midly
tested.
I also added a template for the vote e-mail, with placeholders for
things that need to be replaced, although there is no automation around
that for the moment. It shouldn't be hard to hook up certain things like
version and tags to this, or to figure out certain things like the
repo URL from the output of the release scripts.