Skip to content

Commit

Permalink
Create base-image and minimize layer count (apache-spark-on-k8s#324)
Browse files Browse the repository at this point in the history
* Create base-image and minimize layer count

* Create running-on-kubernetes.md
  • Loading branch information
johscheuer authored and Puneet Loya committed Mar 8, 2019
1 parent 2221232 commit ab1f68e
Show file tree
Hide file tree
Showing 8 changed files with 50 additions and 76 deletions.
6 changes: 5 additions & 1 deletion docs/running-on-kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,20 +49,24 @@ If you wish to use pre-built docker images, you may use the images published in
</table>

You may also build these docker images from sources, or customize them as required. Spark distributions include the
Docker files for the driver, executor, and init-container at `dockerfiles/driver/Dockerfile`,
Docker files for the base-image, driver, executor, and init-container at `dockerfiles/spark-base/Dockerfile`, `dockerfiles/driver/Dockerfile`,
`dockerfiles/executor/Dockerfile`, and `dockerfiles/init-container/Dockerfile` respectively. Use these Docker files to
build the Docker images, and then tag them with the registry that the images should be sent to. Finally, push the images
to the registry.

For example, if the registry host is `registry-host` and the registry is listening on port 5000:

cd $SPARK_HOME
docker build -t registry-host:5000/spark-base:latest -f dockerfiles/driver/spark-base .
docker build -t registry-host:5000/spark-driver:latest -f dockerfiles/driver/Dockerfile .
docker build -t registry-host:5000/spark-executor:latest -f dockerfiles/executor/Dockerfile .
docker build -t registry-host:5000/spark-init:latest -f dockerfiles/init-container/Dockerfile .
docker push registry-host:5000/spark-base:latest
docker push registry-host:5000/spark-driver:latest
docker push registry-host:5000/spark-executor:latest
docker push registry-host:5000/spark-init:latest

Note that `spark-base` is the base image for the other images. It must be built first before the other images, and then afterwards the other images can be built in any order.

## Submitting Applications to Kubernetes

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,26 +15,13 @@
# limitations under the License.
#

FROM openjdk:8-alpine
FROM spark-base

# If this docker file is being used in the context of building your images from a Spark distribution, the docker build
# command should be invoked from the top level directory of the Spark distribution. E.g.:
# docker build -t spark-driver:latest -f dockerfiles/driver/Dockerfile .

RUN apk upgrade --update
RUN apk add --update bash tini
RUN mkdir -p /opt/spark
RUN touch /opt/spark/RELEASE

ADD jars /opt/spark/jars
ADD examples /opt/spark/examples
ADD bin /opt/spark/bin
ADD sbin /opt/spark/sbin
ADD conf /opt/spark/conf

ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark
COPY examples /opt/spark/examples

CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \
if ! [ -z ${SPARK_MOUNTED_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,26 +15,13 @@
# limitations under the License.
#

FROM openjdk:8-alpine
FROM spark-base

# If this docker file is being used in the context of building your images from a Spark distribution, the docker build
# command should be invoked from the top level directory of the Spark distribution. E.g.:
# docker build -t spark-executor:latest -f dockerfiles/executor/Dockerfile .

RUN apk upgrade --update
RUN apk add --update bash tini
RUN mkdir -p /opt/spark
RUN touch /opt/spark/RELEASE

ADD jars /opt/spark/jars
ADD examples /opt/spark/examples
ADD bin /opt/spark/bin
ADD sbin /opt/spark/sbin
ADD conf /opt/spark/conf

ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark
COPY examples /opt/spark/examples

# TODO support spark.executor.extraClassPath
CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,10 @@
# limitations under the License.
#

FROM openjdk:8-alpine
FROM spark-base

# If this docker file is being used in the context of building your images from a Spark distribution, the docker build
# command should be invoked from the top level directory of the Spark distribution. E.g.:
# docker build -t spark-executor:latest -f dockerfiles/executor/Dockerfile .

RUN apk upgrade --update
RUN apk add --update bash tini
RUN mkdir -p /opt/spark
RUN touch /opt/spark/RELEASE

ADD jars /opt/spark/jars
ADD bin /opt/spark/bin
ADD sbin /opt/spark/sbin
ADD conf /opt/spark/conf

ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark

ENTRYPOINT [ "/sbin/tini", "--", "bin/spark-class", "org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer" ]
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,10 @@
# limitations under the License.
#

FROM openjdk:8-alpine
FROM spark-base

# If this docker file is being used in the context of building your images from a Spark distribution, the docker build
# command should be invoked from the top level directory of the Spark distribution. E.g.:
# docker build -t spark-executor:latest -f dockerfiles/executor/Dockerfile .

RUN apk upgrade --update
RUN apk add --update bash tini
RUN mkdir -p /opt/spark
RUN touch /opt/spark/RELEASE

ADD jars /opt/spark/jars
ADD bin /opt/spark/bin
ADD sbin /opt/spark/sbin
ADD conf /opt/spark/conf

ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark

ENTRYPOINT [ "/sbin/tini", "--", "bin/spark-class", "org.apache.spark.deploy.rest.kubernetes.ResourceStagingServer" ]
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,12 @@
# limitations under the License.
#

FROM openjdk:8-alpine
FROM spark-base

# If this docker file is being used in the context of building your images from a Spark distribution, the docker build
# command should be invoked from the top level directory of the Spark distribution. E.g.:
# docker build -t spark-shuffle:latest -f dockerfiles/shuffle/Dockerfile .

RUN apk upgrade --update
RUN apk add --update bash tini
RUN mkdir -p /opt/spark
RUN touch /opt/spark/RELEASE

ADD jars /opt/spark/jars
ADD examples /opt/spark/examples
ADD bin /opt/spark/bin
ADD sbin /opt/spark/sbin
ADD conf /opt/spark/conf

ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark
COPY examples /opt/spark/examples

ENTRYPOINT [ "/sbin/tini", "--", "bin/spark-class", "org.apache.spark.deploy.kubernetes.KubernetesExternalShuffleService", "1" ]
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

FROM openjdk:8-alpine

# If this docker file is being used in the context of building your images from a Spark distribution, the docker build
# command should be invoked from the top level directory of the Spark distribution. E.g.:
# docker build -t spark-base:latest -f dockerfiles/spark-base/Dockerfile .

RUN apk upgrade --no-cache && \
apk add --no-cache bash tini && \
mkdir -p /opt/spark && \
touch /opt/spark/RELEASE

COPY jars /opt/spark/jars
COPY bin /opt/spark/bin
COPY sbin /opt/spark/sbin
COPY conf /opt/spark/conf

ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ private[spark] class SparkDockerImageBuilder(private val dockerEnv: Map[String,

private val DOCKER_BUILD_PATH = Paths.get("target", "docker")
// Dockerfile paths must be relative to the build path.
private val BASE_DOCKER_FILE = "dockerfiles/spark-base/Dockerfile"
private val DRIVER_DOCKER_FILE = "dockerfiles/driver/Dockerfile"
private val EXECUTOR_DOCKER_FILE = "dockerfiles/executor/Dockerfile"
private val SHUFFLE_SERVICE_DOCKER_FILE = "dockerfiles/shuffle-service/Dockerfile"
Expand Down Expand Up @@ -60,6 +61,7 @@ private[spark] class SparkDockerImageBuilder(private val dockerEnv: Map[String,

def buildSparkDockerImages(): Unit = {
Eventually.eventually(TIMEOUT, INTERVAL) { dockerClient.ping() }
buildImage("spark-base", BASE_DOCKER_FILE)
buildImage("spark-driver", DRIVER_DOCKER_FILE)
buildImage("spark-executor", EXECUTOR_DOCKER_FILE)
buildImage("spark-shuffle", SHUFFLE_SERVICE_DOCKER_FILE)
Expand Down

0 comments on commit ab1f68e

Please sign in to comment.