Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use wheels directly in the build process #1555

Merged
merged 2 commits into from
Nov 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 23 additions & 35 deletions Integrations/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -65,34 +65,9 @@ idea {
}
}

/**
* Produce a docker image with the prereqs to run and use deephaven python
*/
def deephavenPythonDockerfile = tasks.register('deephavenPythonDockerfile', Dockerfile) {
destFile.set layout.buildDirectory.file('deephaven-python-docker/Dockerfile')
from new Dockerfile.From('deephaven/deephaven-jpy-wheel:local-build').withStage('deephaven-jpy-wheel')
from new Dockerfile.From('deephaven/deephaven-wheel:local-build').withStage('deephaven-wheel')
from new Dockerfile.From('deephaven/deephaven2-wheel:local-build').withStage('deephaven2-wheel')
from new Dockerfile.From('deephaven/java-and-python:local-build')

copyFile(new Dockerfile.CopyFile('/usr/src/app/dist', '.').withStage('deephaven-jpy-wheel'))
copyFile(new Dockerfile.CopyFile('/usr/src/app/dist', '.').withStage('deephaven-wheel'))
copyFile(new Dockerfile.CopyFile('/usr/src/app/dist', '.').withStage('deephaven2-wheel'))

// liblzo2-2 below is needed for supporting LZO parquet encoding.
runCommand('''set -eux; \\
apt update; \\
apt -y install liblzo2-2; \\
apt -y install curl; \\
pip3 install setuptools wheel; \\
pip3 install *.whl; \\
rm *.whl
''')
}
Docker.registerDockerImage(project, 'buildDeephavenPython') {
// deephaven/java-and-python:local-build
def javaAndPython = project(':deephaven-jpy').tasks.getByName('buildDockerForRuntime')
def dockerContext = project.layout.buildDirectory.dir('context')

def prepareDocker = project.tasks.register('prepareDocker', Sync) {
// deephaven-jpy.whl
def deephavenJpyWheel = project(':deephaven-jpy').tasks.getByName('buildDockerForWheel')

Expand All @@ -102,16 +77,29 @@ Docker.registerDockerImage(project, 'buildDeephavenPython') {
// deephaven2.whl
def deephaven2Wheel = project(':deephaven2-wheel').tasks.getByName('buildWheel')

// Dockerfile
def dockerfile = deephavenPythonDockerfile.get()
from (deephavenJpyWheel.outputs.files) {
into 'deephaven-jpy-wheel'
}
from (deephavenWheel.outputs.files) {
into 'deephaven-wheel'
}
from (deephaven2Wheel.outputs.files) {
into 'deephaven2-wheel'
}

from 'src/main/docker'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not embed the Dockerfile as a task, rather than an "external" build file to show how to use the wheels mentioned above?

Also, along with the "one big copy" comment, consider making the three froms use one common into?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think of it so much as an "external" build file. When we don't need build-level parameterization of a dockerfile, I think a textual Dockerfile is easier to grok and modify.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps - I find it easier to all have in one place, rather than knowing when to look at py, groovy, dockerfile, .gradle, .sh, etc to find configuration, consolidate the sh into Dockerfile, the Dockerfile into gradle.

Plus if we have a centralized tagging system (so we can produce arm images correctly) we are going to have to modify all these images to have the right tag for in-build. For example FROM dh/java-and-python:local-build-arm64, or else we have to do a docker+gradle clean on CI to get it to start the next platform, etc. (or pick another option which keeps gradle in the dark about what kinds of images it is producing).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow-up PR, I'm adding a "requirements.txt" file - now, we could consolidate the requirements file into a RUN/sh install command instead (no file). But I think as a text file it's easier to maintain, and in perspective lives right next to the Dockerfile.

into dockerContext
}

Docker.registerDockerImage(project, 'buildDeephavenPython') {
// deephaven/java-and-python:local-build
def javaAndPython = project(':deephaven-jpy').tasks.getByName('buildDockerForRuntime')

dependsOn prepareDocker

inputs.files javaAndPython.outputs.files,
deephavenJpyWheel.outputs.files,
deephavenWheel.outputs.files,
deephaven2Wheel.outputs.files,
dockerfile.outputs.files
inputs.files javaAndPython.outputs.files

inputDir.set layout.buildDirectory.dir('deephaven-python-docker')
inputDir.set dockerContext

images.add('deephaven/runtime-base:local-build')
}
Expand Down
24 changes: 24 additions & 0 deletions Integrations/src/main/docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
FROM deephaven/java-and-python:local-build as install-wheels

RUN set -eux; \
apt-get -qq update; \
apt-get -qq -y --no-install-recommends install liblzo2-2 curl; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider versioning these, so the dockerfile command is different to handle updates rather than just take it for granted that the docker build cache is empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Versioning the package installs? ie, liblzo2-2=x.y.z curl=...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, otherwise when apt-get installs something new, we won't detect it and update local cached images. It isn't likely to break things... except when it does, and we won't have visibility, will have to either push a garbage step above (or in) the RUN, or tell downstream users to clear the docker caches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a scale between being very specific to maximize "reproducibility" vs having generically valid code that should be expected to work when upstream packages have changes. I'm not too keen on trying to be shepherds of package manager versions. It's somewhat nice to have "loose" dependencies that automatically update to their latest versions without needing a lot of engineering management around them.

It's true that you'll need to explicitly bust the cache to pick up changes, or you'll implicitly pick up changes when something earlier causes a cache bust; and when things do break it will take some image inspection to list out all of the differences in system packages.

We could potentially think about doing a single apt-get update very high up in the stack and not deleting it - with the expectation we explicitly bust that part of the cache after a new major/minor release - such that any downstream installs should always install the same.

That said, I don't think I've seen these patterns in use elsewhere (neither being very specific with 3rd party packages, nor the persisted apt-get update). Not that I'm not willing to try if we really think it's valuable.

Related: apt-get upgrade and apt-get dist-upgrade have larger implications wrt reproducibility as well.

rm -rf /var/lib/apt/lists/*

COPY deephaven-jpy-wheel/ /deephaven-jpy-wheel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not do just one big copy here, then one big run, install all three wheels in one step? I expect that jpy itself changes less often than the other two wheels, but do you think its enough to warrant tripling the layers?

The concern I have is that since old deephaven-wheel is based in part on our javadoc, it will see a change any time the JSON is correctly updated, so we will have to copy it, install it, copy the dh2-wheel, install it (even if it didnt change) - might as well just do them in one step?

("yes, i meant it this way" is a totally valid answer, just trying to make sure i get it all)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we are "tripling" the layers, but it only effects the build stage, and not the final produced image. In essence, I think we are able to get the "best of both worlds" - we don't have to re-install deephaven-jpy.whl when deephaven.whl or deephaven2.whl changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it also affects the local build cache, 2x the image churn - and deephaven.whl is the "big" one, relative to the other two (1MB, 50kB, 217MB) due to downloading other packages from the internet. On the one hand it should be first so we don't churn it so much, on the other hand, afaict it is the only one that changes when javadoc changes, so it should be last to avoid rebuilding the others.

And the final layer gets a complete fresh copy made when anything else changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to change the order of the COPY/RUN steps, or move into a single COPY/RUN.

But this also brings up another "reproducibility" concern very similar to your concern about apt-get installing specific versions. pip will likely choose the most recent version for all of the dependencies the wheel specifies. Our requirements aren't specific:

$ unzip -q -c docker/runtime-base/build/context/deephaven-wheel/deephaven-0.7.0-py2.py3-none-any.whl | grep Requires

Requires-Dist: deephaven-jpy (==0.7.0)
Requires-Dist: numpy
Requires-Dist: dill (>=0.2.8)
Requires-Dist: wrapt
Requires-Dist: pandas
Requires-Dist: enum34 ; python_version < "3.4"
Requires-Dist: numba ; python_version > "3.0"

If we want to maximize cache-ability, we would likely want to have a freeze list that we install before any of our wheels; and then can install just our wheels w/o any additional download.

Copy link
Member Author

@devinrsmith devinrsmith Nov 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a requirements file into the runtime-base PR - which in addition to being better for reproducibility, aids in cache-ability - but comes at the cost of additional maintenance.

RUN set -eux; \
python3 -m pip install -q --no-cache-dir /deephaven-jpy-wheel/*.whl; \
rm -r /deephaven-jpy-wheel

COPY deephaven-wheel/ /deephaven-wheel
RUN set -eux; \
python3 -m pip install -q --no-cache-dir /deephaven-wheel/*.whl; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(cc @jmao-denver) I just noticed this wondering if you have any insights - if I'm reading this right, the wheel itself is 137kB, but the installed sized is 217MB?

3c7d9ad99d54   8 minutes ago   /bin/sh -c set -eux;     python3 -m pip inst…   217MB     
ec679d5275dd   8 minutes ago   /bin/sh -c #(nop) COPY dir:7dc2bc98d33cb7f8d…   137kB  
Step 5/10 : COPY deephaven-wheel/ /deephaven-wheel
 ---> ec679d5275dd
Step 6/10 : RUN set -eux;     python3 -m pip install -q --no-cache-dir /deephaven-wheel/*.whl;     rm -r /deephaven-wheel
 ---> Running in 046d3e02500b
+ python3 -m pip install -q --no-cache-dir /deephaven-wheel/deephaven-0.7.0-py2.py3-none-any.whl
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
+ rm -r /deephaven-wheel
Removing intermediate container 046d3e02500b
 ---> 3c7d9ad99d54

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the sizes I get locally:

$ ls -la Integrations/build/context/*         
-rw-r--r--. 1 devin devin 761 Nov 12 12:03 Integrations/build/context/Dockerfile

Integrations/build/context/deephaven2-wheel:
total 16
drwxr-xr-x. 1 devin devin    66 Nov 12 12:03 .
drwxr-xr-x. 1 devin devin   120 Nov 12 12:03 ..
-rw-r--r--. 1 devin devin 12942 Nov 12 12:03 deephaven2-0.7.0-py3-none-any.whl

Integrations/build/context/deephaven-jpy-wheel:
total 352
drwxr-xr-x. 1 devin devin     94 Nov 12 12:03 .
drwxr-xr-x. 1 devin devin    120 Nov 12 12:03 ..
-rw-r--r--. 1 devin devin 356445 Nov 12 12:03 deephaven_jpy-0.7.0-cp37-cp37m-linux_x86_64.whl

Integrations/build/context/deephaven-wheel:
total 136
drwxr-xr-x. 1 devin devin     72 Nov 12 12:03 .
drwxr-xr-x. 1 devin devin    120 Nov 12 12:03 ..
-rw-r--r--. 1 devin devin 137119 Nov 12 12:03 deephaven-0.7.0-py2.py3-none-any.whl

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The large install size is due to dependency on numba and other stuff.

rm -r /deephaven-wheel

COPY deephaven2-wheel/ /deephaven2-wheel
RUN set -eux; \
python3 -m pip install -q --no-cache-dir /deephaven2-wheel/*.whl; \
rm -r /deephaven2-wheel

FROM deephaven/java-and-python:local-build
COPY --from=install-wheels / /
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand properly that these last two lines "start over" with the fresh image, and just put one big layer on top, with the contents of the above steps?

Assuming so, would it be possible to just mash in the created dist-packages from the three above RUNs, and let each RUN happen in its own distinct FROM, so that they don't affect each other? This is probably overkill, but since we're making an entirely fresh 200mb build each time the above 200mb build changes, it seems like we could have some room to improve a bit, or drop this "flattening" step and cut our churn in half?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - we are taking the changes from the "install-wheels" stage, and just adding the bits that have changed back against java-and-python:local-build.

The reason I don't want to run 3 pip installs runs to be parallel (instead of serial) is that I can't guarantee they won't have overlapping bits; and I'd prefer to let the pip package manager deal with any potential version conflicts than either breaking or silently choosing one during a copy stage.

10 changes: 10 additions & 0 deletions buildSrc/src/main/groovy/Docker.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,11 @@ class Docker {
*/
List<String> entrypoint;

/**
* Optional build arguments
*/
Map<String, String> buildArgs;

/**
* Logs are always printed from the build task when it runs, but entrypoint logs are only printed
* when it fails. Set this flag to always show logs, even when entrypoint is successful.
Expand Down Expand Up @@ -256,6 +261,11 @@ class Docker {
if (cfg.imageName) {
images.add(cfg.imageName)
}

// add build arguments, if provided
if (cfg.buildArgs) {
buildArgs.putAll(cfg.buildArgs)
}
}
}

Expand Down
21 changes: 14 additions & 7 deletions gradle/deephaven-jpy.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -24,24 +24,31 @@ project(':deephaven-jpy') {
// For OSS, we need to currently move fast, and will prefer to make edits directly in py/jpy.
// These can be backported to jpy proper in the future if necessary.

task prepareDocker(type: Sync) {
def prepareDocker = project.tasks.register('prepareDocker', Sync) {
from(project.projectDir) {
exclude 'build'
}
into 'build/docker'
}

Docker.registerDockerImage(project, 'buildDockerForRuntime') {
dependsOn prepareDocker
buildArgs.put('DEEPHAVEN_VERSION', "${project.version}")
images.add('deephaven/java-and-python:local-build')
target.set('runtime_reqs')
}

Docker.registerDockerImage(project, 'buildDockerForWheel') {
dependsOn prepareDocker
buildArgs.put('DEEPHAVEN_VERSION', "${project.version}")
images.add('deephaven/deephaven-jpy-wheel:local-build')

target.set('build')
Docker.registerDockerTask(project, 'buildDockerForWheel') {
copyIn {
from(project.projectDir) {
exclude 'build'
}
}
buildArgs = [ DEEPHAVEN_VERSION: project.version.toString() ]
imageName = 'deephaven/deephaven-jpy-wheel:local-build'
containerOutPath = '/usr/src/app/dist'
copyOut {
into 'build/deephaven-jpy-wheel'
}
}
}
8 changes: 1 addition & 7 deletions py/jpy/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,8 @@ FROM build_reqs as sources
WORKDIR /usr/src/app
COPY . .

FROM sources as build
FROM sources
ARG DEEPHAVEN_VERSION
RUN set -eux; \
test -n "${DEEPHAVEN_VERSION}"; \
python3.7 setup.py bdist_wheel

FROM runtime_reqs
COPY --from=build /usr/src/app/dist/ .
RUN set -eux; \
pip3 install *.whl; \
rm *.whl
4 changes: 3 additions & 1 deletion sphinx/sphinx.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,9 @@ def cppClientDoxygenTask = Docker.registerDockerTask(project, 'cppClientDoxygen'
from 'deephaven/runtime-base:local-build'

runCommand('''set -eux; \\
apt-get -y install doxygen graphviz
apt-get update; \\
apt-get -y install doxygen graphviz; \\
rm -rf /var/lib/apt/lists/*
''')

copyFile('.', '/project')
Expand Down