Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix R-builds deploy job #171

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Fix R-builds deploy job #171

wants to merge 4 commits into from

Conversation

glin
Copy link
Contributor

@glin glin commented Jul 5, 2023

Fixes #165. Tested this in staging at https://build.posit.it/blue/organizations/jenkins/r-builds%2Fdeploy-r-builds/detail/deploy-r-builds/195/pipeline, and then confirmed that a staging rebuild works. The CI failure is for an unrelated reason.

If this works well, someone will need to update serverless-custom.yml since I don't think I have permissions for that.

We were using the serverless-python-requirements plugin with dockerizePip: true, which installs the Python requirements in a separate docker container. The plugin has to mount some directories (requirements.txt, cache files) into the separate container, and this failed in two ways.

The first issue is that the mounted directory was ending up as a relative path starting with a period, which docker doesn't support. With serverless --verbose, you get the actual error message:

[2023-07-04T00:11:49.031Z] + docker run --rm -v .cache/serverless-python-requirements/2b10c0467f75bfbcf12e9b77be65ec7563b90c5995f0eaa96faefbd42c337bdd_x86_64_slspyc:/var/task:z -v .cache/serverless-python-requirements/downloadCacheslspyc:/var/useDownloadCache:z lambci/lambda:build-python3.8 /bin/sh -c chown -R 0:0 /var/useDownloadCache
[2023-07-04T00:11:49.031Z] docker: Error response from daemon: create .cache/serverless-python-requirements/2b10c0467f75bfbcf12e9b77be65ec7563b90c5995f0eaa96faefbd42c337bdd_x86_64_slspyc: ".cache/serverless-python-requirements/2b10c0467f75bfbcf12e9b77be65ec7563b90c5995f0eaa96faefbd42c337bdd_x86_64_slspyc" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path.
[2023-07-04T00:11:49.031Z] See 'docker run --help'.
script returned exit code 125

This cache dir comes from the plugin's use of the appdirectory npm package, which generates the cache paths based off $HOME: https://github.com/MrJohz/appdirectory/blob/27f19a6eceb46110cd5d6882a18cae3a4da98331/lib/appdirectory.js#L91-L94

When running the plugin locally in Docker, it works fine because $HOME resolves to /root or /home/<user>. However, in Jenkins, the docker image is run with HOME set to .:

# Docker agent in Jenkins
[2023-07-05T18:36:45.755Z] echo "HOME: $HOME"
[2023-07-05T18:36:45.755Z] HOME: .

# Docker running locally
$ echo HOME: $HOME
HOME: /root

This may have been a Jenkins behavior change in some upgrade, which could explain why the job suddenly started failing. We were settting HOME=., see #171 (comment)

The second issue is that we were mounting the docker socket in the agent container. When you do this, any docker run -v mounts from within the container will use paths from the host rather than the container. So even with the cache directory fixed, the plugin was trying to mount directories in the python container that didn't exist, causing a new issue. I'm not sure how the Jenkins job was working before.

docker run --rm -v /root/.cache/2b10c0467f75bfbcf12e9b77be65ec7563b90c5995f0eaa96faefbd42c337bdd_x86_64_slspyc:/var/task:z -v /root/.cache/downloadCacheslspyc:/var/useDownloadCache:z lambci/lambda:build-python3.8 /bin/sh -c chown -R 0\:0 /var/useDownloadCache && python3.8 -m pip install -t /var/task/ -r /var/task/requirements.txt --cache-dir /var/useDownloadCache && chown -R 0\:0 /var/task && chown -R 0\:0 /var/useDownloadCache
chown: missing operand

To fix both of these, I temporarily patched serverless-custom.yml to not dockerize pip and disable caching:

pythonRequirements:
  dockerizePip: false

Switched the deploy image to a base OS image rather than a python/node or lambda image, because the language specific images felt hard to maintain. The nodesource install script didn't work on the python/node images' Debian 11, and the lambda images use AL2 which is super old and no longer supports newer versions of Node.

@glin
Copy link
Contributor Author

glin commented Jul 5, 2023

@jspiewak pointed out that we explicitly set HOME to . in the Jenkins job:

HOME = "."

That explains the setting then, and how it was able to work before. The HOME dir is used for the cache, which exists on the host because it's the workspace dir that gets mounted in. Changing it to env.WORKSPACE as @jspiewak suggested gets the original job working again without all these other changes.

I'll still propose running the whole deployment in a single container though, to keep things simpler and easier to debug locally.

Copy link
Contributor

@gaborcsardi gaborcsardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really check if the changes are OK, just saying thanks for fixing the deployment. :)

Copy link
Member

@jspiewak jspiewak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If setting HOME to env.WORKSPACE actually fixes the issue, is it worth separating that fix from the rest and getting it landed?
Otherwise, @stevenolen or @jforest will need to chime in with some historical context on the docker-in-docker situation.

Makefile Outdated
# The cache directory must either be an absolute path (optionally specified
# via cacheLocation: /path/to/cache), or caching must be disabled for the plugin
# to work. We disable caching completely to avoid all sorts of future issues.
# The cache directory is based on $HOME, which is set to "." when running
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer the case


# Package the service only, for debugging.
# Requires deps and fetch-serverless-custom-file to be run first.
serverless-package:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not declare deps and fetch-serverless-custom-file as target dependencies then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to iterate on the packaging (without deploying) step to debug failures with that. I was testing various serverless-python-requirements settings in serverless-custom.yml and didn't want to overwrite it.

@glin
Copy link
Contributor Author

glin commented Jul 6, 2023

@jspiewak Good idea, let's go with that first to get the deploy unblocked asap. I'll follow up with a new PR soon and add some docs. The main problem was just how hard the deploy job was to understand and debug, and I would be fine with the original docker-inside-docker approach as long as it was documented well enough. I'd still prefer reducing it to just one container, but am curious what @stevenolen and @jforest think.

glin added 4 commits July 10, 2023 17:36
* Don't Dockerize pip and stop mounting Docker socket in Jenkins. Mounting
  the socket breaks docker run commands that attempt to mount a directory
  in the container - the mount uses the host rather than the container.
  The serverless-python-requirements plugin will do this and fail to locate
  files. No idea how this was working in Jenkins before, however!
* Switch to jammy Docker image with custom-installed dependencies. The
  language-specific images and lambda images are too tied to distros that
  make dependencies hard to install or maintain. e.g., the nodesource
  install script is broken on Debian 11 right now, and AL2 no longer supports
  newer versions of Node 18+.
* Disable caching in the serverless-python-requirements plugin. For some
  reason, the cache directory in Jenkins changed to a relative .cache
  directory at some point, based on /home/greg, set to . in Jenkins but
  an absolute path like /root typically. The plugin assumes that this cache
  directory is absolute and breaks otherwise with a relative path. To prevent
  future issues, we just disable caching completely rather than explicitly
  setting cacheLocation to an absolute path.
…mments

* Simplify the deployment as much as possible
* HOME can be removed, but needs the jenkins user restored to have a
  valid HOME dir that's not /, which breaks npm.
@glin glin force-pushed the glin-fix-serverless branch from 3264aa3 to fbcdbb2 Compare July 10, 2023 23:47
@glin
Copy link
Contributor Author

glin commented Jul 10, 2023

I updated the docs and tried to simplify the job a little more:

  • Eliminated the custom HOME config - restored the jenkins user so HOME wouldn't be / and break npm
  • Re-enabled plugin caching so only dockerizePip: true would need to be removed in serverless-custom.yml

#175 seems to be working fine for now though, but I'll still leave this up for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

R-builds deployment failing
3 participants