Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Fix ccache cache restoration to improve build times #5202

Merged
merged 12 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 102 additions & 53 deletions .github/workflows/integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,12 @@ concurrency:
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
permissions: read-all
env:
TRITON_BUILD_WITH_CCACHE: "true"
TRITON_BUILD_WITH_CLANG_LLD: "TRUE"
TRITON_USE_ASSERT_ENABLED_LLVM: "TRUE"
TRITON_DISABLE_LINE_INFO: 1
PROTON_SKIP_PC_SAMPLING_TEST: 1
CCACHE_COMPRESS: "true"
jobs:
Runner-Preparation:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -154,6 +156,8 @@ jobs:
strategy:
matrix:
runner: ${{fromJson(needs.Runner-Preparation.outputs.matrix-CUDA)}}
env:
RUNNER_TYPE: ${{ matrix.runner[0] }}
steps:
- name: Checkout
uses: actions/checkout@v4
Expand Down Expand Up @@ -199,22 +203,30 @@ jobs:
# "restore" step. This is to prevent the caches from accumulating stale
# files over time.
name: Restore cache of ccache and Triton compilation artifacts
if: github.event_name != 'push'
id: restore-build-cache
if: github.ref != 'refs/heads/main'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and this condition was disabling the cache from being restored on PRs

uses: actions/cache/restore@v4
with:
path: |
~/.triton/cache
~/.cache/ccache
~/.ccache
# Restore the most recent cache entry.
restore-keys: triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ runner.name }}-llvm-${{ steps.cache-key.outputs.llvm }}-
restore-keys: |
triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ env.RUNNER_TYPE }}-llvm-${{ steps.cache-key.outputs.llvm }}-
triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ env.RUNNER_TYPE }}-
# We expect this cache key never to hit and for us to fall back
# unconditionally to the restore-key, so it doesn't actually matter
# what we put here (so long as it doesn't hit an existing key).
key: triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ runner.name }}-llvm-${{ steps.cache-key.outputs.llvm }}-${{ steps.cache-key.outputs.datetime }}
- name: Inspect cache directory
key: triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ env.RUNNER_TYPE }}-llvm-${{ steps.cache-key.outputs.llvm }}-${{ steps.cache-key.outputs.datetime }}
Copy link
Contributor Author

@peterbell10 peterbell10 Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing runner.name -> env.RUNNER_TYPE here means if we ever have multiple runners servicing the same type of build then they will be able to use each others caches. Of course I did this change before I realised there was only one though...

- name: Inspect cache directories
run: |
mkdir -p ~/.triton
ls -alh ~/.triton
du -sh ~/.triton/**

mkdir -p ~/.ccache
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default cache directory if ~/.ccache doesn't exist is actually different from OS to OS, so the macOS build was looking in the wrong directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meant other platforms use ~/.triton but macos uses ~/.ccache?

Copy link
Contributor Author

@peterbell10 peterbell10 Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm referring to the ccache directory. Linux defaults to ~/.cache/ccache whereas macOS defaults to ~/Library/Caches/ccache

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, you're using ~/.ccache, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ccache has multiple defaults which have an order of precedence. ~/.ccache is the highest precedence and the most cross-platform. Then if that doesn't already exist it will fall back to the platform specific options. See https://ccache.dev/manual/latest.html#_configuration_options in the cache_dir section

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I get it now. Thanks for the reference

ls -alh ~/.ccache
du -sh ~/.ccache
- name: Update PATH
run: |
echo "$HOME/.local/bin" >> $GITHUB_PATH
Expand All @@ -224,12 +236,14 @@ jobs:
python3 -m pip install cython setuptools wheel cmake==3.24 ninja pytest-forked pytest-xdist lit
- name: Install Triton
env:
TRITON_BUILD_WITH_CCACHE: "true"
CUDA_HOME: "/usr/local/cuda"
run: |
echo "PATH is '$PATH'"
cd python
python3 -m pip install '.[tests]'
ccache --zero-stats
python3 -m pip install -v '.[tests]'
- name: CCache Stats
run: ccache --print-stats
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some debug output to confirm we're getting cache hits.

- name: Run lit tests
run: |
cd python
Expand Down Expand Up @@ -278,6 +292,15 @@ jobs:
cd third_party/proton/test
python3 -m pytest -s .
cd ..
- name: Inspect cache directories
run: |
mkdir -p ~/.triton
ls -alh ~/.triton
du -sh ~/.triton/**

mkdir -p ~/.ccache
ls -alh ~/.ccache
du -sh ~/.ccache
- # If we're on branch `main`, save the ccache Triton compilation artifacts
# to the cache so they can be used by other (non-main) CI runs.
#
Expand All @@ -287,22 +310,17 @@ jobs:
if: github.ref == 'refs/heads/main'
uses: actions/cache/save@v4
with:
path: ~/.triton/cache ~/.cache/ccache
key: triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ runner.name }}-llvm-${{ steps.cache-key.outputs.llvm }}-${{ steps.cache-key.outputs.datetime }}
- name: Inspect cache directories
run: |
mkdir -p ~/.triton
ls -alh ~/.triton
du -sh ~/.triton/**

mkdir -p ~/.cache/ccache
ls -alh ~/.cache/ccache
du -sh ~/.cache/ccache
path: |
~/.triton/cache
~/.ccache
key: ${{ steps.restore-build-cache.outputs.cache-primary-key }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just saves you having to make sure the keys match, and enforces it programatically.

Integration-Tests-AMD:
needs: Runner-Preparation
if: needs.Runner-Preparation.outputs.matrix-HIP != ''
runs-on: ${{ matrix.runner }}
timeout-minutes: 30
env:
RUNNER_TYPE: ${{ matrix.runner[1] }}
strategy:
matrix:
runner: ${{fromJson(needs.Runner-Preparation.outputs.matrix-HIP)}}
Expand Down Expand Up @@ -355,40 +373,55 @@ jobs:
# "restore" step. This is to prevent the caches from accumulating stale
# files over time.
name: Restore cache of ccache and Triton compilation artifacts
if: github.event_name != 'push'
id: restore-build-cache
if: github.ref != 'refs/heads/main'
uses: actions/cache/restore@v4
with:
path: |
~/.triton/cache
~/.cache/ccache
~/.ccache
# Restore the most recent cache entry.
restore-keys: triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ runner.name }}-llvm-${{ steps.cache-key.outputs.llvm }}-
restore-keys: |
triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ env.RUNNER_TYPE }}-llvm-${{ steps.cache-key.outputs.llvm }}-
triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ env.RUNNER_TYPE }}-
# We expect this cache key never to hit and for us to fall back
# unconditionally to the restore-key, so it doesn't actually matter
# what we put here (so long as it doesn't hit an existing key).
key: triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ runner.name }}-llvm-${{ steps.cache-key.outputs.llvm }}-${{ steps.cache-key.outputs.datetime }}
- name: Inspect cache directory
key: triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ env.RUNNER_TYPE }}-llvm-${{ steps.cache-key.outputs.llvm }}-${{ steps.cache-key.outputs.datetime }}
- name: Inspect cache directories
run: |
mkdir -p ~/.triton
ls -alh ~/.triton
du -sh ~/.triton/**

mkdir -p ~/.ccache
ls -alh ~/.ccache
du -sh ~/.ccache
- name: Update PATH
run: |
echo "/opt/rocm/llvm/bin" >> $GITHUB_PATH
- name: Install pip dependencies
run: |
python3 -m pip install --upgrade pip
python3 -m pip install lit
- name: Install apt dependencies
run: |
apt update
apt install ccache
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ccache didn't seem to be installed on the AMD runners

- name: Install Triton
id: amd-install-triton
run: |
echo "PATH is '$PATH'"
pip uninstall -y triton
cd python
ccache --zero-stats
pip install -v -e '.[tests]'
- name: Clean up after an unsuccessful build
if: ${{ !success() && steps.amd-install-triton.outcome != 'success' }}
run: |
rm -rf ~/.triton
- name: CCache Stats
run: ccache --print-stats
- name: Run lit tests
run: |
cd python
Expand Down Expand Up @@ -431,6 +464,15 @@ jobs:
cd python
cd "build/$(ls build | grep -i cmake)"
ctest -j32
- name: Inspect cache directories
run: |
mkdir -p ~/.triton
ls -alh ~/.triton
du -sh ~/.triton/**

mkdir -p ~/.ccache
ls -alh ~/.ccache
du -sh ~/.ccache
- # If we're on branch `main`, save the ccache Triton compilation artifacts
# to the cache so they can be used by other (non-main) CI runs.
#
Expand All @@ -440,17 +482,10 @@ jobs:
if: github.ref == 'refs/heads/main'
uses: actions/cache/save@v4
with:
path: ~/.triton/cache ~/.cache/ccache
key: triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ runner.name }}-llvm-${{ steps.cache-key.outputs.llvm }}-${{ steps.cache-key.outputs.datetime }}
- name: Inspect cache directories
run: |
mkdir -p ~/.triton
ls -alh ~/.triton
du -sh ~/.triton/**

mkdir -p ~/.cache/ccache
ls -alh ~/.cache/ccache
du -sh ~/.cache/ccache
path: |
~/.triton/cache
~/.ccache
key: ${{ steps.restore-build-cache.outputs.cache-primary-key }}
- name: Clean up caches
run: |
rm -rf ~/.triton/cache
Expand All @@ -462,6 +497,8 @@ jobs:
strategy:
matrix:
runner: ${{fromJson(needs.Runner-Preparation.outputs.matrix-MACOS)}}
env:
RUNNER_TYPE: ${{ matrix.runner[0] }}
steps:
- name: Checkout
uses: actions/checkout@v4
Expand All @@ -470,7 +507,7 @@ jobs:
- name: Install brew dependencies
run: |
brew update
brew install ccache llvm@19 lld
brew install ccache llvm@19 lld coreutils
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coreutils is needed for sha256sum which is needed to compute the cache keys.

- name: Compute cache keys
id: cache-key
run: |
Expand Down Expand Up @@ -511,22 +548,30 @@ jobs:
# "restore" step. This is to prevent the caches from accumulating stale
# files over time.
name: Restore cache of ccache and Triton compilation artifacts
if: github.event_name != 'push'
id: restore-build-cache
if: github.ref != 'refs/heads/main'
uses: actions/cache/restore@v4
with:
path: |
~/.triton/cache
~/.cache/ccache
~/.ccache
# Restore the most recent cache entry.
restore-keys: triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ runner.name }}-llvm-${{ steps.cache-key.outputs.llvm }}-
restore-keys: |
triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ env.RUNNER_TYPE }}-llvm-${{ steps.cache-key.outputs.llvm }}-
triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ env.RUNNER_TYPE }}-
# We expect this cache key never to hit and for us to fall back
# unconditionally to the restore-key, so it doesn't actually matter
# what we put here (so long as it doesn't hit an existing key).
key: triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ runner.name }}-llvm-${{ steps.cache-key.outputs.llvm }}-${{ steps.cache-key.outputs.datetime }}
- name: Inspect cache directory
key: triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ env.RUNNER_TYPE }}-llvm-${{ steps.cache-key.outputs.llvm }}-${{ steps.cache-key.outputs.datetime }}
- name: Inspect cache directories
run: |
mkdir -p ~/.triton
ls -alh ~/.triton
du -sh ~/.triton/**

mkdir -p ~/.ccache
ls -alh ~/.ccache
du -sh ~/.ccache
- name: Update PATH
run: |
echo "$HOME/.local/bin" >> $GITHUB_PATH
Expand All @@ -539,7 +584,6 @@ jobs:
python3 -m pip install cython setuptools wheel cmake==3.24 ninja pytest-xdist lit pybind11
- name: Install Triton
env:
TRITON_BUILD_WITH_CCACHE: "true"
TRITON_BUILD_WITH_O1: "true"
# macos-latest has 3 vcpus and 7GB DRAM, to save memory we limit the number of jobs to 3
# https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories
Expand All @@ -548,7 +592,19 @@ jobs:
source ~/.venv/bin/activate
echo "PATH is '$PATH'"
cd python
python3 -m pip install --no-build-isolation .
ccache --zero-stats
python3 -m pip install -v --no-build-isolation .
- name: CCache Stats
run: ccache --print-stats
- name: Inspect cache directories
run: |
mkdir -p ~/.triton
ls -alh ~/.triton
du -sh ~/.triton/**

mkdir -p ~/.ccache
ls -alh ~/.ccache
du -sh ~/.ccache
- # If we're on branch `main`, save the ccache Triton compilation artifacts
# to the cache so they can be used by other (non-main) CI runs.
#
Expand All @@ -558,14 +614,7 @@ jobs:
if: github.ref == 'refs/heads/main'
uses: actions/cache/save@v4
with:
path: ~/.triton/cache ~/.cache/ccache
key: triton-artifacts-${{ runner.os }}-${{ runner.arch }}-${{ runner.name }}-llvm-${{ steps.cache-key.outputs.llvm }}-${{ steps.cache-key.outputs.datetime }}
- name: Inspect cache directories
run: |
mkdir -p ~/.triton
ls -alh ~/.triton
du -sh ~/.triton/**

mkdir -p ~/.cache/ccache
ls -alh ~/.cache/ccache
du -sh ~/.cache/ccache
path: |
~/.triton/cache
~/.ccache
key: ${{ steps.restore-build-cache.outputs.cache-primary-key }}
Loading
Loading