Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move merged PR job directories to trash_bin_dir #271

Merged
merged 61 commits into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
afc2f67
Preliminary work on PR cleanup task
Neves-P Apr 16, 2024
5e17612
WIP disk cleanup
Neves-P May 7, 2024
901b8cc
Merge branch 'develop' into feature/disk_cleanup
Neves-P May 7, 2024
5d8f0a1
cleanup script WIP
Neves-P May 7, 2024
568e4c5
Make date subdiretory
Neves-P May 7, 2024
7e91ce6
Have subdirectory with date and comment
Neves-P May 7, 2024
648232d
Typo
Neves-P May 7, 2024
8e2082d
Better read from config and logging
Neves-P Jun 4, 2024
d5f1d39
Scripting & cron job follow later. Closes #1
Neves-P Jun 4, 2024
e0910f6
Treat hound to some white spaces
Neves-P Jun 4, 2024
fec556b
Merge branch 'Neves-Bot:develop' into feature/disk_cleanup
Neves-P Jun 6, 2024
1d87ea2
Merge pull request #2 from Neves-P/feature/disk_cleanup
Neves-Bot Jun 6, 2024
d645202
Update eessi_bot_event_handler.py
Neves-P Jun 10, 2024
d75bb11
Update tasks/clean_up.py
Neves-P Jun 10, 2024
435bcae
Handle based on `merged` not `action` from `request_body`
Neves-P Jun 10, 2024
6b38f37
Add docstring and use python routines (not shell commands)
Neves-P Jun 10, 2024
c4ee159
Merge pull request #3 from Neves-P/feature/disk_cleanup
Neves-Bot Jun 10, 2024
769daec
Actions are 'closed', not 'merged'
Neves-P Jun 10, 2024
f321fe5
Merge pull request #4 from Neves-P/feature/disk_cleanup
Neves-Bot Jun 10, 2024
1c47315
Read from config properly
Neves-P Jun 10, 2024
f0c0082
Merge branch 'Neves-Bot:feature/disk_cleanup' into feature/disk_cleanup
Neves-P Jun 10, 2024
7b98722
Merge pull request #5 from Neves-P/feature/disk_cleanup
Neves-Bot Jun 10, 2024
1a6eab3
Fix typo in `merged`.
Neves-P Jun 10, 2024
b1dfd1d
Merge branch 'feature/disk_cleanup' of https://github.com/Neves-P/ees…
Neves-P Jun 10, 2024
fd3912e
Merge pull request #6 from Neves-P/feature/disk_cleanup
Neves-Bot Jun 11, 2024
0b91343
merged is a Bool (?)
Neves-P Jun 11, 2024
cee6416
Merge pull request #7 from Neves-P/feature/disk_cleanup
Neves-Bot Jun 11, 2024
784c9b9
Better formatting for merge conflict errors
Neves-P Jun 11, 2024
1f2000b
Merge pull request #8 from Neves-P/feature/disk_cleanup
Neves-Bot Jun 11, 2024
4160bd1
Fix bug in getting repository name
Neves-P Jun 11, 2024
744e29e
Merge pull request #9 from Neves-P/feature/disk_cleanup
Neves-Bot Jun 12, 2024
29ab450
Same formatting for directory timestamp
Neves-P Jun 12, 2024
2497993
Merge branch 'Neves-Bot:feature/disk_cleanup' into feature/disk_cleanup
Neves-P Jun 25, 2024
087d6c5
Best to not read comment from config
Neves-P Jul 1, 2024
5dbd132
Remove local variable
Neves-P Jul 1, 2024
0235586
Merge pull request #10 from Neves-P/feature/disk_cleanup
Neves-Bot Jul 1, 2024
b782889
Restore cleanup_pr.sh
Neves-P Jul 1, 2024
f4114b3
Restore cleanup_pr.sh
Neves-P Jul 1, 2024
283f602
Also move pr event dirs
Neves-P Jul 1, 2024
c025746
Unnecessary library
Neves-P Jul 1, 2024
b018037
Merge pull request #11 from Neves-P/feature/disk_cleanup
Neves-Bot Jul 1, 2024
e56e9ed
Use shutil.copy2 and os.remove instead
Neves-P Jul 1, 2024
fbb16c3
Merge pull request #12 from Neves-P/feature/disk_cleanup
Neves-Bot Jul 1, 2024
00eeb89
Move whole directory(ies) to trash
Neves-P Jul 2, 2024
4b360b3
lint
Neves-P Jul 2, 2024
e3052e7
Merge pull request #13 from Neves-P/feature/disk_cleanup
Neves-Bot Jul 2, 2024
b8f6d57
Better formatting error messages
Neves-P Jul 2, 2024
59a337c
Need copytree instead
Neves-P Jul 2, 2024
7a07249
Merge branch 'Neves-Bot:feature/disk_cleanup' into feature/disk_cleanup
Neves-P Jul 2, 2024
c4dc9b5
Merge pull request #14 from Neves-P/feature/disk_cleanup
Neves-Bot Jul 2, 2024
f80ab2b
Use rmtree
Neves-P Jul 2, 2024
f0e0255
Merge branches 'feature/disk_cleanup' and 'feature/disk_cleanup' of h…
Neves-P Jul 2, 2024
1c08013
Merge pull request #15 from Neves-P/feature/disk_cleanup
Neves-Bot Jul 3, 2024
da970e7
copytree with dirs_exist_ok True
Neves-P Jul 5, 2024
3673540
Merge pull request #16 from Neves-P/feature/disk_cleanup
Neves-Bot Jul 5, 2024
71e7b31
tweak clean up method by
truib Aug 5, 2024
d1a0278
add information to README.md and app.cfg.example
truib Aug 5, 2024
1bc2bf4
Merge pull request #2 from trz42/tweak_cleanup
Neves-P Aug 5, 2024
d8d79b6
Lint
Neves-P Aug 5, 2024
f69c49c
add empty line
truib Aug 6, 2024
f9332fe
Merge pull request #3 from trz42/feature/disk_cleanup
Neves-P Aug 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -720,6 +720,21 @@ git_apply_tip = _Tip: This can usually be resolved by syncing your branch and re
`git_apply_tip` should guide the contributor/maintainer about resolving the cause
of `git apply` failing.

#### `[clean_up]` section

The `[clean_up]` section includes settings related to cleaning up disk used by merged (and closed) PRs.
```
trash_bin_dir = PATH/TO/TRASH_BIN_DIRECTORY
```
Ideally this is on the same filesystem used by `jobs_base_dir` and `job_ids_dir` to efficiently move data
into the trash bin. If it resides on a different filesystem, the data will be copied.

```
moved_job_dirs_comment = PR merged! Moved `{job_dirs}` to `{trash_bin_dir}`
```
Template that is used by the bot to add a comment to a PR noting down which directories have been
moved and where.

# Instructions to run the bot components

The bot consists of three components:
Expand Down
4 changes: 4 additions & 0 deletions app.cfg.example
Original file line number Diff line number Diff line change
Expand Up @@ -259,3 +259,7 @@ curl_failure = Unable to download the `.diff` file.
curl_tip = _Tip: This could be a connection failure. Try again and if the issue remains check if the address is correct_
git_apply_failure = Unable to download or merge changes between the source branch and the destination branch.
git_apply_tip = _Tip: This can usually be resolved by syncing your branch and resolving any merge conflicts._

[clean_up]
trash_bin_dir = $HOME/trash_bin
moved_job_dirs_comment = PR merged! Moved `{job_dirs}` to `{trash_bin_dir}`
64 changes: 63 additions & 1 deletion eessi_bot_event_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@

# Standard library imports
import sys
from datetime import datetime, timezone

# Third party imports (anything installed into the local Python environment)
from pyghee.lib import create_app, get_event_info, PyGHee, read_event_from_json
Expand All @@ -28,7 +29,8 @@
from connections import github
from tasks.build import check_build_permission, get_architecture_targets, get_repo_cfg, \
request_bot_build_issue_comments, submit_build_jobs
from tasks.deploy import deploy_built_artefacts
from tasks.deploy import deploy_built_artefacts, determine_job_dirs
from tasks.clean_up import move_to_trash_bin
from tools import config
from tools.args import event_handler_parse
from tools.commands import EESSIBotCommand, EESSIBotCommandError, \
Expand Down Expand Up @@ -58,6 +60,9 @@
config.BUILDENV_SETTING_SHARED_FS_PATH, # optional+recommended
# config.BUILDENV_SETTING_SLURM_PARAMS, # optional
config.BUILDENV_SETTING_SUBMIT_COMMAND], # required
config.SECTION_CLEAN_UP: [
config.CLEAN_UP_SETTING_TRASH_BIN_ROOT_DIR, # required
config.CLEAN_UP_SETTING_MOVED_JOB_DIRS_COMMENT], # required
config.SECTION_DEPLOYCFG: [
config.DEPLOYCFG_SETTING_ARTEFACT_PREFIX, # (required)
config.DEPLOYCFG_SETTING_ARTEFACT_UPLOAD_SCRIPT, # required
Expand Down Expand Up @@ -599,6 +604,63 @@ def start(self, app, port=3000):
self.log(log_file_info)
waitress.serve(app, listen='*:%s' % port)

def handle_pull_request_closed_event(self, event_info, pr):
"""
Handle events of type pull_request with the action 'closed'. Main action
is to scan directories used and move them to the trash_bin when the PR
is merged.

Args:
event_info (dict): event received by event_handler
pr (github.PullRequest.PullRequest): instance representing the pull request

Returns:
github.IssueComment.IssueComment instance or None (note, github refers to
PyGithub, not the github from the internal connections module)
"""

# Detect event and only act if PR is merged
request_body = event_info['raw_request_body']
action = request_body['action']
merged = request_body['pull_request']['merged']

if merged:
self.log("PR merged: scanning directories used by PR")
self.log(f"pull_request event with action '{action}' and merged '{merged}' will be handled")
else:
self.log(f"Action '{action}' not handled as 'merged' is '{merged}'")
return
# at this point we know that we are handling a new merge
# NOTE: Permissions to merge are already handled through GitHub, we
# don't need to check here

# 1) determine the jobs that have been run for the PR
job_dirs = determine_job_dirs(pr.number)

# 2) Get trash_bin_dir from configs
trash_bin_root_dir = self.cfg[config.SECTION_CLEAN_UP][config.CLEAN_UP_SETTING_TRASH_BIN_ROOT_DIR]

repo_name = request_body['repository']['full_name']
dt = datetime.now(timezone.utc)
trash_bin_dir = "/".join([trash_bin_root_dir, repo_name, dt.strftime('%Y.%m.%d')])

# Subdirectory with date of move. Also with repository name. Handle symbolic links (later?)
# cron job deletes symlinks?

# 3) move the directories to the trash_bin
self.log("Moving directories to trash_bin")
move_to_trash_bin(trash_bin_dir, job_dirs)

# 4) report move to pull request
repo_name = pr.base.repo.full_name
gh = github.get_instance()
repo = gh.get_repo(repo_name)
pull_request = repo.get_pull(pr.number)
clean_up_comment = self.cfg[config.SECTION_CLEAN_UP][config.CLEAN_UP_SETTING_MOVED_JOB_DIRS_COMMENT]
moved_comment = clean_up_comment.format(job_dirs=job_dirs, trash_bin_dir=trash_bin_dir)
issue_comment = pull_request.create_issue_comment(moved_comment)
return issue_comment


def main():
"""
Expand Down
16 changes: 8 additions & 8 deletions tasks/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -381,21 +381,21 @@ def comment_download_pr(base_repo_name, pr, download_pr_exit_code, download_pr_e

download_pr_comments_cfg = config.read_config()[config.SECTION_DOWNLOAD_PR_COMMENTS]
if error_stage == _ERROR_GIT_CLONE:
download_comment = (f"`{download_pr_error}`"
download_comment = (f"```{download_pr_error}```\n"
f"{download_pr_comments_cfg[config.DOWNLOAD_PR_COMMENTS_SETTING_GIT_CLONE_FAILURE]}"
f"{download_pr_comments_cfg[config.DOWNLOAD_PR_COMMENTS_SETTING_GIT_CLONE_TIP]}")
f"\n{download_pr_comments_cfg[config.DOWNLOAD_PR_COMMENTS_SETTING_GIT_CLONE_TIP]}")
elif error_stage == _ERROR_GIT_CHECKOUT:
download_comment = (f"`{download_pr_error}`"
download_comment = (f"```{download_pr_error}```\n"
f"{download_pr_comments_cfg[config.DOWNLOAD_PR_COMMENTS_SETTING_GIT_CHECKOUT_FAILURE]}"
f"{download_pr_comments_cfg[config.DOWNLOAD_PR_COMMENTS_SETTING_GIT_CHECKOUT_TIP]}")
f"\n{download_pr_comments_cfg[config.DOWNLOAD_PR_COMMENTS_SETTING_GIT_CHECKOUT_TIP]}")
elif error_stage == _ERROR_CURL:
download_comment = (f"`{download_pr_error}`"
download_comment = (f"```{download_pr_error}```\n"
f"{download_pr_comments_cfg[config.DOWNLOAD_PR_COMMENTS_SETTING_CURL_FAILURE]}"
f"{download_pr_comments_cfg[config.DOWNLOAD_PR_COMMENTS_SETTING_CURL_TIP]}")
f"\n{download_pr_comments_cfg[config.DOWNLOAD_PR_COMMENTS_SETTING_CURL_TIP]}")
elif error_stage == _ERROR_GIT_APPLY:
download_comment = (f"`{download_pr_error}`"
download_comment = (f"```{download_pr_error}```\n"
f"{download_pr_comments_cfg[config.DOWNLOAD_PR_COMMENTS_SETTING_GIT_APPLY_FAILURE]}"
f"{download_pr_comments_cfg[config.DOWNLOAD_PR_COMMENTS_SETTING_GIT_APPLY_TIP]}")
f"\n{download_pr_comments_cfg[config.DOWNLOAD_PR_COMMENTS_SETTING_GIT_APPLY_TIP]}")

download_comment = pr_comments.create_comment(
repo_name=base_repo_name, pr_number=pr.number, comment=download_comment
Expand Down
66 changes: 66 additions & 0 deletions tasks/clean_up.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# This file is part of the EESSI build-and-deploy bot,
# see https://github.com/EESSI/eessi-bot-software-layer
#
# The bot helps with requests to add software installations to the
# EESSI software layer, see https://github.com/EESSI/software-layer
#
# author: Pedro Santos Neves (@Neves-P)
#
# license: GPLv2
#

# Standard library imports
import sys
import os
import shutil

# Third party imports (anything installed into the local Python environment)
from pyghee.utils import log

# Local application imports (anything from EESSI/eessi-bot-software-layer)


def move_to_trash_bin(trash_bin_dir, job_dirs):
"""
Move directory to trash_bin_dir

Args:
trash_bin_dir (string): path to the trash_bin_dir. Defined in .cfg
job_dirs (list): list with job directory names

Returns:
None (implicitly)
"""
# idea:
# - shutil.move YYYY.MM/pr_PR_NUM to trash_bin_dir
# - need to obtain list of YYYY.MM/pr_PR_NUM directories from job dirs
# - need to ensure that YYYY.MM under trash_bin_dir exists (or create it)
# - then we can just move YYYY.MM/pr_PR_NUM to trash_bin_dir/YYYY.MM
# - (LATER) we should also fix the symbolic links under job_ids_dir/finished
# (remove it for the job id and add a new one pointing to the new location)
funcname = sys._getframe().f_code.co_name
log(f"{funcname}(): trash_bin_dir = {trash_bin_dir}")

# ensure the 'trash_bin_dir' exists
os.makedirs(trash_bin_dir, exist_ok=True)

pr_dirs = []
for job_dir in job_dirs:
pr_dir = os.path.dirname(job_dir)
log(f"{funcname}(): adding PR dir '{pr_dir}' (from job dir '{job_dir}')")
pr_dirs.append(pr_dir)

# Move (or copy as fallback) entire pr_PR_NUM directories to trash_bin_dir/YYYY.MM
pr_dirs = list(set(pr_dirs)) # get only unique dirs
for pr_dir in pr_dirs:
# determine YYYY.MM parent of pr_dir
year_month_dir = pr_dir.split('/')[-2]
# make sure that directory exists under trash_bin_dir
target_bin_dir = os.path.join(trash_bin_dir, year_month_dir)
os.makedirs(target_bin_dir, exist_ok=True)

log(f"{funcname}(): attempting to move {pr_dir} to {target_bin_dir}")
destination_dir = shutil.move(pr_dir, target_bin_dir)
log(f"{funcname}(): moved {pr_dir} to {destination_dir}")

return True
4 changes: 4 additions & 0 deletions tools/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,10 @@
SUBMITTED_JOB_COMMENTS_SETTING_INITIAL_COMMENT = 'initial_comment'
SUBMITTED_JOB_COMMENTS_SETTING_AWAITS_RELEASE = 'awaits_release'

SECTION_CLEAN_UP = 'clean_up'
CLEAN_UP_SETTING_TRASH_BIN_ROOT_DIR = 'trash_bin_dir'
CLEAN_UP_SETTING_MOVED_JOB_DIRS_COMMENT = 'moved_job_dirs_comment'


def read_config(path='app.cfg'):
"""
Expand Down
Loading