-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove I/O in event loop for backup create and restore operations #5634
Conversation
2f462ad
to
0be8ff4
Compare
Make remove_folder and remove_folder_with_excludes synchronous functions which need to be run in an executor thread to be safely used in asyncio. This makes them better composable with other I/O operations like checking for file existence etc.
0be8ff4
to
f75f9d6
Compare
📝 WalkthroughWalkthroughThis pull request refactors and enhances backup, restore, and folder removal routines across several modules. It updates type annotations and method signatures, encapsulates file handling logic into helper functions, and tightens error handling—particularly for mount validations and file existence checks. Additionally, synchronous execution is adopted for folder removal utilities with corresponding test updates. These changes affect the Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as User/System
participant HA as HomeAssistant Module
participant RHA as _restore_home_assistant
participant Tar as Tarfile Extraction Process
Caller->>HA: invoke restore(tar_file, exclude_database)
HA->>RHA: call _restore_home_assistant()
RHA->>Tar: Extract tar file and process metadata
Tar-->>RHA: Return extraction results
RHA-->>HA: Return Home Assistant metadata
HA->>Caller: Complete restoration with metadata validation
sequenceDiagram
participant Caller as User/System
participant GR as GitRepo Instance
participant Inner as _remove_git_dir
participant Exec as sys_run_in_executor
participant RF as remove_folder
Caller->>GR: Call _remove()
GR->>Inner: Execute removal logic (_remove_git_dir)
Inner->>Exec: Schedule removal in executor context
Exec->>RF: Invoke remove_folder for directory removal
RF-->>Exec: Return after folder removal
Exec-->>Inner: Complete asynchronous execution
Inner-->>GR: Removal process finished
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (4)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
supervisor/homeassistant/module.py (1)
12-12
: Consider more precise typing for_restore_home_assistant
It looks like_restore_home_assistant
returns either a dict orNone
. UsingAny
is functional but reduces clarity.Here's a sample diff for the return type:
-from typing import Any +from typing import Any, Optional, Dict-def _restore_home_assistant() -> Any: +def _restore_home_assistant() -> Optional[Dict[str, Any]]:supervisor/backups/backup.py (1)
97-97
: Enhance type annotation for clarity.The type annotation for
_tmp
should beTemporaryDirectory | None
since it can be None.- self._tmp: TemporaryDirectory = None + self._tmp: TemporaryDirectory | None = None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
supervisor/backups/backup.py
(7 hunks)supervisor/backups/manager.py
(6 hunks)supervisor/homeassistant/module.py
(3 hunks)supervisor/resolution/fixups/store_execute_reset.py
(1 hunks)supervisor/store/git.py
(1 hunks)supervisor/utils/__init__.py
(2 hunks)tests/utils/test_remove_folder.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Build armv7 supervisor
- GitHub Check: Build armhf supervisor
- GitHub Check: Build aarch64 supervisor
- GitHub Check: Run tests Python 3.13.2
🔇 Additional comments (15)
supervisor/homeassistant/module.py (1)
50-50
: Good import adjustment
Importingremove_folder_with_excludes
aligns nicely with the new approach to handle excluded paths.tests/utils/test_remove_folder.py (4)
9-9
: Non-async test function looks good
Switching fromasync def
to a regular function is appropriate now thatremove_folder
is synchronous.
17-17
: Synchronous call is consistent with newremove_folder
behavior
Removingawait
usage aligns with the updated non-async function signature.
21-21
: Regular test function updated
Converting the second test function to a synchronous style is correct given the helper function changes.
37-37
: Removeawait
for synchronous folder removal
Again, consistent change following the refactor ofremove_folder
.supervisor/resolution/fixups/store_execute_reset.py (1)
43-43
: Offload folder removal to an executor
This change properly shifts blocking I/O out of the event loop, aligning well with the PR objective.supervisor/utils/__init__.py (2)
84-110
: LGTM! The synchronous implementation aligns with the PR objectives.The refactoring of
remove_folder
from async to sync is a good change as it:
- Moves I/O operations out of the event loop.
- Simplifies error handling by using
subprocess.run
.- Maintains the same functionality with cleaner code.
112-127
: LGTM! Well-structured implementation of file exclusion handling.The new
remove_folder_with_excludes
function:
- Safely handles file exclusions using a temporary directory.
- Properly restores excluded files after removal.
- Uses context manager for automatic cleanup.
supervisor/store/git.py (1)
192-198
: LGTM! Good encapsulation of removal logic.The inner function
_remove_git_dir
effectively:
- Encapsulates directory check and removal logic.
- Maintains clean error handling.
- Aligns with the PR's objective of moving I/O out of the event loop.
supervisor/backups/backup.py (4)
467-481
: LGTM! Well-structured extraction of tarfile creation logic.The new
_open_outer_tarfile
function:
- Improves code organization.
- Enhances error handling with clear messages.
- Aligns with the PR's objective of moving I/O out of the event loop.
508-513
: LGTM! Enhanced error handling for backup file access.The improved error handling:
- Checks file existence before proceeding.
- Provides clear error messages.
- Triggers backup reload when file is not found.
686-724
: LGTM! Improved folder save operation with return value.The changes:
- Add return value to indicate success/failure.
- Only append folder to list if save was successful.
- Better error handling.
783-799
: LGTM! Enhanced mount handling during folder restore.The changes:
- Properly handle bind mounts within the folder.
- Ensure mounts are unmounted before restore.
- Properly remount after restore.
supervisor/backups/manager.py (2)
473-484
: LGTM! Enhanced mount state verification before backup.The changes:
- Verify mount state before starting backup.
- Clear error messages for mount issues.
- Aligns with PR's objective of better I/O handling.
352-360
: LGTM! Improved error handling for mount operations.The changes:
- Check mount state before copy operations.
- Clear error messages for mount issues.
- Proper type handling with Mount objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Have to fix the small impact on import and I'd really prefer we avoid setting data on the Backup
instances in the executor since they are globals through BackupManager
. Just want to be extra cautious to prevent some of the race condition issues we've had before refactoring stuff to the executor.
supervisor/backups/manager.py
Outdated
# Before starting, check if the location is mounted, if a backup mount | ||
if location == DEFAULT and self.sys_mounts.default_backup_mount: | ||
location = self.sys_mounts.default_backup_mount | ||
|
||
if location not in (DEFAULT, LOCATION_CLOUD_BACKUP, None): | ||
location_mount: Mount = location | ||
if not await self.sys_run_in_executor(location_mount.local_where.is_mount): | ||
raise BackupMountDownError( | ||
f"{location_mount.name} is down, cannot back-up to it", | ||
_LOGGER.error, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of moving this logic into _do_backup
we just make some kind of _check_mount
method? Then we can call it from do_backup_partial
, do_backup_full
, import_backup
and copy_to_additional_locations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sounds like a good idea. There is already a is_mounted()
function in the Mount
class (which is overwritten by the NetworkMount
class. So we can reuse code here and make it applicable for all types of mounts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, since I am using the is_mounted()
async function from the Mount
class, I can't use it in copy_to_additional_locations
. I could by simply looping through locations first, but this would make executors than necessary. It is a tradeoff between code duplication and slightly more optimal code. I opted for the latter.
Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍 |
Completely agree, we should avoid touching state in the class from the executor as much as possible, this is good practice! That said, I think in practice it is not a problem for two reasons: Firstly, we do allow only a single backup to run (via job group limits/Supervisor state freeze etc.). Secondly, in today's Python, the global interpreter lock only allows a single Thread to execute Python code. So as long as the state updated in the executor represents a coherent state, it should not be a problem. But still, this is just good practice, and maybe there is even a case where this could be problematic. I'll change the code accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
supervisor/homeassistant/module.py (1)
462-466
: Improve type annotation for_restore_home_assistant
.The return type annotation
Any
is too generic. Consider using a more specific type to better document the expected metadata structure.-def _restore_home_assistant() -> Any: +def _restore_home_assistant() -> dict[str, Any] | None:supervisor/backups/manager.py (1)
365-371
: Consider using async mount check in_copy_to_additional_locations
.The function uses
is_mount()
which is a synchronous check. Consider using the asyncis_mounted()
method for consistency with_check_location
.- if not location_mount.local_where.is_mount(): + if not await location_mount.is_mounted():
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
supervisor/backups/backup.py
(7 hunks)supervisor/backups/manager.py
(5 hunks)supervisor/homeassistant/module.py
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Run tests Python 3.13.2
🔇 Additional comments (9)
supervisor/homeassistant/module.py (3)
480-484
: LGTM! Improved error handling for tar file extraction.The error handling for tar file extraction is well-structured, providing clear error messages and proper logging.
490-510
: LGTM! Robust cleanup and error handling.The code properly handles cleanup of temporary files and provides clear error messages for file operations.
525-536
: LGTM! Proper validation of metadata.The code correctly validates the metadata against the schema and provides clear error messages.
supervisor/backups/backup.py (4)
467-482
: LGTM! Well-structured tarfile handling.The
_open_outer_tarfile
function properly handles file existence checks and provides clear error messages.
690-729
: LGTM! Improved folder backup logic.The
_folder_save
function now returns a boolean to indicate success/failure and properly handles file existence checks.
757-768
: LGTM! Enhanced folder restore validation.The
_folder_restore
function now includes proper validation of the backup file and cleanup of existing content.
787-796
: LGTM! Proper handling of bind mounts.The code correctly handles unmounting and remounting of bind mounts during folder restoration.
supervisor/backups/manager.py (2)
126-138
: LGTM! Well-structured location validation.The
_check_location
method properly validates backup locations and provides clear error messages.
563-564
: LGTM! Consistent location validation.The code properly validates backup locations before performing backup operations.
Also applies to: 613-614
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
WOW! A lot of work have been done. Any idea, when we will see the fix in production? Thanks for the time you investigated into fixing this. 👍 |
Unfortunately, there is more cases where I/O is done in the event loop in the backup create path. I am not sure if it is relevant for your case, but I'd like to remove all of it ideally. |
Proposed change
This PR refactors the Supervisor backup create and restore operations to be asyncio event loop friendly. Specifically, all I/O is moved into executor threads, even "small" I/O like file existence checks. This is especially important since we support network storage, which may have significant delay for operations (e.g. when a NAS spins down HDDs).
There is still some I/O in the event loop, e.g. when loading backups. These will be addressed in follow-up PRs.
Type of change
Additional information
Checklist
ruff format supervisor tests
)If API endpoints or add-on configuration are added/changed:
Summary by CodeRabbit
Refactor
Bug Fixes
Tests