Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added logging for DatasetNotFound error #7778

Merged
merged 8 commits into from
Apr 26, 2024
Merged

Conversation

klakhov
Copy link
Contributor

@klakhov klakhov commented Apr 16, 2024

Motivation and context

This PR adds logs for further invistigation of DatasetNotFound error which frequently appears in case of importing datasets in various formats

How has this been tested?

Checklist

  • I submit my changes into the develop branch
  • [ ] I have created a changelog fragment
  • [ ] I have updated the documentation accordingly
  • [ ] I have added tests to cover my changes
  • [ ] I have linked related issues (see GitHub docs)
  • [ ] I have increased versions of npm packages if it is necessary
    (cvat-canvas,
    cvat-core,
    cvat-data and
    cvat-ui)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

Summary by CodeRabbit

  • New Features

    • Added detailed error logging for dataset import errors to enhance troubleshooting and user feedback.
  • Enhancements

    • Introduced a new logging configuration for handling dataset import errors, including file location and formatting details.
  • Configuration Changes

    • Added an environment variable CVAT_LOG_IMPORT_ERRORS to control the logging of dataset import errors, set to 'true' by default in the Docker configuration.

cvat/settings/base.py Outdated Show resolved Hide resolved
cvat/requirements/base.in Outdated Show resolved Hide resolved
cvat/apps/engine/utils.py Outdated Show resolved Hide resolved
base_info = f"[{entity}.id={entity_id} format.name={format_name} exc={base_error}]"
dir_tree = directory_tree(
path=dir_path,
max_depth=5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a class constant or constructor parameter with default value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azhavoro
Do you mean a param in DatasetLogManager for max depth?

for item in os.listdir(path):
full_path = os.path.join(path, item)
if os.path.isdir(full_path):
output += directory_tree(full_path, max_depth, current_depth + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't using os.walk be simpler and more efficient way here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, Ive tried to use walk from datumaro which is based on os.walk and also supports depth

@nmanovic
Copy link
Contributor

@coderabbitai help

Copy link
Contributor

coderabbitai bot commented Apr 25, 2024

CodeRabbit Commands

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

@nmanovic
Copy link
Contributor

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Apr 25, 2024

Walkthrough

The updates across several files in the CVAT application enhance error logging for dataset imports. New functionalities include logging detailed errors when datasets are not found and adjusting settings to enable or disable this feature. Additionally, these changes integrate a new logging class and utilize directory tree structures for better error insight, all controllable via environment settings in Docker.

Changes

Files Change Summary
.../dataset_manager/project.py, .../task.py Enhanced error logging for dataset import issues, including re-raising exceptions for improved handling.
.../engine/log.py, .../engine/utils.py Introduced DatasetLogManager for logging detailed import errors and added utility for generating directory trees.
cvat/settings/base.py New logging configurations and a setting to toggle import error logging.
docker-compose.yml Added environment variable to enable dataset import error logging.

🐰✨
Oh hooray, oh hooray, the logs will now say,
When datasets go astray or just won't obey.
With trees of directories, so tall and so wide,
In CVAT they'll reside, with nowhere to hide.
Let’s hop to success, with errors no less! 🌟
🐰✨


Recent Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 40df2aa and 3cfa295.
Files selected for processing (1)
  • cvat/apps/engine/log.py (2 hunks)
Files skipped from review as they are similar to previous changes (1)
  • cvat/apps/engine/log.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Comment on lines 35 to 47
class DatasetLogManager:
def __init__(self) -> None:
self.glob = logging.getLogger("dataset_logger")

def log_import_error(self, entity, entity_id, format_name, base_error, dir_path) -> None:
base_info = f"[{entity}.id={entity_id} format.name={format_name} exc={base_error}]"
dir_tree = directory_tree(
path=dir_path,
max_depth=5,
)
log_error = f"{base_info} \nDirectory tree:\n{dir_tree}"
self.glob.error(log_error)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a class constant or constructor parameter for the logger name.

This would increase flexibility by allowing different logger configurations without modifying the class code directly. Here's a suggested change:

- self.glob = logging.getLogger("dataset_logger")
+ DEFAULT_LOGGER_NAME = "dataset_logger"
+ def __init__(self, logger_name=DEFAULT_LOGGER_NAME) -> None:
+     self.glob = logging.getLogger(logger_name)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
class DatasetLogManager:
def __init__(self) -> None:
self.glob = logging.getLogger("dataset_logger")
def log_import_error(self, entity, entity_id, format_name, base_error, dir_path) -> None:
base_info = f"[{entity}.id={entity_id} format.name={format_name} exc={base_error}]"
dir_tree = directory_tree(
path=dir_path,
max_depth=5,
)
log_error = f"{base_info} \nDirectory tree:\n{dir_tree}"
self.glob.error(log_error)
class DatasetLogManager:
DEFAULT_LOGGER_NAME = "dataset_logger"
def __init__(self, logger_name=DEFAULT_LOGGER_NAME) -> None:
self.glob = logging.getLogger(logger_name)
def log_import_error(self, entity, entity_id, format_name, base_error, dir_path) -> None:
base_info = f"[{entity}.id={entity_id} format.name={format_name} exc={base_error}]"
dir_tree = directory_tree(
path=dir_path,
max_depth=5,
)
log_error = f"{base_info} \nDirectory tree:\n{dir_tree}"
self.glob.error(log_error)

Comment on lines 401 to 418
def directory_tree(path, max_depth=None, current_depth=0) -> str:
if not os.path.exists(path):
return f"No such file or directory: {path}"

output = ""
if current_depth > 0:
output += "| " * (current_depth - 1) + "|- "

output += os.path.basename(path) + "/\n"

if max_depth is None or current_depth < max_depth:
for item in os.listdir(path):
full_path = os.path.join(path, item)
if os.path.isdir(full_path):
output += directory_tree(full_path, max_depth, current_depth + 1)
else:
output += "| " * current_depth + "|- " + item + "\n"
return output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize the directory_tree function using os.walk.

Using os.walk could simplify and potentially optimize the directory tree generation. Here's a suggested implementation:

- for item in os.listdir(path):
-     full_path = os.path.join(path, item)
-     if os.path.isdir(full_path):
-         output += directory_tree(full_path, max_depth, current_depth + 1)
-     else:
-         output += "|  " * current_depth + "|- " + item + "\n"
+ for root, dirs, files in os.walk(path, topdown=True):
+     for name in dirs:
+         output += "|  " * current_depth + "|- " + name + "/\n"
+     for name in files:
+         output += "|  " * current_depth + "|- " + name + "\n"
+     break  # This ensures only one level is processed at a time, similar to the original functionality

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def directory_tree(path, max_depth=None, current_depth=0) -> str:
if not os.path.exists(path):
return f"No such file or directory: {path}"
output = ""
if current_depth > 0:
output += "| " * (current_depth - 1) + "|- "
output += os.path.basename(path) + "/\n"
if max_depth is None or current_depth < max_depth:
for item in os.listdir(path):
full_path = os.path.join(path, item)
if os.path.isdir(full_path):
output += directory_tree(full_path, max_depth, current_depth + 1)
else:
output += "| " * current_depth + "|- " + item + "\n"
return output
def directory_tree(path, max_depth=None, current_depth=0) -> str:
if not os.path.exists(path):
return f"No such file or directory: {path}"
output = ""
if current_depth > 0:
output += "| " * (current_depth - 1) + "|- "
output += os.path.basename(path) + "/\n"
if max_depth is None or current_depth < max_depth:
for root, dirs, files in os.walk(path, topdown=True):
for name in dirs:
output += "| " * current_depth + "|- " + name + "/\n"
for name in files:
output += "| " * current_depth + "|- " + name + "\n"
break # This ensures only one level is processed at a time, similar to the original functionality
return output

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Out of diff range and nitpick comments (1)
cvat/settings/base.py (1)

471-478: The new logging configuration for dataset handling looks good. Consider prefixing the handler name with CVAT_ to maintain consistency with other CVAT settings.

@azhavoro azhavoro merged commit 9d2018f into develop Apr 26, 2024
32 checks passed
@bsekachev bsekachev deleted the kl/dataset-not-found-logging branch May 6, 2024 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants