Added logging for `DatasetNotFound` error #7778

klakhov · 2024-04-16T10:09:47Z

Motivation and context

This PR adds logs for further invistigation of DatasetNotFound error which frequently appears in case of importing datasets in various formats

How has this been tested?

Checklist

I submit my changes into the develop branch
~~[ ] I have created a changelog fragment~~
~~[ ] I have updated the documentation accordingly~~
~~[ ] I have added tests to cover my changes~~
~~[ ] I have linked related issues (see GitHub docs)~~
[ ] I have increased versions of npm packages if it is necessary
(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.

Summary by CodeRabbit

New Features
- Added detailed error logging for dataset import errors to enhance troubleshooting and user feedback.
Enhancements
- Introduced a new logging configuration for handling dataset import errors, including file location and formatting details.
Configuration Changes
- Added an environment variable CVAT_LOG_IMPORT_ERRORS to control the logging of dataset import errors, set to 'true' by default in the Docker configuration.

cvat/settings/base.py

cvat/requirements/base.in

cvat/apps/engine/utils.py

azhavoro · 2024-04-23T13:02:59Z

cvat/apps/engine/log.py

+        base_info = f"[{entity}.id={entity_id} format.name={format_name} exc={base_error}]"
+        dir_tree = directory_tree(
+            path=dir_path,
+            max_depth=5,


Please add a class constant or constructor parameter with default value

@azhavoro
Do you mean a param in DatasetLogManager for max depth?

azhavoro · 2024-04-24T12:03:19Z

cvat/apps/engine/utils.py

+        for item in os.listdir(path):
+            full_path = os.path.join(path, item)
+            if os.path.isdir(full_path):
+                output += directory_tree(full_path, max_depth, current_depth + 1)


Wouldn't using os.walk be simpler and more efficient way here?

Alright, Ive tried to use walk from datumaro which is based on os.walk and also supports depth

nmanovic · 2024-04-25T14:30:28Z

@coderabbitai help

coderabbitai · 2024-04-25T14:30:46Z

CodeRabbit Commands

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

nmanovic · 2024-04-25T14:31:01Z

@coderabbitai review

coderabbitai · 2024-04-25T14:31:21Z

Walkthrough

The updates across several files in the CVAT application enhance error logging for dataset imports. New functionalities include logging detailed errors when datasets are not found and adjusting settings to enable or disable this feature. Additionally, these changes integrate a new logging class and utilize directory tree structures for better error insight, all controllable via environment settings in Docker.

Changes

Files	Change Summary
`.../dataset_manager/project.py`, `.../task.py`	Enhanced error logging for dataset import issues, including re-raising exceptions for improved handling.
`.../engine/log.py`, `.../engine/utils.py`	Introduced `DatasetLogManager` for logging detailed import errors and added utility for generating directory trees.
`cvat/settings/base.py`	New logging configurations and a setting to toggle import error logging.
`docker-compose.yml`	Added environment variable to enable dataset import error logging.

🐰✨
Oh hooray, oh hooray, the logs will now say,
When datasets go astray or just won't obey.
With trees of directories, so tall and so wide,
In CVAT they'll reside, with nowhere to hide.
Let’s hop to success, with errors no less! 🌟
🐰✨

Recent Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 40df2aa and 3cfa295.

Files selected for processing (1)

cvat/apps/engine/log.py (2 hunks)

Files skipped from review as they are similar to previous changes (1)

cvat/apps/engine/log.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

coderabbitai · 2024-04-25T14:34:54Z

cvat/apps/engine/log.py

+class DatasetLogManager:
+    def __init__(self) -> None:
+        self.glob = logging.getLogger("dataset_logger")
+
+    def log_import_error(self, entity, entity_id, format_name, base_error, dir_path) -> None:
+        base_info = f"[{entity}.id={entity_id} format.name={format_name} exc={base_error}]"
+        dir_tree = directory_tree(
+            path=dir_path,
+            max_depth=5,
+        )
+        log_error = f"{base_info} \nDirectory tree:\n{dir_tree}"
+        self.glob.error(log_error)
+


Consider adding a class constant or constructor parameter for the logger name.

This would increase flexibility by allowing different logger configurations without modifying the class code directly. Here's a suggested change:

- self.glob = logging.getLogger("dataset_logger") + DEFAULT_LOGGER_NAME = "dataset_logger" + def __init__(self, logger_name=DEFAULT_LOGGER_NAME) -> None: + self.glob = logging.getLogger(logger_name)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

class DatasetLogManager:

def __init__(self) -> None:

self.glob = logging.getLogger("dataset_logger")

def log_import_error(self, entity, entity_id, format_name, base_error, dir_path) -> None:

base_info = f"[{entity}.id={entity_id} format.name={format_name} exc={base_error}]"

dir_tree = directory_tree(

path=dir_path,

max_depth=5,

)

log_error = f"{base_info} \nDirectory tree:\n{dir_tree}"

self.glob.error(log_error)

class DatasetLogManager:

DEFAULT_LOGGER_NAME = "dataset_logger"

def __init__(self, logger_name=DEFAULT_LOGGER_NAME) -> None:

self.glob = logging.getLogger(logger_name)

def log_import_error(self, entity, entity_id, format_name, base_error, dir_path) -> None:

base_info = f"[{entity}.id={entity_id} format.name={format_name} exc={base_error}]"

dir_tree = directory_tree(

path=dir_path,

max_depth=5,

)

log_error = f"{base_info} \nDirectory tree:\n{dir_tree}"

self.glob.error(log_error)

coderabbitai · 2024-04-25T14:34:54Z

cvat/apps/engine/utils.py

+def directory_tree(path, max_depth=None, current_depth=0) -> str:
+    if not os.path.exists(path):
+        return f"No such file or directory: {path}"
+
+    output = ""
+    if current_depth > 0:
+        output += "|  " * (current_depth - 1) + "|- "
+
+    output += os.path.basename(path) + "/\n"
+
+    if max_depth is None or current_depth < max_depth:
+        for item in os.listdir(path):
+            full_path = os.path.join(path, item)
+            if os.path.isdir(full_path):
+                output += directory_tree(full_path, max_depth, current_depth + 1)
+            else:
+                output += "|  " * current_depth + "|- " + item + "\n"
+    return output


Optimize the directory_tree function using os.walk.

Using os.walk could simplify and potentially optimize the directory tree generation. Here's a suggested implementation:

- for item in os.listdir(path): - full_path = os.path.join(path, item) - if os.path.isdir(full_path): - output += directory_tree(full_path, max_depth, current_depth + 1) - else: - output += "| " * current_depth + "|- " + item + "\n" + for root, dirs, files in os.walk(path, topdown=True): + for name in dirs: + output += "| " * current_depth + "|- " + name + "/\n" + for name in files: + output += "| " * current_depth + "|- " + name + "\n" + break # This ensures only one level is processed at a time, similar to the original functionality

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def directory_tree(path, max_depth=None, current_depth=0) -> str:

if not os.path.exists(path):

return f"No such file or directory: {path}"

output = ""

if current_depth > 0:

output += "| " * (current_depth - 1) + "|- "

output += os.path.basename(path) + "/\n"

if max_depth is None or current_depth < max_depth:

for item in os.listdir(path):

full_path = os.path.join(path, item)

if os.path.isdir(full_path):

output += directory_tree(full_path, max_depth, current_depth + 1)

else:

output += "| " * current_depth + "|- " + item + "\n"

return output

def directory_tree(path, max_depth=None, current_depth=0) -> str:

if not os.path.exists(path):

return f"No such file or directory: {path}"

output = ""

if current_depth > 0:

output += "| " * (current_depth - 1) + "|- "

output += os.path.basename(path) + "/\n"

if max_depth is None or current_depth < max_depth:

for root, dirs, files in os.walk(path, topdown=True):

for name in dirs:

output += "| " * current_depth + "|- " + name + "/\n"

for name in files:

output += "| " * current_depth + "|- " + name + "\n"

break # This ensures only one level is processed at a time, similar to the original functionality

return output

coderabbitai

Actionable comments posted: 0

Out of diff range and nitpick comments (1)

cvat/settings/base.py (1)

471-478: The new logging configuration for dataset handling looks good. Consider prefixing the handler name with CVAT_ to maintain consistency with other CVAT settings.

added logging

a1725a4

klakhov added the server label Apr 16, 2024

klakhov requested a review from azhavoro April 16, 2024 10:09

klakhov requested review from zhiltsov-max and Marishka17 as code owners April 16, 2024 10:09

klakhov removed request for zhiltsov-max and Marishka17 April 16, 2024 10:09

fixed linter errors

007fc09

azhavoro reviewed Apr 17, 2024

View reviewed changes

cvat/settings/base.py Outdated Show resolved Hide resolved

cvat/requirements/base.in Outdated Show resolved Hide resolved

klakhov added 4 commits April 18, 2024 12:01

revomoved dep, added own func

dfb5403

added env varibale

733bf38

fixed logging in container

60b15cd

linter fix

d211319

azhavoro reviewed Apr 24, 2024

View reviewed changes

coderabbitai bot reviewed Apr 25, 2024

View reviewed changes

klakhov added 2 commits April 25, 2024 20:51

applied comments

40df2aa

added class constructor parameter

3cfa295

coderabbitai bot reviewed Apr 25, 2024

View reviewed changes

azhavoro approved these changes Apr 26, 2024

View reviewed changes

azhavoro merged commit 9d2018f into develop Apr 26, 2024
32 checks passed

bsekachev deleted the kl/dataset-not-found-logging branch May 6, 2024 06:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added logging for `DatasetNotFound` error #7778

Added logging for `DatasetNotFound` error #7778

klakhov commented Apr 16, 2024 •

edited by coderabbitai bot

Loading

azhavoro Apr 23, 2024

klakhov Apr 25, 2024

azhavoro Apr 24, 2024

klakhov Apr 25, 2024

nmanovic commented Apr 25, 2024

coderabbitai bot commented Apr 25, 2024

nmanovic commented Apr 25, 2024

coderabbitai bot commented Apr 25, 2024 •

edited

Loading

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Apr 25, 2024

coderabbitai bot Apr 25, 2024

coderabbitai bot left a comment

Added logging for DatasetNotFound error #7778

Added logging for DatasetNotFound error #7778

Conversation

klakhov commented Apr 16, 2024 • edited by coderabbitai bot Loading

Motivation and context

How has this been tested?

Checklist

License

Summary by CodeRabbit

azhavoro Apr 23, 2024

Choose a reason for hiding this comment

klakhov Apr 25, 2024

Choose a reason for hiding this comment

azhavoro Apr 24, 2024

Choose a reason for hiding this comment

klakhov Apr 25, 2024

Choose a reason for hiding this comment

nmanovic commented Apr 25, 2024

coderabbitai bot commented Apr 25, 2024

CodeRabbit Commands

nmanovic commented Apr 25, 2024

coderabbitai bot commented Apr 25, 2024 • edited Loading

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Apr 25, 2024

Choose a reason for hiding this comment

coderabbitai bot Apr 25, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

Added logging for `DatasetNotFound` error #7778

Added logging for `DatasetNotFound` error #7778

klakhov commented Apr 16, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 25, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)