Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to train AI using Tasks and their bounding boxes #8310

Merged
merged 16 commits into from
Jan 16, 2025

Conversation

MichaelBuessemeyer
Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer commented Jan 8, 2025

This PR changes the AI Training Input under Administration -> AI Models to support task Ids. Additionally, the task bounding boxes of those tasks are now checked before submitting the training job. The corresponding worker pr ensures that the worker also uses the task bounding boxes during training.

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

  • enable jobs
  • optional: create a new task type that allows volume annotation and not only skeleton (as the sample type)
  • create two new tasks with a task bounding box each. Ensure that the task bounding box is mag aligned in the mag you want to choose for training. Choosing mag 1 should be easiest. The task bounding boxes should have a size of 64 in each dimension.
  • do some volume annotation (withing the task bbox) for these two tasks and then finish each of them.
  • create two more annotations. For each create at least one user bounding box of size 64 (mag aligned) and do some volume annotation in these.
  • Go to the administration -> ai models view and open the task/annotationID input to start a new ai training.
  • Enter the two task ids and the two annotation ids. No error should occur (even though the frontend falsely try whether the annotation ids are actually task ids which results in an error sent by the backend).
  • Then proceeded to the modal to configure the training. Choose the volume annotation layers for the annotations (not tasks) and choose the mag you decided for. Then submit.
  • Start the local worker.
  • The training should successfully start successfully.
  • Then try the same again. This time with non-mag aligned bounding boxes. The modal should show a warning that bounding boxes are not mag aligned.

Issues:


(Please delete unneeded items, merge only when none are left open)

Copy link
Contributor

coderabbitai bot commented Jan 8, 2025

📝 Walkthrough

Walkthrough

This pull request introduces modifications to the AI model training functionality, focusing on enhancing the ability to use task-based annotations for training. The changes primarily involve updating type definitions from AnnotationInfoForAIJob to AnnotationInfoForAITrainingJob across multiple frontend files. The modifications enable specifying task IDs as training data sources and incorporate task bounding boxes into the training process, addressing key requirements for more flexible AI model training.

Changes

File Change Summary
frontend/javascripts/admin/voxelytics/ai_model_list_view.tsx Updated type import and annotation type from AnnotationInfoForAIJob to AnnotationInfoForAITrainingJob.
frontend/javascripts/oxalis/view/jobs/train_ai_model.tsx Comprehensive updates including:
- Renamed type AnnotationInfoForAIJob to AnnotationInfoForAITrainingJob
- Added getAnnotationsForTask import
- Modified AnnotationsCsvInput to handle task URLs and annotation IDs
- Updated function signatures to use new type.
CHANGELOG.unreleased.md Added entry for new feature supporting task IDs and bounding boxes for AI training jobs, renamed "resolution" to "magnification", and documented various bug fixes and removals.
app/controllers/AiModelController.scala Updated error message for model name conflict in runTraining method.
conf/messages Added new error message for AI model name conflict.
frontend/javascripts/admin/tasktype/task_type_create_view.tsx Minor formatting change to checkbox label.
frontend/javascripts/admin/api/tasks.ts Updated getAnnotationsForTask function to accept an optional options parameter for request customization.

Assessment against linked issues

Objective Addressed Explanation
Specify task IDs for training
Include task bounding boxes
Support long-running jobs

Possibly related PRs

Suggested labels

bug, backend

Poem

🐰 A rabbit's tale of code so bright,

Types transform with algorithmic might

From job to training, annotations dance

Bounding boxes leap with a new stance

AI models learn, the future's in sight! 🤖


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d53eb2a and aafd50b.

📒 Files selected for processing (1)
  • CHANGELOG.unreleased.md (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (1)
CHANGELOG.unreleased.md (1)

30-30: LGTM! The changelog entry is clear and accurate.

The entry effectively communicates the changes, follows the changelog format, and has incorporated the previous review feedback.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
frontend/javascripts/oxalis/view/jobs/train_ai_model.tsx (1)

620-620: Consider using an enum or constant for task state.

The string literal "Finished" for task state could be replaced with a type-safe constant or enum.

Consider defining a constant or enum:

const TASK_STATE = {
  FINISHED: "Finished",
} as const;

Then use it in the code:

-          const finishedAnnotations = annotations.filter(({ state }) => state === "Finished");
+          const finishedAnnotations = annotations.filter(({ state }) => state === TASK_STATE.FINISHED);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a556be and 293c98f.

📒 Files selected for processing (2)
  • frontend/javascripts/admin/voxelytics/ai_model_list_view.tsx (2 hunks)
  • frontend/javascripts/oxalis/view/jobs/train_ai_model.tsx (8 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (6)
frontend/javascripts/admin/voxelytics/ai_model_list_view.tsx (1)

15-18: LGTM! Type updates are consistent.

The changes correctly update the type imports and usage from AnnotationInfoForAIJob to AnnotationInfoForAITrainingJob.

Also applies to: 112-112

frontend/javascripts/oxalis/view/jobs/train_ai_model.tsx (5)

50-50: LGTM! Import added for task support.

The addition of getAnnotationsForTask import aligns with the PR objective of supporting tasks in AI training.


59-64: LGTM! Clear documentation of task bounding box support.

The comments effectively explain the rationale for including task bounding boxes and the relationship with user bounding boxes.


181-182: LGTM! Function parameters updated consistently.

The type annotations in TrainAiModelTab parameters have been correctly updated to use AnnotationInfoForAITrainingJob.


668-677: LGTM! Task bounding box handling.

The code correctly adds the task's bounding box to the list of user bounding boxes with proper ID generation and naming.


690-694: LGTM! Clear warning for unfinished tasks.

The code provides clear feedback when tasks without finished annotations are encountered.

Comment on lines 617 to 628
let isTask = true;
try {
const annotations = await getAnnotationsForTask(taskOrAnnotationIdOrUrl);
const finishedAnnotations = annotations.filter(({ state }) => state === "Finished");
if (annotations.length > 0) {
annotationIdsForTraining.push(...finishedAnnotations.map(({ id }) => id));
} else {
unfinishedTasks.push(taskOrAnnotationIdOrUrl);
}
} catch (_e) {
isTask = false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling in task validation.

The catch block silently swallows all errors to determine if the input is a task. This could hide real API errors or other issues.

Consider this improvement:

-        try {
-          const annotations = await getAnnotationsForTask(taskOrAnnotationIdOrUrl);
-          const finishedAnnotations = annotations.filter(({ state }) => state === "Finished");
-          if (annotations.length > 0) {
-            annotationIdsForTraining.push(...finishedAnnotations.map(({ id }) => id));
-          } else {
-            unfinishedTasks.push(taskOrAnnotationIdOrUrl);
-          }
-        } catch (_e) {
-          isTask = false;
-        }
+        try {
+          const annotations = await getAnnotationsForTask(taskOrAnnotationIdOrUrl);
+          const finishedAnnotations = annotations.filter(({ state }) => state === "Finished");
+          if (annotations.length > 0) {
+            annotationIdsForTraining.push(...finishedAnnotations.map(({ id }) => id));
+          } else {
+            unfinishedTasks.push(taskOrAnnotationIdOrUrl);
+          }
+        } catch (error) {
+          if (error.response?.status === 404) {
+            isTask = false;
+          } else {
+            throw new Error(`Failed to fetch task ${taskOrAnnotationIdOrUrl}: ${error.message}`);
+          }
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let isTask = true;
try {
const annotations = await getAnnotationsForTask(taskOrAnnotationIdOrUrl);
const finishedAnnotations = annotations.filter(({ state }) => state === "Finished");
if (annotations.length > 0) {
annotationIdsForTraining.push(...finishedAnnotations.map(({ id }) => id));
} else {
unfinishedTasks.push(taskOrAnnotationIdOrUrl);
}
} catch (_e) {
isTask = false;
}
let isTask = true;
try {
const annotations = await getAnnotationsForTask(taskOrAnnotationIdOrUrl);
const finishedAnnotations = annotations.filter(({ state }) => state === "Finished");
if (annotations.length > 0) {
annotationIdsForTraining.push(...finishedAnnotations.map(({ id }) => id));
} else {
unfinishedTasks.push(taskOrAnnotationIdOrUrl);
}
} catch (error) {
if (error.response?.status === 404) {
isTask = false;
} else {
throw new Error(`Failed to fetch task ${taskOrAnnotationIdOrUrl}: ${error.message}`);
}
}

@MichaelBuessemeyer
Copy link
Contributor Author

@daniel-wer I just noticed that the worker bucket & mag aligns the bounding boxes before using them. This is not done by the frontend checking mechanism. But this aligning correction might lead to different sizes of the bounding boxes and therefore the frontend assertion that bounding boxes have a multiple size of the smallest bounding box might no longer be correct. => Therefore I would also suggest doing the aligning in the frontend and then doing the bbox size assertion checking.

Moreover, the currently only bboxes with an extent of < 10 in at least one direction are counted as too small. But according to https://scm.slack.com/archives/C03B6160ATY/p1736344267807119 this is too small. Therefore, I would also suggest to increase this limit. But, I am not familiar enough with the training code to tell what would be a better limit. I would need you're help here 🙏

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 293c98f and 675d369.

📒 Files selected for processing (4)
  • CHANGELOG.unreleased.md (1 hunks)
  • app/controllers/AiModelController.scala (1 hunks)
  • conf/messages (1 hunks)
  • frontend/javascripts/admin/tasktype/task_type_create_view.tsx (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • frontend/javascripts/admin/tasktype/task_type_create_view.tsx
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.unreleased.md

[style] ~25-~25: You’ve already used the word ‘also’ once in your sentence, so using it again may be redundant.
Context: ...now also support inputting task ids and also considers their task bounding box. [#8310](https:...

(REDUNDANT_FILLER)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (2)
app/controllers/AiModelController.scala (1)

156-156: LGTM! Error message key change improves clarity.

The change from "model.nameInUse" to "aiModel.nameInUse" makes the error key more specific and consistent with the AI model domain.

conf/messages (1)

376-376: LGTM! Clear and consistent error message.

The new error message follows the established format and provides clear guidance to users when they attempt to use a duplicate AI model name.

@@ -22,6 +22,7 @@ For upgrade instructions, please check the [migration guide](MIGRATIONS.released
- Renamed "resolution" to "magnification" in more places within the codebase, including local variables. [#8168](https://github.com/scalableminds/webknossos/pull/8168)
- Layer names are now allowed to contain `$` as special characters. [#8241](https://github.com/scalableminds/webknossos/pull/8241)
- Datasets can now be renamed and can have duplicate names. [#8075](https://github.com/scalableminds/webknossos/pull/8075)
- Starting an ai training job using multiple annotations now also support inputting task ids and also considers their task bounding box. [#8310](https://github.com/scalableminds/webknossos/pull/8310)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve changelog entry clarity and correctness.

The changelog entry contains grammatical issues and redundant wording. Consider this revision:

-- Starting an ai training job using multiple annotations now also support inputting task ids and also considers their task bounding box. [#8310](https://github.com/scalableminds/webknossos/pull/8310)
++ Starting an AI training job using multiple annotations now supports inputting task-IDs and considers their task bounding boxes. [#8310](https://github.com/scalableminds/webknossos/pull/8310)

Changes made:

  • Fixed subject-verb agreement ("supports")
  • Capitalized "AI"
  • Hyphenated "task-IDs"
  • Removed redundant "also"
  • Made "bounding box" plural to match multiple tasks
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Starting an ai training job using multiple annotations now also support inputting task ids and also considers their task bounding box. [#8310](https://github.com/scalableminds/webknossos/pull/8310)
- Starting an AI training job using multiple annotations now supports inputting task-IDs and considers their task bounding boxes. [#8310](https://github.com/scalableminds/webknossos/pull/8310)
🧰 Tools
🪛 LanguageTool

[style] ~25-~25: You’ve already used the word ‘also’ once in your sentence, so using it again may be redundant.
Context: ...now also support inputting task ids and also considers their task bounding box. [#8310](https:...

(REDUNDANT_FILLER)

@MichaelBuessemeyer
Copy link
Contributor Author

Moreover, we could also check that enough annotations are provided. When training with multiple annotations, the training code seems to assume that at least 4 annotations are given by the user

@daniel-wer
Copy link
Member

I just noticed that the worker bucket & mag aligns the bounding boxes before using them. This is not done by the frontend checking mechanism. But this aligning correction might lead to different sizes of the bounding boxes and therefore the frontend assertion that bounding boxes have a multiple size of the smallest bounding box might no longer be correct. => Therefore I would also suggest doing the aligning in the frontend and then doing the bbox size assertion checking.

Good catch, please do so if it's not too complicated. Maybe also a warning should be shown if a bounding box is not mag-aligned. This would make it somewhat easier for users to understand why bounding box sizes might differ in mags > 1.

Moreover, the currently only bboxes with an extent of < 10 in at least one direction are counted as too small. But according to https://scm.slack.com/archives/C03B6160ATY/p1736344267807119 this is too small. Therefore, I would also suggest to increase this limit. But, I am not familiar enough with the training code to tell what would be a better limit. I would need you're help here 🙏

I think it should be fine to go with a minimum extent of 32 vx in the training mag. Do you agree @valentin-pinkau?

Moreover, we could also check that enough annotations are provided. When training with multiple annotations, the training code seems to assume that at least 4 annotations are given by the user

Mmh, I'm not sure I get that. I don't know of any minimum number of annotations assumed by the training code. Do you mean bounding boxes? If so, the test_fraction config value determines a minimum number of training samples that's needed so that the validation set contains at least one example (e.g. if the test_fraction is 0.2 at least 5 training samples are needed). Or do you mean something else?

@MichaelBuessemeyer
Copy link
Contributor Author

If so, the test_fraction config value determines a minimum number of training samples that's needed so that the validation set contains at least one example (e.g. if the test_fraction is 0.2 at least 5 training samples are needed). Or do you mean something else?

Yes, I mean exactly that: The training process always died on me before actually starting the training because of a test_fraction 0.3. I could only resolve this issue by providing at least 4 annotations and not by adding more bounding boxes to one of the three annotations I previously tried. At least that's what I observed

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
frontend/javascripts/oxalis/view/jobs/train_ai_model.tsx (1)

721-730: Consider extracting the color constant.

The black color [0, 0, 0] for task bounding boxes could be defined as a named constant for better maintainability.

+const TASK_BOUNDING_BOX_COLOR = [0, 0, 0] as const;
+
 if (annotation.task?.boundingBox) {
   const largestId = Math.max(...userBoundingBoxes.map(({ id }) => id));
   userBoundingBoxes.push({
     name: "Task Bounding Box",
     boundingBox: Utils.computeBoundingBoxFromBoundingBoxObject(annotation.task.boundingBox),
-    color: [0, 0, 0],
+    color: TASK_BOUNDING_BOX_COLOR,
     isVisible: true,
     id: largestId + 1,
   });
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a5a046 and a329a16.

📒 Files selected for processing (2)
  • CHANGELOG.unreleased.md (1 hunks)
  • frontend/javascripts/oxalis/view/jobs/train_ai_model.tsx (13 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (5)
CHANGELOG.unreleased.md (1)

25-25: LGTM! Clear and accurate changelog entry.

The entry effectively communicates the new feature of supporting task IDs and task bounding boxes in AI training jobs.

frontend/javascripts/oxalis/view/jobs/train_ai_model.tsx (4)

59-64: Well-documented changes!

The comments clearly explain the handling of task bounding boxes and multiple bounding box sizes.


538-538: Robust bounding box validation implementation!

The code correctly:

  1. Enforces minimum extent of 32 voxels per dimension
  2. Validates magnification alignment
  3. Provides clear warnings for misaligned boxes

Also applies to: 560-598, 617-626


743-747: Clear user feedback for unfinished tasks!

The warning toast effectively communicates when tasks have no finished annotations.


669-681: 🛠️ Refactor suggestion

Improve error handling for task validation.

The current implementation silently catches all errors to determine if the input is a task. This could mask real API errors.

Consider this improvement:

        try {
          const annotations = await getAnnotationsForTask(taskOrAnnotationIdOrUrl, {
            showErrorToast: false,
          });
          const finishedAnnotations = annotations.filter(({ state }) => state === "Finished");
          if (annotations.length > 0) {
            annotationIdsForTraining.push(...finishedAnnotations.map(({ id }) => id));
          } else {
            unfinishedTasks.push(taskOrAnnotationIdOrUrl);
          }
-        } catch (_e) {
+        } catch (error) {
+          if (error.response?.status === 404) {
            isTask = false;
+          } else {
+            throw new Error(`Failed to fetch task ${taskOrAnnotationIdOrUrl}: ${error.message}`);
+          }
         }

Likely invalid or redundant comment.

@daniel-wer
Copy link
Member

Yes, I mean exactly that: The training process always died on me before actually starting the training because of a test_fraction 0.3. I could only resolve this issue by providing at least 4 annotations and not by adding more bounding boxes to one of the three annotations I previously tried. At least that's what I observed

I just retested this by adding more and more bounding boxes to a single annotation. Once I reached 4 bounding boxes, the training no longer failed. Alternatively, enlarging a single bounding box also works. Not sure what happened in your case 🤔

The test_fraction is computed dynamically, but it is at most 0.3 and only goes down if the bounding boxes contain more than ~16M voxels at which point there should be more training samples than are needed to have at least one example in the validation set. However, what complicates this is that not the number of bounding boxes is critical here, but the number of training samples. Larger bounding boxes can contain more than one training sample. I think it would be very complicated to calculate this in the frontend and would therefore not do anything about it for now. If we see users hitting this issue we can reconsider this decision.

@MichaelBuessemeyer
Copy link
Contributor Author

@daniel-wer Can you do the review of this pr and the corresponding worker pr as you are already familiar with the issue? The worker pr is very small and should be easy to understand.

@daniel-wer
Copy link
Member

Yes I'll happily do the reviews :)

@daniel-wer daniel-wer self-requested a review January 9, 2025 12:45
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
CHANGELOG.unreleased.md (1)

27-27: Enhance changelog entry with implementation details.

While the entry correctly describes the main change, consider adding important implementation details discussed in the PR:

  • The validation of task bounding boxes before training
  • The requirement for mag-aligned bounding boxes
  • The minimum number of annotations needed for training
- Starting an AI training job using multiple annotations now supports inputting task-IDs and considers their task bounding boxes. [#8310](https://github.com/scalableminds/webknossos/pull/8310)
+ Starting an AI training job using multiple annotations now supports inputting task-IDs and considers their task bounding boxes. The system validates that bounding boxes are properly mag-aligned and ensures a minimum of four annotations for successful training. [#8310](https://github.com/scalableminds/webknossos/pull/8310)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a329a16 and b9e0946.

📒 Files selected for processing (5)
  • CHANGELOG.unreleased.md (1 hunks)
  • frontend/javascripts/admin/api/tasks.ts (1 hunks)
  • frontend/javascripts/admin/tasktype/task_type_create_view.tsx (1 hunks)
  • frontend/javascripts/admin/voxelytics/ai_model_list_view.tsx (1 hunks)
  • frontend/javascripts/oxalis/model/helpers/shader_editor.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • frontend/javascripts/oxalis/model/helpers/shader_editor.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • frontend/javascripts/admin/tasktype/task_type_create_view.tsx
  • frontend/javascripts/admin/api/tasks.ts
  • frontend/javascripts/admin/voxelytics/ai_model_list_view.tsx
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
frontend/javascripts/oxalis/view/jobs/train_ai_model.tsx (1)

665-692: ⚠️ Potential issue

Improve error handling in task validation.

The catch block silently swallows all errors to determine if the input is a task. This could hide real API errors or other issues.

🧹 Nitpick comments (3)
frontend/javascripts/oxalis/view/jobs/train_ai_model.tsx (3)

538-543: Consider extracting bounding box types to a separate file.

The type definition for bounding box validation could be moved to a separate types file to improve code organization.

+// In types/bounding_box_types.ts
+export type BoundingBoxWithTrainingInfo = UserBoundingBox & {
+  annotationId: string;
+  trainingMag: Vector3 | undefined;
+};

-function checkBoundingBoxesForErrorsAndWarnings(
-  userBoundingBoxes: (UserBoundingBox & {
-    annotationId: string;
-    trainingMag: Vector3 | undefined;
-  })[],
+function checkBoundingBoxesForErrorsAndWarnings(
+  userBoundingBoxes: BoundingBoxWithTrainingInfo[],

751-755: Consider adding more context to the warning message.

The warning message could be more informative by explaining what action users should take.

-        `The following tasks have no finished annotations: ${unfinishedTasks.join(", ")}`,
+        `The following tasks have no finished annotations and will be skipped: ${unfinishedTasks.join(", ")}. Please ensure these tasks are completed before including them for training.`,

785-785: Update placeholder text to be more descriptive.

The current placeholder text could be more helpful by providing examples.

-          placeholder="taskOrAnnotationIdOrUrl"
+          placeholder="Enter task IDs or annotation URLs (one per line)\nExample:\ntask_123\nannotation_456\nhttps://webknossos.org/annotations/789"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9e0946 and 52a80ef.

📒 Files selected for processing (2)
  • frontend/javascripts/admin/voxelytics/ai_model_list_view.tsx (2 hunks)
  • frontend/javascripts/oxalis/view/jobs/train_ai_model.tsx (13 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/javascripts/admin/voxelytics/ai_model_list_view.tsx
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (4)
frontend/javascripts/oxalis/view/jobs/train_ai_model.tsx (4)

59-64: LGTM! Clear and comprehensive documentation.

The comments effectively explain the rationale behind handling multiple bounding box sizes, fallback data support, and task bounding boxes.


181-182: LGTM! Type signature updates.

The type rename from AnnotationInfoForAIJob to AnnotationInfoForAITrainingJob better reflects its purpose.


560-606: LGTM! Robust bounding box alignment logic.

The code correctly handles magnification alignment by:

  1. Checking if training magnification is available
  2. Aligning bounding boxes with the magnification
  3. Tracking and warning about any alignment adjustments

729-738: LGTM! Task bounding box integration.

The code correctly:

  1. Checks for task bounding box presence
  2. Generates a unique ID for the new bounding box
  3. Adds it to the user bounding boxes with appropriate defaults

Copy link
Member

@daniel-wer daniel-wer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done 👍 Testing also went well 🎉

During testing I noticed that the minimum dimension warning does not seem to take the mag into account. At least I had a bounding box with extent 32³, chose mag 2-2-2, and there was no warning that it was too small (although it is only 16³ vx in mag 2-2-2). I think this was wrong before your PR, so you don't need to fix it if it's more involved.

@@ -153,7 +153,7 @@ class AiModelController @Inject()(
)
existingAiModelsCount <- aiModelDAO.countByNameAndOrganization(request.body.name,
request.identity._organization)
_ <- bool2Fox(existingAiModelsCount == 0) ?~> "model.nameInUse"
_ <- bool2Fox(existingAiModelsCount == 0) ?~> "aiModel.nameInUse"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

// it is necessary to save userBoundingBoxes separately and not load them from volumeTracings entries to support skeleton only annotations.
// Moreover, in case an annotations is a task, its task bounding box should also be used for training.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Moreover, in case an annotations is a task, its task bounding box should also be used for training.
// Moreover, in case an annotation is a task, its task bounding box should also be used for training.

@@ -55,10 +56,12 @@ const FormItem = Form.Item;
// only the APIAnnotations of the given annotations to train on are loaded from the backend.
// Thus, the code needs to handle both HybridTracing | APIAnnotation where APIAnnotation is missing some information.
// Therefore, volumeTracings with the matching volumeTracingMags are needed to get more details on each volume annotation layer and its magnifications.
// The userBoundingBoxes are needed for checking for equal bounding box sizes. As training on fallback data is supported and an annotation is not required to have VolumeTracings,
// As the userBoundingBoxes should have multiple sizes of the smallest one, a check with a warning should be included.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// As the userBoundingBoxes should have multiple sizes of the smallest one, a check with a warning should be included.
// As the userBoundingBoxes should have extents that are multiples of the smallest extent, a check with a warning should be included.

`'${name}' of annotation ${annotationId}: ${boundingBox.join(", ")} -> ${alignedBoundingBox.join(", ")}`,
);
warnings.push(
`The following bounding boxes are not aligned with the selected magnification. They will be automatically shrunk to be aligned with the magnification:\n${notMagAlignedBoundingBoxesStrings.join("\n")}`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to include the actual magnification here, for example at the end of the first sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the problem is that one can select different mags for different annotations. Therefore, I added the mag to the end of each bbox entry in the warning message.
But the problem is, that this makes the text even longer. I don't know if that's a benefit.
Maybe a better tree structure of the error message would help here 🤔. But this would also complicate the the code a little.

Here is what I currently have:
image

And this is what I mean with an improved hierarchy structure:

The following bounding boxes are not aligned with the selected magnification. They will be automatically shrunk to be aligned with the magnification:
- Annotation 677e8b3f5f0100a80b1dc6ae
  - 'Bounding box 1' (3584, 3584, 1024, 64, 64, 64) will be 1792, 1792, 1024, 32, 32, 64 in mag 2, 2, 1
  - 'Bounding box 2' (3648, 3648, 1088, 64, 64, 64) will be 1824, 1824, 1088, 32, 32, 64 in mag 2, 2, 1
  - 'Bounding box 3' (3712, 3712, 1152, 64, 64, 64) will be 1856, 1856, 1152, 32, 32, 64 in mag 2, 2, 1 

What do you think about this tree structured text?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that a lot 👍

Copy link
Contributor Author

@MichaelBuessemeyer MichaelBuessemeyer Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I made it work 🎉
image

Could you please check the newest changes?

- regarding bbox size calculation use bbox adjusted to mag (not just aligned)
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
frontend/javascripts/oxalis/view/jobs/train_ai_model.tsx (1)

678-691: ⚠️ Potential issue

Improve error handling in task validation.

The catch block silently swallows all errors to determine if the input is a task.

Apply this improvement:

-        try {
-          const annotations = await getAnnotationsForTask(taskOrAnnotationIdOrUrl, {
-            showErrorToast: false,
-          });
-          const finishedAnnotations = annotations.filter(({ state }) => state === "Finished");
-          if (annotations.length > 0) {
-            annotationIdsForTraining.push(...finishedAnnotations.map(({ id }) => id));
-          } else {
-            unfinishedTasks.push(taskOrAnnotationIdOrUrl);
-          }
-        } catch (_e) {
-          isTask = false;
-        }
+        try {
+          const annotations = await getAnnotationsForTask(taskOrAnnotationIdOrUrl, {
+            showErrorToast: false,
+          });
+          const finishedAnnotations = annotations.filter(({ state }) => state === "Finished");
+          if (annotations.length > 0) {
+            annotationIdsForTraining.push(...finishedAnnotations.map(({ id }) => id));
+          } else {
+            unfinishedTasks.push(taskOrAnnotationIdOrUrl);
+          }
+        } catch (error) {
+          if (error.response?.status === 404) {
+            isTask = false;
+          } else {
+            throw new Error(`Failed to fetch task ${taskOrAnnotationIdOrUrl}: ${error.message}`);
+          }
+        }
🧹 Nitpick comments (3)
frontend/javascripts/oxalis/model/bucket_data_handling/bounding_box.ts (1)

209-211: Add JSDoc documentation for the new method.

The method combines two operations (alignWithMag and fromMag1ToMag) in a specific order. Adding documentation would help explain:

  • The purpose of this method
  • Why this specific order of operations is important
  • What each strategy does to the bounding box

Add this documentation:

+  /**
+   * Aligns a bounding box with the given magnification and converts it from mag 1 to the target mag.
+   * This is particularly useful for training data where bounding boxes need to be properly aligned
+   * with the magnification to ensure consistent training samples.
+   * 
+   * @param mag - The target magnification vector
+   * @param strategy - How to align the bounding box:
+   *   - "shrink": ceil min and floor max
+   *   - "grow": floor min and ceil max  
+   *   - "ceil": ceil both min and max
+   *   - "floor": floor both min and max
+   * @returns A new aligned bounding box in the target magnification
+   */
   alignFromMag1ToMag(mag: Vector3, strategy: "shrink" | "grow" | "ceil" | "floor"): BoundingBox {
frontend/javascripts/oxalis/view/jobs/train_ai_model.tsx (2)

731-740: Consider using a distinct color for task bounding boxes.

Currently using black ([0, 0, 0]) which might not be visually distinct enough.

Consider using a more distinctive color:

-            color: [0, 0, 0],
+            color: [255, 140, 0], // Distinctive orange color for task bounding boxes

787-787: Improve the placeholder text for better clarity.

The current placeholder "taskOrAnnotationIdOrUrl" could be more descriptive.

Consider this improvement:

-          placeholder="taskOrAnnotationIdOrUrl"
+          placeholder="Enter task IDs, annotation IDs, or URLs (one per line)"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52a80ef and c37fe7f.

📒 Files selected for processing (3)
  • frontend/javascripts/oxalis/model/bucket_data_handling/bounding_box.ts (1 hunks)
  • frontend/javascripts/oxalis/model/helpers/shader_editor.ts (1 hunks)
  • frontend/javascripts/oxalis/view/jobs/train_ai_model.tsx (13 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/javascripts/oxalis/model/helpers/shader_editor.ts
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (2)
frontend/javascripts/oxalis/view/jobs/train_ai_model.tsx (2)

59-64: Well-documented changes!

The comments clearly explain the requirements and considerations for:

  • Bounding box size validation
  • Task bounding box integration
  • Support for fallback data

Line range hint 538-636: Good implementation of bounding box validation!

The validation logic properly handles:

  • Magnification alignment checks with clear warnings
  • Minimum size requirements (32 voxels) as discussed
  • Clear error messages with box details

Copy link
Member

@daniel-wer daniel-wer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

During testing I noticed that the minimum dimension warning does not seem to take the mag into account. At least I had a bounding box with extent 32³, chose mag 2-2-2, and there was no warning that it was too small (although it is only 16³ vx in mag 2-2-2). I think this was wrong before your PR, so you don't need to fix it if it's more involved.

Was this fixed by your changes or should it be fixed as a followup?

@MichaelBuessemeyer
Copy link
Contributor Author

Was this fixed by your changes or should it be fixed as a followup?

That should also be solved here. The screenshot shows that the mag adjusted size is used for minimal bounding box comparison
image

@MichaelBuessemeyer MichaelBuessemeyer enabled auto-merge (squash) January 16, 2025 14:12
@MichaelBuessemeyer MichaelBuessemeyer merged commit 761f1b1 into master Jan 16, 2025
3 checks passed
@MichaelBuessemeyer MichaelBuessemeyer deleted the enable-taskbased-training branch January 16, 2025 14:14
@MichaelBuessemeyer
Copy link
Contributor Author

And this screenshot includes the minimum warning message
image

@daniel-wer
Copy link
Member

Awesome, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make AI Model Training work with tasks
2 participants