🚀 Add Multi-GPU Training Support #2435

ashwinvaidya17 · 2024-11-25T16:03:37Z

📝 Description

Testing script

#!/bin/bash

models=("Cfa" "Cflow" "Csflow" "Dfkde" "Dfm" "Draem" "Dsr" "EfficientAd" "Fastflow" "Fre" "Ganomaly" "Padim" "Patchcore" "ReverseDistillation" "Rkde" "Stfpm" "Uflow" "VlmAd" "WinClip" "AiVad")

# Loop through each model and run the anomalib train command
for model in "${models[@]}"; do
    anomalib train --model "$model" --data MVTec --trainer.max_epochs 2 --trainer.devices 2 --trainer.strategy='ddp_find_unused_parameters_true'
done

Works

CFA
CFlow
CSFlow
Dfkde
Dfm
Dsr
Fastflow
Ganomaly
Padim
Patchcore
ReverseDistillation
Stfpm
Uflow
WinCLIP
EfficientAd
VlmAd
AiVad // visualization stage does not work
Draem
Fre

Not Working

✨ Changes

Select what type of change your PR is:

🐞 Bug fix (non-breaking change which fixes an issue)
🔨 Refactor (non-breaking change which refactors the code base)
🚀 New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📚 Documentation update
🔒 Security update

✅ Checklist

Before you submit your pull request, please make sure you have completed the following steps:

📋 I have summarized my changes in the CHANGELOG and followed the guidelines for my type of change (skip for minor changes, documentation updates, and test enhancements).
📚 I have made the necessary updates to the documentation (if applicable).
🧪 I have written tests that support my changes and prove that my fix is effective or my feature works (if applicable).

For more information about code review checklists, see the Code Review Checklist.

Signed-off-by: Ashwin Vaidya <[email protected]>

ashwinvaidya17 · 2024-11-28T15:27:11Z

src/anomalib/data/validators/torch/video.py

@@ -600,17 +600,18 @@ def validate_gt_mask(mask: torch.Tensor | None) -> Mask | None:
        if mask is None:


We need to revisit the docstrings for this method

Signed-off-by: Ashwin Vaidya <[email protected]>

codecov · 2024-12-10T10:21:06Z

Codecov Report

Attention: Patch coverage is 70.58824% with 10 lines in your changes missing coverage. Please review.

Please upload report for BASE (feature/v2@c73e411). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/anomalib/data/validators/torch/video.py	77.77%	2 Missing ⚠️
src/anomalib/metrics/evaluator.py	77.77%	2 Missing ⚠️
...rc/anomalib/models/video/ai_vad/lightning_model.py	33.33%	2 Missing ⚠️
...malib/models/components/base/memory_bank_module.py	66.66%	1 Missing ⚠️
...models/components/classification/kde_classifier.py	0.00%	1 Missing ⚠️
src/anomalib/models/image/dfm/torch_model.py	0.00%	1 Missing ⚠️
src/anomalib/models/image/dsr/anomaly_generator.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##             feature/v2    #2435   +/-   ##
=============================================
  Coverage              ?   78.38%           
=============================================
  Files                 ?      302           
  Lines                 ?    12940           
  Branches              ?        0           
=============================================
  Hits                  ?    10143           
  Misses                ?     2797           
  Partials              ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

samet-akcay

looking good, thanks 🔥

Signed-off-by: Ashwin Vaidya <[email protected]>

haimat · 2024-12-10T13:18:13Z

Awesome, these are great news, thanks a lot!
When do you plan to release an official update with this change?

samet-akcay · 2024-12-11T09:54:57Z

Our plan is to pre-release before the 20th

haimat · 2025-01-08T09:31:22Z

Hello, we are so much looking forward to multi-GPU training :)
Do you know when you can release this change?

samet-akcay · 2025-01-08T09:37:52Z

Hi, as soon as passing this CI :)
#2465

We didn't release it on the 20th of December mainly because we thought the documentation is not sufficient. We worked on the documentation during Christmas.

@blaz-r kindly added a new algorithm to v2 as well, but one of the tests are failing now. As soon as we fix the test, we'll release, which we hope to sort out by the end of today.

haimat · 2025-01-08T09:40:10Z

Hi, as soon as passing this CI :) #2465

Awesome, thanks!

blaz-r · 2025-01-08T12:05:46Z

Hi, as soon as passing this CI :) #2465

We didn't release it on the 20th of December mainly because we thought the documentation is not sufficient. We worked on the documentation during Christmas.

@blaz-r kindly added a new algorithm to v2 as well, but one of the tests are failing now. As soon as we fix the test, we'll release, which we hope to sort out by the end of today.

I think I found the issue, I'm going to open a PR in a few mins. Apologies about that.

samet-akcay · 2025-01-08T12:31:05Z

Hi, as soon as passing this CI :) #2465
We didn't release it on the 20th of December mainly because we thought the documentation is not sufficient. We worked on the documentation during Christmas.
@blaz-r kindly added a new algorithm to v2 as well, but one of the tests are failing now. As soon as we fix the test, we'll release, which we hope to sort out by the end of today.

I think I found the issue, I'm going to open a PR in a few mins. Apologies about that.

no worries @blaz-r, I think I've fixed it already, let's wait for the test results
76dd186

blaz-r · 2025-01-08T12:34:02Z

Good @samet-akcay, that might fix it, but in case it doesn't I also opened a PR that also adds device= to that same lines in #2490 .

samet-akcay · 2025-01-08T13:05:56Z

Good @samet-akcay, that might fix it, but in case it doesn't I also opened a PR that also adds device= to that same lines in #2490 .

Looks like it is passing
https://github.com/openvinotoolkit/anomalib/actions/runs/12670086114/job/35308989508?pr=2465

blaz-r · 2025-01-08T13:21:39Z

Great! 😄

haimat · 2025-01-24T15:17:25Z

@samet-akcay Heyho, just wanted to ask for the current status of multi-GPU training?
Still working on it, any planned release date?

samet-akcay · 2025-01-24T16:34:28Z

You could try it with pip install anomalib==2.0.0b2

haimat · 2025-01-24T16:37:45Z

Thanks, that is great news - I will give it a try 👍
Do you have any docs on the breaking API changes?

haimat · 2025-01-27T11:26:15Z

@samet-akcay I tried to get beta 2 up and running, however, I am not sure how to update our workflow.
For example, neither the Folder nor the Engine classes accept the task argument any more.
How else do I specify the task?
Could you provide a short example of how a basic classificiation training with a given folder would look like in version 2?

ashwinvaidya17 added 3 commits November 25, 2024 17:01

Initial changes

12a82d3

Signed-off-by: Ashwin Vaidya <[email protected]>

stash

07e3a9a

Signed-off-by: Ashwin Vaidya <[email protected]>

fix video mask

5bd6b69

Signed-off-by: Ashwin Vaidya <[email protected]>

ashwinvaidya17 commented Nov 28, 2024

View reviewed changes

samet-akcay mentioned this pull request Dec 1, 2024

Set devices to 1 if multi-gpu is configured #2256

Closed

9 tasks

ashwinvaidya17 added 2 commits December 6, 2024 14:19

Merge branch 'feature/v2' into ashwin/multi_gpu

5750dad

Merge branch 'feature/v2' into ashwin/multi_gpu

6423a72

ashwinvaidya17 marked this pull request as ready for review December 6, 2024 13:20

ashwinvaidya17 requested review from samet-akcay and djdameln as code owners December 6, 2024 13:20

ashwinvaidya17 marked this pull request as draft December 6, 2024 15:23

fix remaining models

9be6eb9

Signed-off-by: Ashwin Vaidya <[email protected]>

ashwinvaidya17 changed the title ~~[WIP] Multi-GPU fixes~~ Multi-GPU fixes Dec 9, 2024

ashwinvaidya17 marked this pull request as ready for review December 9, 2024 15:12

samet-akcay approved these changes Dec 10, 2024

View reviewed changes

ashwinvaidya17 merged commit 8bd06a9 into openvinotoolkit:feature/v2 Dec 10, 2024
7 checks passed

samet-akcay changed the title ~~Multi-GPU fixes~~ 🚀 Add Multi-GPU Training Support Dec 10, 2024

samet-akcay mentioned this pull request Dec 10, 2024

📋 [TASK] Implement Multi-GPU Training Support #2258

Closed

11 tasks

ashwinvaidya17 added 2 commits December 10, 2024 11:30

Fix tests

9a64940

Signed-off-by: Ashwin Vaidya <[email protected]>

update docstrings

3fa92b3

Signed-off-by: Ashwin Vaidya <[email protected]>

samet-akcay mentioned this pull request Jan 20, 2025

[Bug]: AnomalyScoreThreshold is incompatible with multi-GPU training #1398

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 Add Multi-GPU Training Support #2435

🚀 Add Multi-GPU Training Support #2435

ashwinvaidya17 commented Nov 25, 2024 •

edited

Loading

ashwinvaidya17 Nov 28, 2024

codecov bot commented Dec 10, 2024

samet-akcay left a comment

haimat commented Dec 10, 2024

samet-akcay commented Dec 11, 2024

haimat commented Jan 8, 2025

samet-akcay commented Jan 8, 2025 •

edited

Loading

haimat commented Jan 8, 2025

blaz-r commented Jan 8, 2025

samet-akcay commented Jan 8, 2025 •

edited

Loading

blaz-r commented Jan 8, 2025

samet-akcay commented Jan 8, 2025

blaz-r commented Jan 8, 2025

haimat commented Jan 24, 2025

samet-akcay commented Jan 24, 2025

haimat commented Jan 24, 2025

haimat commented Jan 27, 2025

		@@ -600,17 +600,18 @@ def validate_gt_mask(mask: torch.Tensor \| None) -> Mask \| None:
		if mask is None:

🚀 Add Multi-GPU Training Support #2435

🚀 Add Multi-GPU Training Support #2435

Conversation

ashwinvaidya17 commented Nov 25, 2024 • edited Loading

📝 Description

Works

Not Working

✨ Changes

✅ Checklist

ashwinvaidya17 Nov 28, 2024

Choose a reason for hiding this comment

codecov bot commented Dec 10, 2024

Codecov Report

samet-akcay left a comment

Choose a reason for hiding this comment

haimat commented Dec 10, 2024

samet-akcay commented Dec 11, 2024

haimat commented Jan 8, 2025

samet-akcay commented Jan 8, 2025 • edited Loading

haimat commented Jan 8, 2025

blaz-r commented Jan 8, 2025

samet-akcay commented Jan 8, 2025 • edited Loading

blaz-r commented Jan 8, 2025

samet-akcay commented Jan 8, 2025

blaz-r commented Jan 8, 2025

haimat commented Jan 24, 2025

samet-akcay commented Jan 24, 2025

haimat commented Jan 24, 2025

haimat commented Jan 27, 2025

ashwinvaidya17 commented Nov 25, 2024 •

edited

Loading

samet-akcay commented Jan 8, 2025 •

edited

Loading

samet-akcay commented Jan 8, 2025 •

edited

Loading