Use Kinetics instead of Kinetics400 in references (#5787) #5952

bjuncek · 2022-05-05T11:02:00Z

references/video_classification/train.py

pmeier · 2022-05-05T12:39:05Z

references/video_classification/presets.py

@@ -31,7 +30,6 @@ class VideoClassificationPresetEval:
    def __init__(self, *, crop_size, resize_size, mean=(0.43216, 0.394666, 0.37645), std=(0.22803, 0.22145, 0.216989)):
        self.transforms = transforms.Compose(
            [
-                ConvertBHWCtoBCHW(),


Why can we simply remove this? It seems something is off here. The old Kinetics400 states

vision/torchvision/datasets/kinetics.py

Line 292 in 970ba35

- video (Tensor[T, H, W, C]): the `T` video frames

whereas the new Kinetics states

vision/torchvision/datasets/kinetics.py

Line 69 in 970ba35

- video (Tensor[T, C, H, W]): the `T` video frames in torch.uint8 tensor

Was this intentional or did we botch this during the introduction of the new class? In any way, we need to adapt our deprecation warnings accordingly:

vision/torchvision/datasets/kinetics.py

Lines 252 to 254 in 970ba35

.. warning::

This class was deprecated in ``0.12`` and will be removed in ``0.14``. Please use

``Kinetics(..., num_classes='400')`` instead.

vision/torchvision/datasets/kinetics.py

Lines 308 to 311 in 970ba35

warnings.warn(

"The Kinetics400 class is deprecated since 0.12 and will be removed in 0.14."

"Please use Kinetics(..., num_classes='400') instead."

)

This was intentional -- the first iteration was convenient because that's the raw output from FFMPEG/pyav, but tv ops like transforms and conv operators work on Tensor[T, C, H, W] (as you've seen you'd have to have a transform to deal with this). The change makes sense, and I'll update the deprication warning, how does that sound?

pmeier · 2022-05-05T12:40:50Z

references/video_classification/train.py

-        dataset = torchvision.datasets.Kinetics400(
-            traindir,
+        dataset = torchvision.datasets.Kinetics(
+            args.data_path,


I'm guessing this is done, because we can now select the split through the dataset class rather than using a different root directory, correct? If yes, can we remove the traindir and valdir variables above and from the CLI?

Yup; and done

@pmeier @bjuncek I checked on the servers, I believe there is a reason why these parameter are there. The datasets contain the training/val data at different preprocessed sizes. Example:

/datasets01/kinetics/070618/400$ ls README list list_cvt scripts test train train_avi-160p train_avi-288p val val_avi-160p val_avi-288p

Can we please restore this?

As discussed with @datumbox :

the different pre-processed size don't seem to be part of the original downloadable dataset. Looks more like an internal pre-processing that was made by someone.

IF someone was relying on this originally, they should have gotten deprecation warning telling them to use Kinetics, and they should have noticed that this wasn't supported anymore, which they should have told us.

This never happened, so we can assume this support for the different sizes isn't actually needed. If it is, we'll know soon enough.

So we're OK to remove that.

pmeier · 2022-05-05T12:42:58Z

torchvision/io/video.py

-    else:
-        warnings.warn(
-            "The pts_unit 'pts' gives wrong results and will be removed in a "
-            + "follow-up version. Please use pts_unit 'sec'."
-        )


Instead of this warning, which seems to do the right thing, can't we change the reference script to not use "pts"?

Not really; or rather, it used to. The issue is that the backend (both pyav and VideoReader) consume data in pts which is unfortunate as they are not standardised per stream (e.g. audio and video have different timebases so pts=24 means different things). In a PR some time ago, we've tried to fix this by introducing sec as a default unit that would then be converted to appropriate stream. This caused an imprecision issue (bc thigs would be incorrectly rounded) which broke all datasets bc VideoClips creation requires fairly precise pts and such rounding erros cause major issues. We've initially fixed these by using pts as a default in VideoClips (which raises this warrning). This would be fine (i.e. I'd keep the warning and try to re-work the video clips) if all else was equal, but the PR mentioned above was reverted for internal issues, so this warrning is not really relevant anymore.

Having said that, I have a major PR with a VideoReading re-work in plan that will make this redundant in a way.

This sounds reasonable, but I have to little background to "sign this off". Who else from the team would be able to judge this?

@prabhat00155 most likely, as he was the one who originally implemented and then reverted the sync PR.
But we can also keep this open as I'm bound to finish the rework by the end of next week

As discussed with @datumbox

We'll put this warning back. There should only be 2 reasons to remove this warning:

fix the actual bug

completely forbid "pts".

datumbox · 2022-05-20T09:34:33Z

@bjuncek @pmeier We need to speed up the work on this as we are about to cut the branch for the release. If more work is needed, I would just address the deprecation warning on a different PR.

Concerning the removal of ConvertBHWCtoBCHW, we need to be extra careful with this because it affects how people are doing inference and use their models. See:

vision/torchvision/transforms/_presets.py

Lines 96 to 102 in 769ae13

    
           def forward(self, vid: Tensor) -> Tensor: 
        
               need_squeeze = False 
        
               if vid.ndim < 5: 
        
                   vid = vid.unsqueeze(dim=0) 
        
                   need_squeeze = True 
        
               vid = vid.permute(0, 1, 4, 2, 3)  # (N, T, H, W, C) => (N, T, C, H, W)

bjuncek · 2022-05-20T10:01:16Z

Hi @datumbox ,
So what is your proposal for this?

In summary:

Deprication warning can be straight-up removed as it's pointless at the moment. The goal is either to make it less pointless (by re-writing a solid chunk of videoIO), or to have a status quo where users can use whatever they like.
Addressing deprecation of Kinetics, by extension means removing of the BHWC to BCHW transform. I feel as if adding deprecation warning should be plenty, and we can liaison with whomever is in charge of transforms (Victor?) to figure out the best way to handle this on the preset level. I don't understand enough about _presets to have an opinion on it.

datumbox · 2022-05-20T10:23:20Z

@bjuncek I'm OK with removing the deprecation warning until we have a proper video module to recommend. See #6056

Concerning the presets, the problem is that the removal of the permutation affects how our users use the existing models to perform inference. This is a big problem and we need to figure it out prior to the release. Else we will be looking at a BC breaking change on the near future. Effectively we need to clarify what's the order of dimensions that the preprocessing (preset) transforms accept and stick to it.

cc @pmeier @vfdev-5 @NicolasHug

bjuncek · 2022-05-20T10:40:21Z

Ok, that makes sense.
So in the short term, should I just add the parameter for the output format to Kinetics class, and set it to legacy by default and add a warning that it's about to be changed?

datumbox · 2022-05-20T10:50:49Z

My current understanding (could be wrong) is that the permutation should not be removed because read_video() returns the output as (T, H, W, C) so the permutation is necessary and should remain in place. Correct me if I'm wrong, but if read_video uses this format then this is no longer about changing the order inside the Kinetics but rather changing the format everywhere in the library. That's a very big discussion to be had so close to the release with major BC-breaking implications.

For more info on using read_video with the new presets see the example at the docs.

vfdev-5 · 2022-05-20T10:55:51Z

Geometric transforms, like Resize, should assume the input data of shape [..., H, W] and for Normalize it should be (..., C, H, W). So we need to adapt input data format before feeding it into transformations and finally readapt it for the model's expected format.

bjuncek · 2022-05-20T11:06:09Z

So, you are right, the read_video outputs (t, h, w, c).
In new Kinetics class, we introduce permutation to (t, c, h, w) in order to save the users of going to the trouble of permuting stuff every time they get a sample (see here), which is correctly documented in the Kinetics class and has nothing to do with read_video.

If we want to prevent BC, we can simply comment out the lines referenced above, and return the transform to the reference.

pmeier · 2022-05-20T11:13:04Z

@datumbox The "BC break", i.e. the changed dimension order, happened in #3680 and more specifically after #3680 (comment) almost a year ago.

datumbox · 2022-05-20T11:28:53Z

Thanks for the info.

I understand the reasons why we wanted to switch to (t, c, h, w) but we could have done better work when we introduced the BC breaking change by ensuring that all usages/references/documentation/examples in the codebase are updated.

To fix things prior the release, I think we need to:

Update the references to adopt the new Kinetics (this PR)
Remove the (N, T, H, W, C) => (N, T, C, H, W) conversion from presets and update all the code examples to do permute the output of read_video immediately after reading it (PR Remove (N, T, H, W, C) => (N, T, C, H, W) from presets #6058)
Verify the accuracy of the pre-trained models is the expected one using the reference scripts + the presets

Please let me know if this makes sense to you.

datumbox · 2022-05-20T11:31:44Z

references/video_classification/train.py

-    parser.add_argument("--train-dir", default="train_avi-480p", type=str, help="name of train dir")
-    parser.add_argument("--val-dir", default="val_avi-480p", type=str, help="name of val dir")
+    parser.add_argument(
+        "--kinetics-version", default="400", type=str, choices=["400", "600"], help="Select kinetics version"


You need to update the commands shown at README.md.

torchvision/datasets/kinetics.py

Co-authored-by: Nicolas Hug <[email protected]>

NicolasHug

Thanks all. I've had a sync with @datumbox and I'll proceed to a few changes (listed below), and merge the PR.

I'll also follow up with a PR that adds a new format=... parameter to relevant datasets and video readers.

NicolasHug · 2022-05-20T15:33:59Z

references/video_classification/train.py

-        dataset = torchvision.datasets.Kinetics400(
-            traindir,
+        dataset = torchvision.datasets.Kinetics(
+            args.data_path,


As discussed with @datumbox :

the different pre-processed size don't seem to be part of the original downloadable dataset. Looks more like an internal pre-processing that was made by someone.

IF someone was relying on this originally, they should have gotten deprecation warning telling them to use Kinetics, and they should have noticed that this wasn't supported anymore, which they should have told us.

This never happened, so we can assume this support for the different sizes isn't actually needed. If it is, we'll know soon enough.

So we're OK to remove that.

NicolasHug · 2022-05-20T15:35:19Z

torchvision/io/video.py

-    else:
-        warnings.warn(
-            "The pts_unit 'pts' gives wrong results and will be removed in a "
-            + "follow-up version. Please use pts_unit 'sec'."
-        )


As discussed with @datumbox

We'll put this warning back. There should only be 2 reasons to remove this warning:

fix the actual bug

completely forbid "pts".

…5952) Summary: * Dataset creation now supports "new" version of Kinetics dataset * remove unnecessary warning for now * provide kinetics option * new reading somehow doesn't need BHWC to BCHW transform * Addressing minor comments * Adding kinetics deprication warning for the old Kinetics400 class * lint error * Update torchvision/datasets/kinetics.py * Updating README * Remove BHWC to BCHW * Put warning back * formatting Reviewed By: NicolasHug Differential Revision: D36760942 fbshipit-source-id: b7c642b263f981247593511d3a8f667a9f1b963b Co-authored-by: Nicolas Hug <[email protected]> Co-authored-by: Bruno Korbar <[email protected]> Co-authored-by: Vasilis Vryniotis <[email protected]> Co-authored-by: Nicolas Hug <[email protected]>

Bruno Korbar added 4 commits May 5, 2022 10:44

Dataset creation now supports "new" version of Kinetics dataset

a98c056

remove unnecessary warning for now

c5c36d8

provide kinetics option

2ee355a

new reading somehow doesn't need BHWC to BCHW transform

5f90f09

facebook-github-bot added the cla signed label May 5, 2022

pmeier reviewed May 5, 2022

View reviewed changes

Bruno Korbar and others added 4 commits May 12, 2022 12:14

Addressing minor comments

5559593

Adding kinetics deprication warning for the old Kinetics400 class

01b0d9b

lint error

a84b807

Merge branch 'main' into bkorbar/5787kineticstrain

6d2f7f8

Merge branch 'main' into bkorbar/5787kineticstrain

9d76b18

datumbox reviewed May 20, 2022

View reviewed changes

This was referenced May 20, 2022

Remove deprecation warning for pts on read_video #6056

Closed

Remove (N, T, H, W, C) => (N, T, C, H, W) from presets #6058

Merged

NicolasHug reviewed May 20, 2022

View reviewed changes

torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved

bjuncek and others added 4 commits May 20, 2022 14:57

Update torchvision/datasets/kinetics.py

2d2841f

Co-authored-by: Nicolas Hug <[email protected]>

Updating README

e8401c1

Merge branch 'main' into bkorbar/5787kineticstrain

11f365d

Remove BHWC to BCHW

8c05f0f

NicolasHug reviewed May 20, 2022

View reviewed changes

Put warning back

ab29225

NicolasHug approved these changes May 20, 2022

View reviewed changes

formatting

af4e30c

NicolasHug changed the title ~~Addressing deprecation wranings in Kinetics (#5787)~~ Use Kinetics instead of Kinetics400 in references (#5787) May 20, 2022

NicolasHug added module: datasets module: reference scripts code quality labels May 20, 2022

NicolasHug merged commit b969cca into pytorch:main May 20, 2022

NicolasHug mentioned this pull request May 20, 2022

Add output_format do video datasets and readers #6061

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Kinetics instead of Kinetics400 in references (#5787) #5952

Use Kinetics instead of Kinetics400 in references (#5787) #5952

bjuncek commented May 5, 2022

pmeier May 5, 2022

bjuncek May 12, 2022

pmeier May 5, 2022

bjuncek May 12, 2022

datumbox May 20, 2022

NicolasHug May 20, 2022

pmeier May 5, 2022

bjuncek May 12, 2022

pmeier May 12, 2022

bjuncek May 13, 2022

NicolasHug May 20, 2022

datumbox commented May 20, 2022

bjuncek commented May 20, 2022

datumbox commented May 20, 2022 •

edited

Loading

bjuncek commented May 20, 2022

datumbox commented May 20, 2022

vfdev-5 commented May 20, 2022

bjuncek commented May 20, 2022 •

edited

Loading

pmeier commented May 20, 2022

datumbox commented May 20, 2022 •

edited

Loading

datumbox May 20, 2022

bjuncek May 20, 2022

NicolasHug left a comment

NicolasHug May 20, 2022

NicolasHug May 20, 2022

	.. warning::
	This class was deprecated in ``0.12`` and will be removed in ``0.14``. Please use
	``Kinetics(..., num_classes='400')`` instead.

	warnings.warn(
	"The Kinetics400 class is deprecated since 0.12 and will be removed in 0.14."
	"Please use Kinetics(..., num_classes='400') instead."
	)

Use Kinetics instead of Kinetics400 in references (#5787) #5952

Use Kinetics instead of Kinetics400 in references (#5787) #5952

Conversation

bjuncek commented May 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datumbox commented May 20, 2022

bjuncek commented May 20, 2022

datumbox commented May 20, 2022 • edited Loading

bjuncek commented May 20, 2022

datumbox commented May 20, 2022

vfdev-5 commented May 20, 2022

bjuncek commented May 20, 2022 • edited Loading

pmeier commented May 20, 2022

datumbox commented May 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datumbox commented May 20, 2022 •

edited

Loading

bjuncek commented May 20, 2022 •

edited

Loading

datumbox commented May 20, 2022 •

edited

Loading