Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Kinetics instead of Kinetics400 in references (#5787) #5952

Merged
merged 15 commits into from
May 20, 2022

Conversation

bjuncek
Copy link
Contributor

@bjuncek bjuncek commented May 5, 2022

See #5787

@@ -31,7 +30,6 @@ class VideoClassificationPresetEval:
def __init__(self, *, crop_size, resize_size, mean=(0.43216, 0.394666, 0.37645), std=(0.22803, 0.22145, 0.216989)):
self.transforms = transforms.Compose(
[
ConvertBHWCtoBCHW(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we simply remove this? It seems something is off here. The old Kinetics400 states

- video (Tensor[T, H, W, C]): the `T` video frames

whereas the new Kinetics states

- video (Tensor[T, C, H, W]): the `T` video frames in torch.uint8 tensor

Was this intentional or did we botch this during the introduction of the new class? In any way, we need to adapt our deprecation warnings accordingly:

.. warning::
This class was deprecated in ``0.12`` and will be removed in ``0.14``. Please use
``Kinetics(..., num_classes='400')`` instead.

warnings.warn(
"The Kinetics400 class is deprecated since 0.12 and will be removed in 0.14."
"Please use Kinetics(..., num_classes='400') instead."
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional -- the first iteration was convenient because that's the raw output from FFMPEG/pyav, but tv ops like transforms and conv operators work on Tensor[T, C, H, W] (as you've seen you'd have to have a transform to deal with this). The change makes sense, and I'll update the deprication warning, how does that sound?

dataset = torchvision.datasets.Kinetics400(
traindir,
dataset = torchvision.datasets.Kinetics(
args.data_path,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is done, because we can now select the split through the dataset class rather than using a different root directory, correct? If yes, can we remove the traindir and valdir variables above and from the CLI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup; and done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmeier @bjuncek I checked on the servers, I believe there is a reason why these parameter are there. The datasets contain the training/val data at different preprocessed sizes. Example:

/datasets01/kinetics/070618/400$ ls 
README  list  list_cvt  scripts  test  train  train_avi-160p  train_avi-288p  val  val_avi-160p  val_avi-288p

Can we please restore this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @datumbox :

the different pre-processed size don't seem to be part of the original downloadable dataset. Looks more like an internal pre-processing that was made by someone.

IF someone was relying on this originally, they should have gotten deprecation warning telling them to use Kinetics, and they should have noticed that this wasn't supported anymore, which they should have told us.

This never happened, so we can assume this support for the different sizes isn't actually needed. If it is, we'll know soon enough.

So we're OK to remove that.

Comment on lines 159 to 163
else:
warnings.warn(
"The pts_unit 'pts' gives wrong results and will be removed in a "
+ "follow-up version. Please use pts_unit 'sec'."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this warning, which seems to do the right thing, can't we change the reference script to not use "pts"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really; or rather, it used to. The issue is that the backend (both pyav and VideoReader) consume data in pts which is unfortunate as they are not standardised per stream (e.g. audio and video have different timebases so pts=24 means different things). In a PR some time ago, we've tried to fix this by introducing sec as a default unit that would then be converted to appropriate stream. This caused an imprecision issue (bc thigs would be incorrectly rounded) which broke all datasets bc VideoClips creation requires fairly precise pts and such rounding erros cause major issues. We've initially fixed these by using pts as a default in VideoClips (which raises this warrning). This would be fine (i.e. I'd keep the warning and try to re-work the video clips) if all else was equal, but the PR mentioned above was reverted for internal issues, so this warrning is not really relevant anymore.

Having said that, I have a major PR with a VideoReading re-work in plan that will make this redundant in a way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds reasonable, but I have to little background to "sign this off". Who else from the team would be able to judge this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prabhat00155 most likely, as he was the one who originally implemented and then reverted the sync PR.
But we can also keep this open as I'm bound to finish the rework by the end of next week

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @datumbox

We'll put this warning back. There should only be 2 reasons to remove this warning:

  • fix the actual bug
  • completely forbid "pts".

@datumbox
Copy link
Contributor

@bjuncek @pmeier We need to speed up the work on this as we are about to cut the branch for the release. If more work is needed, I would just address the deprecation warning on a different PR.

Concerning the removal of ConvertBHWCtoBCHW, we need to be extra careful with this because it affects how people are doing inference and use their models. See:

def forward(self, vid: Tensor) -> Tensor:
need_squeeze = False
if vid.ndim < 5:
vid = vid.unsqueeze(dim=0)
need_squeeze = True
vid = vid.permute(0, 1, 4, 2, 3) # (N, T, H, W, C) => (N, T, C, H, W)

@bjuncek
Copy link
Contributor Author

bjuncek commented May 20, 2022

Hi @datumbox ,
So what is your proposal for this?

In summary:

  1. Deprication warning can be straight-up removed as it's pointless at the moment. The goal is either to make it less pointless (by re-writing a solid chunk of videoIO), or to have a status quo where users can use whatever they like.
  2. Addressing deprecation of Kinetics, by extension means removing of the BHWC to BCHW transform. I feel as if adding deprecation warning should be plenty, and we can liaison with whomever is in charge of transforms (Victor?) to figure out the best way to handle this on the preset level. I don't understand enough about _presets to have an opinion on it.

@datumbox
Copy link
Contributor

datumbox commented May 20, 2022

@bjuncek I'm OK with removing the deprecation warning until we have a proper video module to recommend. See #6056

Concerning the presets, the problem is that the removal of the permutation affects how our users use the existing models to perform inference. This is a big problem and we need to figure it out prior to the release. Else we will be looking at a BC breaking change on the near future. Effectively we need to clarify what's the order of dimensions that the preprocessing (preset) transforms accept and stick to it.

cc @pmeier @vfdev-5 @NicolasHug

@bjuncek
Copy link
Contributor Author

bjuncek commented May 20, 2022

Ok, that makes sense.
So in the short term, should I just add the parameter for the output format to Kinetics class, and set it to legacy by default and add a warning that it's about to be changed?

@datumbox
Copy link
Contributor

My current understanding (could be wrong) is that the permutation should not be removed because read_video() returns the output as (T, H, W, C) so the permutation is necessary and should remain in place. Correct me if I'm wrong, but if read_video uses this format then this is no longer about changing the order inside the Kinetics but rather changing the format everywhere in the library. That's a very big discussion to be had so close to the release with major BC-breaking implications.

For more info on using read_video with the new presets see the example at the docs.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented May 20, 2022

Geometric transforms, like Resize, should assume the input data of shape [..., H, W] and for Normalize it should be (..., C, H, W). So we need to adapt input data format before feeding it into transformations and finally readapt it for the model's expected format.

@bjuncek
Copy link
Contributor Author

bjuncek commented May 20, 2022

So, you are right, the read_video outputs (t, h, w, c).
In new Kinetics class, we introduce permutation to (t, c, h, w) in order to save the users of going to the trouble of permuting stuff every time they get a sample (see here), which is correctly documented in the Kinetics class and has nothing to do with read_video.

If we want to prevent BC, we can simply comment out the lines referenced above, and return the transform to the reference.

@pmeier
Copy link
Collaborator

pmeier commented May 20, 2022

@datumbox The "BC break", i.e. the changed dimension order, happened in #3680 and more specifically after #3680 (comment) almost a year ago.

@datumbox
Copy link
Contributor

datumbox commented May 20, 2022

Thanks for the info.

I understand the reasons why we wanted to switch to (t, c, h, w) but we could have done better work when we introduced the BC breaking change by ensuring that all usages/references/documentation/examples in the codebase are updated.

To fix things prior the release, I think we need to:

  1. Update the references to adopt the new Kinetics (this PR)
  2. Remove the (N, T, H, W, C) => (N, T, C, H, W) conversion from presets and update all the code examples to do permute the output of read_video immediately after reading it (PR Remove (N, T, H, W, C) => (N, T, C, H, W) from presets #6058)
  3. Verify the accuracy of the pre-trained models is the expected one using the reference scripts + the presets

Please let me know if this makes sense to you.

parser.add_argument("--train-dir", default="train_avi-480p", type=str, help="name of train dir")
parser.add_argument("--val-dir", default="val_avi-480p", type=str, help="name of val dir")
parser.add_argument(
"--kinetics-version", default="400", type=str, choices=["400", "600"], help="Select kinetics version"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to update the commands shown at README.md.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all. I've had a sync with @datumbox and I'll proceed to a few changes (listed below), and merge the PR.

I'll also follow up with a PR that adds a new format=... parameter to relevant datasets and video readers.

dataset = torchvision.datasets.Kinetics400(
traindir,
dataset = torchvision.datasets.Kinetics(
args.data_path,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @datumbox :

the different pre-processed size don't seem to be part of the original downloadable dataset. Looks more like an internal pre-processing that was made by someone.

IF someone was relying on this originally, they should have gotten deprecation warning telling them to use Kinetics, and they should have noticed that this wasn't supported anymore, which they should have told us.

This never happened, so we can assume this support for the different sizes isn't actually needed. If it is, we'll know soon enough.

So we're OK to remove that.

Comment on lines 159 to 163
else:
warnings.warn(
"The pts_unit 'pts' gives wrong results and will be removed in a "
+ "follow-up version. Please use pts_unit 'sec'."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @datumbox

We'll put this warning back. There should only be 2 reasons to remove this warning:

  • fix the actual bug
  • completely forbid "pts".

@NicolasHug NicolasHug changed the title Addressing deprecation wranings in Kinetics (#5787) Use Kinetics instead of Kinetics400 in references (#5787) May 20, 2022
@NicolasHug NicolasHug merged commit b969cca into pytorch:main May 20, 2022
facebook-github-bot pushed a commit that referenced this pull request Jun 1, 2022
…5952)

Summary:
* Dataset creation now supports "new" version of Kinetics dataset

* remove unnecessary warning for now

* provide kinetics option

* new reading somehow doesn't need BHWC to BCHW transform

* Addressing minor comments

* Adding kinetics deprication warning for the old Kinetics400 class

* lint error

* Update torchvision/datasets/kinetics.py

* Updating README

* Remove BHWC to BCHW

* Put warning back

* formatting

Reviewed By: NicolasHug

Differential Revision: D36760942

fbshipit-source-id: b7c642b263f981247593511d3a8f667a9f1b963b

Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Bruno Korbar <[email protected]>
Co-authored-by: Vasilis Vryniotis <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants