-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Kinetics instead of Kinetics400 in references (#5787) #5952
Conversation
@@ -31,7 +30,6 @@ class VideoClassificationPresetEval: | |||
def __init__(self, *, crop_size, resize_size, mean=(0.43216, 0.394666, 0.37645), std=(0.22803, 0.22145, 0.216989)): | |||
self.transforms = transforms.Compose( | |||
[ | |||
ConvertBHWCtoBCHW(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can we simply remove this? It seems something is off here. The old Kinetics400
states
vision/torchvision/datasets/kinetics.py
Line 292 in 970ba35
- video (Tensor[T, H, W, C]): the `T` video frames |
whereas the new Kinetics
states
vision/torchvision/datasets/kinetics.py
Line 69 in 970ba35
- video (Tensor[T, C, H, W]): the `T` video frames in torch.uint8 tensor |
Was this intentional or did we botch this during the introduction of the new class? In any way, we need to adapt our deprecation warnings accordingly:
vision/torchvision/datasets/kinetics.py
Lines 252 to 254 in 970ba35
.. warning:: | |
This class was deprecated in ``0.12`` and will be removed in ``0.14``. Please use | |
``Kinetics(..., num_classes='400')`` instead. |
vision/torchvision/datasets/kinetics.py
Lines 308 to 311 in 970ba35
warnings.warn( | |
"The Kinetics400 class is deprecated since 0.12 and will be removed in 0.14." | |
"Please use Kinetics(..., num_classes='400') instead." | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was intentional -- the first iteration was convenient because that's the raw output from FFMPEG/pyav, but tv ops like transforms and conv operators work on Tensor[T, C, H, W] (as you've seen you'd have to have a transform to deal with this). The change makes sense, and I'll update the deprication warning, how does that sound?
dataset = torchvision.datasets.Kinetics400( | ||
traindir, | ||
dataset = torchvision.datasets.Kinetics( | ||
args.data_path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is done, because we can now select the split through the dataset class rather than using a different root directory, correct? If yes, can we remove the traindir
and valdir
variables above and from the CLI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup; and done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmeier @bjuncek I checked on the servers, I believe there is a reason why these parameter are there. The datasets contain the training/val data at different preprocessed sizes. Example:
/datasets01/kinetics/070618/400$ ls
README list list_cvt scripts test train train_avi-160p train_avi-288p val val_avi-160p val_avi-288p
Can we please restore this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with @datumbox :
the different pre-processed size don't seem to be part of the original downloadable dataset. Looks more like an internal pre-processing that was made by someone.
IF someone was relying on this originally, they should have gotten deprecation warning telling them to use Kinetics, and they should have noticed that this wasn't supported anymore, which they should have told us.
This never happened, so we can assume this support for the different sizes isn't actually needed. If it is, we'll know soon enough.
So we're OK to remove that.
torchvision/io/video.py
Outdated
else: | ||
warnings.warn( | ||
"The pts_unit 'pts' gives wrong results and will be removed in a " | ||
+ "follow-up version. Please use pts_unit 'sec'." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this warning, which seems to do the right thing, can't we change the reference script to not use "pts"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really; or rather, it used to. The issue is that the backend (both pyav and VideoReader) consume data in pts
which is unfortunate as they are not standardised per stream (e.g. audio and video have different timebases so pts=24 means different things). In a PR some time ago, we've tried to fix this by introducing sec
as a default unit that would then be converted to appropriate stream. This caused an imprecision issue (bc thigs would be incorrectly rounded) which broke all datasets bc VideoClips creation requires fairly precise pts and such rounding erros cause major issues. We've initially fixed these by using pts
as a default in VideoClips (which raises this warrning). This would be fine (i.e. I'd keep the warning and try to re-work the video clips) if all else was equal, but the PR mentioned above was reverted for internal issues, so this warrning is not really relevant anymore.
Having said that, I have a major PR with a VideoReading re-work in plan that will make this redundant in a way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds reasonable, but I have to little background to "sign this off". Who else from the team would be able to judge this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prabhat00155 most likely, as he was the one who originally implemented and then reverted the sync PR.
But we can also keep this open as I'm bound to finish the rework by the end of next week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with @datumbox
We'll put this warning back. There should only be 2 reasons to remove this warning:
- fix the actual bug
- completely forbid "pts".
@bjuncek @pmeier We need to speed up the work on this as we are about to cut the branch for the release. If more work is needed, I would just address the deprecation warning on a different PR. Concerning the removal of vision/torchvision/transforms/_presets.py Lines 96 to 102 in 769ae13
|
Hi @datumbox , In summary:
|
@bjuncek I'm OK with removing the deprecation warning until we have a proper video module to recommend. See #6056 Concerning the presets, the problem is that the removal of the permutation affects how our users use the existing models to perform inference. This is a big problem and we need to figure it out prior to the release. Else we will be looking at a BC breaking change on the near future. Effectively we need to clarify what's the order of dimensions that the preprocessing (preset) transforms accept and stick to it. |
Ok, that makes sense. |
My current understanding (could be wrong) is that the permutation should not be removed because For more info on using |
Geometric transforms, like Resize, should assume the input data of shape |
So, you are right, the If we want to prevent BC, we can simply comment out the lines referenced above, and return the transform to the reference. |
@datumbox The "BC break", i.e. the changed dimension order, happened in #3680 and more specifically after #3680 (comment) almost a year ago. |
Thanks for the info. I understand the reasons why we wanted to switch to To fix things prior the release, I think we need to:
Please let me know if this makes sense to you. |
parser.add_argument("--train-dir", default="train_avi-480p", type=str, help="name of train dir") | ||
parser.add_argument("--val-dir", default="val_avi-480p", type=str, help="name of val dir") | ||
parser.add_argument( | ||
"--kinetics-version", default="400", type=str, choices=["400", "600"], help="Select kinetics version" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to update the commands shown at README.md.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all. I've had a sync with @datumbox and I'll proceed to a few changes (listed below), and merge the PR.
I'll also follow up with a PR that adds a new format=...
parameter to relevant datasets and video readers.
dataset = torchvision.datasets.Kinetics400( | ||
traindir, | ||
dataset = torchvision.datasets.Kinetics( | ||
args.data_path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with @datumbox :
the different pre-processed size don't seem to be part of the original downloadable dataset. Looks more like an internal pre-processing that was made by someone.
IF someone was relying on this originally, they should have gotten deprecation warning telling them to use Kinetics, and they should have noticed that this wasn't supported anymore, which they should have told us.
This never happened, so we can assume this support for the different sizes isn't actually needed. If it is, we'll know soon enough.
So we're OK to remove that.
torchvision/io/video.py
Outdated
else: | ||
warnings.warn( | ||
"The pts_unit 'pts' gives wrong results and will be removed in a " | ||
+ "follow-up version. Please use pts_unit 'sec'." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with @datumbox
We'll put this warning back. There should only be 2 reasons to remove this warning:
- fix the actual bug
- completely forbid "pts".
…5952) Summary: * Dataset creation now supports "new" version of Kinetics dataset * remove unnecessary warning for now * provide kinetics option * new reading somehow doesn't need BHWC to BCHW transform * Addressing minor comments * Adding kinetics deprication warning for the old Kinetics400 class * lint error * Update torchvision/datasets/kinetics.py * Updating README * Remove BHWC to BCHW * Put warning back * formatting Reviewed By: NicolasHug Differential Revision: D36760942 fbshipit-source-id: b7c642b263f981247593511d3a8f667a9f1b963b Co-authored-by: Nicolas Hug <[email protected]> Co-authored-by: Bruno Korbar <[email protected]> Co-authored-by: Vasilis Vryniotis <[email protected]> Co-authored-by: Nicolas Hug <[email protected]>
See #5787