Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download and Kinetics 400/600/700 Datasets #3680

Merged
merged 72 commits into from
Jun 10, 2021
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
5e60e01
Initial commit
bjuncek Apr 16, 2021
62c5e05
pmeiers comments
bjuncek Apr 20, 2021
090e526
pmeiers changes
bjuncek Apr 20, 2021
9403bcf
pmeiers comments
bjuncek Apr 20, 2021
e08cf09
replace pandas with system library to avoid crashes
bjuncek Apr 22, 2021
29a4f03
Lint
bjuncek Apr 22, 2021
8cd5209
Lint
bjuncek Apr 22, 2021
a6d2490
fixing unittest
bjuncek Apr 22, 2021
c2076a7
Merge branch 'master' into bkorbar/datasets/kinetics
bjuncek Apr 22, 2021
93d1444
Minor comments removal
bjuncek Apr 22, 2021
9e3d3f3
pmeier comments
bjuncek Apr 30, 2021
139ec6d
remove asserts
bjuncek Apr 30, 2021
33a9f98
address pmeier formatting changes
bjuncek Apr 30, 2021
abdd2f6
address pmeier changes
bjuncek Apr 30, 2021
1460886
pmeier changes
bjuncek Apr 30, 2021
7b06906
rename n_classes to num_classes
bjuncek Apr 30, 2021
62f7fd5
Merge branch 'bkorbar/datasets/kinetics' of github.com:bjuncek/vision…
bjuncek Apr 30, 2021
e76f4ab
formatting changes
bjuncek Apr 30, 2021
0a8f216
doc change to add ".mp4" to backported class
bjuncek Apr 30, 2021
94a40aa
formatting to correct line length
bjuncek Apr 30, 2021
c585a5f
adding **kwargs to Kinetics400 class
bjuncek Apr 30, 2021
8cacd80
remove urlib request and download the file directly
bjuncek Apr 30, 2021
802f8f9
annotations and files can be already downloaded
bjuncek Apr 30, 2021
af70e5f
test fix
bjuncek May 4, 2021
6ec3253
add download tests for Kinetics
pmeier May 4, 2021
b84b298
users now dont need to provide full path within the root for new Kine…
bjuncek May 4, 2021
adbc2f8
Merge branch 'master' into bkorbar/datasets/kinetics
bjuncek May 4, 2021
d7f14d0
linter
bjuncek May 4, 2021
34c4323
Merge branch 'bkorbar/datasets/kinetics' of github.com:bjuncek/vision…
bjuncek May 4, 2021
96e2bec
Update test/test_datasets_download.py
pmeier May 5, 2021
20dc75d
Update torchvision/datasets/kinetics.py
bjuncek May 5, 2021
5ea1232
revert whitespace (3680#discussion_r626382842)
bjuncek May 5, 2021
607a3cb
addressing annotation_path parameter which is unnecessary
bjuncek May 5, 2021
da586c6
Update torchvision/datasets/kinetics.py
bjuncek May 5, 2021
23cc7f3
Merge branch 'bkorbar/datasets/kinetics' of github.com:bjuncek/vision…
bjuncek May 5, 2021
fd2208b
Update torchvision/datasets/kinetics.py
bjuncek May 5, 2021
0dc04d3
kwargs update
bjuncek May 5, 2021
d645f93
Merge branch 'bkorbar/datasets/kinetics' of github.com:bjuncek/vision…
bjuncek May 5, 2021
2bdd820
expose num_download_workers as public
bjuncek May 5, 2021
5640dd9
swap os.isfile with check_integrity
bjuncek May 5, 2021
9ef70da
nit on private things
bjuncek May 5, 2021
b7b81b1
special case if there are no default arguments
bjuncek May 5, 2021
36bd2c7
revert changes to kinetics400 test case for BC
bjuncek May 5, 2021
2bda79c
add split_folder changes and support for legacy format
bjuncek May 5, 2021
1a7a978
pmeiers suggestions
bjuncek May 11, 2021
89e41e6
pmeiers suggestions - root comment
bjuncek May 11, 2021
5941dab
pmeiers comments - annotation attribute remmoved
bjuncek May 11, 2021
72d260a
pmeiers suggestion
bjuncek May 11, 2021
51231cf
pmeiers suggestion
bjuncek May 11, 2021
7b91bbe
pmeiers suggestion
bjuncek May 11, 2021
cd2e55a
pmeiers suggestion
bjuncek May 11, 2021
7b322e9
Update torchvision/datasets/kinetics.py
bjuncek May 11, 2021
328c84e
Update torchvision/datasets/kinetics.py
bjuncek May 11, 2021
22e5d48
Update torchvision/datasets/kinetics.py
bjuncek May 11, 2021
173d385
Update torchvision/datasets/kinetics.py
bjuncek May 11, 2021
5a7db27
Update torchvision/datasets/kinetics.py
bjuncek May 11, 2021
44030ee
Update torchvision/datasets/kinetics.py
bjuncek May 11, 2021
ce5f80b
minor debugging
bjuncek May 11, 2021
803bab1
nit picks
bjuncek May 11, 2021
6e64bb6
only include public kwargs into defaults
pmeier May 12, 2021
8b64d1d
add _use_legacy_structure in favour of **kwargs
pmeier May 12, 2021
94b21cc
add type hints for Kinetics400
pmeier May 12, 2021
f803946
flake8
pmeier May 12, 2021
b39646a
flake8
pmeier May 12, 2021
c47c309
flake8
pmeier May 12, 2021
18ad36d
rename to make thigs clearer
bjuncek May 24, 2021
d0fa6f4
Merge branch 'master' into bkorbar/datasets/kinetics
pmeier May 25, 2021
61334f0
Merge branch 'master' into bkorbar/datasets/kinetics
pmeier Jun 7, 2021
12b76d7
permuting the output
bjuncek Jun 8, 2021
89b9bee
Merge branch 'master' into bkorbar/datasets/kinetics
bjuncek Jun 8, 2021
1ee00b2
Merge branch 'master' into bkorbar/datasets/kinetics
pmeier Jun 9, 2021
a22e4e7
Merge branch 'master' into bkorbar/datasets/kinetics
fmassa Jun 10, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions torchvision/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from .sbd import SBDataset
from .vision import VisionDataset
from .usps import USPS
from .kinetics import Kinetics400
from .kinetics import Kinetics400, Kinetics
from .hmdb51 import HMDB51
from .ucf101 import UCF101
from .places365 import Places365
Expand All @@ -34,6 +34,6 @@
'Omniglot', 'SBU', 'Flickr8k', 'Flickr30k',
'VOCSegmentation', 'VOCDetection', 'Cityscapes', 'ImageNet',
'Caltech101', 'Caltech256', 'CelebA', 'WIDERFace', 'SBDataset',
'VisionDataset', 'USPS', 'Kinetics400', 'HMDB51', 'UCF101',
'VisionDataset', 'USPS', 'Kinetics400', "Kinetics", 'HMDB51', 'UCF101',
'Places365', 'Kitti',
)
279 changes: 261 additions & 18 deletions torchvision/datasets/kinetics.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,30 @@
from .utils import list_dir
import urllib
import time
import os
import warnings


from os import path
import csv
from typing import Callable, Optional
from functools import partial
from multiprocessing import Pool

from .utils import download_and_extract_archive, download_url
from .folder import find_classes, make_dataset
from .video_utils import VideoClips
from .vision import VisionDataset


class Kinetics400(VisionDataset):
"""
`Kinetics-400 <https://deepmind.com/research/open-source/open-source-datasets/kinetics/>`_
def _dl_wrap(tarpath, videopath, line):
download_and_extract_archive(line, tarpath, videopath)


class Kinetics(VisionDataset):
"""` Generic Kinetics <https://deepmind.com/research/open-source/open-source-datasets/kinetics/>`_
dataset.

Kinetics-400 is an action recognition video dataset.
Kinetics-400/600/700 are action recognition video datasets.
This dataset consider every video as a collection of video clips of fixed size, specified
by ``frames_per_clip``, where the step in frames between each clip is given by
``step_between_clips``.
Expand All @@ -20,11 +35,9 @@ class Kinetics400(VisionDataset):
Note that we drop clips which do not have exactly ``frames_per_clip`` elements, so not all
frames in a video might be present.

Internally, it uses a VideoClips object to handle clip creation.

Args:
root (string): Root directory of the Kinetics-400 Dataset. Should be structured as follows:

root (string): Root directory of the (split of the) Kinetics Dataset.
Directory should be structured as follows:
.. code::

root/
Expand All @@ -35,29 +48,85 @@ class Kinetics400(VisionDataset):
└── class2
├── clipx.avi
└── ...

If the split is not defined, it is appended using the split argument.
n_classes (int): select between Kinetics-400, Kinetics-600, and Kinetics-700
split (str): split of the dataset to consider; currently supports ["train", "val"]
frame_rate (float): If not None, interpolate different frame rate for each clip.
frames_per_clip (int): number of frames in a clip
step_between_clips (int): number of frames between each clip
annotation_path (str): path to official Kinetics annotation file.
transform (callable, optional): A function/transform that takes in a TxHxWxC video
and returns a transformed version.
download (bool): Download the official version of the dataset to root folder.
num_workers (int): Use multiple workers for VideoClips creation
_num_download_workers (int): Use multiprocessing in order to speed up download.

Returns:
tuple: A 3-tuple with the following entries:

- video (Tensor[T, H, W, C]): the `T` video frames
- video (Tensor[T, H, W, C]): the `T` video frames in torch.uint8 tensor
- audio(Tensor[K, L]): the audio frames, where `K` is the number of channels
and `L` is the number of points
and `L` is the number of points in torch.float tensor
- label (int): class of the video clip

Raises:
RuntimeError: If ``download is True`` and the image archive is already extracted.
"""

def __init__(self, root, frames_per_clip, step_between_clips=1, frame_rate=None,
extensions=('avi',), transform=None, _precomputed_metadata=None,
num_workers=1, _video_width=0, _video_height=0,
_video_min_dimension=0, _audio_samples=0, _audio_channels=0):
super(Kinetics400, self).__init__(root)
_FILES = {
"400": "https://s3.amazonaws.com/kinetics/400/{split}/k400_{split}_path.txt",
"600": "https://s3.amazonaws.com/kinetics/600/{split}/k600_{split}_path.txt",
"700": "https://s3.amazonaws.com/kinetics/700_2020/{split}/k700_2020_{split}_path.txt",
}
_ANNOTATION = {
"400": "https://s3.amazonaws.com/kinetics/400/annotations/{split}.csv",
"600": "https://s3.amazonaws.com/kinetics/600/annotations/{split}.txt",
"700": "https://s3.amazonaws.com/kinetics/700_2020/annotations/{split}.csv",
}

def __init__(
self,
root: str,
num_classes: str = "400",
split: str = "train",
frame_rate: float = None,
frames_per_clip: int = 5,
step_between_clips: int = 1,
annotation_path: str = None,
transform: Optional[Callable] = None,
extensions=("avi", "mp4"),
download: bool = False,
num_workers: int = 1,
_precomputed_metadata=None,
_num_download_workers=1,
_video_width=0,
_video_height=0,
_video_min_dimension=0,
_audio_samples=0,
_audio_channels=0,
) -> None:

# TODO: support test
assert split in ["train", "val"]
assert num_classes in ["400", "600", "700"]
self.n_classes = num_classes
self.extensions = extensions
self._num_download_workers = _num_download_workers

self.root = root
self.split = split

if annotation_path is not None:
self.annotations = annotation_path

if download:
self.download_and_process_videos()
super().__init__(self.root)

self.classes, class_to_idx = find_classes(self.root)
self.samples = make_dataset(self.root, class_to_idx, extensions, is_valid_file=None)
self.samples = make_dataset(
self.root, class_to_idx, extensions, is_valid_file=None
)
video_list = [x[0] for x in self.samples]
self.video_clips = VideoClips(
video_list,
Expand All @@ -74,6 +143,93 @@ def __init__(self, root, frames_per_clip, step_between_clips=1, frame_rate=None,
)
self.transform = transform

def download_and_process_videos(self) -> None:
"""
downloads all the videos to the _root_ folder
in the expected format
"""
tic = time.time()
self._download_videos()
toc = time.time()
print("Elapsed time for downloading in mins ", (toc - tic) / 60)
self._make_ds_structure()
toc2 = time.time()
print("Elapsed time for processing in mins ", (toc2 - toc) / 60)
print("Elapsed time overall in mins ", (toc2 - tic) / 60)

def _download_videos(self) -> None:
"""download tarballs containing the video to
"tars" folder and extract them into the _split_ folder
where split is one of the official dataset splits.

Raises:
RuntimeError: if download folder exists, break to prevent
downloading entire dataset again.
"""
if path.exists(self.root):
raise RuntimeError(
f"The directory {self.root} already exists. If you want to re-download or re-extract the images, "
f"delete the directory."
)

file_url = urllib.request.urlopen(
self._FILES[self.n_classes].format(split=self.split)
)
kinetics_dir, _ = path.split(self.root)
tar_path = path.join(kinetics_dir, "tars")
annotation_path = path.join(kinetics_dir, "annotations")

# download annotations
download_url(
self._ANNOTATION[self.n_classes].format(split=self.split), annotation_path
)
self.annotations = os.path.join(annotation_path, f"{self.split}.csv")

if self._num_download_workers == 1:
for line in file_url:
line = str(line.decode("utf-8")).replace("\n", "")
download_and_extract_archive(line, tar_path, self.root)
else:
part = partial(_dl_wrap, tar_path, self.root)
lines = [str(line.decode("utf-8")).replace("\n", "") for line in file_url]
poolproc = Pool(self._num_download_workers)
poolproc.map(part, lines)

def _make_ds_structure(self):
"""move videos from
root/
├── clip1.avi
├── clip2.avi

to the correct format as described below:
root/
├── class1
│ ├── clip1.avi

"""
file_tmp = "{ytid}_{start:06}_{end:06}.mp4"
with open(self.annotations) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
f = file_tmp.format(
ytid=row["youtube_id"],
start=int(row["time_start"]),
end=int(row["time_end"]),
)
label = (
row["label"]
.replace(" ", "_")
.replace("'", "")
.replace("(", "")
.replace(")", "")
)
os.makedirs(os.path.join(self.root, label), exist_ok=True)
existing_file = os.path.join(self.root, f)
if os.path.isfile(existing_file):
os.replace(
existing_file, os.path.join(self.root, label, f),
)

@property
def metadata(self):
return self.video_clips.metadata
Expand All @@ -89,3 +245,90 @@ def __getitem__(self, idx):
video = self.transform(video)

return video, audio, label


class Kinetics400(Kinetics):
"""
`Kinetics-400 <https://deepmind.com/research/open-source/open-source-datasets/kinetics/>`_
dataset.

Kinetics-400 is an action recognition video dataset.
This dataset consider every video as a collection of video clips of fixed size, specified
by ``frames_per_clip``, where the step in frames between each clip is given by
``step_between_clips``.

To give an example, for 2 videos with 10 and 15 frames respectively, if ``frames_per_clip=5``
and ``step_between_clips=5``, the dataset size will be (2 + 3) = 5, where the first two
elements will come from video 1, and the next three elements from video 2.
Note that we drop clips which do not have exactly ``frames_per_clip`` elements, so not all
frames in a video might be present.

Internally, it uses a VideoClips object to handle clip creation.

Args:
root (string): Root directory of the Kinetics-400 Dataset. Should be structured as follows:

.. code::

root/
├── class1
│ ├── clip1.avi
│ ├── clip2.avi
│ └── ...
└── class2
├── clipx.avi
└── ...

frames_per_clip (int): number of frames in a clip
step_between_clips (int): number of frames between each clip
transform (callable, optional): A function/transform that takes in a TxHxWxC video
and returns a transformed version.

Returns:
tuple: A 3-tuple with the following entries:

- video (Tensor[T, H, W, C]): the `T` video frames
- audio(Tensor[K, L]): the audio frames, where `K` is the number of channels
and `L` is the number of points
- label (int): class of the video clip
"""

def __init__(
self,
root,
frames_per_clip,
step_between_clips=1,
frame_rate=None,
extensions=("avi",),
transform=None,
_precomputed_metadata=None,
num_workers=1,
_video_width=0,
_video_height=0,
_video_min_dimension=0,
_audio_samples=0,
_audio_channels=0,
):
warnings.warn(
"torchvision now supports multiple versions of Kinetics"
"datasets, available via Kinetics class with a separate "
"n_classes parameter. This function might get deprecated in the future."
)

super(Kinetics400, self).__init__(
root=root,
num_classes="400",
frame_rate=frame_rate,
step_between_clips=step_between_clips,
frames_per_clip=frames_per_clip,
extensions=extensions,
transform=transform,
_precomputed_metadata=_precomputed_metadata,
num_workers=num_workers,
_video_width=_video_width,
_video_height=_video_height,
_video_min_dimension=_video_min_dimension,
_audio_channels=_audio_channels,
_audio_samples=_audio_samples,
download=False,
)