Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add verbose mode for dataset constructors to print directory scan stats + add a warning when video loading fails #3282

Closed
vadimkantorov opened this issue Jan 24, 2021 · 3 comments · Fixed by #3961

Comments

@vadimkantorov
Copy link

vadimkantorov commented Jan 24, 2021

I'm trying to create VideoClips object with my custom folder with a video. It returns zero results, because it can't read_video_timestamps (under PyAV backend) with an error av.error.InvalidDataError: [Errno 1094995529] Invalid data found when processing input: 'data/pseudo-kinetics/train_256/class0/P01_01.MP4'; last error log: [mov,mp4,m4a,3gp,3g2,mj2] moov atom not found

Maybe it's well an invalid file, but it's indeed better to print a warning

# TODO add a warning
(at least when some verbose flag equals True - maybe worth introducing verbose flag in dataset constructors) and maybe print the stats over all files (how many loaded, how many skipped because of extensions, how many had errors while loading) - this would save a lot of time when creating a new dataset and that has some problems

cc @bjuncek

@datumbox
Copy link
Contributor

Thanks for reporting.

I think it would be good to add a warning there instead of silently catching the exception. If you are interested, please send a PR that introduces a warning.

@vadimkantorov vadimkantorov changed the title Indeed add a printed warning (at least when some verbose flag is True) Add verbose mode for dataset constructors to print directory scan stats + add a warning when video loading fails Jan 25, 2021
@vadimkantorov
Copy link
Author

File lists became available: https://github.com/cvdfoundation/kinetics-dataset

So extensions could now be set properly

@bjuncek
Copy link
Contributor

bjuncek commented Jun 4, 2021

@vadimkantorov with #3680 we introduce a new kinetics dataset class (Kinetics(num_classes=400)) that should solve the issue with different extension (and you can also supply your own list of extensions btw). #3932 raises an error for invalid files; #3961 raises the warning for the possibly corrupt files.

As for the dataset verbose dataset statistics, that is something that we can perhaps keep in a separate issue in order discuss with @pmeier after the dataset revamp has been implemented?

@bjuncek bjuncek self-assigned this Jun 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants