The MSR−VTT dataset started by Microsoft research community consist of 10K videos. The dataset is collected based on 257 queries with average 118 videos for each queries. The separation of dataset for training and validation is done as per the statistics given in [1]. This repository contains the MSR-VTT video captioning dataset in Hindi. To maintain the integrity of MSR−VTT dataset, all the 20 English captions of each video are translated to Hindi. The output Hindi captions of Google translator need to be post-edited due to the presence of some erroneous translated Hindi captions. These errors are mostly due to ambiguous word sense and typo.
NOTE:
- Details of video with their ids are available here
- To get whole data with video please fill the google form, we will mail you all the details.
Dataset detail:
Data seperation | # Videos | # Hindi caption | # English caption |
---|---|---|---|
Training | 6153 | 123060 | 123060 |
Validation | 497 | 9940 | 9940 |
Test | 2990 | 59800 | 59800 |
Sample Hindi and English captions
The same format used in the MSR-VTT dataset is adopted:
{
"info" : {
"year" : str,
"version" : str,
"description": str,
"contributor": str,
"data_created": str
},
"videos": {
"id": int,
"video_id": str,
"category": int,
"url": str,
"start time": float,
"end time": float,
"split": str
},
"sentences": {
"sen_id": int,
"video_id": str,
"caption": str
}
}
Acknowledgements: