Skip to content

Commit

Permalink
add download link
Browse files Browse the repository at this point in the history
  • Loading branch information
liutaocode committed Jan 16, 2024
1 parent 3bbd1f8 commit 8ab18d6
Showing 1 changed file with 16 additions and 15 deletions.
31 changes: 16 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Compared with other multi-modal datasets, the segment length distribution of our

## Wavs

- [ Google Drive (7.56 GB)](https://drive.google.com/file/d/1I5qfuPPGBM9keJKz0VN-OYEeRMJ7dgpl)
- [Google Drive (7.56 GB)](https://drive.google.com/file/d/1I5qfuPPGBM9keJKz0VN-OYEeRMJ7dgpl)

md5: 0057f82daaddf2ce993d1bf0679929c4

Expand All @@ -38,8 +38,7 @@ The video file name corresponds to the audio file name.

- [Cropped faces (Google Drive, 14.49 GB)](https://drive.google.com/file/d/1poGOdkXway5MkQEGWTtM9U7TegLSOw54)


(For Chinese users, you can use [Baidu Link](https://pan.baidu.com/s/1hnrSKVDD9QS1bUnx4lV-Zg?pwd=t5t9) for faster downloading speech. )
(For Chinese researchers, you can use [Baidu Drive](https://pan.baidu.com/s/1hnrSKVDD9QS1bUnx4lV-Zg?pwd=t5t9) to speed up downloads. )


Our multimodal speaker diarization baseline includes a subtask - active speaker detection. To train the active speaker detection algorithm ([TalkNet](https://github.com/TaoRuijie/TalkNet-ASD) mentioned in our paper), we utilize 'cropped faces.' These are randomly generated from videos based on video content and rttm labels, and subsequently, manually rectified. However, if you choose not to use these resources, you can ignore the 'cropped faces.'
Expand All @@ -57,48 +56,46 @@ There are four categories of cropped-face videos:
Time is denoted in seconds format, and Segment_id corresponds to the cropped face video id within each video folder.



**[Updates]** Please disregard files with negative filenames (approximately 90 files).



**Notes**:

* The database is **ONLY** for research purposes.
* In response to community requests, we have uploaded a video.zip file due to some videos no longer being available online. This is to facilitate better replication of our work within the research community. These videos are solely for this purpose and must not be used otherwise. All usage must be in line with our [licensing agreement](MSDWILD_license_agreement.pdf). It's important to note that these materials may be removed at any time upon request from the original video owner.


## Face id with Bounding Boxes
## Videos with frame-by-frame face position annotation

We have added additional **bounding boxes** for every facial image across the frames. Our trained annotators has reviewed the facial annotations on each frame to guarantee accuracy — no faces have been ignored or incorrectly tagged. Moreover, they have realigned any improperly positioned face bounding boxes. The refined annotations have been systematically archived in a correspondingly named directory, with the data structured in CSV files as outlined below. [One Sample](https://drive.google.com/file/d/106yqmxF0yfimexCsDxufeTIb3JeUKL-c)

```
CSV line: 3363,face,1,398,129,479,244,0
Description: frame id, face(fixed), face_id, x1, y1, x2, y2, 0(fixed)
Description: frame id, face (fixed), face_id, x1, y1, x2, y2, 0 (fixed)
```

Download Link : [Google Drive (uploading, please wait 2-3 days)](#)

(For Chinese researchers, you can use [Baidu Drive](https://pan.baidu.com/s/1YpLMdCAcV0eG8fHmYf_lkw?pwd=msdb) to speed up downloads. )

[Full Data Downloading URL (uploading)](#)

### Preview Usage
### How to Preview Annotations

<img src='imgs/boundingbox1.jpg' width=70% />

Clik `DarkLabel.exe` and select `one video file` to preview.

<img src='imgs/boundingbox2.jpg' width=70% />

Move the `slider` to preview the positions and ID information of faces on different frames, while trying not to alter any other default settings
Move the `slider` to preview the positions and ID information of faces on different frames, not altering any other default settings.

**Notes**:

* The aforementioned video files have been standardized to a frame rate of 25 frames per second (fps), while the [original](https://drive.google.com/file/d/1fGYcJvqCEikZpwDq_84q4Pau5qO5Was1) frame rate may have varied.
* [DarkLabel](https://github.com/darkpgmr/DarkLabel) can be used for labelling or preview here. You can also use other tools.
* DarkLabel only supports Windows currently and you may use [wine](https://github.com/darkpgmr/DarkLabel/issues/4) to run on Mac or Linux (Not tested).
* The result can not directly converted to exactly the same [RTTM](./rttms/all.rttm) as some duration or face ids are adjusted and off-screen speech is not included here.
* The facial identification in each video is unique and differs from the identifiers found in [RTTM](./rttms/all.rttm).
* [DarkLabel](https://github.com/darkpgmr/DarkLabel) can be used for labelling or preview here. The `csv` file is generated by it. You can also use other tools by convert the `csv` file.
* `DarkLabel` only supports Windows (Win10 or Win11) currently and you may use wine (mentioned in this [issue](https://github.com/darkpgmr/DarkLabel/issues/4)) to run on Mac or Linux.
* The result can `not` directly converted to exactly the same [RTTM](./rttms/all.rttm) as some duration or face ids are adjusted and off-screen speech is not included in this part. By the way, the facial identification in each video is unique and also differs from the identifiers in [RTTM](./rttms/all.rttm) mentioned above.
* Different from the above-mentioned cropped face, the annotation here is for the bounding box of the unprocessed face in the original video.
* I suggest that this is merely supplementary material for this dataset. Possible future work we envision includes training an end-to-end multimodal speaker diarization that incorporates facial location information, and an evaluation method for a multimodal speaker diarization that takes into account the human face location.


Expand Down Expand Up @@ -127,6 +124,10 @@ You can refer to [URL](https://github.com/liutaocode/DiarizationVisualization) t

<img src='imgs/via_example.png' width=70% />

## Acknowledgments

Thanks for [You Zhang](https://github.com/yzyouzhang) for helping to point out some annotation issues and improve the quality of the dataset.

## Reference

```
Expand Down

0 comments on commit 8ab18d6

Please sign in to comment.