Skip to content

Visualization tools for audio-only and multi-modal speaker diarization dataset

Notifications You must be signed in to change notification settings

liutaocode/DiarizationVisualization

Repository files navigation

Visualization Tools for Speaker Diarization

Introduction

The current landscape lacks a robust tool for diarization visualization, which is critical for the analysis of datasets and algorithm outcomes. In this repository, we offer intuitive methods to illustrate speaker diarization results. A pivotal criterion for selecting this visualization software was its capacity for interactive operation. While these visualization tools have room for improvement, they are the best available options at present.

Go to: Visualization tool for Audio-only datasets

Go to: Visualization tool for Audio-visual datasets

Visualization for Audio-only datasets

Step 1: Generating praat format:

python audio_visualized.py -rttm audio_cases/afjiv.rttm -audio_path audio_cases/afjiv.wav -praat_result audio_cases/afjiv.txt
  • rttm --- the reference or system rttm
  • audio_path --- the audio path
  • praat_result --- visualized result for praat software

(Example is from VoxConverse)

Step 2: Import praat_result into Praat:

  • Install Praat Mac or Windows
  • import praat_result into Praat
    • Open praat_result and audio
    • Select them all
    • Click View & Edit

Step3: Overview

You can slide with a horizontal scroll. Speaker labels are shown in each timeline (e.g., spk00, spk01 ...).

Some useful shortcuts:

  • CMD + A: Show all utterances in one screen.
  • CMD + N: Dive into selected areas.

Visualization for Audio-visual datasets

Step 1: Generating VIA format

python audio_visual_visualized.py -rttm audio_visual_cases/00115.rttm -mp4_path audio_visual_cases/00115.rttm -via_json_result audio_visual_cases/00115.json
  • rttm --- the reference or system rttm
  • mp4_path --- the mp4 path
  • via_json_result --- visualized result for VIA software

(Example is from MSDWild)

If the video cannot be previewed or quickly previewed, please try to convert them to support the specific mp4 format of HTML5.

ffmpeg -i original.mp4 -vcodec libx264 -acodec aac -preset fast -movflags +faststart  previewed.mp4

Step 2: Import via_format.json into VIA tools

  • Download via_video_annotator.html from URL or directly use a online demo. This website is an offline client, and we have tested on version via-3.0.11(see file: via_video_annotator_3.0.11.html in this repo).
  • Import JSON by clicking the folder button as follows:
  • You can also modify the script to support online URLs from OSS (Object Storage Service).

Step3: Overview

You can use the Space key to control Play/Pause Media.

More keys can be found on:

References

About

Visualization tools for audio-only and multi-modal speaker diarization dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published